Reproducible Research and Software in Python

Slide Note
Embed
Share

This presentation by Quentin Peter focuses on the importance of reproducible research and software development, particularly in a Python environment. It delves into key aspects such as ensuring replicability, version control, automation, and documentation. Practical examples and strategies are provided to help researchers establish efficient workflows that enhance reproducibility in their work.


Uploaded on Oct 05, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Reproducible Research and Software with a python focus Quentin Peter

  2. Aims of the talk Make you think about the questions you should ask yourself. Give you pointers to tools you could use. Do your own research on that to find the best setup for you! (At the end, I give an example of step by step instructions)

  3. Examples of irreproducible research Could someone replicate your analysis without your help? Could you replicate an analysis you did 6 month ago and get the same results? Do you have multiple versions of the same script with suffixes like _new / _final / _3 Do you still have the raw data? Is is backed up on a single computer / hard drive somewhere? Does your analysis includes large parts you do manually?

  4. Plan What did you do? Why did you do it? How did you set up everything? When and what did you change? How can we access your work? http://t-redactyl.io/blog/2016/10/a-crash-course-in-reproducible-research-in-python.html

  5. What did you do? Goal: I can easily go from raw data to the figure in your paper. Eg. Write metadata in json file Best way: automatise everything in a script Worst way: do everything by hand and write lacunar / no instructions It is very hard to remember exactly what you did. Compromise: do-nothing scripts Like a good documentation, easily replaced by actual code Boring step: Tidy the script after you re done. https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-the-key-to-gradual-automation/

  6. Plan What did you do? Why did you do it? How did you set up everything? When and what did you change? How can we access your work?

  7. Why did you do it? What assumption do you build in your script / process? Write comments in the script. Better: Use literate programming (mix code and text) Jupyter notebook.

  8. Plan What did you do? Why did you do it? How did you set up everything? When and what did you change? How can we access your work?

  9. How did you set up everything? What is the result of this python code? a = 3 / 2 print(a) Depends on the version on python (Use 3). What are the arguments to the function numpy.histogram? Changed five times. (This is bad) What happens when indexing an array with floats in numpy? You can t with new versions of numpy. (This is good) https://python3statement.org/ https://numpy.org/neps/nep-0023-backwards-compatibility.html

  10. How did you set up everything? Scripts may crash (bad) or produce different results (very bad) when using different versions. Get versions of all the packages you are using: pip freeze > spec-file-pip.txt conda list --explicit > spec-file.txt

  11. How did you set up everything? Use venv, virtualenv or conda environments https://virtualenv.pypa.io/en/stable/ https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/ https://medium.com/slalom-technology/reproducible-data-science-environment- with-virtualenv-29a663018a72 https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage- environments.html Use Docker https://dev.to/rosejcday/python-and-jupyter-notebooks-23h5 Make a copy of your computer with a disk image. (overkill, except if you use network)

  12. Plan What did you do? Why did you do it? How did you set up everything? When and what did you change? How can we access your work?

  13. When and what did you change? Possibility to go back Find when you broke something Use git I like git kraken (www.gitkraken.com )

  14. Plan What did you do? Why did you do it? How did you set up everything? When and what did you change? How can we access your work?

  15. How can we access your work? In 50 years? Three important part of your work The paper: University repository https://elements.admin.cam.ac.uk The code: University git / GitHub https://git.uis.cam.ac.uk The data: University repository https://www.repository.cam.ac.uk/ Not some dropbox!!!

  16. Summary What did you do? Use code, clean it up. Why did you do it? Comment code, use Jupyter. Link the code and the paper. How did you set up everything? Give versions numbers. Docker? When and what did you change? Use Git. How can we access your work? Upload code on github, data on university repository. Link to paper with doi.

  17. Summary None of these solutions are the only one, but you must ask yourself these five questions. You should now be able to access your research for years to come! Questions? And now: My personal recommendation: in not that many steps.

  18. Step by step instructions for python: The code Create a git repository + find somewhere to host it (private github / git.uis.cam.ac.uk) Learn to use git! (for example .gitignore) Saves the code and I can easily come back I can see what I have edited during the day! Write your code until it works, and try to comment important parts. Clean the code, comment it No # Add numbes comments Variables are snake_case, no Z , x , i

  19. Step by step instructions for python: Installation Package your code properly (packaging.python.org) You should be able to install with pip install List all the dependencies! Create a venv (docs.python.org/3/tutorial/venv.html) Install the package in the venv. If it doesn t work, delete the venv, update setup.py, and create a new one. Save the output of pip freeze in the git repository. Create a standalone distribution (docker, pyinstaller (www.pyinstaller.org), ) (if you are really motivated)

  20. Step by step instructions for python: Share the code! Add your data doi (from www.repository.cam.ac.uk) to the readme, with instructions to use the code on them. Add the paper doi to readme (open access is better). Upload the code to github, create a release. Use zenodo.org to get a doi to add to the paper.

  21. Questions? Thanks for listening !

Related