Chapter 5 Python
Python is a very powerful language for programming and data science and has wide use in scientific communities. The Python Docs website has links to news and resources, and their Hitchhiker’s guide to Python provides a nice overview and links to many tutorials sorted by beginner, intermediate, and advanced.
Versions. Python had some upgrades between versions 2 and 3 in ~2010, and it’s important to keep this in mind as you learn. A recommendation is that if you’re learning Python to create your own new workflows, focus on Python 3, and if you’ll primarily be running older workflows, focus on Python 2. Either way, the majority of your learning will be the same.
Tip: Keep your eye on Scientific-Python, a new effort to coordinate scientific Python development, funded by the Moore Foundation and led by UC Berkeley.
R users. If you are an R user, consider using Python in RStudio with reticulate to reduce the overhead for getting started.
5.1 Getting a sense of Python
Reading blog posts are also a good way for getting a sense of Python, and you can follow along with many hands-on.
- Testers and Data Science - Jeff Nyman
- 4-part blog describing considerations for data science; you can follow hands-on with Python from Jupyter notebooks
- Python 3
- Python Recipe: Open a file, read it, print matching lines - Ben Welsh
- Nice narrative with minimal code describing wanting to search for a pattern in a file and save the search results
- Python 2
5.2 Learning Python
Programming with Python - The Carpentries (Software Carpentry)
- Assumes no previous coding experience, and setup recommends using Anaconda and Jupyter Notebooks. Recommended as an intro to how Python works for programming (see next).
- Python 3
Data analysis and visualization in Python for ecologists - The Carpentries (Data Carpentry)
- Not just for ecologists! Assumes no previous coding experience, and setup recommends using Anaconda and Jupyter Notebooks. Recommended as an intro from how to use Python for data analysis (see previous).
- Python 3
A Whirlwind Tour of Python - Jake VanderPlas
- Says it assumes familiarity with programming in another language, but is a nice overview for any beginner, and "particularly designed for those who wish to use Python for data science and/or scientific programming
- Serves as an introduction to his longer book (listed next):
The Python Data Science Handbook - Jake VanderPlas
- Says it assumes familiarity with programming in another language, but a nice start for any beginner
- Setup available to run code in Jupyter Notebooks and Google Collab
- Python 3
Python for Data Analysis - Luke Thompson
- Assumes no previous experience coding. Numbered lessons also include intro to the command line.
Scipy Lecture Notes - Edited by Varoquaux et al.
- Assumes some familiarity with coding concepts
- Python 3
Python Introduction - Google for Education
- Assumes strong familiarity with coding concepts, for example that you are learning Python as a second programming language.
- Python 2
5.2.1 Setting up a python environment
- Creating a reproducible (python-centric) environment - Chris Sifuentes
- These slides from NDCN Office Hours describe what an environment is, why it’s important for reproducibility, and how to set it up in Python
5.3 Machine Learning
- Introduction to Machine Learning - Lawrence Carin, Coursera
- This course will “provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.”