Chapter 9 Streamlined workflows with code

Alternative title: “coding for not just you in this moment”

We will discuss good coding practices for beginning and seasoned coders alike that make it easier to work with other people, times, and computers.

Most of this comes directly from Jenny Bryan & Jim Hester’s awesome course What they Forgot to Teach You About R. I highly recommend reading Chapters 1-4 that go into much better detail than the notes here.

Note: these are all general coding practices, not R-specific

9.1 Source files

What are they and why?

Code that creates objects is “source code”. Source code is essentially text files you edit in a text editor that is then executed in the console.

Examples:

  • .R, .Rmd
  • .py
  • .m

9.1.1 Save the source, not the workspace

Save the source code; do not save the R object itself.

Save your commands as a .R or .py (“script”), or .Rmd or .ipynb (“R Markdown” or “notebook”) file. It doesn’t have to be polished. Just save it!

Everything that really matters should be achieved through code that you save – including objects and figures The contrast is storing them implicitly or explicitly, as part of an entire workspace, or clicking via the mouse.

9.1.2 Always start R with a blank slate

Saving code is an absolute requirement for reproducibility.

When you quit, do not save the workspace to an .Rdata file. When you launch, do not reload the workspace from an .Rdata file.

In RStudio, set this via Tools > Global Options.

9.1.3 Restart R often during development

“Have you tried turning it off and then on again?” – timeless troubleshooting wisdom, applies to everything

If you use RStudio, use the menu item Session > Restart R

Additional ways to restart development where you left off, i.e. “re-run all the code up to HERE”

9.1.4 Avoid rm(list = ls())

It’s common to see scripts begin with this object-nuking command: rm(list = ls())

This is highly suggestive of a non-reproducible workflow.

The problem with rm(list = ls()) is that, given the intent, it does not go far enough.

It only deletes user-created objects from the global workspace.

Instead, Restart R!!

9.2 Filepaths

Every saved thing gets a unique path.

Your code needs to run from somewhere specific. And when it interacts with other things (data or other code), you need to tell your code where things are.

The more deliberate you are about where things live,

  • The easier it will be for you and future you
  • The easier it will be for other people
  • The easier it will be on another computer

9.2.1 setwd(“path/that/only/works/on/my/machine”)

The chance of setwd() having the desired effect – making the file paths work – for anyone besides its author is 0%.

It’s also unlikely to work for the author one or two years or computers from now.

Hard-wired, absolute paths, especially when sprinkled throughout the code, make a project brittle. Such code does not travel well across time or space.

9.2.2 setwd()

BUT, if you still decide to use setwd() in your scripts, you should at least be very disciplined about it:

Only use setwd() at the very start of a file, i.e. in an obvious and predictable place.

Always set working directory to the same thing, namely to the top-level of the project. Always build subsequent paths relative to that.

9.2.3 R users: use the here package

here() identifies your project’s files, based on the current working directory at the time when the package is loaded.

library(here)
here()

9.3 Project oriented workflows

9.3.1 Dilemma and Solution

Problem statement:

We want to work on project A with the working directory set to path/to/projectA (my data analysis) and on project B with the working directory set to path/to/projectB (my teaching material).

But we also want to keep code like setwd(“path/to/projectA”) out of our scripts.

Solution:

Solution: use an IDE that supports a project-based workflow.

An integrated development environment (IDE) offers:

  • a powerful, R-aware code editor
  • many ways to send your code to a running R process
  • other modern conveniences

And it eliminates:

  • temptation to develop code directly in the Console. (instead:.R!)
  • tension between development convenience and portability of the code.

9.3.2 Organize your work into projects

Here’s what I mean by “work in a project”:

  • File system discipline: put all files related to a project in a designated folder.
    • This applies to data, code, figures, notes, etc.
    • Depending on project complexity, you might enforce further organization into subfolders.
  • Working directory intentionality: when working on project A, make sure working directory is set to project A’s folder.
    • Ideally, this is achieved via the development workflow and tooling, not by baking absolute paths into the code.
  • File path discipline: all paths are relative — relative to the project’s folder.

Synergistic habits: you’ll get the biggest payoff if you practice all of them together.

Portability: the project can be moved around on your computer or onto other computers and will still “just work”. is the only practical convention that creates reliable, polite behavior across different computers/users/time. This convention is neither new, nor unique to R.

It’s like agreeing that we will all drive on the left or the right. A hallmark of civilization is following conventions that constrain your behavior a little, in the name of public safety.

9.3.3 RStudio Projects

The RStudio IDE has a notion of a (capital “P”) Project, which is a very effective implementation of (small “p”) projects.

Project have an.Rproj file in the folder, which is used to store settings specific to that project. Use File > New Project … to get started.

Allows for multiple projects

no danger of crosstalk: each has own R process, global workspace & working directory

Same “unit” as a GitHub repo!

9.3.4 Tips for RStudio Projects

One suggestion for organizing:

Have a dedicated folder for your Projects. - If you have One Main Place for Projects, then go there in Finder/File Explorer to launch any specific project with .Rproj. - Mine is called “~/github/”.

Switching Projects: RStudio knows about recent Projects.

9.3.5 Name files deliberately

Jenny Bryan’s 3 rules for Naming Things:

  • machine readable
  • human readable
  • plays well with default ordering

Available from Speakerdeck or download pdf