Image credit: Photo by Leo Rivas on Unsplash

Day 82-94: Idle thoughts on my workflow

So this is a post about how I set up one part of my workflow. I feel nervous about it for two reasons:

  • I’m afraid that I’m doing it wrong and someone will laugh at me
  • I’m afraid that I’m doing it right and I’ll sound like one of those people bragging about their awesome gaming rig or whatever

Yes, I realise that I’m setting myself up to feel bad. I should stop that.

Why write this?

Why am I bothering to document this at all, if it makes me feel nervous? Well, the other day I made a tiny modification to my everyday workflow, at least as it relates to my “intro to R” lecture notes, and posted about it on twitter just because it made me happy that I could make my own process a little smoother:

I was expecting maybe one or two favourites from my friends who also enjoy tinkering with these things. I wasn’t expecting over 100 likes, and definitely wasn’t expecting anyone to encourage me to write a blog post about my setup!

I don’t feel particularly confident with how I’ve set up my workflow and certainly don’t feel like I can give any useful advice so why would anyone want to hear from me? In retrospect I guess it does make sense though. Everyone struggles to work out how to organise their projects in a way that suits their situation, their skills and their idiosyncratic style. So we’re all clutching at straws here, hoping to find some good tips by looking at what other people do. So even though I make no claims whatsoever as to competence, here’s the constraints I’m working with and what I’ve ended up doing for the PSYR website…

The constraints

I feel like everyone ends up with a different set up for all sorts of reasons. I don’t believe there’s such a thing as good workflow or bad workflow, not inherently. There’s just “something that works or me” and “something that doesn’t work for me”. Often the reason something works or doesn’t work for you is historical – you learned this tool because of that situation, and it’s sometimes smarter to repurpose a tool you’re comfortable with than learn a completely new one. Life is short, and learning is hard. So for me, the workflow I have set up for managing R for Psychological Science isn’t just an adaptation to the specific problem of “write these lecture notes”: it’s also to the broader problem of “working as a computational cognitive scientist”.

My workflow for PSYR sits in this context:

  • Most of my websites are “single serve” sites that I use to run behavioural experiments. They usually rely on client side javascript (especially jsPsych) do to the work, but there’s a tiny amount of server side code required. At a minimum, the data have to be written to server. Using Drew Hendrickson’s code as my starting point, I use Google App Engine to host the sites. An extremely poorly documented template for setting up one of those sites is here

  • I use my lab website as a way of linking to all my student and postdoc websites (when they exist… hint, hint, students 😀), archiving publications and live demonstrations of experiments,1 highlighting research and teaching, and creating pages associated with different projects and classes. The main site doesn’t have to be updated often, but it does need to be flexible so it’s also hosted it as an app engine site: here it is as an appspot subdomain.2

  • I also have a bunch of static sites that I administer, generated locally with blogdown (e.g. rladiessydney.org, djnavarro.net thetranslobby.com), or just plain bootstrap unsw-psych-women.org. Because I am lazy they also use Google App Engine.

The PSYR project also sits in this context:

  • 90% of my stuff is stored in my Dropbox folder. It’s lightweight, low effort and easy to share with collaborators who don’t use git (i.e. mosty of them)
  • 5% of my stuff really needs proper version control, so it becomes a repository on GitHub. I’m still a git novice, so I don’t use git unless I have to.
  • 5% of my stuff disappears randomly because I am disorganised 😀

Mostly this works fine, but there are occasional annoyances. As far as I an tell git and dropbox don’t play nicely with each other. Any time I have created a git repository inside a dropbox folder, bad things have happened (and a bit of googling confirms that I’m not the only one who has learned this lesson the hard way). As a consequence all my local files are separated into “GitHubLand” and “DropboxLand”. The lab website lives in DropboxLand because that’s just easier… but I want the R for Psychological Science project lives in GitHubLand.

This is less than ideal, and produces a weird break in my workflow that I needed to fix.

The set up for PSYR

The PSYR website is as simple as I could manage. It’s an R Markdown website, not blogdown or bookdown. I have a lot of love for both blogdown and bookdown, but they’re both complicated tools that mix Hugo static site generation with R Markdown and I just don’t need that complexity. My set up is already a mess, and the last thing I want is even more moving parts!

With that in mind, if you look at how PSYR is organised on the github repo it’s basically a flat directory structure, and I render the site locally. Inside the psyr_tools.R file there’s this call to render_site

library(rmarkdown)
library(here)

# render the site in a new environment: handy to make sure
# that nothing in the RMD files is evaluating in .GlobalEnv
psyr_render <- function() {
  render_site(input = here(), envir = new.env())
}

One thing I realised was kind of handy is that evaluating within a new environment is a great way to catch some annoying issues with my code. By default the side effects for some functions (e.g. data()) evaluate in the global environment rather than the calling environment, which is super annoying in R Markdown because now all these variables are showing up in the wrong place and there’s no error message to catch it. By evaluating render_site outside of .GlobalEnv it becomes really obvious when I’ve forgotten to catch those errors because all of a sudden iris (or whatever) shows up in my global environment when it ought to have been kept within the local environment for the R Markdown document.

The next thing I realised is that my workflow when writing PSYR is really scattered. I tend to update multiple of sections at the same time because I realise I need to link different things together. From a git perspective this really strongly argues for frequent commits so that I don’t end up with one commit every week with a completely useless message reading “updated everything”. However, what I was finding is that this interrupts my workflow, because I’m constantly jumping out of RStudio to the terminal3 in order to commit.4 So I added these functions to psyr_tools.R

# git commit from R console
psyr_commit <- function(message) {
  message <- paste0('"',message,'"')
  cmd <- paste(here("_shell","git_commit.sh"), message)
  system(cmd)
}

# git push from R console
psyr_push <- function() {
  system(here("_shell","git_push.sh"))
}

# git status from R console
psyr_status <- function(){
  system(here("_shell","git_status.sh"))
}

They’re not doing anything interesting; the shell scripts just contain the bash commands to do super basic git operations. Here’s the entirety of git_commit.sh:

#!/usr/bin/env bash

COMMIT_MSG="$1"

cd ~/GitHub/psyr
git add --all
git commit -m "$COMMIT_MSG"

The last interruption I wanted to banish is the tediousness of deploying the PSYR site. It’s super annoying that PSYR lives in GitHubLand on my local machine, but the local copy of the lab website lives in DrobpoxLand. For ages I was manually doing a drag and drop of the static site in ~/GitHub/psyr/_site to the website staging location ~/Dropbox/Website/ccss/psyr, then switching to the terminal to deploy to Google App Engine. It’s not a hugely time consuming job, but it forces me to switch from thinking about R, to thinking about my file system, and then to thinking about app engine, and then I have a headache so I need to take a lie down. Eventually I realised that this was stupid because every single step of this operation has a simple bash command that I can then wrap into another R function. So I added this to my psyr_tools.R file

psyr_deploy <- function() {
  system(here("_shell","gae_deploy.sh"))
}

added this to my tiny library of shell scripts

#!/usr/bin/env bash

# the google sdk manages the path for gcloud through
# bash profile: need to reload for the script
source ~/.bash_profile

# copy static files from PSYR site to GAE site
cp -r ~/GitHub/psyr/_site/* ~/Dropbox/Webpage/ccss/psyr

# deploy (quietly) from GAE home
cd ~/Dropbox/Webpage/ccss
gcloud app deploy --quiet --project=compcogscisydney

and now I’m done.5 It’s kind of trivial, but I find it amazingly helpful. Because I’ve created R wrappers for everything, I find I never have to stop thinking about R. My whole process for checking, committing, pushing to GitHub and deploying to GAE now sits in R, and the commands are intuitive:

psyr_status()
psyr_commit("message")
psyr_push()
psyr_deploy()

Obviously, I don’t push or deploy as often as I commit, but just knowing that I don’t have to switch my pattern of thinking out of R in order to do any of these things makes me much more willing to do them.

Yay!

That said… I honestly don’t feel all that confident that I’m doing any of this “right”. If anyone reading this has suggestions for how I could smooth it out or simplify it further, I’m always happy to learn!


  1. I feel like one day I need to learn Docker???? That’s got to be a useful tool to help with archiving, right???

  2. I register my websites using Google Domains and then set up a DNS redirect - it’s pretty easy to do, since both App Engine and Domains have such clear instructions that even someone like me who doesn’t really understand the internet can follow them.

  3. Yes, I realise there’s a terminal in RStudio now. When did that happen??? That does help, but I still find that every single time I have to remember the command I want, and eventually I decided to automate it

  4. I admit it - I use the command line for all my git stuff. I’m not a snob - I would love to use something like gitkraken or sourcetree or whatever, but honestly I find them super confusing. Every time I open one of them up I see this horrible clutter in the interface and think nah. 99% of what I do with git is (1) check status, (2) stage and commit, (3) push to github. I don’t want to learn a whole new application for that.

  5. There’s a part of me that is annoyed that the shell scripts all refer to the actual paths on my machine, but I guess that’s inevitable since these scripts are managing the relationship between two qualitatively different projects that just happen to be linked to each other. At least I’ve set it up so that it’s only this one tiny section of the code that does this, and none of it is within R.

Related