Day 8: Set default to prettiness

by Danielle Navarro, 04 May 2018

The default package is pretty simple. It has three functions:

Hm. In principle this seems like a really nice thing to be able to do. In practice…

… oh yes. Can I really be trusted with the power to alter default settings???? Only time will tell, I suppose. Okay here goes!

What is my use case for this?

It has always annoyed me that functions like mean do not remove missing data by default. I know this makes me a bad person: after all, it’s often not safe to ignore NA values, which is why we have whole classes of imputation models for handling them. So it does make a lot of sense that the mean function sets na.rm = FALSE by default. Removing NA values genuinely should be a choice that the analyst is forced to actively make, rather than one that the software makes for them.

Yes, yes but… I get so tiiiiiirred of typing na.rm = TRUE over and over again within a single analysis. Okay, sure, I agree that it’s on me make the active choice to ignore missing data, but is it really necessary for me to keep typing in my decision, using my fingers like a sucker?

So I can imagine a workflow that goes like this

  • Think about what I want to do
  • Set my defaults for this analysis
  • Do the analysis
  • Reset the defaults to their original values

That seems sensible to me, and it might even produce cleaner code, because I would have one section at the top where I make my assumptions explicit via the “defaults”, but then during the analysis itself the code doesn’t get cluttered by having to reassert my assumptions every single time I call mean or lm or whatever. So that might be quite nice!

What does worry me is the thought of taking a “set and forget” approach to defaults. I could easily imagine myself ending up in a situation where I change my defaults, forget that I have done so, and then getting confused that my analyses don’t look the same as someone else’s. That sounds like a nightmare

Working on the assumption that I would have the discipline to always reset my defaults, let’s have a go at playing around with defaults

Taking it for a spin…

Right, so my first attempt to set a default didn’t work:

default(fun = mean) <- list(na.rm = TRUE)
## Error: 'na.rm' is not an argument of this function

Say what? na.rm is absolutely is an argument to the mean function what the … oh, right mean is a generic S3 function. Sigh. So presumably I need to set the default for the, um, default method, mean.default.

default(fun = mean.default) <- list(na.rm = TRUE)

Okay, that seems to have worked. Let’s have a look

# some data
x <- c(3,6,2,7,2,7)
y <- c(3,6,2,7,2,7,NA)

## [1] 4.5
## [1] 4.5

Yep, the mean function is now ignoring NA values by default. Next, let’s make sure I can reset the defaults:

mean.default <- reset_default(mean.default)

## [1] 4.5
## [1] NA

It works! 🎉

Tidy code for histograms

A really nice application I can see myself relying on a lot is specifying the default styling for my plots within a single analysis. By using the default function to do this, I can push the “ancillary” code specifying the visual style for the plots into the header of my script, leaving the functional code for the analysis looking nice and tidy. I imagine there are better ways to do this using ggplot but honestly I find there are a lot of situations (especially when computational modelling is involved) where ggplot isn’t really all that appropriate, and base graphics is still the best tool for the job. So it’s nice to be able to work with base graphics using clean code!

# set my defaults & load my data
default(hist.default) <- list(main = "",  col = "#88398A", border = "white" )
afl <- read.csv(file = "")

# analysis
afl %$% hist(home.score)
afl %$% hist(away.score)

# reset the defaults
hist.default <- reset_default(hist.default)

In this situation, when I come back to look at this analysis several months (years) later I can just look at it and just “see” what it’s doing without having to sort through many many arguments to work out which things are “substantive” and which are not. Here’s the output:

## home.score away.score year round weekday day
## 1  North Melbourne  Brisbane        104        137 1987     1     Fri  27
## 2 Western Bulldogs  Essendon         62        121 1987     1     Sat  28
## 3          Carlton  Hawthorn        104        149 1987     1     Sat  28
## 4      Collingwood    Sydney         74        165 1987     1     Sat  28
## 5        Melbourne   Fitzroy        128         89 1987     1     Sat  28
## 6         St Kilda   Geelong        101        102 1987     1     Sat  28
##   month              venue attendance
## 1     3    FALSE                MCG      14096
## 2     3    FALSE      Waverley Park      22550
## 3     3    FALSE       Princes Park      19967
## 4     3    FALSE      Victoria Park      17129
## 5     3    FALSE                MCG      18012
## 6     3    FALSE Gold Coast Stadium      15867