Day 8: Set default to prettiness
by Danielle Navarro, 04 May 2018
default package is pretty simple. It has three functions:
defaultallows you to check the default values to a function’s arguments
default<-allows you to change the default values
reset_defaultrestores the defaults to the their original values
Hm. In principle this seems like a really nice thing to be able to do. In practice…
@dataandme) April 26, 2018
… oh yes. Can I really be trusted with the power to alter default settings???? Only time will tell, I suppose. Okay here goes!
What is my use case for this?
It has always annoyed me that functions like
mean do not remove missing data by default. I know this makes me a bad person: after all, it’s often not safe to ignore
NA values, which is why we have whole classes of imputation models for handling them. So it does make a lot of sense that the
mean function sets
na.rm = FALSE by default. Removing
NA values genuinely should be a choice that the analyst is forced to actively make, rather than one that the software makes for them.
Yes, yes but… I get so tiiiiiirred of typing
na.rm = TRUE over and over again within a single analysis. Okay, sure, I agree that it’s on me make the active choice to ignore missing data, but is it really necessary for me to keep typing in my decision, using my fingers like a sucker?
So I can imagine a workflow that goes like this
- Think about what I want to do
- Set my defaults for this analysis
- Do the analysis
- Reset the defaults to their original values
That seems sensible to me, and it might even produce cleaner code, because I would have one section at the top where I make my assumptions explicit via the “defaults”, but then during the analysis itself the code doesn’t get cluttered by having to reassert my assumptions every single time I call
lm or whatever. So that might be quite nice!
What does worry me is the thought of taking a “set and forget” approach to defaults. I could easily imagine myself ending up in a situation where I change my defaults, forget that I have done so, and then getting confused that my analyses don’t look the same as someone else’s. That sounds like a nightmare
Working on the assumption that I would have the discipline to always reset my defaults, let’s have a go at playing around with defaults
Taking it for a spin…
Right, so my first attempt to set a default didn’t work:
default(fun = mean) <- list(na.rm = TRUE)
## Error: 'na.rm' is not an argument of this function
na.rm is absolutely is an argument to the
mean function what the … oh, right
mean is a generic S3 function. Sigh. So presumably I need to set the default for the, um, default method,
default(fun = mean.default) <- list(na.rm = TRUE)
Okay, that seems to have worked. Let’s have a look
# some data x <- c(3,6,2,7,2,7) y <- c(3,6,2,7,2,7,NA) mean(x) mean(y)
##  4.5 ##  4.5
mean function is now ignoring
NA values by default. Next, let’s make sure I can reset the defaults:
mean.default <- reset_default(mean.default) mean(x) mean(y)
##  4.5 ##  NA
It works! 🎉
Tidy code for histograms
A really nice application I can see myself relying on a lot is specifying the default styling for my plots within a single analysis. By using the
default function to do this, I can push the “ancillary” code specifying the visual style for the plots into the header of my script, leaving the functional code for the analysis looking nice and tidy. I imagine there are better ways to do this using ggplot but honestly I find there are a lot of situations (especially when computational modelling is involved) where ggplot isn’t really all that appropriate, and base graphics is still the best tool for the job. So it’s nice to be able to work with base graphics using clean code!
# set my defaults & load my data default(hist.default) <- list(main = "", col = "#88398A", border = "white" ) afl <- read.csv(file = "https://djnavarro.net/files/afl24.csv") # analysis head(afl) afl %$% hist(home.score) afl %$% hist(away.score) # reset the defaults hist.default <- reset_default(hist.default)
In this situation, when I come back to look at this analysis several months (years) later I can just look at it and just “see” what it’s doing without having to sort through many many arguments to work out which things are “substantive” and which are not. Here’s the output:
## home.team away.team home.score away.score year round weekday day ## 1 North Melbourne Brisbane 104 137 1987 1 Fri 27 ## 2 Western Bulldogs Essendon 62 121 1987 1 Sat 28 ## 3 Carlton Hawthorn 104 149 1987 1 Sat 28 ## 4 Collingwood Sydney 74 165 1987 1 Sat 28 ## 5 Melbourne Fitzroy 128 89 1987 1 Sat 28 ## 6 St Kilda Geelong 101 102 1987 1 Sat 28 ## month is.final venue attendance ## 1 3 FALSE MCG 14096 ## 2 3 FALSE Waverley Park 22550 ## 3 3 FALSE Princes Park 19967 ## 4 3 FALSE Victoria Park 17129 ## 5 3 FALSE MCG 18012 ## 6 3 FALSE Gold Coast Stadium 15867