Day 39-46: On error bars and other simple data summaries

This series of posts has been on hold for the last few days because I’ve been solo parenting and had a few deadlines at work. I have no idea how single parents cope. It’s not like this is my first time at the rodeo, of course - probably the roughest solo parenting exercise was living out of the back of a RAV4 with two kids while trying to sell a house and move across the country (that one was tricky) - but every time my partner is gone for more than a few days I start to notice the extra workload, and all the optional things like writing blog posts or taking a shower tend to get put on hold.

At last though, I have a little time to myself. The kids have been out trampolining all day, we’re relaxing with our preferred screens, and I’ve been thinking a bit about statistics to take my mind off things. As usual, my inspiration for the post comes from twitter, and this is a tale of two tweets. First off, here’s me idly wondering about error bars, and what scientists typically prefer to display with error bars…

I admit I was surprised at how many people genuinely like standard errors. A few people have suggested to me that the reason they like standard errors is that they can run the “eyeball test” of significance - if error bars don’t overlap then the corresponding t-test is statistically significant (modulo all the usual caveats that attach to that). I’m sympathetic to that to some extent, but I still have worries that I’ll try to explain later in the post.

Using the ggstatsplot package

While I was thinking a little about data visualisation for simple statistical problems like “estimating a mean”, I saw this post from Indrajeet Patil pop up on my feed:

This ggstatsplot package sounds very neat. The idea, as I understand it, is to provide a very quick and simple way to produce a few standardised plots that would be useful in a lot of behavioural science contexts. To quote from the github page:

ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. Currently, it supports only the most common types of statistical tests (parametric, nonparametric, and robust versions of t-tets/anova, correlation, and contingency tables analyses). Accordingly, it produces limited kinds of plots: violin plots (for comparisons between groups or conditions), pie charts (for categorical data), scatterplots (for correlations between variables), and histograms (for hypothesis about distributions).

I think that’s a really handy thing to have. I can’t see myself ever using it to draw a figure that I’d publish in a paper, but I can see myself using it a lot during my first pass through a data set. To see why I like it, let’s say I’ve just run a simple between-subjects experiment with a single three-level factor, and I get data like this:

study <- tibble(
  group = rep(c("apples","oranges","bananas"), 20),
  outcome = rep(c(5.5,6,6.5), 20) + rnorm(60)
)  %T>% print
## # A tibble: 60 x 2
##    group   outcome
##    <chr>     <dbl>
##  1 apples     5.56
##  2 oranges    5.74
##  3 bananas    6.95
##  4 apples     6.23
##  5 oranges    5.88
##  6 bananas    7.21
##  7 apples     5.17
##  8 oranges    6.68
##  9 bananas    6.11
## 10 apples     7.80
## # ... with 50 more rows

Now, suppose I want to take a quick look at the study data and see if my results make any sense, at all. I don’t want to bother putting together a fancy production-ready graphic, but I do need something that is pretty enough not to give me a headache and make it easy to see what’s going on. This is absurdly easy to do with the ggbetweenstats function:

  data = study,
  x = group,
  y = outcome

The actual output also is kind enough to add some extra information about what assumptions the function has made and what tests it’s reporting. I think that’s a good thing, but I’m not at all interested in the statistical analysis in this context so I’ve just suppressed that part of the output to allow me to focus on pretty pictures! It’s nice:

  • The sample mean is displayed prominently (large red dots, labelled)
  • The raw data are present but unobtrusive in the background
  • It shows the traditional boxplot and a violin plot (plot.type=boxviolin)

If that seems a little too busy, it’s easy enough to keep just one of the two plots. So if I only want the box plot I can do this:

  data = study,
  x = group,
  y = outcome,
  plot.type = 'box'

I like this a lot. It’s so simple that I’d probably use it as a quick and dirty sweep through the data, and it’s something you can give to a new student who is just learning R and wanting to use it to explore their first experiment. It hits a sweet spot of being just rich enough to be informative while being simple enough not to be onerous.

Back to my puzzle

Okay, so why don’t I feel comfortable with drawing standard errors? Well, let’s suppose I drew standard error bars around the sample means, exactly like I’m supposed to. Here’s a pretty typical way I might display that:

study_stats <- study %>%
  group_by(group) %>%
    gp_mean = mean(outcome),
    n = n(),
    gp_se = sd(outcome) / sqrt(n)

p <-
    data = study_stats,
    mapping = aes(x = factor(group), y = gp_mean)
  ) +
    aes(ymin = gp_mean - gp_se,
        ymax = gp_mean + gp_se,
        width = .3),
    lwd = 1.5,
    colour = "grey50"
  ) +
  geom_point(size = 3) +


Obviously there’s less variability in the error bars because we’re expressing the standard error of the sample mean. This is an inferential statistic pertaining to the population parameter, not a summary of the sample characteristics. Interpreting this in the usual frequentist way, the standard error bars are random variables that are expected to trap the true mean 66% of the time. Fair enough. I’m not personally all that big a fan of thinking in frequentist terms, but I’m not doing yet another tedious Bayes versus frequentist thing in this post.

I’m thinking more in everyday scientific terms here, and thinking more about the perceptual properties of the data visualisation here. Suppose I do something that seems intuitive: I’ll replicate the experiment 100 times, and then plot the sample mean that emerges in each replication over the top of these plots:

study_rep <- tibble()

for(i in 1:50){
  study_tmp <- tibble(
    group = rep(c("apples","oranges","bananas"), 20),
    outcome = rep(c(5.5,6,6.5), 20) + rnorm(60)
  ) %>%
  group_by(group) %>%
    gp_mean = mean(outcome)
  study_tmp$rep <- i
  study_rep <- rbind(study_rep, study_tmp)

study_rep$xval <- (study_rep$group %>%
  as.factor %>% as.numeric) + .075

p2 <- p +
    data = study_rep,
    mapping = aes(x=xval, y=gp_mean, group=rep, colour=rep),
    alpha = .5,
    lty = 2
  ) +
    data = study_rep,
    mapping = aes(x=xval, y=gp_mean),
    width = .025
  ) +
  scico::scale_colour_scico(palette = "lajolla")


To me, that’s visually surprising. The statistician within me knows that this kind of thing is to be expected, because the error bars I generated from the first exercise are subject to sampling error, and the points generated in the replication are also subject to sampling error. I know this. I’m not dumb, honest I’m not.

But I’m still surprised every time. My visual system absolutely refuses to attach the “correct” interpretation of the standard error to the original picture. I look at the original set of error bars, and I can’t make myself believe that the level of uncertainty is as high as it really is. Compare and contrast. Here’s the original plot again:


Here’s what happens if I use ggbetweenstats to summarise the replication studies:

  data = study_rep,
  x = group,
  y = gp_mean,
  plot.type = 'violin'

For the life of me, I can’t see how I’m supposed to look at the first plot and know that I’m supposed to expect the second? It feels deeply weird to me.


I should also mention that the ggstatsplot package has a few other nice plots too. It handles histograms like this…

  data = study,
  x = outcome,
  binwidth = .5

Correlation matrices appear like this…

mu <- rep(0,10)
sig <- diag(10)+ matrix(runif(100)-.3,10,10)
sig <- sig + t(sig)
corrs <- mvtnorm::rmvnorm(100, mean=mu, sigma=sig) %>%
  data = corrs,
  cor.vars = 1:10

… and so on. There are a few other things in there, and the package is a work in progress, so I imagine some of this will change and more options will become available later.

Danielle Navarro
Associate Professor of Cognitive Science