Background: Map of the Paris pneumatic tube system (Wikipedia)

Day 4: A series of tubes (magrittr)

Yesterday was supposed to be laundry day. I had a massive backlog of clothes that needed to be washed, another pile that needed folding, and several free hours in which I could put several loads through the washing machine. It didn’t quite go to plan: our neighbour has been doing some construction work and the drainage pipes haven’t been all that reliable. The first load I tried to put through caused water to back up through the drains, and suddenly half of the house is a wading pool. The kids were very pleased. My partner and I, less so.

Anyway, while bailing water out of the laundry and thinking about pipes, it occurs to me that I really ought to be using magrittr. It’s not that I don’t know how to use %>%, it’s just that I haven’t spent enough time thinking about what it means for my workflow. It’s a little shameful that I’ve never even bothered to look at the documentation for magrittr. Maybe I should do something about that…

A simple pipe

Writing your own pipe is really easy, as long as you don’t want it to be very good. Here I define my own pipe %-->% and use it to draw a barplot that counts the number of characters in every line to Greg! The stop sign! (as discussed in the emo-tism post from a couple of days ago). First we define the pipe function as a custom operator:

`%-->%` <- function(x,f) { f(x) }

Now I an use it the same way I would do with the %>% operator in the magrittr package:

greg <- "https://djnavarro.net/files/gregthestopsign.txt" 
greg %-->% readLines %-->% nchar %-->% barplot

The thing that I – and just about everyone else in #Rstats land it seems – love about this way of coding in R is the fact that it’s a nice metaphor for data analysis. You have a data source (greg) that gets “piped” through several steps (in this case it’s a read > process > plot sequence) and then out “pops” the result. Writing the same code in a more conventional way is rather less visually satisfying:

greg <- "https://djnavarro.net/files/gregthestopsign.txt" 
gregData <- readLines(greg)
gregSummary <- nchar(gregData)
barplot(gregSummary)

In the piped example, you get an intuitive feeling of the greg data flowing through each of the steps in a straight line, whereas in the traditional example it feels more like greg is zigzagging through the code, jumping from left to right as you move down the code! To my mind there’s nothing “right” or “wrong” about either version, but the more I use pipes in my own code the happier I feel about it.

My pipe is broken

The thing about plumbing, I mused to myself while scowling at my daughter who is gleefully splashing water all over the cupboards, is that it looks easier than it really is. Any fool can duct tape some PVC together and call it plumbing, but if it leaks then tree roots will get in and then it’s a whole thing and, well…

Clunky metaphors notwithstanding, the problem with my %-->% function is that it only works when the object on the right hand side of the pipe is a function. If I want to customise the barplot that gets drawn at the end – say, by drawing it in R-Ladies purple – it’s not sufficient to pipe my data to the barplot function. I need to tell R to call barplot with the col="#88398A" argument. Intuitively it feels like I what I want to do is this…

greg %-->% readLines %-->% nchar %-->% barplot(col="#88398A")
## Error in barplot.default(col = "#88398A"): argument "height" is missing, with no default

… but of course my stupid, stupid pipe doesn’t know how to handle that! The code I wrote for %-->% works in the original example because barplot is a function, but barplot(col="#8839A") describes a call to the barplot function. I could of course redefine my pipe to be a bit more sophisticated so that it can handle arbitrary calls as input, but then I’d have to try to remember how quote and eval and substitute and all those vile low-level functions work… and seriously, why bother reinventing the pipe? The %>% operator in the magrittr package already does everything I could ever want. So thanks to the joy that is %>% I can seamlessly mix functions like readLines with calls like barplot(col="#88398A") in my code, and magrittr is smart enought to handle it:

greg %>% readLines %>% nchar %>% barplot(col="#88398A")

If I’ve learned anything from the pool of water spreading across my laundry floor, it’s really important to know that you can trust your pipes.

There are several pipes in magrittr

Being a very lazy person, I’ve never really looked at the package documentation for magrittr. I should have. There are four operators in the package:

  • %>% is the forward-pipe operator.
  • %T>% is the tee operator.
  • %<>%is the compound assignment pipe-operator.
  • %$% is the exposition pipe-operator.

The %>% is the one that everyone uses all the time… but what are these other magical looking glyphs? What do they mean? Are they also gang signs?

Turns out they’re all kinds of awesomeness, and I am embarrassed not to have known about them before!

%T>

Suppose I want to have two outputs to flow out of my pipe: for instance, suppose I want to read the first 15 lines of the song, print them on screen, and then graph the length in characters in each line? Intuitively, what I want to do is split my pipe in two, kind of like having a T-junction in a road or pipe. That’s what the %T>% operator does. It passes the input “down” into one function, but that function is a dead end; the original value gets passed onto the next function. Example:

greg %>% 
  readLines(n = 15) %T>%    # the URL is passed to readLines...
  cat(sep = "\n") %>%       # ... take a little detour to print the lyrics
  nchar %>%                 # the output from *readLines* is passed to nchar...
  barplot(col = "#88398A")  # now we draw the barplot 
## [Intro]
## Ba-ba-ba, ba-ba, ba-ba-ba-ba
## Ba-ba-ba, ba-ba, ba-ba-ba-ba
## Ba-ba-ba, ba-ba, ba-ba-ba-ba
## Ba-ba-ba, ba-ba, ba-ba-ba-ba
## 
## [Verse 1: Ron]
## The guy who slagged the football team, those yobs were not for him
## He turns into a real estate agent who believes in discipline
## The guy who's first to use cocaine, the wild boy breaking free
## He'll end up in a court of law as a prosecuting QC
## Remember the school captain? Success was a matter of time
## I can hear him now as she screams, "Greg, you missed the stop sign!"
## 
## [Verse 2: Ron]

Cool! We now have two outputs from our pipe… the actual lyrics (first output, at the top) and the plot (second output at the bottom). Pretty neat!

%<>%

I am viscerally annoyed 😠 😁 that %<>% is referred to in the documentation as the compound assignment pipe-operator when it is so obviously the Ouroboros operator.

(original version)

The %<>% operator is what happens when you take a variable, feed it into a pipe, and then take the output from the end of the pipe and assign it back to the original variable. Clearly a snake eating its own tail.

me <- "snake"
me %<>% 
  (emo::ji) %>% 
  rep.int(times=10)
cat(me)
## 🐍 🐍 🐍 🐍 🐍 🐍 🐍 🐍 🐍 🐍

%$%

When the input (left hand side) to this pipe is a data frame (or list or environment), the %$% operator “exposes” the names of the variables inside the data frame to the function on the right hand side. I can imagine this being very handy in all kinds of situations. Plotting data in the traditional graphics system is much nicer, for one thing!

iris %$% plot(Sepal.Length, Sepal.Width, pch=19)

Soooo many plots that I could have drawn in a cleaner way if I’d known about %$%.

Epilogue

Okay, so magrittr is awesome 💯 and I love it. It’s done nothing to help me with my underwater laundry though.

Avatar
Danielle Navarro
Associate Professor of Cognitive Science

Related