Untidy evaluation

by Danielle Navarro, 27 Jan 2020



Due to terrible planning on my part I have found myself at rstudio::conf with an extra day to spare. I’ve answered my emails to the best of my jetlagged ability, I’m confused and disoriented in spite of the three coffees I’ve had so far, and so naturally this is an ideal time to revisit my understanding of non-standard evaluation (NSE) in R … the logic for this being that I’m already maxxed out on confusion and disorientation, so spending some time thinking about metaprogramming can’t possibly make things worse.

Right?

I’ve written about NSE once before, back when I was doing my “100 days of CRAN” package tryout exercise. I’m going to try to expand on that post here, to see if I’ve learned anything useful lately. This is probably going to be a two part post… the first part will talk about (my understanding of) how non-standard evaluation works in base R, and the second part will move on to discussing how it works within tidyverse.

Using quote() and eval() to work with expressions

One of the weirder features of R as a programming language is its lazy evaluation model. Whenever you write an R command – or, more precisely, whenever you create an R expression that should “evaluate” to some value – the language doesn’t actually execute that command on the spot. R is lazy. Maybe you, the user, will never actually do anything with that value, and if so, why should R bother doing all that calculation? So instead of evaluating it immediately R creates a “promise” to evaluate that expression later on. Once the promise is created, R waits until the value of the expression is needed for some other purpose, and only then does it calculate the value of the expression. Normally you never notice this. For example in this code snippet

a <- 1
print(a + 1)
## [1] 2

the value of the expression a + 1 is needed the moment that it needs to be printed to the console, so it evaluates more or less immediately. However that’s not always the case, and R allows you to capture the unevaluated expression using the quote() function:

a <- 1
x <- a + 1
y <- quote(a + 1)

The difference between the assignment operation that we used to give x a value forces the expression a + 1 to evaluate at that point in time and x already has a value of 2. In contrast y captures the unevaluate expression and – as far as the variable y is concerned – the expression a + 1 has not yet been given any numerical value. To see this, let’s now change the value of a:

a <- 10

If we print out x we can see that it still has the value of 2 as you’d expect, because the a + 1 expression was evaluated earlier and x no longer has any connection to the symbol a.1

x
## [1] 2

However, this is not the case for y, because the value of a + 1 has not yet been evaluated in this case. So if we use the eval() function to get R to evaluate y we get a value of 11 because R uses the current value of a to do this:

eval(y)
## [1] 11

Making promises

In the example above, when I created x by the usual assigment operation x <- a + 1 the mere act of assignment was enough to force R to evaluate the expression a + 1. For the most part, it doesn’t matter how you do the assignment. So, for example, if I use the assign() function to do the same thing, the value of x is evaluated immediately, as the following illustrates:

a <- 1
assign("x", a + 1)
a <- 10
x
## [1] 2

However, R also has a delayedAssign() function that allows you to create the promise object directly. If I (promise to) create the variable x in this fashion, it is not assigned any value until it is next needed. That means that the a + 1 expression remains unevaluated until that time. So this time we get a different answer:

a <- 1
delayedAssign("x", a + 1)
a <- 10
x
## [1] 11

On the weird behaviour of substitute()

Okay so it sort of makes sense that the idea behind quote() is to capture the argument as an unevaluated expression.

a <- 1
x <- quote(a + 1)

In this code snippet, the symbol a has been bound to the numeric value 1 by the assignment, but the symbol x has been the unevaluated expression a + 1. We can force the evaluation of this expression simply by typing eval(x), but how do we capture the expression (i.e., a + 1) to which x has been bound? This is the job of the substitute() fuction. Instead of capturing the literal expression that we enter, it rewrites the expression according to the following rules. If I type substitute(x), the result is as follows:

  • Rule 1: If x does not refer to an object in this environment, then substitute() returns x as an unevaluated expression, exactly as if I had typed quote(x)
  • Rule 2: If x is a promise object (e.g., if x <- quote(a + 1)), then substitute() returns the unevaluated expression to which the promise refers (i.e., the expression a + 1 is returned)
  • Rule 3: If x is an ordinary variable that has already been assigned a value, then that value is returned

Okay, that makes sense, so let’s try this. Currently in my global environment I have a as an ordinary variable and x as the expression a + 1. So all I have to do to capture the expression a + 1 is substitute(x), right?

substitute(x)
## x

Um… what?

Okay, so maybe I lied. The list of rules I gave above is not quite accurate. The rules above explain how substitute() works as long as the input is being evaluated anywhere except the global environment. This is unbearably annoying, and I am reliably informed by Thomas Lumley that the reason R does this was originally to be compatible with the S language, which didn’t have environments. If your call to substitute() is evaluated in the global environment, then it behaves like quote().

This confused me for a very long time because when I was first teaching myself non-standard evaluation, I was playing around with quote() and substitute() at the console, and I couldn’t work out what the difference between the two actually was. Worse, if you’re using substitute() inside a function (which is what you usually do) your code is almost certainly not going to be evaluated in the global environment, and as consequence, it will do something different inside your function than it will if you type the same code at the console.

It is utterly infuriating!

The simplest way I can think of to illustrate what substitute() does without writing a function is to note that it has an env argument that lets the user specify where the expression is to be evaluated. So lets create a list l, and inside this list we’ll create two variables a and x:

l <- list(
  a = 10, 
  x = quote(a + 2)
)

So now when I do the substitution inside the list l, I get the behaviour described in my list of three rules. There is no variable b inside this list so Rule 1 applies and subsitute() behaves the same as quote():

substitute(b, env = l) 
## b

In contrast not only does x exist in the list, it also corresponds to an unevaluated expression, so Rule 2 applies and substitute() returns that expression:

substitute(x, env = l)
## a + 2

Finally, because a exists inside the list as an ordinary variable, Rule 3 applies and substitute() returns the value of a:

substitute(a, env = l) 
## [1] 10

We could do the same thing a little more explicitly by creating a new environment e and assigning variables inside the environment:

e <- new.env()
assign("a", 10, envir = e)
assign("x", quote(a + 2), envir = e)

These would produce the same answers as before:

substitute(b, env = e)
substitute(x, env = e)
substitute(a, env = e)

Similarly, if I enclose everything inside a function f() and then call that function, I get answers that follow the three rules:

f <- function() {
  a <- 1
  x <- quote(a + 1)
  print(substitute(b)) # Rule 1  
  print(substitute(x)) # Rule 2
  print(substitute(a)) # Rule 3
}
f()
## b
## a + 1
## [1] 1

Sigh. I love this language dearly, but R is just maddening sometimes.


  1. Note that because in R multiple symbols can point to the same object, this would not strictly be the case if I’d set x <- a rather than x <- a + 1, but that’s irrelevant to the current post and is another topic entirely…