Source

Days 26-27: Expressions

A few days ago I postedabout my first attempt to understand metaprogramming in R, trying to wrap my head around non-standard evaluation with the help of Hadley Wickham’s Advanced R book. That was a lot of fun. This time around, it’s the expressions section – well the first half of it anyway 😀 – and an attempt to go through the looking glass and understand what is going on internally with R expressions. To make my life a little easier, I’m going to follow the gentle suggestions in the book and use some of the tools in the pryr package…

library(pryr)

(source)

Okay, so let’s say I’m thinking about a command like paste("alice","in","wonderland") and I want to understand what the corresponding R expression is, how it is structured, and how I can manipulate it. The first thing I want to do is use quote to capture the expression itself rather than have R evaluate it…

my_call <- quote(paste("alice","in","wonderland"))
my_call
## paste("alice", "in", "wonderland")

If I were then to type class(my_call) I could verify that this is an object of class call, and if I were to then use the command eval(my_call) R would then execute the command, pasting the three strings together into a single string…

eval(my_call)
## [1] "alice in wonderland"

Structure of a call

In the last post I didn’t really do much with expressions other than learn how to quote and evaluate them, which is super-helpful, but I do want to get a better sense of what’s actually going on with these things. One of the things I find really helpful is to recognise that a call has a lot in common with a list. To illustrate:

as.list(my_call)
## [[1]]
## paste
## 
## [[2]]
## [1] "alice"
## 
## [[3]]
## [1] "in"
## 
## [[4]]
## [1] "wonderland"

So my_call[[1]] corresponds to the paste part of the call, with my_call[[2]] capturing the "alice" string, and so on. Like lists, calls can be manipulated using [[ ]] or [ ]. A more visually appealing representation of the structure of a call is provided by the call_tree function in the pryr package that takes the quoted call as input and prints it out as a tree:

call_tree(my_call)
## \- ()
##   \- `paste
##   \-  "alice"
##   \-  "in"
##   \-  "wonderland"

(source)

The different elements of a call can be different kinds of object, including other calls. For the fairly simple call we just created, we get this:

sapply(my_call, class)
## [1] "name"      "character" "character" "character"

The first element of the call is a name (also referred to as a symbol), which in this case refers to the paste function. Later elements capture the arguments fed to the function, which in this case corresponds to the three strings.

Changing which function gets called

The list-like behaviour of calls is useful. For instance, in my_call I wanted to paste the three strings into one but perhaps you want to combine them into a vector when you make your_call. One way to do that is to replace the first element of the call with c:

your_call <- my_call
your_call[[1]] <- quote(c)

So now if we compare these two calls…

my_call
your_call
## paste("alice", "in", "wonderland")
## c("alice", "in", "wonderland")

… and evaluate them …

eval(my_call)
eval(your_call)
## [1] "alice in wonderland"
## [1] "alice"      "in"         "wonderland"

… we do indeed get different results!

(source)

Calls in calls

Calls can include other calls. So for instance, suppose the queen_call is a shouty call, and converts everything to uppercase:

queen_call <- quote(toupper(x))
call_tree(queen_call)
## \- ()
##   \- `toupper
##   \- `x

But perhaps that isn’t quite what the queen wants. Perhaps she wants to do your_call but just in a shouty voice. What I need to do is insert your_call where the x currently sits:

queen_call[[2]] <- your_call
call_tree(queen_call)
## \- ()
##   \- `toupper
##   \- ()
##     \- `c
##     \-  "alice"
##     \-  "in"
##     \-  "wonderland"

Another way to do the same thing is to directly construct a call using the call function, using another call as one of the arguments. For instance, suppose the king wants the exact same thing as the queen. I can do that like this

king_call <- call("toupper", your_call)

These are identical calls:

identical(queen_call, king_call)
## [1] TRUE

Call trees can get messy!

Perhaps my friend wants to combine my call with the queen’s call. There is nothing stopping her from creating her_call in the same way:

her_call <- my_call
her_call[[4]] <- queen_call
call_tree(her_call)
## \- ()
##   \- `paste
##   \-  "alice"
##   \-  "in"
##   \- ()
##     \- `toupper
##     \- ()
##       \- `c
##       \-  "alice"
##       \-  "in"
##       \-  "wonderland"

Her call now looks like this:

her_call
## paste("alice", "in", toupper(c("alice", "in", "wonderland")))

It evaluates to this:

eval(her_call)
## [1] "alice in ALICE"      "alice in IN"         "alice in WONDERLAND"

(source)

Inserting a named argument into a call

Now it turns out that the jack likes my friend’s call, but asks that the results be pasted together properly. We can do that using the collapse argument to the paste function. Because calls support the $ method, we can directly specify jack_call$collapse in order to insert collapse as a named argument…

jack_call <- her_call
jack_call$collapse <- quote(" *** ")
call_tree(jack_call)
## \- ()
##   \- `paste
##   \-  "alice"
##   \-  "in"
##   \- ()
##     \- `toupper
##     \- ()
##       \- `c
##       \-  "alice"
##       \-  "in"
##       \-  "wonderland"
##   \-  " *** "

The resulting call now looks like this:

jack_call
## paste("alice", "in", toupper(c("alice", "in", "wonderland")), 
##     collapse = " *** ")

It evaluates to this:

eval(jack_call)
## [1] "alice in ALICE *** alice in IN *** alice in WONDERLAND"

(source)

Differences between apparently-identical calls…

Calls are sneaky things. There’s an exercise in the Advanced R book that asks us to compare these two calls:

a <- call("mean", 1:10)
b <- call("mean", quote(1:10))

They look the same…

print(a)
print(b)
## mean(1:10)
## mean(1:10)

They evaluate the same…

eval(a)
eval(b)
## [1] 5.5
## [1] 5.5

But they are not the same:

identical(a,b)
## [1] FALSE

Okay, so since I do like to try at least some of the exercises, it’s worth asking why they aren’t the same. It helps to show the call tree for both:

call_tree(a)
## \- ()
##   \- `mean
##   \- <integer>
call_tree(b)
## \- ()
##   \- `mean
##   \- ()
##     \- `:
##     \-  1
##     \-  10

Ah, that makes sense. In the first case, the expression 1:10 has already been evaluated and is a vector of integers in a. In the second case we quoted the expression so when b is created it takes the form of a nested call to : inside the call to mean.

Hm… the exercise is asking me to say which one I prefer? Well, a is simpler, and has no nested calls. So in this case the second element a[[2]] is a vector of integers, and if I change the second element of that vector to -10, I get this as my modified call:

a[[2]][2] <- -10
print(a)
eval(a)
## mean(c(1, -10, 3, 4, 5, 6, 7, 8, 9, 10))
## [1] 4.3

On the other hand b contains 1:10 as an unevaluated call - i.e. call(":",1,10) - so modifying the second element of that nested call changes it to -10:10:

b[[2]][2] <- -10
print(b)
eval(b)
## mean(-10:10)
## [1] 0

I’m not sure which one I prefer in general. I think if I were genuinely doing metaprogramming, I’d prefer b since I can intervene on the structure of the call at any level. Nothing has been evaluated yet so I have more flexibility in terms of what I can do with it. On the other hand, if I were doing data analysis and I actually had a set of numbers I wanted to compute the mean for, I suppose the shallower tree in a makes more sense? But then again, why would I ever want to compute the mean for 1:10? I feel like there’s some weirdness here caused by the fact that mean(1:10) is just not something that ever appears in real data analysis!

(source)

Standardising calls…

The last thing I want to play with today, because it’s getting late and I’m sleepy, is the ordering of terms within a call. When a call is created it preserves the order and names of the arguments. So here are three calls that would all run the same analysis,

a <- quote(lm(outcome ~ predictor, dataset))
b <- quote(lm(formula = outcome ~ predictor, data = dataset))
c <- quote(lm(data = dataset, formula = outcome ~ predictor))

but they are not the same call:

print(a)
print(b)
print(c)
## lm(outcome ~ predictor, dataset)
## lm(formula = outcome ~ predictor, data = dataset)
## lm(data = dataset, formula = outcome ~ predictor)

The call trees include the arguments in different order:

call_tree(a)
## \- ()
##   \- `lm
##   \- ()
##     \- `~
##     \- `outcome
##     \- `predictor
##   \- `dataset
call_tree(b)
## \- ()
##   \- `lm
##   \- ()
##     \- `~
##     \- `outcome
##     \- `predictor
##   \- `dataset
call_tree(c)
## \- ()
##   \- `lm
##   \- `dataset
##   \- ()
##     \- `~
##     \- `outcome
##     \- `predictor

And accordingly they are not identical to one another:

identical(a,b)
identical(a,c)
identical(b,c)
## [1] FALSE
## [1] FALSE
## [1] FALSE

This makes sense, but I can see that being inconvenient in a lot of situations. To help with that, the pryr package contains a standardise_call function that sorts the arguments into their canonical order and assigns them names. So if I standardise all three of these calls,

aa <- standardise_call(a)
bb <- standardise_call(b)
cc <- standardise_call(c)

I end up with three calls that look the same…

print(aa)
print(bb)
print(cc)
## lm(formula = outcome ~ predictor, data = dataset)
## lm(formula = outcome ~ predictor, data = dataset)
## lm(formula = outcome ~ predictor, data = dataset)

have the same call tree…

call_tree(aa)
## \- ()
##   \- `lm
##   \- ()
##     \- `~
##     \- `outcome
##     \- `predictor
##   \- `dataset
call_tree(bb)
## \- ()
##   \- `lm
##   \- ()
##     \- `~
##     \- `outcome
##     \- `predictor
##   \- `dataset
call_tree(cc)
## \- ()
##   \- `lm
##   \- ()
##     \- `~
##     \- `outcome
##     \- `predictor
##   \- `dataset

and are judged identical:

identical(aa,bb)
identical(aa,cc)
identical(bb,cc)
## [1] TRUE
## [1] TRUE
## [1] TRUE

Yay!

Oh, one last thing. As far as I can tell standardise_call doesn’t work on a call to a primitive function. If I try the same exercise using round(3.1415, 2) – which, foolishly, was the first thing I tried – rather than lm(outcome ~ predictor, dataset) it doesn’t modify the calls in any way. In retrospect that’s not surprising given that you can’t just peer inside a primitive function, and that’s reflected in the source code:

standardise_call
## function (call, env = parent.frame()) 
## {
##     stopifnot(is.call(call))
##     f <- eval(call[[1]], env)
##     if (is.primitive(f)) 
##         return(call)
##     match.call(f, call)
## }
## <environment: namespace:pryr>

To be continued?

Much to my surprise, I think I understood most of what I was doing today! Yay me! With any luck I’ll manage to make it through the rest of the metaprogramming section to Advanced R in later posts. This is fun!

(source)

Avatar
Danielle Navarro
Associate Professor of Cognitive Science

Related