Anonymous Functions, Not Variables
Use the LHS of a formula to specify variable names in purrr-style lambda functions
I am a very heavy purrr user. The killer
feature is clearly map_df
(fairly recently rebranded as map_dfr
and
map_dfc
for row and column binding, respectively) to iterate over a list à la
lapply
and simplify the result to a data frame. Thanks to the power of
dplyr::bind_rows
, it fixes all the drawbacks of sapply
’s simplify2array
behavior:
- It returns a data frame, not a matrix or array, so multiple types can be kept. Never see a matrix of lists again.
- Variables are aligned by names, not locations, and thus will not be recycled
to locations they do not belong, and
NA
s will be inserted as appropriate. - Turning the list names into a column is as simple as passing a name for it to
the
.id
parameter.
Along with tidyr’s gather
and spread
, map_df
is indispensable for me—to
do without would mean a lot of do.call(rbind, ...)
that can go very wrong
without warning.
purrr has lots of other lovely functions, too, like transpose
, partial
, and
cross
that are well worth learning. At the core of its functional programming
abilities, though, is as_mapper
, a function unlikely to be called much
directly, but which is called by every map
variant to decide what to do with
the .f
and ...
parameters, which may either be parameters for subsetting
each element or a function, possibly with additional parameters. In the latter
case, purrr:::as_mapper.default
is [now] simply a wrapper for
rlang::as_closure
.
This function enables one of purrr’s far-reaching conveniences: the ability to
define anonymous functions with formulas. In its most basic format, e.g. a
unary function for map
, function(x) as.character(x)
can be rewritten as
~as.character(.x)
. The ~
both quotes the function, and can be read for the
unary case as function(.x)
.
For unary functions, the input can be referred to as .x
, just .
(which I
generally avoid to eliminate confusion with data piped in), or ..1
, since it
is the first parameter. For binary functions like those for map2
or reduce
,
the inputs are typically referred to as .x
and .y
(though the first behaves
the same as that of a unary function, and the second, as such, can also be
referred to as ..2
). For a polyadic function like that for pmap
, the
parameters are input as ...
, which can be collected, e.g. params = list(...)
, or accessed directly by number with ..1
, ..2
, etc. notation as
usual.
While this notation has a small learning curve, the convenience when composing
anonymous functions adds up. Keystrokes are saved. Less noticeably but perhaps
more importantly, the mental effort of naming parameters is eliminated. (“Did I
call something x
already?”)
This is all very good. However, there are limitations. In particular, nested calls with this notation will typically not work, because the variable is called the same thing, e.g.
library(purrr)
map(c("a", "b", "c"),
~map_chr(1:3,
~paste0(.x, .x))) %>%
str()
#> List of 3
#> $ : chr [1:3] "11" "22" "33"
#> $ : chr [1:3] "11" "22" "33"
#> $ : chr [1:3] "11" "22" "33"
Assuming such a case where double iteration is necessary and cannot be vectorized away1, there is simply no convenient way to write this code with this notation. If the function is supposed to be passed to the first parameter of one of the functions, an alternative is to pass the function and other parameters separately, e.g.
map(c("a", "b", "c"),
~map_chr(1:3,
paste0,
.x)) %>%
str()
#> List of 3
#> $ : chr [1:3] "1a" "2a" "3a"
#> $ : chr [1:3] "1b" "2b" "3b"
#> $ : chr [1:3] "1c" "2c" "3c"
but this gets hard to read, and the order cannot be reversed without flipping the loops, which may be impractical or impossible. Thus, the alternative is to fall back to standard anonymous function notation so as to specify different names for each parameter:
map(c("a", "b", "c"),
function(x) {
map_chr(1:3, function(y) paste0(x, y))
}) %>%
str()
#> List of 3
#> $ : chr [1:3] "a1" "a2" "a3"
#> $ : chr [1:3] "b1" "b2" "b3"
#> $ : chr [1:3] "c1" "c2" "c3"
This works, but loses a lot of convenience for the sake of avoiding a name clash. But I think there is a better way.
The functions in question are anonymous
functions, i.e. they are not
assigned to a name, but are rather raw expressions that define a function. In R,
the most common usage of such functions (sometimes called lambda expressions
due to their origins in Alonzo Church’s lambda
calculus) is when passing a
function as a parameter to a function like map
. Since their introduction in
Lisp, anonymous functions have become a part of most modern programming
languages. Syntax differs, of course; for example in Racket, a modern Lisp:
((lambda (x)
(* x x))
2)
#> 4
or Python:
(lambda x: print(x * x))(2)
#> 4
or Julia:
((x) -> x * x)(2)
#> 4
or R:
(function(x) x * x)(2)
#> [1] 4
Comparing these syntaxes to that of purrr’s anonymous functions, a disadvantage
of the latter becomes apparent: there is no way to specify parameter names,
which is why nested evaluation is impossible. That said, given both a notation
like that of Julia above and how models in R use formula notation, it seems
plausible that variable names could be specified on the otherwise unused
left-hand side (LHS) of a formula function.2
Multiple variables cannot be separated by commas (which would terminate the
formula), but could be separated by +
, like multiple terms in a regression.
To build such capability, let’s use rlang, which has handy utilities for parsing and creating formulas and functions.
- The variable names can be parsed from the LHS (returned as an expression by
f_lhs
) withbase::all.vars
, which takes an expression and returns a character vector of variables it uses. - The function can be created with
new_function
, which requires- the parameters, passed as a named list. Since there should be no defaults,
the
args
parameter should be a list with named empty elements. For known inputs, such a list can be created withalist
, e.g.alist(x = , y = )
, but because of its quoting semantics, it is very difficult to program with. A suitable list can be generated by simply making a list of empty expressions, though. - the body, which is simply the right-hand side of the formula, extracted
with
f_rhs
, and - the environment. Formulas, like functions, have environments, which can be
accessed with
f_env
.
- the parameters, passed as a named list. Since there should be no defaults,
the
All together,
library(rlang)
lambda <- function(x){
arg_names <- all.vars(f_lhs(x))
new_function(args = set_names(rep(list(expr()),
length(arg_names)),
arg_names),
body = f_rhs(x),
env = f_env(x))
}
Now we can write formula functions in a fashion fully equivalent to ordinary ones, e.g.
lambda(x ~ x * x)
#> function (x)
#> x * x
lambda(x ~ x * x)(2)
#> [1] 4
It works for multiple parameters, too:
lambda(x + na.rm ~ mean(x, na.rm = na.rm))(c(1, NA, 3), na.rm = TRUE)
#> [1] 2
Now, in the original nested map
example,
map(c("a", "b", "c"),
lambda(x ~ map_chr(1:3,
lambda(y ~ paste0(x, y))))) %>%
str()
#> List of 3
#> $ : chr [1:3] "a1" "a2" "a3"
#> $ : chr [1:3] "b1" "b2" "b3"
#> $ : chr [1:3] "c1" "c2" "c3"
Ideally, it would be nice to drop the lambda
call altogether, but as far as I
have been able to find, the only way to do so would be to redefine
purrr:::as_mapper.default
to call lambda
instead of rlang::as_closure
when passed a formula with anything on the left-hand side.
For now, then, extra keystrokes abide.3
Here,
outer(c("a", "b", "c"), 1:3, paste0)
, though that does return a matrix instead of a list.↩The fascinating magrittr alternative pipeR implements something of the sort for naming data piped into a function.↩
Maybe not all of them; see Part 2: gsubfn.↩
Share this post