Mapping leaves
Recursive functional programming
at_depth
I love purrr.1 Aside from
its anonymous function
notation, one of the
functions that made me love the package was at_depth
, which iterates across a
list at a specified level of nesting. It has since been deprecated in favor of
modify_depth
, which is more powerful, but is significantly more finicky.
The additional power is because the .depth
parameter can now be passed a
negative integer to index up from the bottom of the list. When I first heard
this, I was excited, because it sounded like it could iterate across the ragged
leaf nodes of a list. After months of trying, I still could not get it to work.
I discovered that was because it does not iterate across leaf nodes, but instead
selects depth by indexing up from the deepest level of the list, iterating over
whatever nodes are at that level regardless of whether they are lists or not.
rapply
Part of my initial excitement upon hearing that modify_depth
could take
negative .depth
values was because I thought it may be less-confusing version
of rapply
, the second-most confusing function in base R after reshape
.2 rapply
is a recursive version of lapply
which
(in theory) lets you iterate a function across the leaf nodes of a list.
For example:
library(magrittr)
l <- list(1, list(2:3, 4, list(5:6), 7))
rapply(l, function(x) x + 1, how = "replace") %>%
str()
#> List of 2
#> $ : num 2
#> $ :List of 4
#> ..$ : num [1:2] 3 4
#> ..$ : num 5
#> ..$ :List of 1
#> .. ..$ : num [1:2] 6 7
#> ..$ : num 8
With the default how = "unlist"
, unlist
is called on the result, collapsing it:
rapply(l, function(x) x + 2)
#> [1] 3 4 5 6 7 8 9
The biggest problem with rapply
is that it decides whether to recurse over a node or apply the function based on the behavior of typeof
and is.list
, not those of class
. Consequently, it cannot operate on objects that are lists, like data frames or models—it will recurse right into them:
rapply(list(list(mtcars)), mean)
#> mpg cyl disp hp drat wt
#> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
#> qsec vs am gear carb
#> 17.848750 0.437500 0.406250 3.687500 2.812500
Mapping over leaf nodes
So let’s build our own recursive map/apply function that can manage list
objects. Sticking to purrr’s structure, we’ll make the function an S3 generic
with a default method that will do the recursion. Objects can be differentiated
from unclassed lists with rlang::is_bare_list
, and purrr::as_mapper
will let
it handle the same variety of input structures as map
and friends.
map_leaves <- function(.x, .f, ...){
UseMethod("map_leaves")
}
map_leaves.default <- function(.x, .f, ...){
if (rlang::is_bare_list(.x)) {
purrr::map(.x, map_leaves, .f, ...) # recurse!
} else {
.f <- purrr::as_mapper(.f, ...)
.f(.x, ...) # call function on node
}
}
Let’s try it out:
l %>% map_leaves(~.x + 1) %>% str()
#> List of 2
#> $ : num 2
#> $ :List of 4
#> ..$ : num [1:2] 3 4
#> ..$ : num 5
#> ..$ :List of 1
#> .. ..$ : num [1:2] 6 7
#> ..$ : num 8
ltcars <- map_leaves(l, ~mtcars[.x, ])
ltcars
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#>
#> [[2]]
#> [[2]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#>
#> [[2]][[2]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#>
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
#>
#>
#> [[2]][[4]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
Both of these could be done with rapply
. But rapply
can’t operate upon the
data frames of ltcars
, whereas map_leaves
can:
map_leaves(ltcars, dplyr::summarise_all, mean)
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#>
#> [[2]]
#> [[2]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.9 5 134 101.5 3.875 2.5975 17.815 0.5 1 4 2.5
#>
#> [[2]][[2]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#>
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 18.4 7 292.5 140 2.955 3.45 18.62 0.5 0 3 1.5
#>
#>
#> [[2]][[4]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
If we want to do the same thing in pure base R, it looks similar:
leafapply <- function(X, FUN, ...){
if (is.list(X) && is.null(attr(X, "class"))) {
lapply(X, leafapply, FUN, ...)
} else {
FUN(X, ...)
}
}
l %>% leafapply(`+`, 1) %>% str()
#> List of 2
#> $ : num 2
#> $ :List of 4
#> ..$ : num [1:2] 3 4
#> ..$ : num 5
#> ..$ :List of 1
#> .. ..$ : num [1:2] 6 7
#> ..$ : num 8
ltcars <- leafapply(l, function(x) mtcars[x, ])
ltcars
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#>
#> [[2]]
#> [[2]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#>
#> [[2]][[2]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#>
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
#>
#>
#> [[2]][[4]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
leafapply(ltcars, function(x) aggregate(. ~ 1, x, mean))
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#>
#> [[2]]
#> [[2]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.9 5 134 101.5 3.875 2.5975 17.815 0.5 1 4 2.5
#>
#> [[2]][[2]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#>
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 18.4 7 292.5 140 2.955 3.45 18.62 0.5 0 3 1.5
#>
#>
#> [[2]][[4]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
Collapsing lists but not list objects
Neither map_leaves
nor leafapply
can condense the results like rapply
does with how = "unlist"
, though. unlist
itself won’t work, because it
also ignores list objects. To collapse recursively, we need to write a smarter
version of unlist
.
In this case, we need to recurse over sublists of the input (and, because it is
recursive, potentially sublists of sublists), and then collapse the recursively
flattened list. We can use a similar test to map_leaves
and leafapply
to
identify sublists, but we also need a function to collapse each flat list. To
imitate unlist
, this would be c
, applied with purrr::invoke
or do.call
.
But it makes sense to make this function user-settable, because for other data
types, different collapsing functions will allow more forms of collapsing.
deflate <- function(.x, .f, ...){
UseMethod("deflate")
}
deflate.default <- function(.x, .f, ...){
.f <- purrr::as_mapper(.f)
is_sublist <- purrr::map_lgl(.x, rlang::is_bare_list)
.x[is_sublist] <- purrr::map(.x[is_sublist], deflate, .f, ...)
purrr::invoke(.f, .x, ...)
}
deflate
can imitate unlist
:
deflate(l, c)
#> [1] 1 2 3 4 5 6 7
But because it can take different collapsing functions, it can also do fancier
collapsing more appropriate for, say, data frames:
deflate(ltcars, dplyr::bind_rows)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Again, we can write the same thing in base R:
Deflate <- function(f, x, ...){
is_sublist <- vapply(x, function(y) is.list(y) && is.null(attr(y, "class")), logical(1))
x[is_sublist] <- lapply(x[is_sublist], Deflate, f = f, ...)
do.call(f, c(x, ...))
}
Deflate(c, l)
#> [1] 1 2 3 4 5 6 7
Deflate(rbind, ltcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Exciting!
Next steps
Some possible extensions:
- Both functions could be combined into one to mimic
rapply
more directly. I like the modular, composable form, though; it fits the Unix philosophy nicely. - A
class
parameter à larapply
could be added tomap_leaves
(and made way more useful thanrapply
’s). Presumably formap_leaves
, nodes that don’t match the class should be ignored; fordeflate
they should be subsetted out. - Make a version of
eapply
that takes aclass
parameter. - A version of
map_leaves
that indexes up from each leaf node could be created. While powerful, this has a lot of potential to be finicky to use. - Make new methods for similar objects like dendrograms à la
dendrapply
. - Put all these functions in a package, if people will use it.
- Something else? Add a comment or ping me on Twitter!
I’m also very fond of
Map
andReduce
and so on, so maybe I just like functional programming.↩I can now get
rapply
to work about half the time, and even used it usefully once. I have never gottenreshape
to do what I want. I’ve seen other people make it work, so I know it can function, but for the life of me I can’t understand its parameters.↩
Share this post