Mapping leaves

Recursive functional programming

Edward Visel

10 minute read

at_depth

I love purrr.1 Aside from its anonymous function notation, one of the functions that made me love the package was at_depth, which iterates across a list at a specified level of nesting. It has since been deprecated in favor of modify_depth, which is more powerful, but is significantly more finicky.

The additional power is because the .depth parameter can now be passed a negative integer to index up from the bottom of the list. When I first heard this, I was excited, because it sounded like it could iterate across the ragged leaf nodes of a list. After months of trying, I still could not get it to work. I discovered that was because it does not iterate across leaf nodes, but instead selects depth by indexing up from the deepest level of the list, iterating over whatever nodes are at that level regardless of whether they are lists or not.

rapply

Part of my initial excitement upon hearing that modify_depth could take negative .depth values was because I thought it may be less-confusing version of rapply, the second-most confusing function in base R after reshape.2 rapply is a recursive version of lapply which (in theory) lets you iterate a function across the leaf nodes of a list.

For example:

library(magrittr)

l <- list(1, list(2:3, 4, list(5:6), 7))

rapply(l, function(x) x + 1, how = "replace") %>% 
    str()
#> List of 2
#>  $ : num 2
#>  $ :List of 4
#>   ..$ : num [1:2] 3 4
#>   ..$ : num 5
#>   ..$ :List of 1
#>   .. ..$ : num [1:2] 6 7
#>   ..$ : num 8

With the default how = "unlist", unlist is called on the result, collapsing it:

rapply(l, function(x) x + 2)
#> [1] 3 4 5 6 7 8 9

The biggest problem with rapply is that it decides whether to recurse over a node or apply the function based on the behavior of typeof and is.list, not those of class. Consequently, it cannot operate on objects that are lists, like data frames or models—it will recurse right into them:

rapply(list(list(mtcars)), mean)
#>        mpg        cyl       disp         hp       drat         wt 
#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250 
#>       qsec         vs         am       gear       carb 
#>  17.848750   0.437500   0.406250   3.687500   2.812500

Mapping over leaf nodes

So let’s build our own recursive map/apply function that can manage list objects. Sticking to purrr’s structure, we’ll make the function an S3 generic with a default method that will do the recursion. Objects can be differentiated from unclassed lists with rlang::is_bare_list, and purrr::as_mapper will let it handle the same variety of input structures as map and friends.

map_leaves <- function(.x, .f, ...){
    UseMethod("map_leaves")
}

map_leaves.default <- function(.x, .f, ...){
    if (rlang::is_bare_list(.x)) {
        purrr::map(.x, map_leaves, .f, ...)    # recurse!
    } else {
        .f <- purrr::as_mapper(.f, ...)
        .f(.x, ...)    # call function on node
    }
}

Let’s try it out:

l %>% map_leaves(~.x + 1) %>% str()
#> List of 2
#>  $ : num 2
#>  $ :List of 4
#>   ..$ : num [1:2] 3 4
#>   ..$ : num 5
#>   ..$ :List of 1
#>   .. ..$ : num [1:2] 6 7
#>   ..$ : num 8

ltcars <- map_leaves(l, ~mtcars[.x, ])
ltcars
#> [[1]]
#>           mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4
#> 
#> [[2]]
#> [[2]][[1]]
#>                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 
#> [[2]][[2]]
#>                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#>                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.46 20.22  1  0    3    1
#> 
#> 
#> [[2]][[4]]
#>             mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Duster 360 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4

Both of these could be done with rapply. But rapply can’t operate upon the data frames of ltcars, whereas map_leaves can:

map_leaves(ltcars, dplyr::summarise_all, mean)
#> [[1]]
#>   mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1  21   6  160 110  3.9 2.62 16.46  0  1    4    4
#> 
#> [[2]]
#> [[2]][[1]]
#>    mpg cyl disp    hp  drat     wt   qsec  vs am gear carb
#> 1 21.9   5  134 101.5 3.875 2.5975 17.815 0.5  1    4  2.5
#> 
#> [[2]][[2]]
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#>    mpg cyl  disp  hp  drat   wt  qsec  vs am gear carb
#> 1 18.4   7 292.5 140 2.955 3.45 18.62 0.5  0    3  1.5
#> 
#> 
#> [[2]][[4]]
#>    mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4

If we want to do the same thing in pure base R, it looks similar:

leafapply <- function(X, FUN, ...){
    if (is.list(X) && is.null(attr(X, "class"))) {
        lapply(X, leafapply, FUN, ...)
    } else {
        FUN(X, ...)
    }
}

l %>% leafapply(`+`, 1) %>% str()
#> List of 2
#>  $ : num 2
#>  $ :List of 4
#>   ..$ : num [1:2] 3 4
#>   ..$ : num 5
#>   ..$ :List of 1
#>   .. ..$ : num [1:2] 6 7
#>   ..$ : num 8

ltcars <- leafapply(l, function(x) mtcars[x, ])
ltcars
#> [[1]]
#>           mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4
#> 
#> [[2]]
#> [[2]][[1]]
#>                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 
#> [[2]][[2]]
#>                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#>                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.46 20.22  1  0    3    1
#> 
#> 
#> [[2]][[4]]
#>             mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Duster 360 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4

leafapply(ltcars, function(x) aggregate(. ~ 1, x, mean))
#> [[1]]
#>   mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1  21   6  160 110  3.9 2.62 16.46  0  1    4    4
#> 
#> [[2]]
#> [[2]][[1]]
#>    mpg cyl disp    hp  drat     wt   qsec  vs am gear carb
#> 1 21.9   5  134 101.5 3.875 2.5975 17.815 0.5  1    4  2.5
#> 
#> [[2]][[2]]
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 
#> [[2]][[3]]
#> [[2]][[3]][[1]]
#>    mpg cyl  disp  hp  drat   wt  qsec  vs am gear carb
#> 1 18.4   7 292.5 140 2.955 3.45 18.62 0.5  0    3  1.5
#> 
#> 
#> [[2]][[4]]
#>    mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4

Collapsing lists but not list objects

Neither map_leaves nor leafapply can condense the results like rapply does with how = "unlist", though. unlist itself won’t work, because it also ignores list objects. To collapse recursively, we need to write a smarter version of unlist.

In this case, we need to recurse over sublists of the input (and, because it is recursive, potentially sublists of sublists), and then collapse the recursively flattened list. We can use a similar test to map_leaves and leafapply to identify sublists, but we also need a function to collapse each flat list. To imitate unlist, this would be c, applied with purrr::invoke or do.call. But it makes sense to make this function user-settable, because for other data types, different collapsing functions will allow more forms of collapsing.

deflate <- function(.x, .f, ...){
    UseMethod("deflate")
}

deflate.default <- function(.x, .f, ...){
    .f <- purrr::as_mapper(.f)
    is_sublist <- purrr::map_lgl(.x, rlang::is_bare_list)
    .x[is_sublist] <- purrr::map(.x[is_sublist], deflate, .f, ...)
    purrr::invoke(.f, .x, ...)
}

deflate can imitate unlist:

deflate(l, c)
#> [1] 1 2 3 4 5 6 7

But because it can take different collapsing functions, it can also do fancier
collapsing more appropriate for, say, data frames:

deflate(ltcars, dplyr::bind_rows)
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
#> 7 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4

Again, we can write the same thing in base R:

Deflate <- function(f, x, ...){
    is_sublist <- vapply(x, function(y) is.list(y) && is.null(attr(y, "class")), logical(1))
    x[is_sublist] <- lapply(x[is_sublist], Deflate, f = f, ...)
    do.call(f, c(x, ...))
}

Deflate(c, l)
#> [1] 1 2 3 4 5 6 7
Deflate(rbind, ltcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
#> Duster 360        14.3   8  360 245 3.21 3.570 15.84  0  0    3    4

Exciting!

Next steps

Some possible extensions:

  • Both functions could be combined into one to mimic rapply more directly. I like the modular, composable form, though; it fits the Unix philosophy nicely.
  • A class parameter à la rapply could be added to map_leaves (and made way more useful than rapply’s). Presumably for map_leaves, nodes that don’t match the class should be ignored; for deflate they should be subsetted out.
  • Make a version of eapply that takes a class parameter.
  • A version of map_leaves that indexes up from each leaf node could be created. While powerful, this has a lot of potential to be finicky to use.
  • Make new methods for similar objects like dendrograms à la dendrapply.
  • Put all these functions in a package, if people will use it.
  • Something else? Add a comment or ping me on Twitter!

  1. I’m also very fond of Map and Reduce and so on, so maybe I just like functional programming.

  2. I can now get rapply to work about half the time, and even used it usefully once. I have never gotten reshape to do what I want. I’ve seen other people make it work, so I know it can function, but for the life of me I can’t understand its parameters.

comments powered by Disqus