Why use purrr :: map instead of lapply?

Is there a reason I should use

map(<list-like-object>, function(x) <do stuff>) 

instead

 lapply(<list-like-object>, function(x) <do stuff>) 

the conclusion should be the same, and the results of the tests that I did show that lapply little faster (it should be like map should evaluate all the input data of a non-standard estimate).

So, is there a reason why, for such simple cases, should I really consider switching to purrr::map ? I am not asking here about syntax sympathies or sympathies, other functions provided by purrr, etc., but strictly about comparing purrr::map with lapply , assuming using a standard estimate, i.e. map(<list-like-object>, function(x) <do stuff>) . Is there any advantage that purrr::map has in terms of performance, exception handling, etc.? The comments below show that this is not the case, but maybe someone can develop a little more?

+137
r purrr
Jul 14 '17 at 10:45
source share
3 answers

If the only function you use from purrr is map() , then no, the benefits are not significant. As Rich Paulo points out, the main advantage of map() is its helpers, which allow you to write compact code for general special cases:

  • ~. + 1 ~. + 1 ~. + 1 ~. + 1 equivalent to function(x) x + 1

  • list("x", 1) equivalent to function(x) x[["x"]][[1]] . These helpers are more general than [[ - see ?pluck for details. For a .default rectangle, the .default argument .default especially useful.

But in most cases, you do not use a single *apply() / map() function, you use several of them, and the advantage of purrr is that the functions are much more consistent. For example:

  • The first argument to lapply() is the data; The first argument to mapply() is the function. The first argument for all map functions is always data.

  • With vapply() , sapply() and mapply() you can suppress the names in the output with USE.NAMES = FALSE ; but lapply() does not have this argument.

  • There is no consistent way to pass consistent arguments to the mapper function. Most functions use ... but mapply() uses MoreArgs (which you expect to call MORE.ARGS ), and Map() , Filter() and Reduce() expect you to create a new anonymous function. In map functions, a constant argument always comes after the function name.

  • Almost every purrr function is type stable: you can predict the type of output solely from the function name. This is not the case for sapply() or mapply() . Yes, there is vapply() ; but there is no equivalent for mapply() .

You might think that all these minor differences are not important (just as some people think that there are no advantages compared to R string regular expressions), but in my experience they cause unnecessary friction when programming (different orders of arguments are always used for disconnecting to me), and they make it difficult to learn functional programming methods, because besides big ideas, you also need to learn a bunch of random details.

Purrr also fills in some handy map options that are not in the R database:

  • modify() saves the data type using [[<- to change in-place. Combined with the _if option _if this allows for (nice IMO) code such as modify_if(df, is.factor, as.character)

  • map2() allows you to map simultaneously to x and y . This makes it easier to express ideas like map2(models, datasets, predict)

  • imap() allows you to display x and its indices (names or positions) at the same time. This makes it easy (for example) to load all csv files into a directory by adding a filename column to each.

     dir("\\.csv$") %>% set_names() %>% map(read.csv) %>% imap(~ transform(.x, filename = .y)) 
  • walk() returns the input invisibly; and useful when you call a function because of its side effects (i.e. writing files to disk).

Not to mention other helpers, such as safely() and partial() .

Personally, I find that when I use purrr, I can write functional code with less friction and more ease; this reduces the gap between thinking over an idea and its implementation. But your mileage may vary; there is no need to use purrr if this does not help you.

Microbenchmarks

Yes, map() bit slower than lapply() . But the cost of using map() or lapply() depends on what you are displaying, and not on the cost of running the loop. The microbenchmark below shows that the cost of map() compared to lapply() is about 40 ns per element, which is unlikely to have a significant impact on most of the code R.

 library(purrr) n <- 1e4 x <- 1:n f <- function(x) NULL mb <- microbenchmark::microbenchmark( lapply = lapply(x, f), map = map(x, f) ) summary(mb, unit = "ns")$median / n #> [1] 490.343 546.880 
+200
Nov 05 '17 at 15:41
source share

Comparing purrr and lapply comes down to convenience and speed .




1. purrr::map syntactically more convenient than lapply

extract second list item

 map(list, 2) 

which is like @F. PrivΓ© pointed out the same as:

 map(list, function(x) x[[2]]) 

with lapply

 lapply(list, 2) # doesn't work 

we need to pass an anonymous function ...

 lapply(list, function(x) x[[2]]) # now it works 

... or, as @RichScriven pointed out, we pass [[ as an argument to lapply

 lapply(list, '[[', 2) # a bit more simple syntantically 

Therefore, if you find that you apply functions to many lists using lapply , and are tired of either defining a user-defined function or writing an anonymous function, convenience is one of the reasons to switch to purrr .

2. Map type-specific functions are just a lot of lines of code

  • map_chr()
  • map_lgl()
  • map_int()
  • map_dbl()
  • map_df() - my favorite, returns a data frame.

Each of these mapping functions of a particular type returns an atomic list (vector), not the lists returned by map() and lapply() . If you are dealing with nested lists of atomic vectors inside, you can use these type-specific display functions to directly extract vectors and force the vectors to be converted directly to vectors int, dbl, chr. The basic version of R will look something like this: as.numeric(sapply(...)) , as.character(sapply(...)) , etc. This gives purrr one more point for convenience and functionality.

3. Convenience aside, lapply , [slightly] faster than map

Using convenient purrr functions like @F. Prive noted that he slows down the processing a little. Let each of the 4 cases that I presented above be chased.

 # devtools::install_github("jennybc/repurrrsive") library(repurrrsive) library(purrr) library(microbenchmark) library(ggplot2) mbm <- microbenchmark( lapply = lapply(got_chars[1:4], function(x) x[[2]]), lapply_2 = lapply(got_chars[1:4], '[[', 2), map_shortcut = map(got_chars[1:4], 2), map = map(got_chars[1:4], function(x) x[[2]]), times = 100 ) autoplot(mbm) 

enter image description here

And the winner is ....

 lapply(list, '[[', 2) 

In general, if you need base::lapply speed: base::lapply (although it is not much faster)

For simple syntax and expressibility: purrr::map




This excellent purrr tutorial on purrr emphasizes the convenience of not explicitly writing anonymous functions when using purrr and the benefits of map type-dependent functions.

+45
Sep 01 '17 at 6:31 on
source share

If we do not consider aspects of taste (otherwise this question should be closed) or syntax consistency, style, etc., the answer is no, there is no special reason to use map instead of lapply or other options, use a family, for example, a more strict vapply .

PS: For those who run free of charge, just remember that the OP wrote:

I do not ask here about one sympathy or dislike for the syntax, other functions provided by purrr, etc., but strictly about comparing purrr :: map with lapply, assuming the use of a standard estimate

If you are not considering the syntax and other functions of purrr , there is no particular reason to use map . I use purrr myself, and I'm fine with Hadley's answer, but he is ironic of the very things that the OP stated in advance that he didn't ask about.

+28
Jul 31 '17 at 22:47
source share



All Articles