Challenge: optimize unlisting [easy]

Because SO has been a bit slower lately, I'm asking a simple question. I would appreciate if the big fish stayed on the bench for this and gave the newcomers the opportunity to respond.

Sometimes we have objects that have a ridiculous amount of large list items (vectors). How would you "enumerate" this object into a single vector. Show that your method is faster than unlist() .

+10
optimization list vector r
source share
4 answers

If you don’t need names, and your list is one level of depth, then if you can win

 .Internal(unlist(your_list, FALSE, FALSE)) 

I will vote for everything you do on SO over the next 1 year !!!

[Update: if you need unstudied names, and the list is not recursive, here is a version that improves the list more than 100 times

  myunlist <- function(l){ names <- names(l) vec <- unlist(l, F, F) reps <- unlist(lapply(l, length), F, F) names(vec) <- rep(names, reps) vec } myunlist(list(a=1:3, b=2)) aaab 1 2 3 2 > tl <- list(a = 1:20000, b = 1:5000, c = 2:30) > system.time(for(i in 1:200) unlist(tl)) user system elapsed 22.97 0.00 23.00 > system.time(for(i in 1:200) myunlist(tl)) user system elapsed 0.2 0.0 0.2 > system.time(for(i in 1:200) unlist(tl, F, F)) user system elapsed 0.02 0.00 0.02 

]

[Update 2: Answer the Nr3 call from Richie Cotton.

 bigList3 <- replicate(500, rnorm(1e3), simplify = F) unlist_vit <- function(l){ names(l) <- NULL do.call(c, l) } library(rbenchmark) benchmark(unlist = unlist(bigList3, FALSE, FALSE), rjc = unlist_rjc(bigList3), vit = unlist_vit(bigList3), order = "elapsed", replications = 100, columns = c("test", "relative", "elapsed") ) test relative elapsed 1 unlist 1.0000 2.06 3 vit 1.4369 2.96 2 rjc 3.5146 7.24 

]

PS: I suppose the “big fish” is the one that has more reputation than you. So I'm very small here :).

+9
source share

A solution other than unlist() needs to be damned pretty quickly to beat unlist() , right? It takes less than two seconds to list a list of 2,000 numerical vectors with a length of 100,000.

 > bigList2 <- as.list(data.frame(matrix(rep(rnorm(1000000), times = 200), + ncol = 2000))) > print(object.size(bigList2), units = "Gb") 1.5 Gb > system.time(foo <- unlist(bigList2, use.names = FALSE)) user system elapsed 1.897 0.000 2.019 

With bigList2 and foo in my workspace, R uses ~ 9Gb of my available memory. Key use.names = FALSE . Without it, unlist() painfully slow. Just like slowly, I'm still waiting to find out ...

We can speed this up a bit by setting recursive = FALSE , and then we get the same answer as VitoshKa (two representative timings):

 > system.time(foo <- unlist(bigList2, recursive = FALSE, use.names = FALSE)) user system elapsed 1.379 0.001 1.416 > system.time(foo <- .Internal(unlist(bigList2, FALSE, FALSE))) user system elapsed 1.335 0.000 1.344 

... finally, the version of use.names = TRUE finished ...:

 > system.time(foo <- unlist(bigList2, use = TRUE)) user system elapsed 2307.839 10.978 2335.815 

and it consumed all my systems 16 GB of RAM, so I gave up at that moment ...

+2
source share

As a medium-sized fish, I jump using the solution of the first attempt, which provides a guide for small fish to win. This is about 3 times slower than the list.

I am using a smaller version of the ucfagls test list. (Since it fits better in memory.)

 bigList3 <- as.list(data.frame(matrix(rep(rnorm(1e5), times = 200), ncol = 2000))) 

The basic idea is to create one long vector to hold the response, and then iterate over the list items by copying the values ​​from the list.

 unlist_rjc <- function(l) { lengths <- vapply(l, length, FUN.VALUE = numeric(1), USE.NAMES = FALSE) total_len <- sum(lengths) end_index <- cumsum(lengths) start_index <- 1 + c(0, end_index) v <- numeric(total_len) for(i in seq_along(l)) { v[start_index[i]:end_index[i]] <- l[[i]] } v } t1 <- system.time(for(i in 1:10) unlist(bigList2, FALSE, FALSE)) t2 <- system.time(for(i in 1:10) unlist_rjc(bigList2)) t2["user.self"] / t1["user.self"] # 3.08 

Challenges for small fish:
1. Can you expand it to deal with types other than digital?
2. Can you make it work with recursion (nested lists)?
3. Can you do it faster?

I will redirect everyone with fewer points than I, whose answer answers one or more of these mini-tasks.

0
source share

c() has a recursive boolean argument that will recursively cancel the vector list when set to TRUE (the default is obviously FALSE ).

 l <- replicate(500, rnorm(1e3), simplify = F) microbenchmark::microbenchmark( unlist = unlist(l, FALSE, FALSE), c = c(l, recursive = TRUE, use.names = FALSE) ) # Unit: milliseconds # expr min lq mean median uq max neval # unlist 3.083424 3.121067 4.662491 3.172401 3.985668 27.35040 100 # c 3.084890 3.133779 4.090520 3.201246 3.920646 33.22832 100 
0
source share

All Articles