Equivalent to dlply in data.table

I am trying to achieve what dlply does with data.table . Just like a very simple example:

 library(plyr) library(data.table) dt <- data.table( p = c("A", "B"), q = 1:2 ) dlply( dt, "p", identity ) $A pq 1 A 1 $B pq 1 B 2 dt[ , identity(.SD), by = p ] pq 1: A 1 2: B 2 foo <- function(x) as.list(x) dt[ , foo(.SD), by = p ] pq 1: A 1 2: B 2 

Obviously, the return values โ€‹โ€‹of foo collapsed to one data.table . And I don't want to use dlply because it passes split data.tables as data.frames to foo , which makes the additional data.table operations in foo inefficient.

+7
source share
3 answers

Here's a more oriented data.table approach:

 setkey(dt, p) dt[, list(list(dt[J(.BY[[1]])])), by = p]$V1 #[[1]] # pq #1: A 1 # #[[2]] # pq #1: B 2 

There are more data.table alternatives to the styles above, but this seems the fastest - here is a comparison with lapply :

 dt <- data.table( p = rep( LETTERS[1:25], 1E6), q = 25*1E6, key = "p" ) microbenchmark(dt[, list(list(dt[J(.BY[[1]])])), by = p]$V1, lapply(unique(dt$p), function(x) dt[x]), times = 10) #Unit: seconds # expr min lq median uq max neval #dt[, list(list(dt[J(.BY[[1]])])), by = p]$V1 1.111385 1.508594 1.717357 1.966694 2.108188 10 # lapply(unique(dt$p), function(x) dt[x]) 1.871054 1.934865 2.216192 2.282428 2.367505 10 
+3
source

Try the following:

 > split(dt, dt[["p"]]) $A pq 1: A 1 $B pq 1: B 2 
+2
source

Regarding G. Grothendieck's answer, I was wondering how split works well:

 dt <- data.table( p = rep( LETTERS[1:25], 1E6), q = 25*1E6, key = "p" ) system.time( ll <- split(dt, dt[ ,p ] ) ) user system elapsed 5.237 1.340 6.563 system.time( ll <- lapply( unique(dt[,p]), function(x) dt[x] ) ) user system elapsed 1.179 0.363 1.541 

So, if there is no better answer, I would stick with

 lapply( unique(dt[,p]), function(x) dt[x] ) 
+2
source

All Articles