Update: The behavior of DT[, list(list(.)), by=.] Sometimes led to incorrect results in version R> = 3.1.0. This has now been fixed in commit # 1280 in the current version of data.table v1.9.3. From NEWS :
DT[, list(list(.)), by=.] Returns the correct results in R> = 3.1.0. The error occurred due to recent (welcome) changes in R v3.1.0, where list(.) Does not lead to copying. Closes # 481 .
Using data.table about 15 times faster than tapply :
library(data.table) vec <- c("D","B","B","C","C") dt = as.data.table(vec)[, list(list(.I)), by = vec] dt
Speed ββtests:
vec = sample(letters, 1e7, T) system.time(tapply(seq_along(vec), vec, identity)[unique(vec)]) # user system elapsed # 7.92 0.35 8.50 system.time({dt = as.data.table(vec)[, list(list(.I)), by = vec]; setattr(dt$V1, 'names', dt$vec); dt$V1}) # user system elapsed # 0.39 0.09 0.49
source share