This function is now implemented in data.table (since version 1.8.11), as seen from Zach's answer above.
I just saw this big piece of code from Arun here on SO . Therefore, I assume there is a solution to data.table . In relation to this problem:
library(data.table) set.seed(1234) DT <- data.table(x=rep(c(1,2,3),each=1e6), y=c("A","B"), v=sample(1:100,12)) out <- DT[,list(SUM=sum(v)),by=list(x,y)]
This gives the same results as the DWin approach:
tapply(DT$v,list(DT$x, DT$y), FUN=sum) AB 1 26499966 28166677 2 26499978 28166673 3 26500056 28166650
Also, it is fast:
system.time({ out <- DT[,list(SUM=sum(v)),by=list(x,y)] out[, as.list(setattr(SUM, 'names', y)), by=list(x)]}) ## user system elapsed ## 0.64 0.05 0.70 system.time(tapply(DT$v,list(DT$x, DT$y), FUN=sum)) ## user system elapsed ## 7.23 0.16 7.39
UPDATE
So, this solution also works for unbalanced data sets (i.e. some combinations do not exist), you must first enter the ones indicated in the data table:
library(data.table) set.seed(1234) DT <- data.table(x=c(rep(c(1,2,3),each=4),3,4), y=c("A","B"), v=sample(1:100,14)) out <- DT[,list(SUM=sum(v)),by=list(x,y)] setkey(out, x, y) intDT <- expand.grid(unique(out[,x]), unique(out[,y])) setnames(intDT, c("x", "y")) out <- out[intDT] out[, as.list(setattr(SUM, 'names', y)), by=list(x)]
Summary
Combining the comments to the above, here is a one-line solution:
DT[, sum(v), keyby = list(x,y)][CJ(unique(x), unique(y)), allow.cartesian = T][, setNames(as.list(V1), paste(y)), by = x]
It is also easy to change this to have more than just a sum, for example:
DT[, list(sum(v), mean(v)), keyby = list(x,y)][CJ(unique(x), unique(y)), allow.cartesian = T][, setNames(as.list(c(V1, V2)), c(paste0(y,".sum"), paste0(y,".mean"))), by = x] # x A.sum B.sum A.mean B.mean #1: 1 72 123 36.00000 61.5 #2: 2 84 119 42.00000 59.5 #3: 3 187 96 62.33333 48.0 #4: 4 NA 81 NA 81.0