Matrix.utils has a summary function. This can accomplish what you want with one line of code and about 10 times faster than combineByRow and 100 times faster than by :
N <- 10000 m <- matrix( runif(N*100), nrow=N) rownames(m) <- sample(1:(N/2),N,replace=T) > microbenchmark(a<-t(sapply(by(m,rownames(m),colSums),identity)),b<-combineByRow(m),c<-aggregate.Matrix(m,row.names(m)),times = 10) Unit: milliseconds expr min lq mean median uq max neval a <- t(sapply(by(m, rownames(m), colSums), identity)) 6000.26552 6173.70391 6660.19820 6419.07778 7093.25002 7723.61642 10 b <- combineByRow(m) 634.96542 689.54724 759.87833 732.37424 866.22673 923.15491 10 c <- aggregate.Matrix(m, row.names(m)) 42.26674 44.60195 53.62292 48.59943 67.40071 70.40842 10 > identical(as.vector(a),as.vector(c)) [1] TRUE
EDIT: Frank is right, the ranks are somewhat faster than any of these solutions. You would like to use another one of these other functions only if you used Matrix , which is especially rare, or if you performed aggregation other than sum .
Craig
source share