How to sum column values ​​in multiple tables if the tables have different lengths?

Well, that should be easy, but I'm looking for a solution that is as quick as possible.

Let's say I have 3 tables (the number of tables will be much larger):

tab1 <- table(c(1, 1, 1, 2, 2, 3, 3, 3)) tab2 <- table(c(1, 1, 4, 4, 4)) tab3 <- table(c(1, 1, 2, 3, 5)) 

This is what we get:

 > tab1 1 2 3 3 2 3 > tab2 1 4 2 3 > tab3 1 2 3 5 2 1 1 1 

What I want to do in a quick way so that it works with many large tables is the following:

 1 2 3 4 5 7 3 4 3 1 

So basically tables are joined by all names . Is there an elementary function that does this that I am missing? Thank you for your help!

+8
r
source share
3 answers

We concatenate ( c ) the tab output to create 'v1', use tapply to get the sum elements grouped by the names this object.

 v1 <- c(tab1, tab2, tab3) tapply(v1, names(v1), FUN=sum) #1 2 3 4 5 #7 3 4 3 1 
+12
source share

You can use rowsum() . The result will be slightly different from what you show, but you can always restructure it after calculations. rowsum() is known to be very efficient.

 x <- c(tab1, tab2, tab3) rowsum(x, names(x)) # [,1] # 1 7 # 2 3 # 3 4 # 4 3 # 5 1 

A template with akrun data.table clause has also been added here.

 library(microbenchmark) library(data.table) xx <- rep(x, 1e5) microbenchmark( tapply = tapply(xx, names(xx), FUN=sum), rowsum = rowsum(xx, names(xx)), data.table = data.table(xx, names(xx))[, sum(xx), by = V2] ) # Unit: milliseconds # expr min lq mean median uq max neval # tapply 150.47532 154.80200 176.22410 159.02577 204.22043 233.34346 100 # rowsum 41.28635 41.65162 51.85777 43.33885 45.43370 109.91777 100 # data.table 21.39438 24.73580 35.53500 27.56778 31.93182 92.74386 100 
+5
source share

you can try this

 df <- rbind(as.matrix(tab1), as.matrix(tab2), as.matrix(tab3)) aggregate(df, by=list(row.names(df)), FUN=sum) Group.1 V1 1 1 7 2 2 3 3 3 4 4 4 3 5 5 1 
+1
source share

All Articles