Summarize based on two grouping variables in R using data.table

Question

Summarize based on two grouping variables in R using data.table

I am trying to use data.table in R to summarize the following data table:

 SiteNo Var1 Var2 Var3 ... Var18 Group 1 0.1 0.3 1 0.3 1 2 0.3 0.1 0.9 0.2 1 etc.

There are 668,944 observations, 43 sites, 3 groups and 19 variables. I would like to get the results of a function (e.g. mean ) that sums each column / variable both by site and by group. Thus, there should be 43 sites x 3 groups x # of total statistics (for example, mean ). I used the following code:

 e.dt<-data.table(e) setkey(e.dt, Group) # set key to group number # get mean for each column/variable e.dt.mean<-e.dt[,lapply(.SD,mean), by="SiteNo"]

Using the above, I get 43 sites, but not 3 groups that I was behind. I could split the original data table into three groups, but wondered if there was a way to generalize using two variables (SiteNo and Group) using data.table .

I am still RTM on data.table , but so far I have not found the answer to above.

+7

r data.table

Prophet60091 Dec 17 '12 at 18:06

source share

1 answer

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2012-12-17T18:31:18+0000

Try to set the key for both the "Group" and the "SiteNo":

In the example under ?key :

 keycols <- c("SiteNo", "Group") setkeyv(e.dt, keycols)

Then use by as:

 e.dt[, lapply(.SD,mean), by = key(e.dt)]

Alternatively you can use:

 e.dt[, lapply(.SD,mean), by = "SiteNo,Group"]

or

 e.dt[, lapply(.SD, mean), by = list(SiteNo, Group)]

Summarize based on two grouping variables in R using data.table

More articles: