Summarize based on two grouping variables in R using data.table

I am trying to use data.table in R to summarize the following data table:

 SiteNo Var1 Var2 Var3 ... Var18 Group 1 0.1 0.3 1 0.3 1 2 0.3 0.1 0.9 0.2 1 etc. 

There are 668,944 observations, 43 sites, 3 groups and 19 variables. I would like to get the results of a function (e.g. mean ) that sums each column / variable both by site and by group. Thus, there should be 43 sites x 3 groups x # of total statistics (for example, mean ). I used the following code:

 e.dt<-data.table(e) setkey(e.dt, Group) # set key to group number # get mean for each column/variable e.dt.mean<-e.dt[,lapply(.SD,mean), by="SiteNo"] 

Using the above, I get 43 sites, but not 3 groups that I was behind. I could split the original data table into three groups, but wondered if there was a way to generalize using two variables (SiteNo and Group) using data.table .

I am still RTM on data.table , but so far I have not found the answer to above.

+7
source share
1 answer

Try to set the key for both the "Group" and the "SiteNo":

In the example under ?key :

 keycols <- c("SiteNo", "Group") setkeyv(e.dt, keycols) 

Then use by as:

 e.dt[, lapply(.SD,mean), by = key(e.dt)] 

Alternatively you can use:

 e.dt[, lapply(.SD,mean), by = "SiteNo,Group"] 

or

 e.dt[, lapply(.SD, mean), by = list(SiteNo, Group)] 
+11
source

All Articles