Use data.table to count and aggregate / summarize a column

I want to count and aggregate (summarize) a column in data.table and cannot find the most efficient way to do this. This seems to be close to what I want R summing a few columns with data.table .

My details:

 set.seed(321) dat <- data.table(MNTH = c(rep(201501,4), rep(201502,3), rep(201503,5), rep(201504,4)), VAR = sample(c(0,1), 16, replace=T)) > dat MNTH VAR 1: 201501 1 2: 201501 1 3: 201501 0 4: 201501 0 5: 201502 0 6: 201502 0 7: 201502 0 8: 201503 0 9: 201503 0 10: 201503 1 11: 201503 1 12: 201503 0 13: 201504 1 14: 201504 0 15: 201504 1 16: 201504 0 

I want to count and MNTH VAR by MNTH using data.table. Desired Result:

  MNTH COUNT VAR 1 201501 4 2 2 201502 3 0 3 201503 5 2 4 201504 4 2 
+18
source share
1 answer

The publication you are referencing tells how to apply a single aggregation method to multiple columns. If you want to apply different aggregation methods to different columns, you can do:

 dat[, .(count = .N, var = sum(VAR)), by = MNTH] 

This leads to:

  MNTH count var 1: 201501 4 2 2: 201502 3 0 3: 201503 5 2 4: 201504 4 2 

You can also add these values ​​to an existing dataset by updating your dataset using the link:

 dat[, ':=' (count = .N, var = sum(VAR)), by = MNTH] 

This leads to:

 > dat MNTH VAR count var 1: 201501 1 4 2 2: 201501 1 4 2 3: 201501 0 4 2 4: 201501 0 4 2 5: 201502 0 3 0 6: 201502 0 3 0 7: 201502 0 3 0 8: 201503 0 5 2 9: 201503 0 5 2 10: 201503 1 5 2 11: 201503 1 5 2 12: 201503 0 5 2 13: 201504 1 4 2 14: 201504 0 4 2 15: 201504 1 4 2 16: 201504 0 4 2 

See the getting guides on the GitHub wiki for more information on how to use the syntax.

+31
source

All Articles