As an example, I have a data table shown below. I want to do a simple join, where b = sum (b). For c, however, I want the value of the entry in c, where b is the maximum. The required output is shown below (data.aggr). This leads to several questions:
1) Is there a way to do this data.table?
2) Is there an easier way to do this in plyr?
3) In plyr, the output object received a change from data.table to data.frame. Can I avoid this behavior?
library(plyr)
library(data.table)
dt <- data.table(a=c('a', 'a', 'a', 'b', 'b'), b=c(1, 2, 3, 4, 5),
c=c('m', 'n', 'p', 'q', 'r'))
dt
dt.split <- split(dt, dt$a)
dt.aggr <- ldply(lapply(dt.split,
FUN=function(dt){ dt[, .(b=sum(b), c=dt[b==max(b), c]),
by=.(a)] }), .id='a')
dt.aggr
class(dt.aggr)
source
share