I am trying to use a group using the data.table package in R.
start <- as.Date('2014-1-1') end <- as.Date('2014-1-6') time.span <- seq(start, end, "days") a <- data.table(date = time.span, value=c(1,2,3,4,5,6), group=c('a','a','b','b','a','b')) date value group 1 2014-01-01 1 a 2 2014-01-02 2 a 3 2014-01-03 3 b 4 2014-01-04 4 b 5 2014-01-05 5 a 6 2014-01-06 6 b a[,mean(value),by=group] > group V1 1: a 2.6667 2: b 4.3333
This works great.
Since I work with dates, it may happen that a special date not only has one, but also two groups.
a <- data.table(date = time.span, value=c(1,2,3,4,5,6), group=list('a',c('a','b'),'b','b','a','b')) date value group 1 2014-01-01 1 a 2 2014-01-02 2 c("a", "b") 3 2014-01-03 3 b 4 2014-01-04 4 b 5 2014-01-05 5 a 6 2014-01-06 6 b a[,mean(value),by=group] > Error in `[.data.table`(a, , mean(value), by = group) : The items in the 'by' or 'keyby' list are length (1,2,1,1,1,1). Each must be same length as rows in x or number of rows returned by i (6).
I would like the date of the group with both groups to be used to calculate the average of group a, as well as group b.
Expected results:
mean a: 2.6667 mean b: 3.75
Is this possible with the data.table package?
Update
thanks to akrun my original problem is resolved. After "splitting" the data table and in my case to calculate different factors (depending on the groups), I need a data table in its "original" form with unique rows based on the date. My solution so far:
a <- data.table(date = time.span, value=c(1,2,3,4,5,6), group=list('a',c('a','b'),'b','b','a','b')) b <- a[rep(1:nrow(a), lengths(group))][, group:=unlist(a$group)] date value group 1 2014-01-01 1 a 2 2014-01-02 2 a 3 2014-01-02 2 b 4 2014-01-03 3 b 5 2014-01-04 4 b 6 2014-01-05 5 a 7 2014-01-06 6 b # creates new column with mean based on group b[,factor := mean(value), by=group] #creates new data.table c without duplicate rows (based on date) + if a row has group a & b it creates the product of their factors c <- b[,.(value = unique(value), group = list(group), factor = prod(factor)),by=date] date value group factor 01/01/14 1 a 2.666666667 02/01/14 2 c("a", "b") 10 03/01/14 3 b 3.75 04/01/14 4 b 3.75 05/01/14 5 a 2.666666667 06/01/14 6 b 3.75
I think this is not an ideal way to do this, but it works. Any suggestions how I could make this better?
Alternative solution (really slow !!!):
d <- a[rep(1:nrow(a), lengths(group))][,group:=unlist(a$group)][, mean(value), by = group] for(i in 1:NROW(a)){ y1 <- 1 for(j in a[i,group][[1]]){ y1 <- y1 * d[group==j, V1] } a[i, factor := y1] }
My quick solution:
Is there a chance to make it faster?