Can I switch a grouping variable into a single dplyr statement?

Here is a simple example illustrating the problem:

library(data.table) dt = data.table(a = c(1,1,2,2), b = 1:2) dt[, c := cumsum(a), by = b][, d := cumsum(a), by = c] # abcd #1: 1 1 1 1 #2: 1 2 1 2 #3: 2 1 3 2 #4: 2 2 3 4 

Trying to do the same in dplyr I fail because the first group_by is constant and the grouping is done with both b and c :

 df = data.frame(a = c(1,1,2,2), b = 1:2) df %.% group_by(b) %.% mutate(c = cumsum(a)) %.% group_by(c) %.% mutate(d = cumsum(a)) # abcd #1 1 1 1 1 #2 1 2 1 1 #3 2 1 3 2 #4 2 2 3 2 

Is this a bug or function? If this is a function, then how can the data.table solution be replicated in a single expression?

+6
source share
1 answer

Try the following:

 > df %>% group_by(b) %>% mutate(c = cumsum(a)) %>% + group_by(c) %>% mutate(d = cumsum(a)) Source: local data frame [4 x 4] Groups: c abcd 1 1 1 1 1 2 1 2 1 2 3 2 1 3 2 4 2 2 3 4 

Refresh

With the newer version of dplyr, use %>% , not %.% And ungroup no longer needed (according to David Arenburg comment).

+7
source

All Articles