Dplyr summise_each for aggregating results

Question

Dplyr summise_each for aggregating results

I have a data frame as such:

metric1 metric2 metric3 field1 field2 1 1.07809668 4.2569882 7.1710095 L S1 2 0.56174763 1.2660273 -0.3751915 L S2 3 1.17447327 5.5186679 11.6868322 L S2 4 0.32830724 -0.8374830 1.8973718 S S2 5 -0.51213503 -0.3076640 10.0730274 S S1 6 0.24133119 2.7984703 15.9622215 S S1 7 1.96664414 0.1818531 2.7416768 S S3 8 0.06669409 3.8652075 10.5066330 S S3 9 1.14660437 8.5703119 3.4294062 L S4 10 -0.72785683 9.3320762 1.3827989 L S4

I show 2 fields, but have a few more. I need to summarize indicators grouped by each field, for example. for field 1:

 DF %>% group_by(field1) %>% summarise_each(funs(sum),metric1,metric2,metric3)

I can do this for each field where the columns will be sum (metric1), sum (metric2), sum (metric3), but the output of the table I need looks something like this:

 L(field1) S(field1) S1(field2) S2(field2) S3(field2) S4(field2) sum(metric1) sum(metric2) sum(metric3)

I believe there should be a way to do this using tidyr along with dplyr, but cannot figure it out.

+5

r dplyr tidyr

macrotourist Apr 20 '15 at 10:20

source share

2 answers

For the whole dplyr and tidyr you can do:

 library(dplyr) library(tidyr) df %>% unite(variable, field1, field2) %>% group_by(variable) %>% summarise_each(funs(sum)) %>% gather(metrics, value, -variable) %>% spread(variable, value)

What gives:

 #Source: local data frame [3 x 7] # # metrics L_S1 L_S2 L_S4 S_S1 S_S2 S_S3 #1 metric1 1.078097 1.736221 0.4187475 -0.2708038 0.3283072 2.033338 #2 metric2 4.256988 6.784695 17.9023881 2.4908063 -0.8374830 4.047061 #3 metric3 7.171010 11.311641 4.8122051 26.0352489 1.8973718 13.248310

Edit

After reading my comment on David, I think this is closer to the expected result:

 field1 <- group_by(df, field = field1) %>% summarise_each(funs(sum), -(field1:field2)) field2 <- group_by(df, field = field2) %>% summarise_each(funs(sum), -(field1:field2)) bind_rows(field1, field2) %>% gather(metrics, value, -field) %>% spread(field, value)

What gives:

 #Source: local data frame [3 x 7] # # metrics LS S1 S2 S3 S4 #1 metric1 3.233065 2.090842 0.8072928 2.064528 2.033338 0.4187475 #2 metric2 28.944071 5.700384 6.7477945 5.947212 4.047061 17.9023881 #3 metric3 23.294855 41.180931 33.2062584 13.209013 13.248310 4.8122051

+2

Steven beaupré Apr 21 '15 at 6:20

source share

David Arenburg · Accepted Answer · 2015-04-20T22:31:52+0000

Try recast from reshape2 package

 library(reshape2) recast(DF, variable ~ field1 + field2, sum) # variable L_S1 L_S2 L_S4 S_S1 S_S2 S_S3 # 1 metric1 1.078097 1.736221 0.4187475 -0.2708038 0.3283072 2.033338 # 2 metric2 4.256988 6.784695 17.9023881 2.4908063 -0.8374830 4.047061 # 3 metric3 7.171010 11.311641 4.8122051 26.0352489 1.8973718 13.248310

which coincides with

 dcast(melt(DF, c("field1", "field2")), variable ~ field1 + field2, sum)

You can also combine it with tidyr::gather if you want, but you cannot use tidyr::spread because it has no fun.aggregate argument

 DF %>% gather(variable, value, -(field1:field2)) %>% dcast(variable ~ field1 + field2, sum) # variable L_S1 L_S2 L_S4 S_S1 S_S2 S_S3 # 1 metric1 1.078097 1.736221 0.4187475 -0.2708038 0.3283072 2.033338 # 2 metric2 4.256988 6.784695 17.9023881 2.4908063 -0.8374830 4.047061 # 3 metric3 7.171010 11.311641 4.8122051 26.0352489 1.8973718 13.248310

Dplyr summise_each for aggregating results

More articles: