Dplyr with joining subgroup

Question

Dplyr with joining subgroup

The following problem can be considered as a “two-column width conversion”, and there are several methods available for its classical solution: from base::reshape (horror) to reshape2 . For a case with two groups, joining a simple subgroup works best.

Is it possible to reformulate the connection within the dplyr piping dplyr ? The example below is a little silly, but I need a connection in a longer chain of channels, which I do not want to break.

 library(dplyr) d = data.frame(subject= rep(1:5,each=2),treatment=letters[1:2],bp = rnorm(10)) d %>% # Assume piped manipulations here # Make wide # Assume additional piped manipulations here # Make wide (old style) with(d,left_join(d[treatment=="a",], d[treatment=="b",],by="subject" ))

+3

r dplyr magrittr reshape2

Dieter Menne Dec 11 '14 at 9:03

source share

2 answers

Solution with group_by instead of connecting.

 d %>% group_by(subject) %>% summarize(bp_a = bp[match("a",treatment)], bp_b = bp[match("b",treatment)])

+2

Wojciech Sobala Dec 13 '14 at 10:30

source share

docendo discimus · Accepted Answer · 2014-12-11 09:12

What about

 d %>% filter(treatment == "a") %>% left_join(., filter(d, treatment == "b"), by = "subject") # subject treatment.x bp.x treatment.y bp.y #1 1 a 0.4392647 b 0.6741559 #2 2 a -0.6010311 b 1.9845774 #3 3 a 0.1749082 b 1.7678771 #4 4 a -0.3089731 b 0.4427471 #5 5 a -0.8346091 b 1.7156319

You can continue the channel immediately after the left connection.

Or, if you do not need separate processing columns, you can use tidyr:

 library(tidyr) d %>% spread(treatment, bp) # subject ab #1 1 0.4392647 0.6741559 #2 2 -0.6010311 1.9845774 #3 3 0.1749082 1.7678771 #4 4 -0.3089731 0.4427471 #5 5 -0.8346091 1.7156319

(this is the same as using the package d %>% dcast(subject ~ treatment, value.var = "bp") from reshape2 , as Henrik noted in the comments)

Dplyr with joining subgroup

More articles: