Dplyr - the right connection after group_by, not creating the desired / expected result

Question

Dplyr - the right connection after group_by, not creating the desired / expected result

I am trying to get each of my id / year / month strings so that all strings match all seven business days with NA for "missing business days".

Here is a data frame and my attempt to complete this task:

> df id year month weekday amount 1 1 2015 1 Friday 3650.43 2 2 2015 1 Monday 1271.12 3 1 2015 2 Friday 1315.79 4 2 2015 2 Monday 2195.37 > wday weekday 1 Friday 2 Saturday 3 Wednesday 4 Sunday 5 Tuesday 6 Monday 7 Thursday

Tried to use group_by () and the right connection. But that is not what I thought. Is there an easy way to achieve the result that I get after?

 > df <- df %>% group_by(id, year, month) %>% right_join(wday) Joining by: "weekday" > df Source: local data frame [9 x 5] Groups: id, year, month [?] id year month weekday amount (dbl) (int) (int) (chr) (dbl) 1 1 2015 1 Friday 3650.43 2 1 2015 2 Friday 1315.79 3 NA NA NA Saturday NA 4 NA NA NA Wednesday NA 5 NA NA NA Sunday NA 6 NA NA NA Tuesday NA 7 2 2015 1 Monday 1271.12 8 2 2015 2 Monday 2195.37 9 NA NA NA Thursday NA

I want 7 lines for the id / year / month combination, where the sum for absent working days will be NA (or zeros ideally, but I know how to get this with mutate ()).

The resulting data frame should look like this:

 > df id year month weekday amount 1 1 2015 1 Friday 3650.43 2 1 2015 1 Monday 0.00 3 1 2015 1 Saturday 0.00 4 1 2015 1 Sunday 0.00 5 1 2015 1 Thursday 0.00 6 1 2015 1 Tuesday 0.00 7 1 2015 1 Wednesday 0.00 8 1 2015 2 Friday 1315.79 9 1 2015 2 Monday 0.00 10 1 2015 2 Saturday 0.00 11 1 2015 2 Sunday 0.00 12 1 2015 2 Thursday 0.00 13 1 2015 2 Tuesday 0.00 14 1 2015 2 Wednesday 0.00 15 2 2015 1 Friday 0.00 16 2 2015 1 Monday 1271.12 17 2 2015 1 Saturday 0.00 18 2 2015 1 Sunday 0.00 19 2 2015 1 Thursday 0.00 20 2 2015 1 Tuesday 0.00 21 2 2015 1 Wednesday 0.00 22 2 2015 2 Friday 0.00 23 2 2015 2 Monday 2195.37 24 2 2015 2 Saturday 0.00 25 2 2015 2 Sunday 0.00 26 2 2015 2 Thursday 0.00 27 2 2015 2 Tuesday 0.00 28 2 2015 2 Wednesday 0.00

+7

r dplyr

Gopala Dec 20 '15 at 15:05

source share

3 answers

sqldf For complex joins, it is usually easier to use SQL:

 library(sqldf) sqldf("select id, year, month, wday.weekday, sum((df.weekday = wday.weekday) * amount) amount from df join wday group by 1, 2, 3, 4")

giving:

  id year month weekday amount 1 1 2015 1 Friday 3650.43 2 1 2015 1 Saturday 0.00 3 1 2015 1 Wednesday 0.00 4 1 2015 1 Sunday 0.00 5 1 2015 1 Tuesday 0.00 6 1 2015 1 Monday 0.00 7 1 2015 1 Thursday 0.00 8 2 2015 1 Friday 0.00 9 2 2015 1 Saturday 0.00 10 2 2015 1 Wednesday 0.00 11 2 2015 1 Sunday 0.00 12 2 2015 1 Tuesday 0.00 13 2 2015 1 Monday 1271.12 14 2 2015 1 Thursday 0.00 15 1 2015 2 Friday 1315.79 16 1 2015 2 Saturday 0.00 17 1 2015 2 Wednesday 0.00 18 1 2015 2 Sunday 0.00 19 1 2015 2 Tuesday 0.00 20 1 2015 2 Monday 0.00 21 1 2015 2 Thursday 0.00 22 2 2015 2 Friday 0.00 23 2 2015 2 Saturday 0.00 24 2 2015 2 Wednesday 0.00 25 2 2015 2 Sunday 0.00 26 2 2015 2 Tuesday 0.00 27 2 2015 2 Monday 2195.37 28 2 2015 2 Thursday 0.00

base R We could replicate this in base R using merge and transform :

 xt <- transform( merge(df, wday, by = c()), amount = (as.character(weekday.x) == as.character(weekday.y)) * amount, weekday = weekday.y, weekday.x = NULL, weekday.y = NULL )) aggregate(amount ~., xt, sum)

dplyr , and if we really wanted to use dplyr, we could replace transform with mutate , rename and select :

 library(dplyr) merge(df, wday, by = c()) %>% mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>% rename(weekday = weekday.y) %>% select(-weekday.x) %>% group_by(id, year, month, weekday) %>% summarise(amount = sum(amount))

Note. If each group has only one day of the week (as in the question), we could optionally omit the group by / sum, aggregate, and group_by / summary in three solutions, respectively.

+4

G. grothendieck Dec 20 '15 at 15:59

source share

Using tidyr and dplyr . complete here is a hard climb - if you already have a working day somewhere in df, you won't need bind_rows or na.omit (or dplyr).

 library(dplyr) library(tidyr) df %>% #initial data bind_rows(wday) %>% #adding on so we have all the weekdays complete(id, year, month, weekday, #completing all levels of id:year:month:weekday fill = list(amount = 0)) %>% #filling amount column with 0 na.omit() #remove the NAs we got from the bind_rows

+4

jeremycg Dec 20 '15 at 16:27

source share

akrun · Accepted Answer · 2015-12-20T15:20:09+0000

We can use expand.grid

 expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>% left_join(., df) %>% mutate(amount=replace(amount, is.na(amount), 0)) %>% arrange(id, year, month, weekday) # id year month weekday amount #1 1 2015 1 Friday 3650.43 #2 1 2015 1 Monday 0.00 #3 1 2015 1 Saturday 0.00 #4 1 2015 1 Sunday 0.00 #5 1 2015 1 Thursday 0.00 #6 1 2015 1 Tuesday 0.00 #7 1 2015 1 Wednesday 0.00 #8 1 2015 2 Friday 1315.79 #9 1 2015 2 Monday 0.00 #10 1 2015 2 Saturday 0.00 #11 1 2015 2 Sunday 0.00 #12 1 2015 2 Thursday 0.00 #13 1 2015 2 Tuesday 0.00 #14 1 2015 2 Wednesday 0.00 #15 2 2015 1 Friday 0.00 #16 2 2015 1 Monday 1271.12 #17 2 2015 1 Saturday 0.00 #18 2 2015 1 Sunday 0.00 #19 2 2015 1 Thursday 0.00 #20 2 2015 1 Tuesday 0.00 #21 2 2015 1 Wednesday 0.00 #22 2 2015 2 Friday 0.00 #23 2 2015 2 Monday 2195.37 #24 2 2015 2 Saturday 0.00 #25 2 2015 2 Sunday 0.00 #26 2 2015 2 Thursday 0.00 #27 2 2015 2 Tuesday 0.00 #28 2 2015 2 Wednesday 0.00

Dplyr - the right connection after group_by, not creating the desired / expected result

More articles: