Filter two data frames with the same group variables in dplyr

In many cases, after grouping a data frame by some variables, I want to apply a function that uses data from another data frame, grouped by the same variables. The best solution I found is to use the sem_join function inside the function as follows:

d1 <- data.frame(model = c(1,1,2,2), x = runif(4) )
d2 <- data.frame(model=c(1,1,1,2,2,2), y = runif(6) )

myfun <- function(df1, df2) {
   subsetdf2 <- semi_join(df2, df1)
   data.frame(z = sum(d1$x) - sum(subsetdf2$y)) # trivial manipulation just to exemplify
}

d1 %>% group_by(model) %>% do(myfun(., d2))

The problem is that sem_join returns “Merge ...” messages, and since I use the function to load, I get a lot of messages that reset the console. So, is there a way to reduce the verbosity of joins? Do you know a more elegant way to do something like this?

PS A few years ago I asked a similar question for plyr: a subset inside a function using the variables specified in ddply

+4
2

, , " :", , by.

:

semi_join(d2, d1, by="model")

EDIT. semi_join base. group_by , . . , .

myfun <- function(df1, df2) {
  subsetdf2 <- df2[df2[,1] %in% unique(df1[,1]),]
  data.frame(z = sum(df1$x) - sum(subsetdf2$y)) # trivial manipulation just to exemplify
}
+2

@cdeterman. .

d1 <- data.frame(model = c(1,1,2,2), x = runif(4) )
d2 <- data.frame(model=c(1,1,1,2,2,2), y = runif(6) )

myfun <- function(df1, df2, gv) {
  subsetdf2 <- semi_join(df2, df1, by = gv)
  data.frame(z = sum(d1$x) - sum(subsetdf2$y)) # trivial manipulation just to     exemplify
}

group_var <- 'model'
d1 %>% group_by_(group_var) %>% do(myfun(., d2,group_var))
0

All Articles