Using a filter inside a filter in dplyr gives unexpected results

Using R 3.1.2, dplyr 0.4.0.

I try to use filterinside filter, which sounds very simple, and I don’t understand why it does not give me the expected result. This is the code that I wrote about 6 months ago, and I'm sure it worked, so either it stops working due to an updated version of R, dplyror some other dependency. Anyway, here is some simple code that filters the rows from df1 based on the condition that is filterin the column in df2.

df1 <- data.frame(x = c("A", "B"), stringsAsFactors = FALSE)
df2 <- data.frame(x = "A", y = TRUE, stringsAsFactors = FALSE)
dplyr::filter(df1, x %in% (dplyr::filter(df2, y)$x))

I expect this to show the first line df1, but instead I get

# [1] x
# <0 rows> (or 0-length row.names)

which i'm not sure what to do. Why does it return a vector AND an empty data.frame?

If I break the filter code into two separate statements, I get what I expect

xval <- dplyr::filter(df2, y)$x
dplyr::filter(df1, x %in% xval)

#   x
# 1 A

Can someone help me understand why this behavior occurs? I am not talking about this, but I do not understand this.

+4
source share
1 answer

This is the right question, why your approach is not working (rather, apparently). I cannot answer this question, but I would suggest a different approach, as commented above, which avoids nested function calls ( filterinside another filter), which, IMO, is what dplyr is made of: being expressive, easy to read and understand The syntax is from left to right, from top to bottom.

So, for your example, since the columns of interest to you are called "x", you can do:

filter(df2, y) %>% select(x) %>% inner_join(df1)
  • df2 "y"
  • "x"
  • inner_join df1 ( "x" ). inner_join : " x, y, x y."

, "z" "x", :

filter(df2, y) %>% select(x) %>% inner_join(df1, by = c("z" = "x"))

, semi_join inner_join . :

semi_join: x, y, x.

, x y, x.

, :

filter(df2, y) %>% select(x) %>% semi_join(df1)
+4

All Articles