My question includes the distinct function of dplyr .
First set up the data:
set.seed(0) df <- data.frame( x = sample(10, 100, rep = TRUE), y = sample(10, 100, rep = TRUE) )
Consider the following two uses of distinct .
df %>% group_by(x) %>% distinct() df %>% group_by(x) %>% distinct(y)
The first produces a different result for the second. As far as I can tell, the first set of operations finds "All different values ββof x and returns the first value of y ", where, when the second finds "For each value of x , find all different values ββof y ".
Why should this be so when
df %>% distinct(x, y) df %>% distinct()
gives the same result?
EDIT: It looks like this is already a known bug: https://github.com/hadley/dplyr/issues/1110
source share