In most cases, the best solution is to use Different distinct() from dplyr, as already suggested.
However, there is another approach here that uses the slice() function from dplyr.
# Generate fake data for the example library(dplyr) set.seed(123) df <- data.frame( x = sample(0:1, 10, replace = T), y = sample(0:1, 10, replace = T), z = 1:10 ) # In each group of rows formed by combinations of x and y # retain only the first row df %>% group_by(x, y) %>% slice(1)
Difference from using the Different distinct() function
The advantage of this solution is that it makes it clear which rows are stored in the original data frame, and it can be combined perfectly with arrange() function.
Suppose you have customer sales data, and you want to keep one record for each customer, and you want this record to be one of their last purchase. Then you could write:
customer_purchase_data %>% arrange(desc(Purchase_Date)) %>% group_by(Customer_ID) %>% slice(1)
bschneidr Feb 12 '19 at 23:04 2019-02-12 23:04
source share