I have the following data:
data <- structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 4758L, 4758L, 9584L, 9584L, 9584L, 9584L, 9584L, 9584L), time = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), fruit = structure(c(1L, 6L, 1L, 1L, 6L, 5L, 5L, 3L, 4L, 1L, 2L, 4L, 2L, 1L, 6L, 5L, 5L, 3L, 2L), .Label = c("apple", "banana", "lemon", "lime", "orange", "pear"), class = "factor"), count = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), cum_sum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 1L, 2L, 3L, 4L, 5L, 6L)), .Names = c("user", "time", "fruit", "count", "cum_sum" ), row.names = c(NA, -19L), class = "data.frame")
For each user in this set, I want to look at the sequence of fruits over time. But, some fruits are listed back to back on time.
user time fruit count cum_sum 1 1234 1 apple 1 1 2 1234 2 pear 1 2 3 1234 3 apple 1 3 4 1234 4 apple 1 4 5 1234 5 pear 1 5 6 1234 6 orange 1 6 7 1234 7 orange 1 7
What I'm looking for is rather a set of time for unique fruits.
The problem is that if I group by user and fruit, then summarize, dplyr will automatically sort the fruits alphabetically:
data %>% group_by(user, fruit) %>% summarise(temp_var=1) %>% mutate(cum_sum = cumsum(temp_var))
I really want for user 1234 above (for example) so that the fruits are listed in time series order, but remove any duplicates. So where do we see apple> pear> apple> apple> pear> orange> orange, instead we only see apple> pear> apple> pear> orange