I would like to mutate a data frame by applying a function that calls another frame. I can achieve this in several ways, but would like to know how to do it "correctly."
Here is an example of what I'm trying to do. I have a data frame with some initial times, and the second with some temporary observations. I would like to return a data frame indicating the start time and the number of observations that occur in some window after the start. eg.
set.seed(1337) df1 <- data.frame(id=LETTERS[1:3], start_time=1:3*10) df2 <- data.frame(time=runif(100)*100) lapply(df1$start_time, function(s) sum(df2$time>s & df2$time<(s+15)))
The best I have used so far with dplyr is the following (but this loses the identity variables):
df1 %>% rowwise() %>% do(count = filter(df2, time>.$start_time, time < (.$start_time + 15))) %>% mutate(n=nrow(count))
output:
Source: local data frame [3 x 2] Groups: <by row> # A tibble: 3 × 2 count n <list> <int> 1 <data.frame [17 × 1]> 17 2 <data.frame [18 × 1]> 18 3 <data.frame [10 × 1]> 10
I expected to be able to do this:
df1 <- data.frame(id=LETTERS[1:3], start_time=1:3*10) df2 <- data.frame(time=runif(100)*100) df1 %>% group_by(id) %>% mutate(count = nrow(filter(df2, time>start_time, time<(start_time+15))))
but this returns an error:
Error: comparison (6) is possible only for atomic and list types
What is the way dplyr do this?
source share