Counting the number of events executed at a timestamp

I have a series of timestamps indicating the beginning and end of certain events.

library(chron)
start <- structure(c(14246.3805439815, 14246.3902662037, 14246.3909606481, 
14246.3992939815, 14246.4013773148, 14246.4034606481, 14246.4062384259, 
14246.4069328704, 14246.4069328704, 14246.4097106481, 14246.4097106481, 
14246.4104050926, 14246.4117939815, 14246.4117939815, 14246.4117939815, 
14246.4145717593, 14246.4152546296, 14246.4152662037, 14246.4152662037, 
14246.4159606481), format = structure(c("m/d/y", "h:m:s"), .Names = c("dates", 
"times")), origin = structure(c(1, 1, 1970), .Names = c("month", 
"day", "year")), class = c("chron", "dates", "times"))

finish <- structure(c(14246.436099537, 14246.4666550926, 14246.4083217593, 
14246.4374884259, 14246.4847106481, 14246.4867939815, 14246.4305439815, 
14246.4659606481, 14246.4520717593, 14246.9097106481, 14246.4930439815, 
14246.4763773148, 14246.4326273148, 14246.4291550926, 14246.4187384259, 
14246.9145717593, 14246.4395601852, 14246.4395717593, 14246.4395717593, 
14246.4367939815), format = structure(c("m/d/y", "h:m:s"), .Names = c("dates", 
"times")), origin = structure(c(1, 1, 1970), .Names = c("month", 
"day", "year")), class = c("chron", "dates", "times"))

events <- data.frame(start, finish)
head(event, 5)

                start              finish
1 (01/02/09 09:07:59) (01/02/09 10:27:59)
2 (01/02/09 09:21:59) (01/02/09 11:11:59)
3 (01/02/09 09:22:59) (01/02/09 09:47:59)
4 (01/02/09 09:34:59) (01/02/09 10:29:59)
5 (01/02/09 09:37:59) (01/02/09 11:37:59)

Now I want to calculate how many events are executed at specific timestamps.

intervals <- structure(c(14246.3958333333, 14246.40625, 14246.4166666667, 
14246.4270833333, 14246.4375), format = structure(c("m/d/y", 
"h:m:s"), .Names = c("dates", "times")), origin = structure(c(1, 
1, 1970), .Names = c("month", "day", "year")), class = c("chron", 
"dates", "times"))

intervals

[1] (01/02/09 09:30:00) (01/02/09 09:45:00) (01/02/09 10:00:00) (01/02/09 10:15:00) (01/02/09 10:30:00)

So I want to get the following result:

            intervals count
1 (01/01/09 09:30:00)     3
2 (01/01/09 09:45:00)     7
3 (01/01/09 10:00:00)    19
4 (01/01/09 10:15:00)    18
5 (01/01/09 10:30:00)    12

While the problem is trivial for solving software problems, I want to do it for 210,000 intervals and more than 1.2 million events. My current approach involves using a package data.tableand an operator &to check if there is a gap between the start and end times of each event.

library(data.table)
events <- data.table(events)
data.frame(intervals, count = sapply(1:5, function(i) sum(events[, start <= intervals[i] & intervals[i] <= finish])))

But given the size of my data, this approach takes a lot of time. Any tips on the best alternatives to achieve this in R?

Greetings.

+5
2

R , , .

, R. , replicate system.time .

6 sapply data.table. ( 0,6 , 1000 .)

system.time(replicate(1000, 
    XX <- data.frame(
      intervals, 
      count = sapply(1:5, function(i) sum(events[, start <= intervals[i] & intervals[i] <= finish])))
))

   user  system elapsed 
   4.04    0.05    4.11 

. , , . , colSums:

event.array <- function(x, interval){
  len <- length(interval)
  matrix(rep(unclass(x), len), ncol=len)
}

intervals.array <- function(x, intervals){
  len <- length(x)
  matrix(rep(unclass(intervals), len), nrow=len, byrow=TRUE)
} 


a.start <- event.array(start, intervals)
a.finish <- event.array(finish, intervals)
a.intervals <- intervals.array(start, intervals)

data.frame(intervals, 
           count=colSums(a.start <= a.intervals & a.finish >= a.intervals))

            intervals count
1 (01/02/09 09:30:00)     3
2 (01/02/09 09:45:00)     7
3 (01/02/09 10:00:00)    19
4 (01/02/09 10:15:00)    18
5 (01/02/09 10:30:00)    12

system.time(replicate(1000, 
  YY <- data.frame(
          intervals, 
          count=colSums(a.start <= a.intervals & a.finish >= a.intervals))
))

   user  system elapsed 
   0.67    0.02    0.69 

all.equal(XX, YY)
[1] TRUE
+3

, dim() sum() ldply() sapply() ?

b<-function(i,df){ data.frame(i, count=dim(df[with(df, start<i & finish> i),])[1])};
ldply(intervals, b, events);

         i count
1 14246.40     3
2 14246.41     7
3 14246.42    19
4 14246.43    18
5 14246.44    12

chron, i timestamp. .

0

All Articles