Datetime - determine if multiple (n) datetime ranges overlap with each other in R

Hello friends. I have problems finding, if several date-time ranges overlap each other, and if so, then the period of time for which they overlap. I refer to the following links Determines if the two date ranges overlap and the Overlap Period Detection Algorithm and a few more.

I don't know if this is correct, I have an example for n = 3.

Let's say I have "n" switches sw1, sw2 and sw3.State - the state is ON / OFF, i.e. 1/0.

Switches,State,Intime,Outtime sw3,1,9:00:00,10:40:00 sw2,1,9:30:00,10:15:00 sw1,1,10:00:00,11:00:00 sw2,1,10:20:00,10:30:00 

I came across this opportunity. Maybe more. Seek out others. Here the total time period is from 10:00 to 10:15, i.e. 15 minutes and 10:20 to 10:30, i.e. 10 minutes. The combined time period during which these switches were turned on ('1') is 25 minutes.

  10:00 11:00 sw1 |-----------------------------------| 9:30 10:15 10:20 10:30 sw2 |-------------| |-------| 9:00 10:40 sw3 |----------------------------------------| 

Summarizing this datetime for n overlapping switches is a difficult task. I am still working on this, so any suggestions or modifications are welcome. Thanks.

+1
math algorithm datetime r
source share
2 answers

1) . Based on the sample data, we assume that the data is in the form hh: mm: 00, where hh <24.

Reading in test data. Create two functions that convert the character string of the form hh: mm: 00 to the number of minutes and a function that converts the number of minutes into a chron "times" object. Create minute-by-minute sequences for each row of data that provides an Intervals list. Connect those sequences that correspond to the same switch, giving the list Intervals.u , and then intersect the components of this list to give the Intersection sequence. Compute the runs, r , in Intersection to give a set of start and end points. Finally, count the number of minutes and convert them to "times" . (The number of minutes and duration depends only on r and Intersection , so we can skip lines ending in ## if intervals.df not needed.)

 # test data Lines <- "Switches,State,Intime,Outtime sw3,1,9:00:00,10:40:00 sw2,1,9:30:00,10:15:00 sw1,1,10:00:00,11:00:00 sw2,1,10:20:00,10:30:00" DF <- read.csv(text = Lines, as.is = TRUE) library(chron) to.num <- function(x) floor(as.numeric(times(x)) * 24 * 60 + 1e-6) to.times <- function(x) times(x / (24 * 60)) Seq <- function(r) seq(to.num(DF$Intime[r]), to.num(DF$Outtime[r])) Intervals <- lapply(1:nrow(DF), Seq) Intervals.u <- lapply(split(Intervals, DF$Switches), function(L) Reduce(union, L)) Intersection <- Reduce(intersect, Intervals.u) r <- rle(c(FALSE, diff(Intersection) == 1)) i.ends <- cumsum(r$lengths)[r$values] ## ends <- to.times(Intersection[i.ends]) ## starts <- ends - to.times(r$lengths[r$values]) ## intervals.df <- data.frame(start = starts, end = ends); intervals.df ## ## start end ## 1 10:00:00 10:15:00 ## 2 10:20:00 10:30:00 mins <- length(Intersection) - sum(r$values); mins ## [1] 25 duration <- to.times(mins); duration ## [1] 00:25:00 

2) Regarding comments related to speed, we could instead use the IRanges package, which effectively encodes ranges, and also slightly reduces code size:

 library(IRanges) Intervals <- IRanges(to.num(DF$Intime), to.num(DF$Outtime)) Intersection <- Reduce(intersect, split(Intervals, DF$Switches)) intervals.df <- data.frame(start = to.times(start(Intersection)), end = to.times(end(Intersection))) intervals.df ## start end ## 1 10:00:00 10:15:00 ## 2 10:20:00 10:30:00 mins <- sum(width(Intersection) - 1); mins ## [1] 25 duration <- to.times(mins); duration ## [1] 00:25:00 

Updates Some fixes and better variable names. Further improvements. Added (2).

+2
source share

One way to do this:

  • Calculate unique minutes / seconds between Intime and Outtime for each switch. For example. if the switch turns on at 9:00 and goes off at 9:02, these are the unique minutes that were included for the flights 9:00 and 9:01.
  • Indicate how many times each unique minute / second is displayed on all switches.
  • If any minute / second occurs as many times as there are switches (i.e. three in your case), then all switches should be turned on for that minute / second.

Using this logic, here is a potential solution (where your data is stored in data frame x ):

 # Function to convert string to time. asTime <- function (tm) as.POSIXlt(tm, format = '%H:%M:%S') # Calculate unique minutes between Intimes and Outtimes. minSpan <- function (start, end) seq(asTime(start), asTime(end) - 1, 'min') # Calculate the time span in minutes for each row. spans <- mapply(minSpan, x$Intime, x$Outtime) # Count how many times each minute appears. counts <- table(do.call(c, spans)) # Total number of switches. switches <- length(unique(x$Switches)) # Count minutes where all switches have been on. length(counts[counts == switches]) 

This will give you an accuracy of up to one minute, because it is similar to what you showed in your question. Although you can easily change it to seconds by changing 'min' to 'sec' in the minSpan() function.


In minSpan() I subtract one minute from Outtime :

 minSpan <- function (start, end) seq(asTime(start), asTime(end) - 1, 'min') 

This is because if you have to count the minutes between, for example, 10:00 and 10:02, seq() will return three minutes, 10:00, 10:01, 10:02. But actually, the switch turned off at 10:02, so you really need an interval from 10:00 to 10:01.


Anyway, this solution seems to work for the small example you provided. Depending on how big your data is, I expect it to be slow enough, but it might not be a problem.

0
source share

All Articles