1) . Based on the sample data, we assume that the data is in the form hh: mm: 00, where hh <24.
Reading in test data. Create two functions that convert the character string of the form hh: mm: 00 to the number of minutes and a function that converts the number of minutes into a chron "times" object. Create minute-by-minute sequences for each row of data that provides an Intervals list. Connect those sequences that correspond to the same switch, giving the list Intervals.u , and then intersect the components of this list to give the Intersection sequence. Compute the runs, r , in Intersection to give a set of start and end points. Finally, count the number of minutes and convert them to "times" . (The number of minutes and duration depends only on r and Intersection , so we can skip lines ending in ## if intervals.df not needed.)
# test data Lines <- "Switches,State,Intime,Outtime sw3,1,9:00:00,10:40:00 sw2,1,9:30:00,10:15:00 sw1,1,10:00:00,11:00:00 sw2,1,10:20:00,10:30:00" DF <- read.csv(text = Lines, as.is = TRUE) library(chron) to.num <- function(x) floor(as.numeric(times(x)) * 24 * 60 + 1e-6) to.times <- function(x) times(x / (24 * 60)) Seq <- function(r) seq(to.num(DF$Intime[r]), to.num(DF$Outtime[r])) Intervals <- lapply(1:nrow(DF), Seq) Intervals.u <- lapply(split(Intervals, DF$Switches), function(L) Reduce(union, L)) Intersection <- Reduce(intersect, Intervals.u) r <- rle(c(FALSE, diff(Intersection) == 1)) i.ends <- cumsum(r$lengths)[r$values]
2) Regarding comments related to speed, we could instead use the IRanges package, which effectively encodes ranges, and also slightly reduces code size:
library(IRanges) Intervals <- IRanges(to.num(DF$Intime), to.num(DF$Outtime)) Intersection <- Reduce(intersect, split(Intervals, DF$Switches)) intervals.df <- data.frame(start = to.times(start(Intersection)), end = to.times(end(Intersection))) intervals.df
Updates Some fixes and better variable names. Further improvements. Added (2).