Finding spaces between intervals using data.table

I have the following problem: a given set of non-overlapping intervals in the data table. Report the intervals between the intervals.

I implemented this once in SQL, but I am struggling with data.table due to the lack of a leading function or a delay function. For completeness, I have SQL code here . I know that the functionality was implemented in data.table version 1.9.5. like using a change log . Is this possible with data.table without doing many merges and without a delay or lead function?

Basically, I'm not completely against using merges (aka join) until performance suffers. I think this has an easy implementation, but I can't figure out how to “get” the previous end time to start the time of my break table.

For example:

# The numbers represent seconds from 1970-01-01 01:00:01 dat <- structure( list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), stime = structure(c(as.POSIXct("2014-01-15 08:00:00"), as.POSIXct("2014-01-15 11:00:00"), as.POSIXct("2014-01-16 11:30:00"), as.POSIXct("2014-01-15 09:30:00"), as.POSIXct("2014-01-15 12:30:00"), as.POSIXct("2014-01-15 13:30:00") ), class = c("POSIXct", "POSIXt"), tzone = ""), etime = structure(c(as.POSIXct("2014-01-15 10:30:00"), as.POSIXct("2014-01-15 12:00:00"), as.POSIXct("2014-01-16 13:00:00"), as.POSIXct("2014-01-15 11:00:00"), as.POSIXct("2014-01-15 12:45:00"), as.POSIXct("2014-01-15 14:30:00") ), class = c("POSIXct", "POSIXt"), tzone = "") ), .Names = c("ID", "stime", "etime"), sorted = c("ID", "stime", "etime"), class = c("data.table", "data.frame"), row.names = c(NA,-6L) ) dat <- data.table(dat) 

This leads to:

 ID stime etime 1 2014-01-15 10:30:00 2014-01-15 11:00:00 1 2014-01-15 12:00:00 2014-01-16 11:30:00 2 2014-01-15 11:00:00 2014-01-15 12:30:00 2 2014-01-15 12:45:00 2014-01-15 13:30:00 

Please note: gaps are reported evenly after a few days.

+8
r data.table
source share
2 answers

David's answer option is probably a little less efficient, but simpler to type:

 setkey(dat, stime)[, .(stime=etime[-.N], etime=stime[-1]), by=ID] 

It produces:

  ID stime etime 1: 1 2014-01-15 10:30:00 2014-01-15 11:00:00 2: 1 2014-01-15 12:00:00 2014-01-16 11:30:00 3: 2 2014-01-15 11:00:00 2014-01-15 12:30:00 4: 2 2014-01-15 12:45:00 2014-01-15 13:30:00 

setkey is just that the table is sorted by time.

+5
source share

If I don’t miss something, you are missing a line in your desired output, so I will try to use shift from the devel version, as you mentioned.

 library(data.table) ## v >= 1.9.5 indx <- dat[, .I[-.N], by = ID]$V1 dat[, .(ID, stimes = etime, etime = shift(stime, type = "lead"))][indx] res # ID stime etime # 1: 1 2014-01-15 10:30:00 2014-01-15 11:00:00 # 2: 1 2014-01-15 12:00:00 2014-01-16 11:30:00 # 3: 2 2014-01-15 11:00:00 2014-01-15 12:30:00 # 4: 2 2014-01-15 12:45:00 2014-01-15 13:30:00 
+5
source share

All Articles