Select rows in a specific time range

Question

Select rows in a specific time range

I have a data frame, for example:

TimeStamp Category 2013-11-02 07:57:18 AM 0 2013-11-02 08:07:19 AM 0 2013-11-02 08:07:21 AM 0 2013-11-02 08:07:25 AM 1 2013-11-02 08:07:29 AM 0 2013-11-02 08:08:18 AM 0 2013-11-02 08:09:20 AM 0 2013-11-02 09:04:18 AM 0 2013-11-02 09:05:22 AM 0 2013-11-02 09:07:18 AM 0

What I want to do is select a time frame of + -10 minutes when Category is "1".

In this case, since category = 1 is on 2013-11-02 08:07:25 AM , I want to select all the lines at 07:57:25 AM to 08:17:25 AM .

What is the best way to handle this?

perhaps a few "1" for each time frame. (the real data frame is more complicated, it contains several TimeStamps with different users, ie there is another column called "UserID")

+7

r dataframe

zxwjames Jun 24 '15 at 10:13

source share

6 answers

Here, how I would like to do this using data.table::foverlaps

First convert TimeStamp to the correct POSIXct

 library(data.table) setDT(df)[, TimeStamp := as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")]

Then we will create a temporary dataset where Category == 1 will be merged. We will also create an "end" column and a key "start" and "end" columns.

 df2 <- setkey(df[Category == 1L][, TimeStamp2 := TimeStamp], TimeStamp, TimeStamp2)

Then we will do the same for df , but set the intervals to 10 minutes

 setkey(df[, `:=`(start = TimeStamp - 600, end = TimeStamp + 600)], start, end)

Then it remains only to run foverlaps and a subset using consistent incidents

 indx <- foverlaps(df, df2, which = TRUE, nomatch = 0L)$xid df[indx, .(TimeStamp, Category)] # TimeStamp Category # 1: 2013-11-02 08:07:19 0 # 2: 2013-11-02 08:07:21 0 # 3: 2013-11-02 08:07:25 1 # 4: 2013-11-02 08:07:29 0 # 5: 2013-11-02 08:08:18 0 # 6: 2013-11-02 08:09:20 0

+7

David Arenburg Jun 24 '15 at 10:48

source share

It works:

Data:

According to @DavidArenburg's comment (and as mentioned in his answer) the correct way to convert a timestamp column to a POSIXct object (if it hasn't already):

 df$TimeStamp <- as.POSIXct(df$TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")

Decision:

 library(lubridate) #for minutes library(dplyr) #for between pickrows <- function(df) { #pick category == 1 rows df2 <- df[df$Category==1,] #for each timestamp create two variables start and end #for +10 and -10 minutes #then pick rows between them lapply(df2$TimeStamp, function(time) { start <- time - minutes(10) end <- time + minutes(10) df[between(df$TimeStamp, start, end),] }) } #run function pickrows(df)

Output:

 > pickrows(df) [[1]] TimeStamp Category 2 2013-11-02 08:07:19 0 3 2013-11-02 08:07:21 0 4 2013-11-02 08:07:25 1 5 2013-11-02 08:07:29 0 6 2013-11-02 08:08:18 0 7 2013-11-02 08:09:20 0

Keep in mind that the output in the case of several lines of Category==1 , my function output will be a list (in this case it has only one element), therefore, to combine only one file, do.call(rbind, pickrows(df)) required . frame.

+4

LyzandeR Jun 24 '15 at 22:45

source share

Using lubridate:

 df$TimeStamp <- ymd_hms(df$TimeStamp) span10 <- (df$TimeStamp[df$Category == 1] - minutes(10)) %--% (df$TimeStamp[df$Category == 1] + minutes(10)) df[df$TimeStamp %within% span10,] TimeStamp Category 2 2013-11-02 08:07:19 0 3 2013-11-02 08:07:21 0 4 2013-11-02 08:07:25 1 5 2013-11-02 08:07:29 0 6 2013-11-02 08:08:18 0 7 2013-11-02 08:09:20 0

+4

Pierre lafortune Jun 24 '15 at 23:04

source share

I personally like the simplicity of the R-based response from @thelatemail. But just for fun, I will give another answer using the sliding connections in data.table , unlike the range overlap solution provided by @DavidArenburg.

 require(data.table) dt_1 = dt[Category == 1L] setkey(dt, TimeStamp) ix1 = dt[.(dt_1$TimeStamp - 600L), roll=-Inf, which=TRUE] # NOCB ix2 = dt[.(dt_1$TimeStamp + 600L), roll= Inf, which=TRUE] # LOCF indices = data.table:::vecseq(ix1, ix2-ix1+1L, NULL) # not exported function dt[indices] # TimeStamp Category # 1: 2013-11-02 08:07:19 0 # 2: 2013-11-02 08:07:21 0 # 3: 2013-11-02 08:07:25 1 # 4: 2013-11-02 08:07:29 0 # 5: 2013-11-02 08:08:18 0 # 6: 2013-11-02 08:09:20 0

This should work fine, even if you have more than one cell where Category is 1, AFAICT. It would be great to wrap this as a function for this type of operation for data.table ...

PS: refer to other posts to convert TimeStamp to POSIXct format.

+3

Arun Jun 24 '15 at 23:18

source share

Here is my solution with dplyr and lubridate . Here are the steps:

Find where category ==1 add to this, + and - 10 minutes with lubridate minutes with a simple c(-1, 1) * minutes(10) , then use filter for a subset based on the two intervals stored in the rang vector .

 library(lubridate) library(dplyr) wi1 <- which(dat$Category == 1 ) rang <- dat$TimeStamp[wi1] + c(-1,1) * minutes(10) dat %>% filter(TimeStamp >= rang[1] & TimeStamp <= rang[2]) TimeStamp Category 1 2013-11-02 08:07:19 0 2 2013-11-02 08:07:21 0 3 2013-11-02 08:07:25 1 4 2013-11-02 08:07:29 0 5 2013-11-02 08:08:18 0 6 2013-11-02 08:09:20 0

+1

Sabdem Jun 24 '15 at 23:02

source share

thelatemail · Accepted Answer · 2015-06-24T23:07:03+0000

In an R database without lubridate-ing or anything else (assuming you're going to convert a TimeStamp into a POSIXct object), for example:

 df$TimeStamp <- as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p") df[with(df, abs(difftime(TimeStamp[Category==1],TimeStamp,units="mins")) <= 10 ),] # TimeStamp Category #2 2013-11-02 08:07:19 0 #3 2013-11-02 08:07:21 0 #4 2013-11-02 08:07:25 1 #5 2013-11-02 08:07:29 0 #6 2013-11-02 08:08:18 0 #7 2013-11-02 08:09:20 0

If you have multiple 1 , you have to iterate over it like this:

 check <- with(df, lapply(TimeStamp[Category==1], function(x) abs(difftime(x,TimeStamp,units="mins")) <= 10 ) ) df[do.call(pmax, check)==1,]

Select rows in a specific time range

More articles: