How to delete rows from a data table based on a condition in another data table

I have 2 data frames:

master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1)) mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18","2015-01-01 00:00:54","2015-01-01 00:00:48","2015-01-01 00:01:10","2015-01-01 00:01:05"),tz = "GMT")) 

I would like to keep any lines in master within +/- 5 seconds of a window of any time in the mydata data mydata . I would like to delete lines in master that do not meet this condition.

Here is a simpler example if mydata has only 1 row:

 master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1)) mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT")) 

You can see that mydata contains only "2015-01-01 00:00:18" . In this case, I want to delete all the rows from the main data frame, where the time is not in the + - 5 seconds window, i.e. I want to delete all lines from master before "2015-01-01 00:00:13" and after "2015-01-01 00:00:23"

This is a simple case, but harder if mydata contains

  mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18", "2015-01-01 00:00:22"),tz = "GMT")) 

In this case, since "2015-01-01 00:00:18" is there again, I would usually delete all lines in master before "2015-01-01 00:00:13" and after "2015-01-01 00:00:23" .

But in this case I can’t do this because mydata also contains "2015-01-01 00:00:22" , so I want to save all the lines in master after "2015-01-01 00:00:18" and until the "2015-01-01 00:00:27"

Since "2015-01-01 00:00:22" is in my data, now I need to save the lines in master from "2015-01-01 00:00:23" to "2015-01-01 00:00:27"

Basically I want to save any line in master that is within the window +/- 5 seconds of each line in mydata . If there are lines in the main file that are not within 5 seconds, I want to delete it.

Update

Can you advise how to implement this if master and mydata have more than one column, for example:

 master = data.table(MasterTimes= as.POSIXct("2015-01-01", tz = "GMT") + seq(1,100,1), otherol = seq(1,100,1)) mydata = data.frame(MyTimes = as.POSIXct(c("2015-01-01 00:00:18"),tz = "GMT"),othercol = c(1)) 

In fact, both master and mydata have more than 50 columns.

+7
r data.table
source share
4 answers

Base R Solution:

 check_valid_time <- function(row, mydata){ any(row > mydata$MyTimes - 5 & row < mydata$MyTimes + 5) } master[sapply(master$MasterTimes, check_valid_time, mydata),] 
+1
source share

One of the following is possible. First create a foo that contains +/- 5 seconds mydata $ MyTimes for each line. Then you multiply master . First you remove mydata$MyTimes , and then select foo$whatever in MasterTimes. Just in case, I sorted the MasterTimes data at the end.

 foo <- setDT(mydata)[, list(whatever = seq(MyTimes - 5, MyTimes + 5, by = 1)), by = rownames(mydata)] master[!MasterTimes %in% mydata$MyTimes][MasterTimes %in% foo$whatever] -> x setorder(x, MasterTimes) 
+1
source share

Based on nicola's comments:

 master[unlist(lapply(master$MasterTimes, function(x) any(abs(difftime(x, mydata$MyTimes, units="secs"))<5) )),] 
0
source share

Or like this:

 master[which(sapply(unlist(master), function(x) min(sapply(unlist(mydata), function(y) abs(x - y)))) <5 ),] 
0
source share

All Articles