How to combine time frame data with NA output for non-overlapping parts?

Question

How to combine time frame data with NA output for non-overlapping parts?

I have two datasets (df1 and df2), and both are time-based. I want to make it "objective." When merging two data with c ("id1", "id2") I want to leave "NA" in non-overlapping time.

df1

id1 id2 click_timing 1 11 2015-02-03 01:00:00 1 11 2015-02-03 02:00:00 1 12 2015-02-03 03:00:00 1 12 2015-02-03 04:00:00 1 13 2015-02-03 05:10:00 2 34 2015-02-03 03:00:00 2 34 2015-02-03 04:00:00 2 36 2015-02-03 01:00:00 ...

df2

 id1 id2 start end 1 11 2015-02-03 00:20:00 2015-02-03 00:40:00 1 11 2015-02-03 00:50:00 2015-02-03 01:20:00 1 13 2015-02-03 01:10:00 2015-02-03 01:40:00 1 13 2015-02-03 04:50:00 2015-02-03 05:30:00 2 34 2015-02-03 03:50:00 2015-02-03 04:10:00 ...

objective exit

 id1 id2 click_timing start end 1 11 NA 2015-02-03 00:20:00 2015-02-03 00:40:00 1 11 2015-02-03 01:00:00 2015-02-03 00:50:00 2015-02-03 01:20:00 1 11 2015-02-03 02:00:00 NA NA 1 12 2015-02-03 03:00:00 NA NA 1 12 2015-02-03 04:00:00 NA NA 1 13 NA 2015-02-03 01:10:00 2015-02-03 01:40:00 1 13 2015-02-03 05:10:00 2015-02-03 04:50:00 2015-02-03 05:30:00 2 34 2015-02-03 03:00:00 NA NA 2 34 2015-02-03 04:00:00 2015-02-03 03:50:00 2015-02-03 04:10:00 2 36 2015-02-03 01:00:00 NA NA ...

+5

merge r dataset

John legend2 Apr 19 '15 at 23:59

source share

2 answers

bgoldst · Answer 1 · 2015-04-20T00:55:31+0000

Tough problem! I think you need to calculate the intersection between each individual click_timing value and each time period ( start and end ) by manually sorting all click_timing values, and then use the resulting index match as an additional join field

 df1 <- data.frame(id1=c(1,1,1,1,1,2,2,2), id2=c(11,11,12,12,13,34,34,36), click_timing=as.POSIXct(c('2015-02-03 01:00:00','2015-02-03 02:00:00','2015-02-03 03:00:00','2015-02-03 04:00:00','2015-02-03 05:10:00','2015-02-03 03:00:00','2015-02-03 04:00:00','2015-02-03 01:00:00')) ); df2 <- data.frame(id1=c(1,1,1,1,2), id2=c(11,11,13,13,34), start=as.POSIXct(c('2015-02-03 00:20:00','2015-02-03 00:50:00','2015-02-03 01:10:00','2015-02-03 04:50:00','2015-02-03 03:50:00')), end=as.POSIXct(c('2015-02-03 00:40:00','2015-02-03 01:20:00','2015-02-03 01:40:00','2015-02-03 05:30:00','2015-02-03 04:10:00')) ); m <- sapply(1:nrow(df1), function(i) which(df1$id1[i]==df2$id1 & df1$id2[i] == df2$id2 & df1$click_timing[i]>=df2$start & df1$click_timing[i]<=df2$end)[1] ); merge(cbind(df1,m=m),cbind(df2,m=1:nrow(df2)),by=c('id1','id2','m'),all=T)[-3]; ## id1 id2 click_timing start end ## 1 1 11 <NA> 2015-02-03 00:20:00 2015-02-03 00:40:00 ## 2 1 11 2015-02-03 01:00:00 2015-02-03 00:50:00 2015-02-03 01:20:00 ## 3 1 11 2015-02-03 02:00:00 <NA> <NA> ## 4 1 12 2015-02-03 04:00:00 <NA> <NA> ## 5 1 12 2015-02-03 03:00:00 <NA> <NA> ## 6 1 13 <NA> 2015-02-03 01:10:00 2015-02-03 01:40:00 ## 7 1 13 2015-02-03 05:10:00 2015-02-03 04:50:00 2015-02-03 05:30:00 ## 8 2 34 2015-02-03 04:00:00 2015-02-03 03:50:00 2015-02-03 04:10:00 ## 9 2 34 2015-02-03 03:00:00 <NA> <NA> ## 10 2 36 2015-02-03 01:00:00 <NA> <NA>

If there ever will be a case where one click_timing value intersects with several start and end pairs, then this solution will choose the one that occurs earlier (i.e. has the bottom row index in df2 ) than other matches.

Krome · Answer 2 · 2015-04-21T04:59:08+0000

Reproduction of an initial data frame and introduction of insignificant results:

 library(data.table) library(lubridate) df1<- fread("id1,id2,click_timing 1,11,2015-02-03 01:00:00 1,11,2015-02-03 02:00:00 1,12,2015-02-03 03:00:00 1,12,2015-02-03 04:00:00 1,13,2015-02-03 05:10:00 2,34,2015-02-03 03:00:00 2,34,2015-02-03 04:00:00 2,36,2015-02-03 01:00:00") # adding a redundant click_timing2 column to use as the end range for further foverlaps() function df1[, click_timing2:= click_timing] df1[,c("click_timing", "click_timing2"):= list(parse_date_time(click_timing, "%Y-%m-%d %T"), parse_date_time(click_timing2, "%Y-%m-%d %T"))] df2<- fread("id1,id2,start,end 1,11,2015-02-03 00:20:00,2015-02-03 00:40:00 1,11,2015-02-03 00:50:00,2015-02-03 01:20:00 1,13,2015-02-03 01:10:00,2015-02-03 01:40:00 1,13,2015-02-03 04:50:00,2015-02-03 05:30:00 2,34,2015-02-03 03:50:00,2015-02-03 04:10:00") df2[,c("start","end") := list(parse_date_time(start, "%Y-%m-%d %T"), parse_date_time(end, "%Y-%m-%d %T"))] setkey(df2, id1, id2, start, end)

Decision:

 df3<- foverlaps(df1, df2, by.x=c("id1", "id2", "click_timing", "click_timing2"), by.y = c("id1", "id2", "start", "end"), type="within") objective_output<- merge(df3, df2, by = c("id1", "id2", "start", "end"), all = T) # deleting redundant click_timing2 column objective_output[,click_timing2:= NULL] # reordering columns setcolorder(objective_output, c(1,2,5,3,4)) #setting key using all columns and thus reordering all rows setkey(objective_output) objective_output #id1 id2 click_timing start end # 1: 1 11 2015-02-03 02:00:00 <NA> <NA> # 2: 1 11 <NA> 2015-02-03 00:20:00 2015-02-03 00:40:00 # 3: 1 11 2015-02-03 01:00:00 2015-02-03 00:50:00 2015-02-03 01:20:00 # 4: 1 12 2015-02-03 03:00:00 <NA> <NA> # 5: 1 12 2015-02-03 04:00:00 <NA> <NA> # 6: 1 13 <NA> 2015-02-03 01:10:00 2015-02-03 01:40:00 # 7: 1 13 2015-02-03 05:10:00 2015-02-03 04:50:00 2015-02-03 05:30:00 # 8: 2 34 2015-02-03 03:00:00 <NA> <NA> # 9: 2 34 2015-02-03 04:00:00 2015-02-03 03:50:00 2015-02-03 04:10:00 #10: 2 36 2015-02-03 01:00:00 <NA> <NA>

How to combine time frame data with NA output for non-overlapping parts?

More articles: