Add a column by a link to a sliding connection

The main problem is described in the title of the question. We go straight to the example below.

I have two data sets:

library(data.table) dt1 <- data.table(date = as.Date("2015-06-28")+c(0L,3L,5L,7L), key="date") dt2 <- data.table(date = as.Date("2015-06-30")+c(0:1,4L), val = letters[7:9], dummy = rep(NA,3), key="date") 

I want the val column from dt2 added to dt1 using a sliding join.
The following statement will produce similar output for the expected:

 dt2[dt1, roll=TRUE] # date val dummy # 1: 2015-06-28 NA NA # 2: 2015-07-01 h NA # 3: 2015-07-03 h NA # 4: 2015-07-05 i NA 

These are two problems with this statement:
1. I did not want to have a dummy column 2. I want to do this from the link:

 address(dt1) # [1] "0x3b57540" address(dt2[dt1, roll=TRUE]) # [1] "0x3b4e1f0" 

So, I'm looking for a rolling connection and add a column by reference for my dt1 and dt2 , the expected results:

 # date val # 1: 2015-06-28 NA # 2: 2015-07-01 h # 3: 2015-07-03 h # 4: 2015-07-05 i 

And, of course, address(dt1) must match the address magic statement.

+5
source share
2 answers

Here you should use a method that should scale.

 address(dt1) # [1] "0x265a060" ix = dt2[dt1, roll=TRUE, which=TRUE] dt1[, val := dt2[ix, val]] dt1 # date val # 1: 2015-06-28 NA # 2: 2015-07-01 h # 3: 2015-07-03 h # 4: 2015-07-05 i address(dt1) # [1] "0x265a060" 

Or without creating an index, directly as:

 dt1[, val := dt2[dt1, val, roll = TRUE]] ## (1) 

Note that this is more efficient than doing:

 dt1[, val := dt2[dt1, roll = TRUE]$val] ## (2) 

(2) does the whole join (materializes all columns) and then retrieves val , where as (1) directly just retrieves the val column.

+3
source

Here's a workaround; not ideal, because it still involves creating additional memory, although there may be a way to avoid this, which does not immediately come to me (in any case, additional memory may be minimal):

 > address(dt1) [1] "0x57b5230" rng<-range(dt1[,range(date)],dt2[,range(date)]) x<-data.table(date=seq(from=rng[1],to=rng[2],by="day"), key="date") > address(x) [1] "0x6aa2df0" x[dt2,setdiff(names(dt2),"date"):=mget(setdiff(names(dt2),"date")) ][,val:=zoo::na.locf(val,na.rm=F)] > address(x) [1] "0x6aa2df0" > dt1[x,val:=i.val][] date val 1: 2015-06-28 NA 2: 2015-07-01 h 3: 2015-07-03 h 4: 2015-07-05 i > address(dt1) [1] "0x57b5230" 
+1
source

All Articles