Join the data table for the exact date or, if not the nearest, less than the date

I would like to join two data.table using the date as a union.

Well, once I didn’t have an exact match, in which case I would like to find the nearest smaller date. My problem is very similar to this SQL article: SQL Join on Coming less than date

I know that the syntax of data.table similar to SQL, but I cannot code this. What is the correct syntax?

A simplified example:

 Dt1 date x 1/26/2010 - 10 1/25/2010 - 9 1/24/2010 - 9 1/22/2010 - 7 1/19/2010 - 11 Dt2 date 1/26/2010 1/23/2010 1/20/2010 

output

  date x 1/26/2010 - 10 1/23/2010 - 7 1/20/2010 - 11 

Thanks in advance.

+5
r data.table
source share
2 answers

Here you go:

 library(data.table) 

Create data:

 Dt1 <- read.table(text=" date x 1/26/2010, 10 1/25/2010, 9 1/24/2010, 9 1/22/2010, 7 1/19/2010, 11", header=TRUE, stringsAsFactors=FALSE) Dt2 <- read.table(text=" date 1/26/2010 1/23/2010 1/20/2010", header=TRUE, stringsAsFactors=FALSE) 

Convert to data.table , convert strings to dates and set the data.table key:

 Dt1 <- data.table(Dt1) Dt2 <- data.table(Dt2) Dt1[, date:=as.Date(date, format=("%m/%d/%Y"))] Dt2[, date:=as.Date(date, format=("%m/%d/%Y"))] setkey(Dt1, date) setkey(Dt2, date) 

Join the tables using roll=TRUE :

 Dt1[Dt2, roll=TRUE] date x [1,] 2010-01-20 11 [2,] 2010-01-23 7 [3,] 2010-01-26 10 
+6
source share
 ?data.table # search for the `roll` argument example(data.table) # search for the example using roll=TRUE vignette("datatable-intro") # see section "3: Fast time series join" vignette("datatable-faq") # see FAQs 2.16 and 2.20 

This is one of the main features of data.table . Since the rows are ordered (unlike SQL), this operation is simple and very fast. SQL is inherently disordered, so you need self-connection and "order" to complete this task. This can be done in SQL and it works, but it can be slow and requires more code. Since SQL is a repository of strings, even SQL in memory, it has a lower bound determined by loading pages from RAM into L2 cache. data.table is below this lower bound because it is a column repository.

2 vignettes are also on the page.

+2
source share

All Articles