Join two data tables and use only one column from the second dt

Suppose I have two data tables (dt1 and dt2), and I want to get dt3 using data tables. A, B, C, E, F, G, H - column names. The dt1 key is column A, and the dt2 key is column E. Data tables have a different number of rows. I want to save all columns from DT1 and add only one column (H) from DT2 to the joined data table. Over time, I will save this as DT1 (although I showed it as dt3 below).

How can I achieve this using data tables? I have an ugly solution with combining + data frames.

dt1 ABC 1 4 7 2 5 8 3 6 9 2 20 21 dt2 EFGH 1 10 13 16 3 12 15 18 2 11 14 17 dt3 ABCH 1 4 7 16 2 5 8 17 3 6 9 18 2 20 21 17 
+5
source share
3 answers

To make a left join with df1 and add column H from df2 , you can combine the binary join with the update with the link operator ( := )

 setkey(setDT(dt1), A) dt1[dt2, H := iH] 

See here and here for a detailed explanation of how it works.


With the devel version (v> = 1.9.5), we could make it even shorter by specifying key inside setDT (as @Arun pointed out)

 setDT(dt1, key = "A")[dt2, H := iH] 

Edit 7/24/2015

Now you can start the binary connection using the new on parameter without setting keys

 setDT(dt1)[dt2, H := iH, on = c(A = "E")] 
+15
source

data.table solution

 setDT(dt1)[ , H := dt2$H[match(dt1$A , dt2$E)] , ] # ABCH # 1: 1 4 7 16 # 2: 2 5 8 17 # 3: 3 6 9 18 # 4: 2 20 21 17 

another dplyr solution dplyr be

 left_join(x = dt1 , y = dt2 , by = c("A" = "E")) %>% select(one_of(c("A" , "B" , "C" , "H"))) 
+4
source

If they have one length, I would use

 df3<-cbind(df1,df2[,4]) 
-3
source

All Articles