Yesterday I gave this answer: Match data tables with five columns to change the value in another column .
In the comments, OP asked if we can effectively achieve the left join of the two tables and thereby get NSs that will cause the right table to be assigned to the left table. It seems to me that data.table does not provide any facilities for this.
Here is an example that I used in this question:
set.seed(1L); dt1 <- data.table(id=1:12,expand.grid(V1=1:3,V2=1:4),blah1=rnorm(12L)); dt2 <- data.table(id=13:18,expand.grid(V1=1:2,V2=1:3),blah2=rnorm(6L)); dt1; ## id V1 V2 blah1 ## 1: 1 1 1 -0.6264538 ## 2: 2 2 1 0.1836433 ## 3: 3 3 1 -0.8356286 ## 4: 4 1 2 1.5952808 ## 5: 5 2 2 0.3295078 ## 6: 6 3 2 -0.8204684 ## 7: 7 1 3 0.4874291 ## 8: 8 2 3 0.7383247 ## 9: 9 3 3 0.5757814 ## 10: 10 1 4 -0.3053884 ## 11: 11 2 4 1.5117812 ## 12: 12 3 4 0.3898432 dt2; ## id V1 V2 blah2 ## 1: 13 1 1 -0.62124058 ## 2: 14 2 1 -2.21469989 ## 3: 15 1 2 1.12493092 ## 4: 16 2 2 -0.04493361 ## 5: 17 1 3 -0.01619026 ## 6: 18 2 3 0.94383621 key <- paste0('V',1:2);
And here is the solution I gave that does not get NA for inappropriate rows:
dt1[dt2,on=key,id:=i.id]; dt1; ## id V1 V2 blah1 ## 1: 13 1 1 -0.6264538 ## 2: 14 2 1 0.1836433 ## 3: 3 3 1 -0.8356286 ## 4: 15 1 2 1.5952808 ## 5: 16 2 2 0.3295078 ## 6: 6 3 2 -0.8204684 ## 7: 17 1 3 0.4874291 ## 8: 18 2 3 0.7383247 ## 9: 9 3 3 0.5757814 ## 10: 10 1 4 -0.3053884 ## 11: 11 2 4 1.5117812 ## 12: 12 3 4 0.3898432
We need id values ββof 12 and lower that remain in dt1 to replace NA (not because they are 12 or lower, and not because these id values ββare not in dt2 , but because the connection is in the key columns, namely V1 and V2 does not match for these lines in dt1 from dt2 ).
As I said in the comments on this question, a workaround is to pre-assign dt1$id all NA, and then start indexing-destination-connection. Therefore, this is the expected result:
dt1$id <- NA; dt1[dt2,on=key,id:=i.id]; dt1; ## id V1 V2 blah1 ## 1: 13 1 1 -0.6264538 ## 2: 14 2 1 0.1836433 ## 3: NA 3 1 -0.8356286 ## 4: 15 1 2 1.5952808 ## 5: 16 2 2 0.3295078 ## 6: NA 3 2 -0.8204684 ## 7: 17 1 3 0.4874291 ## 8: 18 2 3 0.7383247 ## 9: NA 3 3 0.5757814 ## 10: NA 1 4 -0.3053884 ## 11: NA 2 4 1.5117812 ## 12: NA 3 4 0.3898432
I think the workaround is fine, but I'm not sure why data.table seems not to be able to perform this function in one shot using the index list operation. Below are three dead ends that I have learned:
1: nomatch
data.table provides the nomatch argument, which is a bit like the all , all.x and all.y merge() arguments. This is actually a very limited argument; it only allows you to go from the right join ( nomatch=NA , by default) to the inner join ( nomatch=0 ). We cannot reach the left connection with it.
2: flip dt1 and dt2
Since dt1[dt2] is the right join, we can just flip it, which means dt2[dt1] , to get the corresponding left join.
This will not work because we need to use the syntax := in the argument j to assign to dt1 , and under the inverted call, we assign dt2 instead. I tried to assign i.id under an inverted command, but this did not affect the original dt1 .
3: use merge.data.table()
We can call merge.data.table() with the argument all.x=T to reach the left join. Now the problem is that merge.data.table() has no j argument, and it just does not provide any means to assign a column to the left (or right) table.
So, is it possible to perform this operation at all with data.table? And if so, what is the best way to do this?