Merge.data.table with all = True introduces the string NA. It's right?

Performing a merge between a populated data.table and another empty, inserts one NA row into the resulting data table.

a = data.table(c=c(1,2),key='c') b = data.table(c=3,key='c') b=b[c!=3] b # Empty data.table (0 rows) of 1 col: c merge(a,b,all=T) # c # 1: NA # 2: 1 # 3: 2 

Why? I was expecting it to return only data.table a rows, as is the case with merge.data.frame:

 > merge.data.frame(a,b,all=T,by='c') # c #1 1 #2 2 
+6
source share
4 answers

The example in the question is too simple to show the problem, therefore, confusion and discussion. Using two data.table columns data.table not enough to show what merge does!

Here is a better example:

 > a = data.table(P=1:2,Q=3:4,key='P') > b = data.table(P=2:3,R=5:6,key='P') > a PQ 1: 1 3 2: 2 4 > b PR 1: 2 5 2: 3 6 > merge(a,b) # correct PQR 1: 2 4 5 > merge(a,b,all=TRUE) # correct. PQR 1: 1 3 NA 2: 2 4 5 3: 3 NA 6 > merge(a,b[0],all=TRUE) # incorrect result when y is empty, agreed PQR 1: NA NA NA 2: NA NA NA 3: 1 3 NA 4: 2 4 NA > merge.data.frame(a,b[0],all=TRUE) # correct PQR 1 1 3 NA 2 2 4 NA 
Ricardo reached the end and fixed it in version 1.8.9. From the news:

merge no longer returns false NA strings when y is empty and all.y = TRUE (or all = TRUE), # 2633 . thanks to Vinicius Almendre for the message. Added test.

+5
source

all: logical; all = TRUE is a shorthand to preserve the settings of both all.x = TRUE and all.y = TRUE.

all.x: logical; if TRUE, then additional rows will be added to the output, one for each row in x, which does not have a corresponding row in y. These rows will contain NA in these columns, which are usually filled with values ​​from y. The default is FALSE, so only rows with data from x and y are included in the output.

all.y: logical; similar to all.x above.

This is taken from the data.table documentation . See the description of arguments for the merge function for more details.

I think this answers your question.

+1
source

Given that you define a and b in your path. A simple use of rbind(a,b) will only return the strings a .

However, if you want to combine a NULL data table b with some other non-empty data table a , there is another approach. I had a similar problem when I had to combine different data tables in different loops. I used this workaround.

  #some loop that returns data.table named a #another loop starts if(all.equal(a,b<-data.table())==TRUE){ b<-a next } merge(a,b,c("Factor1","Factor2")) 

It helped me, maybe it helps too.

0
source

We expect that for merge.data.frame all=T will be a full outer join , so you will get all the keys of both tables, see merging

0
source

All Articles