Do.call rbind of data.table depends on the location of the NA

Consider this

do.call(rbind, list(data.table(x=1, b='x'),data.table(x=1, b=NA))) 

returns

  xb 1: 1 x 2: 1 NA 

but

 do.call(rbind, list(data.table(x=1, b=NA),data.table(x=1, b='x'))) 

returns

  xb 1: 1 NA 2: 1 NA 

How can I force the first behavior without changing the contents of the list?

The data table is really really faster in mapreduce jobs (calling data.table ~ 10 * 3MM times 55 nodes, the data table is many times faster than the data frame, so I want this to work ...) Regards saptarshi

+7
r data.table rbind
source share
1 answer

As Frank noted, the problem is that there are (somewhat invisible) several different types of NA . The one that is created when you type NA on the command line has the "logical" class, but there are also NA_integer_ , NA_real_ , NA_character_ and NA_complex_ .

In the first example, the initial data.table sets the class of column b to “character”, and the NA in the second data.table then forced to NA_character_ . In the second example, however, the NA in the first data.table sets column b to the “logical” class, and when the same column in the second data table is forced to “logical”, it is converted to the logical NA. (Try as.logical("x") to see why.)

This is all quite complicated (to formulate, at least), but there is a fairly simple solution. Just create a 1-line data.table template and add it to every data.table you want rbind() . It sets the class of each column as what you want, regardless of what data.table follows it in the list passed to rbind() , and it can be truncated when everything else is connected together.

 library(data.table) ## The two lists of data.tables from the OP A <- list(data.table(x=1, b='x'),data.table(x=1, b=NA)) B <- list(data.table(x=1, b=NA),data.table(x=1, b='x')) ## A 1-row template, used to set the column types (and then removed) DT <- data.table(x=numeric(1), b=character(1)) ## Test it out do.call(rbind, c(list(DT), A))[-1,] # xb # 1: 1 x # 2: 1 NA do.call(rbind, c(list(DT), B))[-1,] # xb # 1: 1 NA # 2: 1 x ## Finally, as _also_ noted by Frank, rbindlist will likely be more efficient rbindlist(c(list(DT), B)[-1,] 
+8
source share

All Articles