NA / NaN / Inf in data.table 1.9.2

After checking the new data.table 1.9.2 function, I don’t quite understand the new NA / NaN / Inf manipulation function.

News:

NA, NaN, + Inf and -Inf are now considered different values, can be in keys, can be combined and can be grouped. data.table defines: NA <NaN <-Inf

I don’t know what it means, “can be combined and can be grouped”

DT <- data.table(A=c(NA,NA,1:3), B=c("a",NA,letters[1:3])) 

Now we have NA in both columns A and B,

But I lost a little how to act, and what is the purpose of this new function. Could you give an example to illustrate this?

Thanks a lot!

+6
source share
1 answer

In previous versions of data.table NA, NaN,Inf values ​​could exist in the key, but you could not join or use binary scanning to sequentially select these rows with different key values.

See Select NA in data.table in R and data.table a subset of NaN does not work for examples of SO questions that relate to these problems (and you can track the history through responses to function requests in the data.table project)

Now, in 1.9.2 (and higher), such things will work.

 # an example data set DT <- data.table(A = c(NA,NaN,Inf,Inf,-Inf,NA,NaN,1,2,3), B =letters[1:10], key = 'A') # selection using binary search DT[.(Inf)] # AB # 1: Inf c # 2: Inf d DT[.(-Inf)] # AB # 1: -Inf e # note that you need to use the right kind of NA DT[.(NA_real_)] # AB # 1: NA a # 2: NA f DT[.(NaN)] # AB # 1: NaN b # 2: NaN g # grouping works DT[,.N,by=A] # AN # 1: NA 2 # 2: NaN 2 # 3: -Inf 1 # 4: 1 1 # 5: 2 1 # 6: 3 1 # 7: Inf 2 
+11
source

All Articles