R data.table replace “NULL” with “NA” when columns are factors

I pulled some data from the SQL database through ODBC, and the columns are automatically set to factor. This is something like the following:

library(RODBC)
library(data.table)
data <- data.table(sqlQuery(channel, query))

My data looks like this: only with a lot of columns:

data <- data.table("C1"=as.factor(c(letters[1:4], "NULL", letters[5])),
                   "C2"=as.factor(c(rnorm(3), "NULL", rnorm(2))),
                   "C3"=as.factor(c(letters[1], "NULL", letters[2:4], "NULL")))
> data
     C1                 C2   C3
1:    a -0.190200079604691    a
2:    b  0.310548914832963 NULL
3:    c 0.0153099116493453    b
4:    d               NULL    c
5: NULL  0.157187027626419    d
6:    e  0.118537540781528 NULL
> str(data)
Classes ‘data.table’ and 'data.frame':  6 obs. of  3 variables:
 $ C1: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 6 5
 $ C2: Factor w/ 6 levels "-0.190200079604691",..: 1 5 2 6 4 3
 $ C3: Factor w/ 5 levels "a","b","c","d",..: 1 5 2 3 4 5
 - attr(*, ".internal.selfref")=<externalptr> 

How to replace "NULL" with NA? Here I want to Rtreat these SQL "NULL" strings as missing values NA. I tried the following, but it seems to be NAcausing problems.

for (col in names(data)) {
  set(data, which(data[[col]]=="NULL"), col, NA)
}

> Error in set(data, which(data[[col]] == "NULL"), col, NA) : 
  Can't assign to column 'C1' (type 'factor') a value of type 'logical' (not character, factor, integer or numeric)

RODBC Solution

@user20650 sqlQuery, data <- data.table(sqlQuery(channel, query, na.strings=c("NA", "NULL"))). , , , .

+4
2

:

is.na(data) <- data == "NULL"

is.na.data.frame, is.na[<-.dataframe. , , [.data.frame. "is.na < -. Default".

, noodling , "is.na < -. default" ( {x[value] <- NA; x}), [<-.data.table, , , " ",.

+6

:

data[,names(data):=lapply(.SD,function(x){
  z <- levels(x)
  z[z=="NULL"] <- NA
  `levels<-`(x,z)
})]

, , lapply(data,levels), , "NULL" .


(, @akrun:) car:

library(car)
data[,names(data):=lapply(.SD, recode, '"NULL"=NA')]

data.table . ...

for (j in names(data)) setattr(data[[j]],"levels",{
  z <- levels(data[[j]])
  z[z=="NULL"] <- NA
  z
})

, `levels<-`.

+4

All Articles