To add to the alternatives, you can also use replace instead of the typical blah[index] <- NA approach. replace will look like this:
df <- replace(df, df == "NA", NA)
Another alternative to consider is type.convert . This is the function that R uses when reading data to automatically convert column types. Thus, the result differs from your current approach in that, for example, the second column is converted to a numeric one.
df[] <- lapply(df, function(x) type.convert(as.character(x), na.strings = "NA")) df
Performance is compared here. The sample data is taken from @roland's answer.
Here are the features to check:
funop <- function() { df[df == "NA"] <- NA df } funr <- function() { ind <- which(vapply(df, function(x) class(x) %in% c("character", "factor"), FUN.VALUE = TRUE)) as.data.table(df)[, names(df)[ind] := lapply(.SD, function(x) { is.na(x) <- x == "NA" x }), .SDcols = ind][] } funam1 <- function() replace(df, df == "NA", NA) funam2 <- function() { df[] <- lapply(df, function(x) type.convert(as.character(x), na.strings = "NA")) df }
Here's the benchmarking:
library(microbenchmark) microbenchmark(funop(), funr(), funam1(), funam2(), times = 10)
replace will be the same as @roland's approach, which is similar to @jgozal. However, the type.convert approach will result in different types of columns.
all.equal(funop(), setDF(funr())) all.equal(funop(), funam()) str(funop()) # 'data.frame': 10000000 obs. of 3 variables: # $ vect1: Factor w/ 3 levels "BANANA","HELLO",..: 2 2 NA 2 1 1 1 NA 1 1 ... # $ vect2: Factor w/ 3 levels "1","5","NA": NA 2 1 NA 1 NA NA 1 NA 2 ... # $ vect3: Factor w/ 1 level "NA": NA NA NA NA NA NA NA NA NA NA ... str(funam2()) # 'data.frame': 10000000 obs. of 3 variables: # $ vect1: Factor w/ 2 levels "BANANA","HELLO": 2 2 NA 2 1 1 1 NA 1 1 ... # $ vect2: int NA 5 1 NA 1 NA NA 1 NA 5 ... # $ vect3: logi NA NA NA NA NA NA ...