How to convert character to numeric in data.table for specific columns?

The dataset below has the characteristics of my large dataset. I manage it in data.table, some columns are loaded as chr, despite the fact that they are numbers, and I want to convert them to numeric and these column names are known

dt = data.table(A=LETTERS[1:10],B=letters[1:10],C=as.character(runif(10)),D = as.character(runif(10))) # simplified version strTmp = c('C','D') # Name of columns to be converted to numeric # columns converted to numeric and returned a 10 x 2 data.table dt.out1 <- dt[,lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp] 

I can convert these 2 columns to numeric using the code above, but I want to update dt. I tried using: = however this did not work. I need help here!

 dt.out2 <- dt[, strTmp:=lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp] # returned a 10 x 6 data.table (2 columns extra) 

I even tried the code below (encoded as data.frame - not my ideal solution, even if it works, as I am worried, in some cases the order may change), but it still does not work. Can anyone tell me why it is not working?

 dt[,strTmp,with=F] <- dt[,lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp] 

Thanks in advance!

+8
r data.table
source share
2 answers

While Roland's answer is more idiomatic, you can also consider set in a loop for something as direct as this. The approach might look something like this:

 strTmp = c('C','D') ind <- match(strTmp, names(dt)) for (i in seq_along(ind)) { set(dt, NULL, ind[i], as.numeric(dt[[ind[i]]])) } str(dt) # Classes 'data.table' and 'data.frame': 10 obs. of 4 variables: # $ A: chr "A" "B" "C" "D" ... # $ B: chr "a" "b" "c" "d" ... # $ C: num 0.308 0.564 0.255 0.828 0.128 ... # $ D: num 0.635 0.0485 0.6281 0.4793 0.7 ... # - attr(*, ".internal.selfref")=<externalptr> 

On the ?set help page, this will avoid some of the overhead [.data.table if it ever becomes a problem for you.

+7
source share
  • You do not need to assign the entire data table if you assign by reference := (i.e. you do not need dt.out2 <- ).

  • You need to wrap LHS := in parentheses to make sure it is evaluated (and not used as a name).

Like this:

 dt[, (strTmp) := lapply(.SD, as.numeric), .SDcols = strTmp] str(dt) #Classes 'data.table' and 'data.frame': 10 obs. of 4 variables: # $ A: chr "A" "B" "C" "D" ... # $ B: chr "a" "b" "c" "d" ... # $ C: num 0.30204 0.00269 0.46774 0.08641 0.02011 ... # $ D: num 0.151 0.0216 0.5689 0.3536 0.26 ... # - attr(*, ".internal.selfref")=<externalptr> 
+25
source share

All Articles