Apply function for subset of columns in data.table with .SDcols

I want to apply a function to a subset of variables in a data table. In this case, I just change the types of variables. I can do this in several different ways in data.table, however I am looking for a method that does not require an intermediate assignment ( mycols in this example) and it requires me to specify the columns that I want to change twice. Here is a simplified reproducibility example:

 library('data.table') n<-30 dt <- data.table(a=sample(1:5, n, replace=T), b=as.character(sample(seq(as.Date('2011-01-01'), as.Date('2015-01-01'), length.out=n))), c1235=as.character(sample(seq(as.Date('2012-01-01'), as.Date('2013-01-01'), length.out=n))), d7777=as.character(sample(seq(as.Date('2012-01-01'), as.Date('2013-01-01'), length.out=n))) ) 

WAY 1: it works ... but it is hardcoded

 mycols <- c('b', 'c1235', 'd7777') dt1 <- dt[,(mycols):=lapply(.SD, as.Date), .SDcols=mycols] 

WAY 2: this works ... but I need to create an intermediate object to work it ( mycols )

 mycols <- which(sapply(dt, class)=='character') dt2 <- dt[,(mycols):=lapply(.SD, as.Date), .SDcols=mycols] 

WAY 3: it works, but I need to specify this long expression twice

 dt3 <- dt[,(which(sapply(dt, class)=='character')):=lapply(.SD, as.Date), .SDcols=which(sapply(dt, class)=='character')] 

WAY 4: this will not work, but I need something similar, which allows me to specify only variables that do .SDcols once. I am looking for a way to replace (.SD):= with something that works ... or combine everything together. In fact, I would be interested to know if anyone has a way to accomplish what was done in WAY 1,2,3 without specifying an intermediate destination that inflates the environment and does not require specifying the same columns twice.

 dt3 <- dt[,(.SD):=lapply(.SD, as.Date), .SDcols=which(sapply(dt, class)=='character')] 
+5
source share
1 answer

here is one answer ...

 for (j in which(sapply(dt, class)=='character')) set(dt, i=NULL, j=j, value=as.Date(dt[[j]])) 

Here's the question where Arun and Matt prefer set with a for loop instead of using .SD

How to apply the same function to each specified column in data.table

+6
source

All Articles