I have nc columns in data.table and nc in vector. I want to take a linear combination of columns, but I donβt know in advance which columns I will use. What is the most efficient way to do this?
Customization
require(data.table) set.seed(1) n <- 1e5 nc <- 5 cf <- setNames(rnorm(nc),LETTERS[1:nc]) DT <- setnames(data.table(replicate(nc,rnorm(n))),LETTERS[1:nc])
ways to do it
Suppose I want to use the first four columns. I can write by hand:
DT[,list(cf['A']*A+cf['B']*B+cf['C']*C+cf['D']*D)]
I can think of two automatic ways (which work without knowing what AE should be used):
mycols <- LETTERS[1:4]
benchmarking
I expect that as.matrix will make the second option slow and won't have any intuition for the speed of Map - Reduce combinations.
require(rbenchmark) options(datatable.verbose=FALSE)
I get 5% to 40% slowdown relative to the manual approach when I repeat the benchmark call.
my application
The sizes here - n and length(mycols) - are close to what I'm working with, but I will do these calculations many times, changing the coefficient vector, cf
performance r data.table linear-algebra
Frank
source share