The fastest way to take the weighted sum of matrix columns in R

I need a weighted sum of each matrix column.

data <- matrix(1:2e7,1e7,2) # warning large number, will eat up >100 megs of memory weights <- 1:1e7/1e5 system.time(colSums(data*weights)) system.time(apply(data,2,function(x) sum(x*weights))) all.equal(colSums(data*weights), apply(data,2,function(x) sum(x*weights))) 

Usually colSums(data*weights) is faster than calling the application.

I often perform this operation (on a large matrix). Therefore, the search for recommendations on the most effective implementation. Ideally, it would be great if we could pass the scales to colSums (or rowSums).

Thanks, appreciate any ideas!

+7
source share
2 answers

colSums and * are internal or primitive functions and will be much faster than the apply approach

Another approach you could try is to use some basic matrix algebra when you are looking

  weights %*% data 

The matrix multiplication method does not work faster, but it will avoid creating a temporary object of size data

 system.time({.y <- colSums(data * weights)}) ## user system elapsed ## 0.12 0.03 0.16 system.time({.x <- weights %*% data}) ## user system elapsed ## 0.20 0.05 0.25 
+8
source

Rcpp leads to increased performance (especially with a large number of columns).

 library(Rcpp) library(inline) src <- ' Rcpp::NumericMatrix dataR(data); Rcpp::NumericVector weightsR(weights); int ncol = dataR.ncol(); Rcpp::NumericVector sumR(ncol); for (int col = 0; col<ncol; col++){ sumR[col] = Rcpp::sum(dataR( _, col)*weightsR); } return Rcpp::wrap(sumR);' weighted.colSums <- cxxfunction( signature(data="numeric", weights="numeric"), src, plugin="Rcpp") data <- matrix(as.numeric(1:1e7),1e5,100) # warning large object weights <- 1:1e5/1e5 all.equal(colSums(data*weights), weighted.colSums(data, weights)) ## [1] TRUE print(system.time(colSums(data*weights))) ## user system elapsed ## 0.065 0.001 0.064 print(system.time(as.vector(weighted.colSums(data, weights)))) ## user system elapsed ## 0.019 0.001 0.019 all.equal(as.vector(weights %*% data), weighted.colSums(data, weights)) ## [1] TRUE print(system.time(weights %*% data)) ## user system elapsed ## 0.066 0.001 0.066 
+3
source

All Articles