Why is Rcpp's self-written vector math function faster than its base copy?

Well, I know the answer, but, being inspired by this question , I would like to get some nice opinions about the following: why the Rcpp exercise is below ca. 15% faster (for long vectors) than the built-in exp() ? We all know that Rcpp is a wrapper for the R / C API, so we should expect slightly worse performance.

 Rcpp::cppFunction(" NumericVector exp2(NumericVector x) { NumericVector z = Rcpp::clone(x); int n = z.size(); for (int i=0; i<n; ++i) z[i] = exp(z[i]); return z; } ") library("microbenchmark") x <- rcauchy(1000000) microbenchmark(exp(x), exp2(x), unit="relative") ## Unit: relative ## expr min lq median uq max neval ## exp(x) 1.159893 1.154143 1.155856 1.154482 0.926272 100 ## exp2(x) 1.000000 1.000000 1.000000 1.000000 1.000000 100 
+8
r rcpp exp
source share
2 answers

Base R tends to do more checks for NA , so we can win a bit without doing it. Also note that by doing tricks such as loop reversal (as done in Rcpp Sugar), we can do a little better.

So I added

 Rcpp::cppFunction("NumericVector expSugar(NumericVector x) { return exp(x); }") 

and with this I get an extra win - with less code on the user side:

 R> microbenchmark(exp(x), exp2(x), expSugar(x), unit="relative") Unit: relative expr min lq mean median uq max neval exp(x) 1.11190 1.11130 1.11718 1.10799 1.08938 1.02590 100 exp2(x) 1.08184 1.08937 1.07289 1.07621 1.06382 1.00462 100 expSugar(x) 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 100 R> 
+8
source share

If you really want to improve performance, code must be written to use the basic concurrency equipment. You can do this using the RcppParallel package, and its parallelFor would be ideal for this.

You can also try a more modern implementation of R/C++ . The next version of Rcpp11 , released a few days later, will automatically add sugar with sugar, which will make expSugar from the previous answer better.

Consider:

 #include <Rcpp.h> using namespace Rcpp ; // [[Rcpp::export]] NumericVector exp2(NumericVector x) { NumericVector z = Rcpp::clone(x); int n = z.size(); for (int i=0; i<n; ++i) z[i] = exp(z[i]); return z; } // [[Rcpp::export]] NumericVector expSugar(NumericVector x) { return exp(x) ; } /*** R library(microbenchmark) x <- rcauchy(1000000) microbenchmark(exp(x), exp2(x), expSugar(x)) */ 

With Rcpp I get:

 $ RcppScript /tmp/exp.cpp > library(microbenchmark) > x <- rcauchy(1e+06) > microbenchmark(exp(x), exp2(x), expSugar(x)) Unit: milliseconds expr min lq median uq max neval exp(x) 7.027006 7.222141 7.421041 8.631589 21.78305 100 exp2(x) 6.631870 6.790418 7.064199 8.145561 31.68552 100 expSugar(x) 6.491868 6.761909 6.888111 8.154433 27.36302 100 

Such a pleasant, but somewhat anecdotal improvement, which can be explained by various attachments, etc., as described in other answers and comments.

With Rcpp11 and auto threaded sugar, I get:

 $ Rcpp11Script /tmp/exp.cpp > library(microbenchmark) > x <- rcauchy(1e+06) > microbenchmark(exp(x), exp2(x), expSugar(x)) Unit: milliseconds expr min lq median uq max neval exp(x) 7.029882 7.077804 7.336214 7.656472 15.38953 100 exp2(x) 6.636234 6.748058 6.917803 7.017314 12.09187 100 expSugar(x) 1.652322 1.780998 1.962946 2.261093 12.91682 100 
+5
source share

All Articles