Your function calls the length method in each iteration of the loop. The compiler does not seem to cache it. To fix this vector storage size in a separate variable or use a range-based loop. Also note that we really do not need to explicitly check for missing values, because in C ++ all comparisons with NaN always return false .
Let me compare the performance:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] IntegerVector tabulate1(const IntegerVector& x, const unsigned max) { IntegerVector counts(max); for (std::size_t i = 0; i < x.size(); i++) { if (x[i] > 0 && x[i] <= max) counts[x[i] - 1]++; } return counts; } // [[Rcpp::export]] IntegerVector tabulate2(const IntegerVector& x, const unsigned max) { IntegerVector counts(max); std::size_t n = x.size(); for (std::size_t i = 0; i < n; i++) { if (x[i] > 0 && x[i] <= max) counts[x[i] - 1]++; } return counts; } // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerVector tabulate3(const IntegerVector& x, const unsigned max) { IntegerVector counts(max); for (auto& now : x) { if (now > 0 && now <= max) counts[now - 1]++; } return counts; } // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerVector tabulate4(const IntegerVector& x, const unsigned max) { IntegerVector counts(max); for (auto it = x.begin(); it != x.end(); it++) { if (*it > 0 && *it <= max) counts[*it - 1]++; } return counts; } /***R library(microbenchmark) x <- sample(10, 1e5, rep = TRUE) microbenchmark( tabulate(x, 10), tabulate1(x, 10), tabulate2(x, 10), tabulate3(x, 10), tabulate4(x, 10) ) x[sample(10e5, 10e3)] <- NA microbenchmark( tabulate(x, 10), tabulate1(x, 10), tabulate2(x, 10), tabulate3(x, 10), tabulate4(x, 10) ) */
tabulate1 is the original version.
Test results:
Without NA :
Unit: microseconds expr min lq mean median uq max neval tabulate(x, 10) 143.557 146.8355 169.2820 156.1970 177.327 286.370 100 tabulate1(x, 10) 390.706 392.6045 437.7357 416.5655 443.065 748.767 100 tabulate2(x, 10) 108.149 111.4345 139.7579 118.2735 153.118 337.647 100 tabulate3(x, 10) 107.879 111.7305 138.2711 118.8650 139.598 300.023 100 tabulate4(x, 10) 391.003 393.4530 436.3063 420.1915 444.048 777.862 100
With NA :
Unit: microseconds expr min lq mean median uq max neval tabulate(x, 10) 943.555 1089.5200 1614.804 1333.806 2042.320 3986.836 100 tabulate1(x, 10) 4523.076 4787.3745 5258.490 4929.586 5624.098 7233.029 100 tabulate2(x, 10) 765.102 931.9935 1361.747 1113.550 1679.024 3436.356 100 tabulate3(x, 10) 773.358 914.4980 1350.164 1140.018 1642.354 3633.429 100 tabulate4(x, 10) 4241.025 4466.8735 4933.672 4717.016 5148.842 8603.838 100
The tabulate4 function, which uses an iterator, is also slower than tabulate . We can improve it: