Equivalent to "which" function in Rcpp

I am new to C ++ and Rcpp. Suppose I have a vector

t1<-c(1,2,NA,NA,3,4,1,NA,5) 

and I want to get the index of the elements t1 that are NA . I can write:

 NumericVector retIdxNA(NumericVector x) { // Step 1: get the positions of NA in the vector LogicalVector y=is_na(x); // Step 2: count the number of NA int Cnt=0; for (int i=0;i<x.size();i++) { if (y[i]) { Cnt++; } } // Step 3: create an output matrix whose size is same as that of NA // and return the answer NumericVector retIdx(Cnt); int Cnt1=0; for (int i=0;i<x.size();i++) { if (y[i]) { retIdx[Cnt1]=i+1; Cnt1++; } } return retIdx; } 

then i get

 retIdxNA(t1) [1] 3 4 8 

It was interesting to me:

(i) is there any equivalent of which in Rcpp?

(ii) is there a way to make the above function shorter / sharper? In particular, is there an easy way to combine steps 1, 2, 3 above?

+2
r rcpp
source share
4 answers

The latest version of RcppArmadillo has functions for identifying end and non-end indexes.

So this code

 #include <RcppArmadillo.h> // [[Rcpp::depends(RcppArmadillo)]] // [[Rcpp::export]] arma::uvec whichNA(arma::vec x) { return arma::find_nonfinite(x); } /*** R t1 <- c(1,2,NA,NA,3,4,1,NA,5) whichNA(t1) */ 

gives the desired answer (module one by one in C / C ++, since they are based on zero):

 R> sourceCpp("/tmp/uday.cpp") R> t1 <- c(1,2,NA,NA,3,4,1,NA,5) R> whichNA(t1) [,1] [1,] 2 [2,] 3 [3,] 7 R> 

Rcpp can do this if you first create a sequence for a subset:

 // [[Rcpp::export]] Rcpp::IntegerVector which2(Rcpp::NumericVector x) { Rcpp::IntegerVector v = Rcpp::seq(0, x.size()-1); return v[Rcpp::is_na(x)]; } 

Added to the code above it gives:

 R> which2(t1) [1] 2 3 7 R> 

The logical subset is also somewhat new in Rcpp.

+8
source share

Try the following:

 #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] IntegerVector which4( NumericVector x) { int nx = x.size(); std::vector<int> y; y.reserve(nx); for(int i = 0; i < nx; i++) { if (R_IsNA(x[i])) y.push_back(i+1); } return wrap(y); } 

which we can run, like this, in R:

 > which4(t1) [1] 3 4 8 

Performance

Note that we have changed the above solution to the backup space for the output vector. This replaces which3 , which:

 // [[Rcpp::export]] IntegerVector which3( NumericVector x) { int nx = x.size(); IntegerVector y; for(int i = 0; i < nx; i++) { // if (internal::Rcpp_IsNA(x[i])) y.push_back(i+1); if (R_IsNA(x[i])) y.push_back(i+1); } return y; } 

Then the performance on a vector of 9 elements is the following with which4 the fastest:

 > library(rbenchmark) > benchmark(retIdxNA(t1), whichNA(t1), which2(t1), which3(t1), which4(t1), + replications = 10000, order = "relative")[1:4] test replications elapsed relative 5 which4(t1) 10000 0.14 1.000 4 which3(t1) 10000 0.16 1.143 1 retIdxNA(t1) 10000 0.17 1.214 2 whichNA(t1) 10000 0.17 1.214 3 which2(t1) 10000 0.25 1.786 

Repeating this for a 9000-element vector, the Armadillo solution comes pretty quickly than the others. Here which3 (which coincides with which4 , except that it does not reserve space for the output vector) comes in the worst case, and which4 takes the second place.

 > tt <- rep(t1, 1000) > benchmark(retIdxNA(tt), whichNA(tt), which2(tt), which3(tt), which4(tt), + replications = 1000, order = "relative")[1:4] test replications elapsed relative 2 whichNA(tt) 1000 0.09 1.000 5 which4(tt) 1000 0.79 8.778 3 which2(tt) 1000 1.03 11.444 1 retIdxNA(tt) 1000 1.19 13.222 4 which3(tt) 1000 23.58 262.000 
+4
source share

All of the above solutions are serial. Although this is not trivial, it is entirely possible to use threads to implement which . See this post for more details. Although for such a small size it will not do more harm than good. How to take a plane a short distance, you lose too much time at the airport security.

R implements which by allocating memory for a logical vector, such as an input, performs one pass to store indexes in that memory, and then copies it to the appropriate logical vector.

Intuitively, this seems less efficient than a double-pass loop, but not necessary, since copying a range of data is cheap. More details here .

+4
source share

Just write a function for yourself, like:

 which_1<-function(a,b){ return(which(a>b)) } 

Then pass this function to rcpp.

-one
source share

All Articles