Rcpp function to select (and return) sub-data

Is it possible to write a C ++ function that receives an R dataFrame as input and then modifies a dataFrame (in our case, takes a subset) and returns a new data frame (in this question, returning a sub-dataframe)? My code below can make my question clearer:

code

# Suppose I have the data frame below created in R: myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10)) # Suppose I want to write a C++ function that gets id as inout and returns # a sub-dataframe corresponding to that id (**If it possible to return # DataFrame in C++**) # Auxiliary function --> helps get a sub vector: arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){ arma::uvec AuxVec = find(IDVec == ID); arma::vec rslt = arma::vec(AuxVec.size()); for (int i = 0; i < AuxVec.size(); i++){ rslt[i] = vecMain[AuxVec[i]]; } return rslt; } # Here is my C++ function: Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){ arma::vec id = Rcpp::as<arma::vec>(myDF["id"]); arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]); arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]); // here I take a sub-vector: arma::vec id_sub = myVecSubset(id, id, int ID); arma::vec alph_sub = myVecSubset(alph, id, int ID); arma::vec mess_sub = myVecSubset(mess, id, int ID); // here is the CHALLENGE: How to combine these vectors into a new data frame??? ??? } 

So there are actually two main questions: 1) Is there a better way to take the sub-dataframe higher in C ++? (want me to just say myDF [myDF $ id == ID,] !!!)

2) Is there anyway that I can combine id_sub, alpha_sub and mess_sub into an R data frame and return it?

I really appreciate your help.

+6
source share
3 answers

For this you do not need Rcpp and RcppArmadillo , you can just use R subset or perhaps dplyr::filter . This is likely to be more efficient than your code, which is to deep copy the data from the data frame to the armadillo vectors, create new armadillo vectors, and then copy them back to the R-vectors so that you can create the data frame. This leads to a large amount of waste. Another source of waste is that you find three times the same thing

In any case, to answer your question, just use DataFrame::create .

 DataFrame::create( _["id"] = id_sub, _["alpha"] = alph_dub, _["mess"] = mess_sub ) ; 

Also note that in your code, alpha will be a factor, so arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]); unlikely to do what you want.

+4
source

To add Romain to the query, you can try calling the operator [ via Rcpp. If we understand how df[x, ] is evaluated (ie. This is really a call to "[.data.frame"(df, x, R_MissingArg) , this is easy to do:

 #include <Rcpp.h> using namespace Rcpp; Function subset("[.data.frame"); // [[Rcpp::export]] DataFrame subset_test(DataFrame x, IntegerVector y) { return subset(x, y, R_MissingArg); } /*** R df <- data.frame(x=1:3, y=letters[1:3]) subset_test(df, c(1L, 2L)) */ 

gives me

 > df <- data.frame(x=1:3, y=letters[1:3]) > subset_test(df, c(1L, 2L)) xy 1 1 a 2 2 b 

Callbacks to R can usually be slower in Rcpp, but depending on how this bottleneck is, it might be fast enough for you.

Be careful, as this function will use a subset based on 1, not a subset 0 for integer vectors.

+6
source

Here is the complete test file. It doesn't need your extractor function and just reassembles the subsets, but for that it needs the newest Rcpp, as it is currently on GitHub, where Kevin seems to have added some work on indexing the subset, which we need here:

 #include <Rcpp.h> /*** R ## Suppose I have the data frame below created in R: ## NB: stringsAsFactors set to FALSE ## NB: setting seed as well set.seed(42) myDF <- data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10), stringsAsFactor=FALSE) */ // [[Rcpp::export]] Rcpp::DataFrame extract(Rcpp::DataFrame D, Rcpp::IntegerVector idx) { Rcpp::IntegerVector id = D["id"]; Rcpp::CharacterVector alph = D["alph"]; Rcpp::NumericVector mess = D["mess"]; return Rcpp::DataFrame::create(Rcpp::Named("id") = id[idx], Rcpp::Named("alpha") = alph[idx], Rcpp::Named("mess") = mess[idx]); } /*** R extract(myDF, c(2,4,6,8)) */ 

With this file we get the expected result:

 R> library(Rcpp) R> sourceCpp("/tmp/sepher.cpp") R> ## Suppose I have the data frame below created in R: R> ## NB: stringsAsFactors set to FALSE R> ## NB: setting seed as well R> set.seed(42) R> myDF <- data.frame(id = rep(c(1,2), each = 5), + alph = letters[1:10], + mess = rnorm(10), + .... [TRUNCATED] R> extract(myDF, c(2,4,6,8)) id alpha mess 1 1 c 0.363128 2 1 e 0.404268 3 2 g 1.511522 4 2 i 2.018424 R> R> packageDescription("Rcpp")$Version ## unreleased version [1] "0.11.1.1" R> 

I just needed something similar a few weeks ago (but did not involve character vectors) and used Armadillo with its elem() functions, using the unsigned int vector as an index.

+4
source

All Articles