Building a data frame in Rcpp

I want to build a data frame in Rcpp , but when I get it, it doesn't look like a data frame. I tried pushing vectors, etc., but that leads to the same thing. Consider:

RcppExport SEXP makeDataFrame(SEXP in) { Rcpp::DataFrame dfin(in); Rcpp::DataFrame dfout; for (int i=0;i<dfin.length();i++) { dfout.push_back(dfin(i)); } return dfout; } 

in R:

 > .Call("makeDataFrame",mtcars,"myPkg") [[1]] [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 [31] 15.0 21.4 [[2]] [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 [[3]] [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8 [13] 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0 304.0 350.0 [25] 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0 [[4]] [1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 [20] 65 97 150 150 245 175 66 91 113 264 175 335 109 [[5]] [1] 3.90 3.90 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 3.07 2.93 [16] 3.00 3.23 4.08 4.93 4.22 3.70 2.76 3.15 3.73 3.08 4.08 4.43 3.77 4.22 3.62 [31] 3.54 4.11 [[6]] [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070 [13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840 [25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780 [[7]] [1] 16.46 17.02 18.61 19.44 17.02 20.22 15.84 20.00 22.90 18.30 18.90 17.40 [13] 17.60 18.00 17.98 17.82 17.42 19.47 18.52 19.90 20.01 16.87 17.30 15.41 [25] 17.05 18.90 16.70 16.90 14.50 15.50 14.60 18.60 [[8]] [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1 [[9]] [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 [[10]] [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4 [[11]] [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2 
+7
source share
4 answers

It seems Rcpp can return the correct data.frame if you give names explicitly. I'm not sure how to adapt this to your arbitrary name example.

 mkdf <- ' Rcpp::DataFrame dfin(input); Rcpp::DataFrame dfout; for (int i=0;i<dfin.length();i++) { dfout.push_back(dfin(i)); } return Rcpp::DataFrame::create( Named("x")= dfout(1), Named("y") = dfout(2)); ' library(inline) test <- cxxfunction( signature(input="data.frame"), mkdf, plugin="Rcpp") test(input=head(iris)) 
+6
source

Short:

  • DataFrames really look like lists with the added need to have a common length, so it’s best to create columns by columns.

  • The best way is to search for our unit tests. Her inst/unitTests/runit.DataFrame.R parses group tests for the DataFrame class.

  • You also found the .push_back() member function in Rcpp , which we added for convenience and analogy with STL. We caution that this is not recommended: due to differences in how R objects are created, we essentially always need to make full copies. .push_back not very efficient.

  • Even though I often answer here, rcpp-devel list the best place for Rcpp questions.

+12
source

Using the information from @baptiste's answer , this is what ultimately gives a well-formed data frame:

 RcppExport SEXP makeDataFrame(SEXP in) { Rcpp::DataFrame dfin(in); Rcpp::DataFrame dfout; Rcpp::CharacterVector namevec; std::string namestem = "Column Heading "; for (int i=0;i<2;i++) { dfout.push_back(dfin(i)); namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i))); } dfout.attr("names") = namevec; Rcpp::DataFrame x; Rcpp::Language call("as.data.frame",dfout); x = call.eval(); return x; } 

I think it remains as before that this might be inefficient due to push_back (as suggested by @Dirk) and a second language score. I looked at rcpp unitTests and still could not come up with something better. Does anyone have any idea?

Update:

Using @Dirk's suggestions (thanks!), This seems to be a simpler and more efficient solution:

 RcppExport SEXP makeDataFrame(SEXP in) { Rcpp::DataFrame dfin(in); Rcpp::List myList(dfin.length()); Rcpp::CharacterVector namevec; std::string namestem = "Column Heading "; for (int i=0;i<dfin.length();i++) { myList[i] = dfin(i); // adding vectors namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i))); // making up column names } myList.attr("names") = namevec; Rcpp::DataFrame dfout(myList); return dfout; } 
+6
source

I agree with the yoran. The output of the C function, called inside R, is a list of all its arguments, both "in" and "out", so each "column" of the data frame can be represented in the call to C as an argument. Once the result of calling the C function is in R, all that remains to be done is to extract these list items using indexing the list and give them the corresponding names.

0
source

All Articles