Creating R data.frame in python with low rpy2

I am using the rpy2 package to port some R functionality to python. The functions that I use in R need a data.frame object and using rlike.TaggedList and then robjects.DataFrame. I can do this job.

However, I am having performance problems compared to the same R functions with exactly the same data, which made me try to use the low level rpy2 interface as indicated here - http://rpy.sourceforge.net/rpy2/doc- 2.3 / html / performances.html

So far I have tried:

  • Using TaggedList with FloatSexpVector objects instead of numpy arrays and a DataFrame.
  • Dropping the TaggedList and DataFrame classes using a dictionary like this:

    d = dict((var_name, var_sexp_vector) for ...) dataframe = robjects.r('data.frame')(**d) 

Both did not get me noticeable acceleration.

I noticed that DataFrame objects can get rinterface.SexpVector in their constructor, so I thought about creating such a named vector, but I have no idea how to put names (in RI I just know its names (vec) = c ('a', 'b' ...)).

How can I do it? Is there another way? And is there an easy way to profile rpy, so I could know where this bottleneck is?

EDIT:

The following code seems to work fine (x4 faster) on the newer version of rpy (2.2.3)

 data = ro.r('list')([ri.FloatSexpVector(x) for x in vectors])[0] data.names = ri.StrSexpVector(vector_names) 

However, this does not apply to version 2.0.8 (the latter is supported by windows), since R canant seems to be able to use the names: "Error in eval (expr, envir, enc): object 'y' not found"

Ideas?

EDIT # 2: Someone did a great job of creating the rpy2.3 binary for windows (python 2.7), the above works fine with it (almost x6 faster for my code)

link: https://bitbucket.org/breisfeld/rpy2_w32_fix/issue/1/binary-installer-for-win32

+4
source share
1 answer

Python can be several times faster than R (even a byte-compiled R), and I was able to perform operations on R data structures with rpy2 faster than R. Using the appropriate R and rpy2 code together will help make more specific recommendations (and if necessary improve rpy2).

In the meantime, SexpVector may not be what you want; it is nothing more than an abstract class for all R-vectors (see the class diagram for rpy2.rinterface ). ListSexpVector might be more appropriate:

 import rpy2.rinterface as ri ri.initr() l = ri.ListSexpVector([ri.IntSexpVector((1,2,3)), ri.StrSexpVector(("a","b","c")),]) 

An important detail is that R-lists are recursive data structures, and R avoids the situation with a 22-type catch, having the operator "[[" (in addition to "["). Python does not have this, and I have not yet implemented "[[" as a method at a low level.

Profiling in Python can be done, for example, using the stdlib module of the cProfile module.

+1
source

All Articles