I am using the rpy2 package to port some R functionality to python. The functions that I use in R need a data.frame object and using rlike.TaggedList and then robjects.DataFrame. I can do this job.
However, I am having performance problems compared to the same R functions with exactly the same data, which made me try to use the low level rpy2 interface as indicated here - http://rpy.sourceforge.net/rpy2/doc- 2.3 / html / performances.html
So far I have tried:
- Using TaggedList with FloatSexpVector objects instead of numpy arrays and a DataFrame.
Dropping the TaggedList and DataFrame classes using a dictionary like this:
d = dict((var_name, var_sexp_vector) for ...) dataframe = robjects.r('data.frame')(**d)
Both did not get me noticeable acceleration.
I noticed that DataFrame objects can get rinterface.SexpVector in their constructor, so I thought about creating such a named vector, but I have no idea how to put names (in RI I just know its names (vec) = c ('a', 'b' ...)).
How can I do it? Is there another way? And is there an easy way to profile rpy, so I could know where this bottleneck is?
EDIT:
The following code seems to work fine (x4 faster) on the newer version of rpy (2.2.3)
data = ro.r('list')([ri.FloatSexpVector(x) for x in vectors])[0] data.names = ri.StrSexpVector(vector_names)
However, this does not apply to version 2.0.8 (the latter is supported by windows), since R canant seems to be able to use the names: "Error in eval (expr, envir, enc): object 'y' not found"
Ideas?
EDIT # 2: Someone did a great job of creating the rpy2.3 binary for windows (python 2.7), the above works fine with it (almost x6 faster for my code)
link: https://bitbucket.org/breisfeld/rpy2_w32_fix/issue/1/binary-installer-for-win32
source share