Load the high-res R dataset into a Pandas DataFrame

Some R-data> can be loaded into a Pandas DataFrame or Panel quite easily:

import pandas.rpy.common as com infert = com.load_data('infert') print(infert.head()) 

This seems to work as long as the dataset size R is <= 3. Larger datasets print an error message:

 In [67]: com.load_data('Titanic') Cannot handle dim=4 

This error message appears in the rpy / common.py _convert_array .

Of course, it makes sense that Pandas cannot directly train a 4-dimensional matrix in a DataFrame or Panel, but is there some workaround for loading datasets like Titanic into a DataFrame (possibly with a hierarchical index)?

+7
python pandas r rpy2
source share
2 answers

With Pandas version 0.13.0 or later , pandas.rpy.common.load_data can load more massive datasets such as Titanic :

 import pandas.rpy.common as com df = com.load_data('Titanic') print(df.head()) 

gives

  Survived Age Sex Class value 0 No Child Male 1st 0.0 1 No Child Male 2nd 0.0 2 No Child Male 3rd 35.0 3 No Child Male Crew 0.0 4 No Child Female 1st 0.0 
+1
source share

Using @joran is a very useful suggestion after installing the reshape package with

 % sudo R R> install.packages('reshape') 

I was able to load a Titanic dataset into a Pandas DataFrame using

 import pandas as pd import pandas.rpy.common as com import rpy2.robjects as ro r = ro.r r('library(reshape)') df = com.convert_robj(r('melt(Titanic)')) print(df.head()) 

who printed

  Class Sex Age Survived value 1 1st Male Child No 0 2 2nd Male Child No 0 3 3rd Male Child No 35 4 Crew Male Child No 0 5 1st Female Child No 0 
+7
source share

All Articles