Pandas.DataFrame.load / save between python2 and python3: problems with protocol calculation

I don't know how to make pickle load / save between python 2 and 3 with pandas DataFrames. The sorter has a β€œprotocol” option with which I played unsuccessfully, but I hope someone has a quick idea for me to try. Here is the code to get the error:

python2.7

>>> import pandas; from pylab import * >>> a = pandas.DataFrame(randn(10,10)) >>> a.save('a2') >>> a = pandas.DataFrame.load('a2') >>> a = pandas.DataFrame.load('a3') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load return com.load(path) File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load return pickle.load(f) ValueError: unsupported pickle protocol: 3 

python3

 >>> import pandas; from pylab import * >>> a = pandas.DataFrame(randn(10,10)) >>> a.save('a3') >>> a = pandas.DataFrame.load('a3') >>> a = pandas.DataFrame.load('a2') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load return com.load(path) File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load return pickle.load(f) UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128) 

Maybe waiting for a brine to work between the python version is a bit optimistic?

+6
source share
3 answers

I had the same problem. You can change the protocol of the pickle dataframe file with the following function in python3:

 import pickle def change_pickle_protocol(filepath,protocol=2): with open(filepath,'rb') as f: obj = pickle.load(f) with open(filepath,'wb') as f: pickle.dump(obj,f,protocol=protocol) 

Then you should be able to open it in python2 without any problems.

+5
source

You can override the highest protocol available for the brine package:

 import pickle as pkl import pandas as pd if __name__ == '__main__': # this constant is defined in pickle.py in the pickle package:" pkl.HIGHEST_PROTOCOL = 2 # 'foo.pkl' was saved in pickle protocol 4 df = pd.read_pickle(r"C:\temp\foo.pkl") # 'foo_protocol_2' will be saved in pickle protocol 2 # and can be read in pandas with Python 2 df.to_pickle(r"C:\temp\foo_protocol_2.pkl") 

This is definitely not an elegant solution, but it works without modifying the pandas code.

+1
source

If someone uses pandas.DataFrame.to_pickle() , do the following modification in the source code to be able to configure the pickle protocol:

1) In the source file /pandas/io/pickle.py (before changing, copy the source file as /pandas/io/pickle.py.ori ), find the following lines:

 def to_pickle(obj, path): pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL) 

Change these lines to:

 def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL): pkl.dump(obj, f, protocol=protocol) 

2) In the source file /pandas/core/generic.py (before changing, copy the source file as /pandas/core/generic.py.ori ), find the following lines:

 def to_pickle(self, path): return to_pickle(self, path) 

Change these lines to:

 def to_pickle(self, path, protocol=None): return to_pickle(self, path, protocol) 

3) Restart the python kernel if it starts, then save your framework using the available pickle protocol (0, 1, 2, 3, 4):

 # Python 2.x can read this df.to_pickle('my_dataframe.pck', protocol=2) # protocol will be the highest (4), Python 2.x can not read this df.to_pickle('my_dataframe.pck') 

4) After updating pandas, repeat steps 1 and 2.

5) (optional) Ask the developers to have this feature in the official versions (because your code will throw exceptions in any other Python environment without these changes)

A good day!

0
source

All Articles