I have a MacBook (Mac OS X 10.9) with 16 GB of RAM. Two Pythons installed via Anaconda: 2.7.8 and 3.4.1. Both are equipped with the latest scikit-learn 0.15.1. When trying to run this simple code (just checking the ability to serialize large matrices):
import numpy as np test_data = np.random.rand(10000, 60000) print(test_data.nbytes / 2**30) from sklearn.externals import joblib joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')
Python 2.7.8 succeeds, but Python 3.4.1 is stuck with the following error:
Failed to save <class 'numpy.ndarray'> to .npy file: Traceback (most recent call last): File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site- packages/sklearn/externals/joblib/numpy_pickle.py", line 240, in save obj, filename = self._write_array(obj, filename) File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site- packages/sklearn/externals/joblib/numpy_pickle.py", line 203, in _write_array self.np.save(filename, array) File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site- packages/numpy/lib/npyio.py", line 453, in save format.write_array(fid, arr) File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site- packages/numpy/lib/format.py", line 410, in write_array fp.write(array.tostring('C')) OSError: [Errno 22] Invalid argument Traceback (most recent call last): File "<ipython-input-3-90ed09e5c6d4>", line 1, in <module> joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib') File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site- packages/sklearn/externals/joblib/numpy_pickle.py", line 368, in dump pickler.dump(value) File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 412, in dump self.framer.end_framing() File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 196, in end_framing self.commit_frame(force=True) File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 208, in commit_frame write(data) OSError: [Errno 22] Invalid argument
The problem seems to be the amount of data that needs to be saved. For example, Python 3 does a great job with np.random.rand (10000, 20,000), which is 1.5 GB.
Just in case, pickling did not work:
import pickle with open('/Users/va/Desktop/test_data.pkl', 'wb') as f: pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)
goes to:
Traceback (most recent call last): File "<ipython-input-6-3f73f3011539>", line 3, in <module> pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL) OSError: [Errno 22] Invalid argument
On Windows 7, Python 3.4 works great with joblib and pickle .
Any suggestions for resolving this issue with Python 3 on Mac?