Iberian peeling: fix characters before loading

I have a pickled object (a list with several numpy arrays) that was created on Windows and apparently saved in a file loaded as text, and not in binary mode (i.e. with open(filename, 'w') instead of open(filename, 'wb') ). The result is that now I can’t unlock it (not even on Windows), because it is infected with \r characters (and possibly more)? The main complaint

 ImportError: No module named multiarray 

allegedly because it is looking for numpy.core.multiarray\r , which, of course, does not exist. Just removing the \r characters didn't do the trick (tried both sed -e 's/\r//g' and in python s = file.read().replace('\r', '') , but both broke file and later gave cPickle.UnpicklingError )

The problem is that I really need to get data from objects. Any ideas on fixing the files?

Edit: Upon request, the first few hundred bytes of my file, Octal:

 \x80\x02]q\x01(}q\x02(U\r\ntotal_timeq\x03G?\x90\x15r\xc9(s\x00U\rreaction_timeq\x04NU\x0ejump_directionq\x05cnumpy.core.multiarray\r\nscalar\r\nq\x06cnumpy\r\ndtype\r\nq\x07U\x02f8K\x00K\x01\x87Rq\x08(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\x025\x9d\x13\xfc#\xc8?\x86Rq\tU\x14normalised_directionq\r\nh\x06h\x08U\x08\xf0\xf9,\x0eA\x18\xf8?\x86Rq\x0bU\rjump_distanceq\x0ch\x06h\x08U\x08\x13\x14\xea&\xb0\x9b\ x1a@ \x86Rq\rU\x04jumpq\x0ecnumpy.core.multiarray\r\n_reconstruct\r\nq\x0fcnumpy\r\nndarray\r\nq\x10K\x00\x85U\x01b\x87Rq\x11(K\x01K\x02\x85h\x08\x89U\x10\x87\x16\xdaEG\xf4\xf3?\x06`OC\xe7"\ x1a@tbU \x0emovement_speedq\x12h\x06h\x08U\x08\\p\xf5[2\xc2\xef?\x86Rq\x13U\x0ctrial_lengthq\ x14G@ \t\x98\x87\xf8\x1a\xb4\xbaU\tconditionq\x15U\x0bhigh_mentalq\x16U\x07subjectq\x17K\x02U\x12movement_directionq\x18h\x06h\x08U\x08\xde\x06\xcf\x1c50\xfd?\x86Rq\x19U\x08positionq\x1ah\x0fh\x10K\x00\x85U\x01b\x87Rq\x1b(K\x01K\x02\x85h\x08\x89U\x10K\xb7\xb4\x07q=\x1e\xc0\xf2\xc2YI\xb7U&\xc0tbU\x04typeq\x1ch\x0eU\x08movementq\x1dh\x0fh\x10K\x00\x85U\x01b\x87Rq\x1e(K\x01K\x02\x85h\x08\x89U\x10\xad8\x9c9\x10\xb5\xee\xbf\xffa\xa2hWR\xcf?tbu}q\x1f(h\ x03G@ \t\xba\xbc\xb8\xad\xc8\x14h\x04G?\xd9\x99%]\xadV\x00h\x05h\x06h\x08U\x08\xe3X\xa9=\xc1\xb1\xeb?\x86Rq h\r\nh\x06h\x08U\x08\x88\xf7\xb9\xc1\t\xd6\xff?\x86Rq!h\x0ch\x06h\x08U\x08v\x7f\xeb\x11\xea5\ r@ \x86Rq"h\x0eh\x0fh\x10K\x00\x85U\x01b\x87Rq#(K\x01K\x02\x85h\x08\x89U\x10\xcd\xd9\x92\x9a\x94=\ x06@ ]C\xaf\xef\xeb\xef\ x02@tbh \x12h\x06h\x08U\x08-\x9c&\x185\xfd\xef?\x86Rq$h\ x14G@ \r\xb8W\xb2`V\xach\x15h\x16h\x17K\x02h\x18h\x06h\x08U\x08\x8e\x87\xd1\xc2 

You can also download the entire file (22k).

+7
source share
4 answers

Assuming the file was created using the default protocol method = 0 compatible with ASCII, you should be able to download it anywhere using open('pickled_file', 'rU') ie universal newlines.

If this does not work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200)) and paste the results into the editing of your question.

Update after publishing the contents of the file:

Your file starts with '\x80\x02' ; it was reset with protocol 2, last / best. Protocols 1 and 2 are binary protocols. Your file was written in text mode in Windows. This caused each '\n' be converted to '\r\n' using the C runtime. Files should open in binary mode as follows:

 with open('result.pickle', 'wb') as f: # b for binary pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL) with open('result.pickle', 'rb') as f: # b for binary obj = pickle.load(f) 

The docs are here . This code will work on both Windows and non-Windows systems.

You can restore the original brine image by reading the file in binary mode and then changing the damage by replacing all occurrences of '\r\n' with '\n' . Note. This recovery procedure is necessary if you are trying to read it on Windows or not.

+11
source

New lines in Windows are not only '\r' , it is CRLF, or '\r\n' .

Give file.read().replace('\r\n', '\n') try. You previously deleted a carriage return, which may not have been part of the newlines.

+5
source

You cannot - on Windows - just open the file in text mode, just as it was written, read it and then write it to another file that opens correctly in binary mode?

0
source

Have you tried flipping in text mode? I.e

 x = pickle.load(open(filename, 'r')) 

(On Windows, of course.)

0
source

All Articles