Why does loading a peak and a python blade inflate the size of an object on disk?

I have a pickled object in a file called b1.pkl:

$ ls -lb* -rw-r--r-- 1 fireball staff 64743950 Oct 11 15:32 b1.pkl 

Then I run the following python code to load the object and upload it to a new file:

 import numpy as np import cPickle as pkl fin = open('b1.pkl', 'r') fout = open('b2.pkl', 'w') x = pkl.load(fin) pkl.dump(x, fout) fin.close() fout.close() 

The file created by this code is more than twice as large:

 $ ls -lb* -rw-r--r-- 1 fireball staff 64743950 Oct 11 15:32 b1.pkl -rw-r--r-- 1 fireball staff 191763914 Oct 11 15:47 b2.pkl 

Can someone explain why the new file is much larger than the original? It must contain exactly the same structure.

+6
source share
3 answers

Perhaps the original pickle used a different protocol. For example, try specifying protocol=2 as the keyword argument in the second pickle.dump and test it again. Binary pickle should be much smaller.

+10
source

Most likely, your original b1.pkl was pickled using a more efficient protocol mode (1 or 2). This way your file starts smaller.

When you load cPickle, it automatically determines the protocol for you from the file. But when you go and unload it again with default arguments, it will use protocol 0, which is much more. It does this for mobility / compatibility. You must explicitly request the binary protocol.

 import numpy as np import cPickle # random data s = {} for i in xrange(5000): s[i] = np.random.randn(5,5) # pickle it out the first time with binary protocol with open('data.pkl', 'wb') as f: cPickle.dump(s, f, 2) # read it back in and pickle it out with default args with open('data.pkl', 'rb') as f: with open('data2.pkl', 'wb') as o: s = cPickle.load(f) cPickle.dump(s, o) $ ls -l 1174109 Oct 11 16:05 data.pkl 3243157 Oct 11 16:08 data2.pkl 
+4
source

pkl.dump (x, fout, 2) will probably result in the same file size. Without specifying the protocol version, it will use the old version 0.

+3
source

Source: https://habr.com/ru/post/927512/


All Articles