What is the best Python mail module for handling large files?

EDIT: In particular, compression and extraction speed.

Any suggestions?

thanks

+7
performance python compression extraction zip
source share
2 answers

So, I made a large zip file with an arbitrary size:

$ ls -l *zip -rw-r--r-- 1 aleax 5000 115749854 Nov 18 19:16 large.zip $ unzip -l large.zip | wc 23396 93633 2254735 

ie, 116 MB with 23.4 thousand files in it and time:

 $ time unzip -d /tmp large.zip >/dev/null real 0m14.702s user 0m2.586s sys 0m5.408s 

this is the unzip binary system command - no doubt as finely tuned and optimized as a pure C-executable file. Then (after clearing / tmp; -) ...:

 $ time py26 -c'from zipfile import ZipFile; z=ZipFile("large.zip"); z.extractall("/tmp")' real 0m13.274s user 0m5.059s sys 0m5.166s 

... and this is Python with its standard library - a little more demanding on CPU time, but more than 10% faster in real, that is, elapsed time.

You can, of course, repeat such measurements (on your specific platform - if it does not work well on the processor, for example, a slow ARM chip, then the additional needs of the Python processor will ultimately slow down - and your specific zipfiles, since every large zip file will be have a completely different mix and possibly performance). But it seems to me that there is not much space much faster to create a Python extension than the good old zipfile - since Python uses it, it uses the uncip version with pure C enabled by the system ,-)

+13
source share

To process large files without loading them into memory, use the new stream methods in Python 2.6 zipfile , for example ZipFile.open . Do not use extract or extractall unless you have strongly sanitized the file names in the ZIP archive.

(You used read all bytes in memory or hacked it like zipstream ; are deprecated.)

+4
source share

All Articles