So, I made a large zip file with an arbitrary size:
$ ls -l *zip -rw-r--r-- 1 aleax 5000 115749854 Nov 18 19:16 large.zip $ unzip -l large.zip | wc 23396 93633 2254735
ie, 116 MB with 23.4 thousand files in it and time:
$ time unzip -d /tmp large.zip >/dev/null real 0m14.702s user 0m2.586s sys 0m5.408s
this is the unzip binary system command - no doubt as finely tuned and optimized as a pure C-executable file. Then (after clearing / tmp; -) ...:
$ time py26 -c'from zipfile import ZipFile; z=ZipFile("large.zip"); z.extractall("/tmp")' real 0m13.274s user 0m5.059s sys 0m5.166s
... and this is Python with its standard library - a little more demanding on CPU time, but more than 10% faster in real, that is, elapsed time.
You can, of course, repeat such measurements (on your specific platform - if it does not work well on the processor, for example, a slow ARM chip, then the additional needs of the Python processor will ultimately slow down - and your specific zipfiles, since every large zip file will be have a completely different mix and possibly performance). But it seems to me that there is not much space much faster to create a Python extension than the good old zipfile - since Python uses it, it uses the uncip version with pure C enabled by the system ,-)
Alex martelli
source share