What is the fastest bzip2 decompressor?

Which bzip2 implementation has the highest decompression speed?

There is http://bitbucket.org/james_taylor/seek-bzip2/src/tip/micro-bunzip.c which claims

Optimization of the size and speed of Manuel Novoa III ( mjn3@codepoet.org ). More efficient reading of huffman codes, optimized read_bunzip () functions and various other settings. In (limited) tests, it’s about 20% faster than bzcat on x86 and about 10% faster on hand. Note that about 2/3 of the time is spent in the read_unzip () reversal of the Burrows-Wheeler conversion. Most of the time, a delay caused by skipping a cache.

Many cache flaws have a chance of being optimized by some methods, so even faster implementations are possible.

This (seek-bzip2) also has an interesting feature to easily search the input file.

My program will consume bzip2 output and (theoretically) can do this in parallel in different parts of the file. Thus, parallel implementations of bzip2 are considered.

Thanks.

+4
source share
2 answers

Here are some http://lists.debian.org/debian-mentors/2009/02/msg00135.html comparisons. Parallel versions are considered.

A little and there http://realworldtech.com/forums/index.cfm?action=detail&id=98883&threadid=98430&roomid=2

links from Intel cilk-parallel version of bzip2 http://software.intel.com/en-us/articles/a-parallel-bzip2/

In addition, bzip2 powered by Intel ipp is very good, and also in IPP (with a negative effect) tries to parallelize some of the internals of bzip2 (without decompressing the parallel block) using openmp (intel KMP 5). If it is limited to one or two streams, 20 MB / s of the decompressed stream is real on 2.4 core2 (ipp "v8" code)

Hope this helps.

+3
source

If you have access to multiprocessor machines (it is easy to rotate a multiprocessor virtual machine on Amazon EC2 or Digital Ocean) / machines with lots of RAM, you should definitely check PBZIP2 :

PBZIP2 is a parallel implementation of the bzip2 file sorting file compressor that uses pthreads and achieves near-linear acceleration on SMP machines.


To illustrate: I am now unpacking a large 17 GB file. bzip2 recorded a decompressed file at a speed of 10 Mbps; PBZIP2 writes it now at a speed of 160 Mbps. I run it this way:

 pbzip2 -v -d -k -m10000 file.bz2 

i.e. -v verbose -d unzip -k save the source file -m1000 to 10Gb RAM

It works on 64 GB of RAM, 20 processors on Digital Ocean, the cost of which is $ 0.952 / hour. :-)

0
source

All Articles