I need to choose a compression algorithm

I need to choose a compression algorithm to compress some data. I do not know the type of data that I will compress in advance (think of it as a WinRAR program).

I have heard about the following algorithms, but I do not know which one I should use. Can someone post a short list of pros and cons? For my application, the first priority is decompression speed; the second priority is space conservation. The compression rate (not decompression) does not matter.

  • deflates
  • Implode
  • Plain huffman
  • bzip2
  • LZMA
+6
language-agnostic algorithm compression
source share
5 answers

If you need a high decompression rate, you should use LZO. Its speed and compression ratio are decent, but it's hard to beat its decompression speed.

+5
source share

I performed several tests compressing a .tar file that contained a combination of high entropy data and text. Here are the results:

  Name - Compression rate * - Decompression Time
 7zip - 87.8% - 0.703s
 bzip2 - 80.3% - 1.661s
 gzip - 72.9% - 0.347s
 lzo - 70.0% - 0.111s

 * Higher is better

From this, I came to the conclusion that the degree of compression of the algorithm depends on its name; the first in alphabetical order will be the one that has the best compression ratio, etc.

So I decided to rename lzo to 1lzo . Now I have a better algorithm.


EDIT : It is worth noting that of all, unfortunately, lzo is the only one with a very restrictive license (GPL): (

+10
source share

In the Linux kernel, this is well explained (from included):

  • Deflate (gzip) - faster, worse compression
  • bzip2 - Slow, medium compression
  • lzma - very slow compression, fast decompression (but slower than gzip), better compression

I do not use others, so it's hard to say, but the speed of the algorithms can largely depend on the architecture. For example, there are studies that compressing data on the hard drive speeds up I / O, because the processor is much faster than the drive that it costs. However, this largely depends on the size of the bottlenecks.

Similarly, one algorithm can use memory extensively, which may or may not cause problems (is 12 MiB a lot or very few? There are a lot of embedded systems: on modern x86 it is a tiny fragment of memory).

+4
source share

Take a look at 7zip . It is open source and contains 7 separate compression methods. Some of the little testing we did shows that the 7z format gives a much smaller result file than the zip, and it was also faster for the example data we used.

Since our standard compression is zip, we have not looked at other compression methods yet.

+2
source share

For an exhaustive benchmark for text data, you can check Test for compressing large text .

For other types, this may be indicative .

+1
source share

All Articles