Low memory LZW compression / decompression

Question

Low memory LZW compression / decompression

Can anyone point out pointers how I can implement lzw compression / decompression in low memory conditions (<2k). perhaps?

+8

c algorithm embedded compression lzw

Manas Jul 08 '10 at 12:05

source share

8 answers

old_timer · Answer 1 · 2010-07-08T20:17:37+0000

The zlib library that everyone uses is bloated among other issues (for built-in ones). I am sure this will not work for your case. I had a bit more memory, maybe 16K, and I couldn’t pick it up. It allocates and zeros large chunks of memory and stores copies of things, etc. The algorithm can do this, but finding existing code is a challenge.

I went with http://lzfx.googlecode.com The decompression cycle is tiny, it is an older compression of type lz, which relies on previous results so that you should have access to uncompressed results ... The next byte is 0x5, the next byte is 0x23, the next 15 bytes is a copy of 15,200 bytes back, the next 6 bytes is a copy of 127 back ... The new lz algorithm is a table of variable width widths that can be large or grow depending on how it is implemented.

I was dealing with duplicate data and tried to compress several K down to several hundred, I think that the compression was about 50%, but not very good, but the work and decompression procedure were tiny. The lzfx package above is small, not like zlib, like the two main functions that have code right there, rather than dozens of files. You can probably change the depth of the buffer, perhaps improve the compression algorithm if you want. I had to change the decompression code (for example, 20 or 30 lines of code), it was a heavy pointer, and I switched it to arrays because in my embedded environment the pointers were in the wrong place. Burns can be an extra register, or it doesn't depend on how you implement it or your compiler. I also did this so that I could abstract the samples and byte stores, as I packed them into memory that was not addressed by byte.

If you find something better, please post it here or email me via stackoverflow, I am also very interested in other built-in solutions. I searched quite a lot, and the above was the only useful I found, and I was lucky that my data was such that it was compressed enough using this algorithm ... for now.

Dummy00001 · Answer 2 · 2010-07-08T21:33:43+0000

Can anyone point out pointers how I can implement lzw compression / decompression in low memory conditions (<2k). perhaps?

Why LZW? LZW requires a lot of memory. It is based on a hash dictionary, and the compression ratio is proportional to the size of the hash / dictionary. More memory means better compression. Less memory — the output can be even more than the input.

I didn’t deal with coding very long, but IIRC Huffman coding is a little better when it comes to memory consumption.

But it all depends on the type of information you want to compress.

Gerhard · Answer 3 · 2010-07-12T08:27:25+0000

I used LZSS . I used the code from Haruhiko Okumura as a base. It uses the last piece of uncompressed data (2K) as a dictionary. The code I linked can be modified so that you practically do not use memory if you have all the uncompressed data available in memory. With a little googling, you will find that there are many different implementations.

R .. · Answer 4 · 2010-07-09T06:17:22+0000

If the choice of compression algorithm is not specified in the stone, you can try gzip / LZ77 instead. Here is a very simple implementation that I used and adapted once:

ftp://quatramaran.ens.fr/pub/madore/misc/myunzip.c

You will need to clear the way you read input, error handling, etc., but this is a good start. It is probably too big if your data and code should be 2k, but at least the data size is already small.

The big plus is the public domain, so you can use it however you like!

Sparky · Answer 5 · 2010-07-08T13:04:26+0000

More than 15 years have passed since the last time I played with the LZW compression algorithm, so do the following with a grain of salt.

Given the limitations of memory, at best it will be difficult. The dictionary that you create will consume the vast majority of what you have. (Assuming code + memory <= 2k.)

Choose a small fixed size for your vocabulary. Let's say 1,024 entries.

Let each dictionary entry take the form ....

struct entry { intType prevIdx; charType newChar; };

This structure makes the dictionary recursive. You need the element in the previous index to be valid for it to work correctly. Is it workable? I'm not sure. However, let's assume at that moment that this is so, and find out where he leads us ....

If you use standard types for int and char, you will run out of memory soon. You will want to pack things as tight as possible. To write to 1024 records, 10 bits are required. Your new character is likely to take 8 bits. Total = 18 bits.

18 bits * 1024 entries = 18432 bits or 2304 bytes.

At first glance, this seems too big. What are we doing? Take advantage of the fact that the first 256 entries are already known - your typical extended ascii set, or whatever you have. This means that we really need 768 records.

768 * 18 bits = 13824 bits or 1728 bytes.

This gives you approximately 320 bytes to play the code. Naturally, you can play around with the meaning of the dictionary and see what is good for you, but you will not have much space for your code. Since you are looking at such a small number of codes, I would expect you to finish coding in the assembly.

Hope this helps.

R .. · Answer 6 · 2010-07-08T12:46:34+0000

My best recommendation is to study the BusyBox source and see if their LZW implementation is enough to work in your environment.

puchu · Answer 7 · 2018-11-30T17:56:55+0000

The lowest dictionary for lzw is trie in the linked list . See the original implementation in LZW AB . I rewrote it in a fork of LZWS . The plug is compatible with compress . Detailed documentation here .

Bit Dictionary n requires (2 ** n) * sizeof(code) + ((2 ** n) - 257) * sizeof(code) + (2 ** n) - 257 .

So:

9-bit code - 1789 bytes.
The 12-bit code is 19709 bytes.
The 16-bit code is 326,909 bytes.

Please note that these are dictionary requirements. You need about 100-150 bytes for state or variables on the stack.

The decompressor will use less memory than the compressor.

Therefore, I think you can try to compress your data with version 9 bit . But this will not provide a good compression ratio. The more bits, the better.

demon · Answer 8 · 2013-09-13T08:39:07+0000

 typedef unsigned int UINT; typedef unsigned char BYTE; BYTE *lzw_encode(BYTE *input ,BYTE *output, long filesize, long &totalsize); BYTE *lzw_decode(BYTE *input ,BYTE *output, long filesize, long &totalsize);

Low memory LZW compression / decompression

More articles: