Byte array compression in Java and decompression in C

Question

Byte array compression in Java and decompression in C

I currently have the following array in a Java program,

byte[] data = new byte[800];

and I would like to compress it before sending it to the microcontroller via the serial port (115200 Baud). I would then unzip the array on a microcontroller in C. However, I'm not quite sure what the best way to do this. Performance is a problem, since a microcontroller is just an arduino, so it cannot be too intense with a memory / processor. The data is more or less random ( edit . I believe that this is not so random, see below). I would say since it represents the rgb color value for every 16 bits.

What would be the best way to compress this data? Any idea how many compressions I could get?

change

Sorry for the lack of information. I need lossless compression, and I only intend to send 800 bytes at a time. My problem is that 800 bytes will not transmit fast enough at the 115200 baud rate that I use. I was hoping I could reduce the size a bit to improve speed.

Every two bytes look like this:

0RRRRRGGGGGBGBBBBB

Where the RG and B bits represent the values for the color channels red, green, and blue, respectively. Every two bytes is a separate LED on a 20x20 grid. I would suggest that many sets of two bytes would be identical, since I often assign the same color codes to multiple LEDs. It may also be that RGB values are often> 15, as I usually use bright colors when I can (but this can be a moot point, since they are not all usually> 15 at the same time).

+6

java c embedded compression decompression

Anon Nov 11 '10 at 21:34

source share

7 answers

Use miniLZO compression. Java version C version

+2

Neopallium Nov 11 '10 at 21:40

source share

A really simple compression / decompression algorithm that is practical in tiny embedded environments and easily "collapses its own" is the encoding of the path length. Basically, this means replacing the path of repeated values with a pair (counter, value). Of course, to introduce a pair, the hourly (magic) value is required, and then a mechanism that allows the magic value to appear in ordinary data (usually an escape sequence can be used for both tasks). In your example, it is best to use 16-bit values (2 bytes).

But, of course, it all depends on the data. Rather random data is incompressible by definition. It is best to collect some sample data first and then evaluate the compression parameters.

Change after posting more information

Just for fun and to show how easy it is to encode the path length, I encoded something. I'm afraid I used C for compression, since I'm not a Java guy. To keep things simple, I fully worked with 16-bit data. Optimization will consist in using an 8-bit number in a pair (count, value). I have not tried to compile or test this code. See also my comment on your question about the possible benefits of handling LED addresses.

 #define NBR_16BIT_WORDS 400 typedef unsigned short uint16_t; // Return number of words written to dst (always // less than or equal to NBR_16BIT_WORDS) uint16_t compress( uint16_t *src, uint16_t *dst ) { uint16_t *end = (src+NBR_16BIT_WORDS); uint16_t *dst_begin = dst; while( src < end ) { uint16_t *temp; uint16_t count=1; for( temp=src+1; temp<end; temp++ ) { if( *src == *temp ) count++; else break; } if( count < 3 ) *dst++ = *src++; else { *dst++ = (*src)|0x8000; *dst++ = count; *src += count; } } return dst-dst_begin; } void decompress( uint16_t *src, uint16_t *dst ) { uint16_t *end_src = (src+NBR_16BIT_WORDS); uint16_t *end_dst = (dst+NBR_16BIT_WORDS); while( src<end_src && dst<end_dst ) { data = *src++; if( (data&0x8000) == 0 ) *dst++ = data; else { data &= 0x7fff; uint16_t count = *src++; while( dst<end_dst && count-- ) *dst++ = data; } } }

+2

Bill forster Nov 11 '10 at 21:43

source share

One of the first things would be converting from RGB to YUV, or YCrCb, or something in that order. By doing this, you can usually get away with a sub-sample of the U and V channels (or Cr / Cb) up to half the resolution. This is quite common in most types of images (for example, JPEG and MPEG, and they do this, as well as sensors in most digital cameras).

In reality, starting with just 800 bytes of data, most other forms of compression will be a waste of time and effort. You will need to do a lot of work before you do a lot (and keeping it fast enough on the Arduino will also not be trivial).

Edit: it’s good if you are absolutely sure that you cannot change the data at all, everything becomes more complicated very quickly. The real question at this point is what you are facing. Others have already mentioned the possibility of something of the order of predictive delta compression, for example, based on previous pixels, to predict what will happen next, and then encode only the difference between the prediction and the actual value. However, as a rule, this requires the execution of the result using some kind of entropy algorithm, such as Shannon-Fanno or Huffman compression. Those, unfortunately, are usually not the fastest to unpack.

If your data is things like charts or graphs where you can expect that you have large areas of the same pixels, the length encoding (or run-end) can work very well. The advantage of this is that it is really trivial to unpack.

I doubt that LZ-based compression will work so well. LZ-based compression works (in general) by creating a dictionary of byte strings that have been noticed, and when / if the same byte string is displayed again, passing the code assigned to the previous instance instead of retransmitting the entire string. The problem is that you cannot transmit uncompressed bytes - you start by sending a codeword that represents that byte in the dictionary. In your case, you can use (for example) a 10-bit codeword. This means that the first time you send a special character, you need to send it as 10 bits, not just 8. You only begin to receive compression when you can create several longer (double-byte, three-byte, etc.) strings in your dictionary and find the corresponding line later in the tab.

This means that LZ-based compression usually gets quite low compression for the first few hundred bytes or so, and then breaks even for a while, and only after it has been working on some input for a while does it really start well shrink, Working for only 800 bytes at a time, I'm not at all sure that you will ever see much compression - in fact, working in such small blocks, it would not be surprising to see that the data expands fairly regularly (especially if it very random).

+2

Jerry Coffin Nov 11 '10 at 21:47

source share

The data is more or less random, I would say, since it represents the rgb color value for every 16 bits.
What would be the best way to compress this data? Any idea how many compressions I could get?

Ideally, you can compress 800 bytes of color data into one byte if the entire image is the same color. However, as Oli Charlworth notes, the more random the data, the less you can compress it. If your images look static on a TV, then it’s really good luck getting any kind of compression out of it.

+1

Roooke Nov 11 '10 at 21:51

source share

Definitely consider Olie Charlworth's answer. On a 20x20 grid, I don't know if you need a full 32x color palette.

In addition, in the previous question, you said that you tried to start it for 20 ms (50 Hz). Do you really need that kind of speed for this display? At 115200 bps, you can transfer ~ 11520 bytes / s - call it 10 Kbps for security (for example, your micro may have a delay between bytes, you should do some experiments to see what the “real” band is transmission). At 50 Hz, this allows you only 200 bytes per packet — you are looking for a compression ratio of over 75%, which may not be possible under any circumstances. You seem to be quite married to your requirements, but there may be time for uncomfortable communication.

If you want to go along the compression route, you probably just have to try several different algorithms with "real" data, as others have said, and try different encodings. I bet you can find extra processing time by doing math, etc. Between receiving bytes from a serial link (you will have about 80 microseconds between bytes) - if you use interrupts to read serial data instead of polling, you can probably do well using a double buffer and processing / displaying the previous buffer when reading to the current buffer.

EDIT: Is it possible to increase the serial port speed for 115200? This Amazon USB-to-serial adapter says it reaches 1 Mbps (probably actually 921600 bps). Depending on your equipment and environment, you may need to worry about bad data, but if you increase the speed enough, you can probably add a checksum and possibly even limited error correction.

I am not familiar with Arduino, but I have an 8-bit FreeScale HCS08 at a speed of 1.25 Mbps, although the bus actually runs RS-485 and not RS-232 (485 uses differential signaling for better noise), and I have no problems with noise errors. You can even consider an RS-485 USB adapter if you can connect it to an Arduino (you'll need conversion equipment to change 485 signals to Arduino levels).

EDIT 2: You can also consider this USB-SPI / I2C adapter if you have an available I2C or SPI interface and you can handle wiring. It says that it can switch to 400 kHz I2C or 200 kHz SPI, which by itself is not enough, but you can split the data between SPI / I2C and the existing serial connection.

+1

Sam skuce Nov 12 '10 at 5:58

source share

LZ77 / 78 is relatively easy to write http://en.wikipedia.org/wiki/LZ77_and_LZ78

However, given the small amount of data that you transfer, it probably should not be compressed at all.

0

kkress Nov 11 '10 at 21:44

source share

Oliver Charlesworth · Accepted Answer · 2010-11-11T21:47:09+0000

If the data is more or less random, then you probably will not be able to compress it, I'm afraid.

UPDATE

Given the new information, I'm sure you do not need 32 thousand colors on your LED display. I would suggest that 1024 or 256 colors might be enough. Therefore, you can get away with a trivial compression scheme (just match each word through a lookup table or maybe just drop the lsbs of each component), which will work even for completely uncorrelated pixel values.

Byte array compression in Java and decompression in C

More articles: