How to compress a sequence of integers?

I have an array that contains data ranging from -255 to + 255.eg An array can be like this:

int data[]={234,56,-4,24,56,78,23,89,234,68,-12,-253,45,128}; 

Here the order should be saved when unpacking, for example. after the 1st semester 234, 56 should come.

So, what are the ways to compress any arbitrary sequence of numbers for which a repeating pattern cannot be observed?

+4
source share
4 answers

The range from -255 to 255 means 511 values ​​→ 9 bits. If you take the character separately, 1 bit for the character and byte for the value.

You can write an array as a byte array, each byte value will be the absolute value of the corresponding int.

In a separate zone (long or possibly byte array), store the character bit.

+5
source

If there really are no templates in the data, a useful compression algorithm is not possible. Don't even bother trying!

Of course, in this case, because all numbers are in a limited range of n, then you have a pattern in bits, namely, that your most significant bits are either all 0 (positive) or all 1 (negative).

This way, standard compression algorithms like zip will work if you want to compress efficiently (assuming you have a long enough array of numbers to make it useful).

Alternatively, you can use the fact that you are effectively using 9-bit numbers. Thus, you can collapse your own compression algorithm by laying out the numbers as a long stream of 9-bit fragments and placing them in a byte array.

+5
source

In your situation (when pattern repetition is not observed), variable-length coding can help you.

For example, Elias gamma coding and Exponential-Golomb coding . The general idea is that for small numbers only a few bits are required for encoding. Exp-Golomb encoding is used in the H.264 / MPEG-4 AVC video compression standard. It is very easy to encode and decode sequences with it; it is also not very difficult to implement this encoding.

The way to encode all integers is to set a bijection by mapping the integers (0, 1, -1, 2, -2, 3, -3, ...) to (1, 2, 3, 4, 5, 6, 7, ...) before encoding.

For instance:

The sequence (after the bijection) [ 0, 2, 5, 8, 5, 2 ] will be encoded as 101100110000100100110011 - As you can see, there are no repeating patterns in this sequence, but it is encoded with only 24 bits.

A brief description of the decoding process:

  • Reading from the input stream and counting the leading zero bits (until you find a non-zero bit value) → zero_bits_count

  • Reading from the input stream the next (zero_bits_count + 1) bit -> binary

  • Convert binary to decimal

  • Return (decimal - 1)

1... -> no leading zeros, zero_bits_count = 0 -> read next 1 bit -> [1]... -> [1] is 1 -> 1 - 1 = 0

011... -> [0] - one leading zero, zero_bits_count = 1 -> read next 2 bits -> [11]... -> [11] is 3 -> 3 - 1 = 2

00110... -> [00] - two leading zeros, zero_bits_count = 2 -> read next 3 bits -> [110]... -> [110] is 6 -> 6 - 1 = 5

and etc.

+4
source

If the numbers are essentially random and evenly distributed, and the order has to be kept, then the best you can do is about 9 bits per character. In 9 bits, one value of 9 bits will not be used, i.e. -256 in a view with two additions. This is convenient since you can use this as an end character to mark the end of a list. Then you also encoded the length of the list, which would be necessary somehow in any case.

+1
source

All Articles