Binary Length Encoding

I have a web form for the content of which I would like to create a short presentation in Base64. The form, by the way, contains a list of 264 binary values, most of which will be 0 at any given time. (They represent regions on a geographical map). Even in Base64, this 264-bit number generates a long, intimidating string. I want to implement full-length coding as efficiently as possible. can you help me with this? I searched for binary RLE but did not find anything useful.

What I have tried so far is running RLE in a binary string using decimal counters and "A" as a separator, indicating the change between 0 and 1, and then converting the result from base 11 to base 64. For example:

00000000001111111000000010000000000000000000000001111111110001111010101000000000000000000000000000000000000111111111110111000000000000111111100000001000000000000000000000000111111111000111101010100000000000000000000000000000000000011111111111011100 

becomes

 10A5A5AA22A7A1A2AAAAAAA34A9AA1A10A5A5AA22A7A1A2AAAAAAA34A9AA1A 

which in turn becomes

 CNnbr/FxkgbbOw0LNAKgk65P8SdvaTG+t74o 

or, in base 62,

 6imo7zq1pqr2mqglTHzXwJRAksm7fvHZHWQK 

This is better, but I still can not doubt that I am doing something wrong - the number "A" is used, since the separator is the best way to do this?

And another update:

Thanks to @comingstorm, I cut the compressed line a bit more.

 ILHHASCAASBYwwccDASYgAEgWDI= 

As I mentioned in the comments, actual use cases usually result in an even shorter line.

+4
source share
3 answers

Since you encode bits, you probably want to use bit-based RLE instead of byte-based. In this context, you should consider Elias gamma coding (or some option) to efficiently encode the run length.

A reasonable first approximation for your encoding format might be:

  • first bit = the same as the first bit of an uncompressed line (to set the original polarity)
  • remaining bits: Elias coded lengths of consecutive bit-runs (alternating 1 and 0)

Since you know how many bits are in your uncompressed string, you do not need a termination code; you can simply add any necessary binary indentation as arbitrary bits.

Note that it is always possible to β€œcompress” the length of a string to expand your bit string; if this bothers you, you can add another start bit to indicate whether your data is in compressed or uncompressed format, limiting the overhead of compression to 1 bit.

+6
source

264 bits, which are only 33 bytes, and which are in base64 only 44 bytes. I think that this (very small) amount of information is hardly compressed. The rare representation of nulvinge means that it simply stores non-zero elements and their values ​​(as you have only 0/1), that is, in your case, just an index of non-zero bits. But since you have 264 possible bits, you need 9 bits for the index, which means that if you have more than 29 non-zero entries, you need more than the original.

Perhaps your question is formulated incorrectly, but I don’t see how 264 bits can lead to an intimidating base64 string (how you generate it - maybe you translate not 264 bits, but 264 ASCII characters (with values 0 and 1 ) - this explains your long string of result?).

+1
source

An alternative that I think is more than what you want is a sparse matrix: http://en.wikipedia.org/wiki/Sparse_matrix

0
source

All Articles