Variable Length Encoding of an Integer

What is the best way to encode a variable length unsigned integer in C #?


"The actual intention is to add integer size (bytes) of variable length to the file header."

For example: "Content-Length" - Http header

Can this be achieved with some changes in the logic below.


I wrote code that does this ....

+4
source share
5 answers

The method I used, which makes smaller values, uses fewer bytes, is to encode 7 bits of data + 1 bit of overhead pr. byte.

The encoding only works for positive values, starting from zero, but can be changed if necessary to process negative values.

How coding works:

  • Take the least significant 7 bits of your value and store them in bytes, this is what you are going to output
  • Move the value of 7 bits to the right, getting rid of those 7 bits that you just captured.
  • If the value is non-zero (i.e., after you have separated 7 bits from it), set the high bit of the byte that you are going to output before you output it
  • Output byte
  • If the value is non-zero (i.e. the same that led to setting the big bit), go back and repeat the steps from the beginning

To decode:

  • Start at position bit 0
  • Read one byte from file
  • Save if the high bit is set and mask it.
  • OR in the rest of the byte to your final value, in the bit position you are on
  • If the high bit has been set, increase the bit position by 7 and repeat the steps, skipping the first (do not reset the bit position)
  39 32 31 24 23 16 15 8 7 0
 value: | DDDDDDDD | CCCCCCCC | BBBBBBBBB | AAAAAAAA |
 encoded: | 0000DDDD | xDDDDCCC | xCCCCCBB | xBBBBBBA | xAAAAAAA |  (note, stored in reverse order)

As you can see, the encoded value can occupy one additional byte, which is used only halfway, due to the overhead of the control bits. If you expand this value to a 64-bit value, the extra byte will be fully utilized, so that there will still be only one byte of extra overhead.

Note Since the encoding stores values ​​one byte at a time, always in the same order, systems with higher or lower orders will not change the layout of this. The least significant byte is always stored first, etc.

Ranges and their coded size:

  0 - 127: 1 byte
         128 - 16.383: 2 bytes
      16.384 - 2.097.151: 3 bytes
   2.097.152 - 268.435.455: 4 bytes
 268.435.456 - max-int32: 5 bytes

Here are the C # implementations for both:

void Main() { using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Create)) using (BinaryWriter writer = new BinaryWriter(stream)) writer.EncodeInt32(123456789); using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Open)) using (BinaryReader reader = new BinaryReader(stream)) reader.DecodeInt32().Dump(); } // Define other methods and classes here public static class Extensions { /// <summary> /// Encodes the specified <see cref="Int32"/> value with a variable number of /// bytes, and writes the encoded bytes to the specified writer. /// </summary> /// <param name="writer"> /// The <see cref="BinaryWriter"/> to write the encoded value to. /// </param> /// <param name="value"> /// The <see cref="Int32"/> value to encode and write to the <paramref name="writer"/>. /// </param> /// <exception cref="ArgumentNullException"> /// <para><paramref name="writer"/> is <c>null</c>.</para> /// </exception> /// <exception cref="ArgumentOutOfRangeException"> /// <para><paramref name="value"/> is less than 0.</para> /// </exception> /// <remarks> /// See <see cref="DecodeInt32"/> for how to decode the value back from /// a <see cref="BinaryReader"/>. /// </remarks> public static void EncodeInt32(this BinaryWriter writer, int value) { if (writer == null) throw new ArgumentNullException("writer"); if (value < 0) throw new ArgumentOutOfRangeException("value", value, "value must be 0 or greater"); bool first = true; while (first || value > 0) { first = false; byte lower7bits = (byte)(value & 0x7f); value >>= 7; if (value > 0) lower7bits |= 128; writer.Write(lower7bits); } } /// <summary> /// Decodes a <see cref="Int32"/> value from a variable number of /// bytes, originally encoded with <see cref="EncodeInt32"/> from the specified reader. /// </summary> /// <param name="reader"> /// The <see cref="BinaryReader"/> to read the encoded value from. /// </param> /// <returns> /// The decoded <see cref="Int32"/> value. /// </returns> /// <exception cref="ArgumentNullException"> /// <para><paramref name="reader"/> is <c>null</c>.</para> /// </exception> public static int DecodeInt32(this BinaryReader reader) { if (reader == null) throw new ArgumentNullException("reader"); bool more = true; int value = 0; int shift = 0; while (more) { byte lower7bits = reader.ReadByte(); more = (lower7bits & 128) != 0; value |= (lower7bits & 0x7f) << shift; shift += 7; } return value; } } 
+13
source

If small values ​​are more common than large, you can use Golomb coding .

+1
source

You must first make a histogram of your value. If the distribution is random (i.e., each bit of your histogram count is close to another), then you cannot encode more efficiently than the binary representation for that number.

If your histogram is unbalanced (that is, if some values ​​are more present than others), then it makes sense to choose an encoding that uses fewer bits for these values, while using more bits for others is unimportant values.

For example, if the number required for encoding is 2 times more than 15 bits, the more you can use the 16th bit to say so and only store or send 16 bits (if it is zero, then the future byte forms 16-bit numbers that can fit into a 32-bit number). If it is 1, then the upcoming 25 bits form 32-bit numbers. You lose one bit here, but since this is unlikely, after all, for a large number, you win more bits.

Obviously, this is a trivial case, and its extension in more than 2 cases is the Huffman algorithm, which affects the "code word" close to optimal, based on the probability of occurrence of numbers.

There is also an arithmetic coding algorithm that does this too (and possibly something else).

In all cases, there is no solution that can store a random value more efficiently than what is currently being done in computer memory.

You need to think about how long and how difficult it will be to implement such a solution compared to the savings that you will receive at the end to find out if it is worth it. The language itself is irrelevant here.

+1
source

I know this question was asked a few years ago, but for MIDI developers I decided to share some code from a personal midi project that I am working on. The code block is based on a fragment from Paul Messica's book “Maximum MIDI” (this example is a modified version for my own needs, however, the concept is all there ...).

  public struct VariableLength { // Variable Length byte array to int public VariableLength(byte[] bytes) { int index = 0; int value = 0; byte b; do { value = (value << 7) | ((b = bytes[index]) & 0x7F); index++; } while ((b & 0x80) != 0); Length = index; Value = value; Bytes = new byte[Length]; Array.Copy(bytes, 0, Bytes, 0, Length); } // Variable Length int to byte array public VariableLength(int value) { Value = value; byte[] bytes = new byte[4]; int index = 0; int buffer = value & 0x7F; while ((value >>= 7) > 0) { buffer <<= 8; buffer |= 0x80; buffer += (value & 0x7F); } while (true) { bytes[index] = (byte)buffer; index++; if ((buffer & 0x80) > 0) buffer >>= 8; else break; } Length = index; Bytes = new byte[index]; Array.Copy(bytes, 0, Bytes, 0, Length); } // Number of bytes used to store the variable length value public int Length { get; private set; } // Variable Length Value public int Value { get; private set; } // Bytes representing the integer value public byte[] Bytes { get; private set; } } 

How to use:

 public void Example() { //Convert an integer into a variable length byte int varLenVal = 480; VariableLength v = new VariableLength(varLenVal); byte[] bytes = v.Bytes; //Convert a variable length byte array into an integer byte[] varLenByte = new byte[2]{131, 96}; VariableLength v = new VariableLength(varLenByte); int result = v.Length; } 
0
source

All Articles