The method I used, which makes smaller values, uses fewer bytes, is to encode 7 bits of data + 1 bit of overhead pr. byte.
The encoding only works for positive values, starting from zero, but can be changed if necessary to process negative values.
How coding works:
- Take the least significant 7 bits of your value and store them in bytes, this is what you are going to output
- Move the value of 7 bits to the right, getting rid of those 7 bits that you just captured.
- If the value is non-zero (i.e., after you have separated 7 bits from it), set the high bit of the byte that you are going to output before you output it
- Output byte
- If the value is non-zero (i.e. the same that led to setting the big bit), go back and repeat the steps from the beginning
To decode:
- Start at position bit 0
- Read one byte from file
- Save if the high bit is set and mask it.
- OR in the rest of the byte to your final value, in the bit position you are on
- If the high bit has been set, increase the bit position by 7 and repeat the steps, skipping the first (do not reset the bit position)
39 32 31 24 23 16 15 8 7 0
value: | DDDDDDDD | CCCCCCCC | BBBBBBBBB | AAAAAAAA |
encoded: | 0000DDDD | xDDDDCCC | xCCCCCBB | xBBBBBBA | xAAAAAAA | (note, stored in reverse order)
As you can see, the encoded value can occupy one additional byte, which is used only halfway, due to the overhead of the control bits. If you expand this value to a 64-bit value, the extra byte will be fully utilized, so that there will still be only one byte of extra overhead.
Note Since the encoding stores values one byte at a time, always in the same order, systems with higher or lower orders will not change the layout of this. The least significant byte is always stored first, etc.
Ranges and their coded size:
0 - 127: 1 byte
128 - 16.383: 2 bytes
16.384 - 2.097.151: 3 bytes
2.097.152 - 268.435.455: 4 bytes
268.435.456 - max-int32: 5 bytes
Here are the C # implementations for both:
void Main() { using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Create)) using (BinaryWriter writer = new BinaryWriter(stream)) writer.EncodeInt32(123456789); using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Open)) using (BinaryReader reader = new BinaryReader(stream)) reader.DecodeInt32().Dump(); } // Define other methods and classes here public static class Extensions { /// <summary> /// Encodes the specified <see cref="Int32"/> value with a variable number of /// bytes, and writes the encoded bytes to the specified writer. /// </summary> /// <param name="writer"> /// The <see cref="BinaryWriter"/> to write the encoded value to. /// </param> /// <param name="value"> /// The <see cref="Int32"/> value to encode and write to the <paramref name="writer"/>. /// </param> /// <exception cref="ArgumentNullException"> /// <para><paramref name="writer"/> is <c>null</c>.</para> /// </exception> /// <exception cref="ArgumentOutOfRangeException"> /// <para><paramref name="value"/> is less than 0.</para> /// </exception> /// <remarks> /// See <see cref="DecodeInt32"/> for how to decode the value back from /// a <see cref="BinaryReader"/>. /// </remarks> public static void EncodeInt32(this BinaryWriter writer, int value) { if (writer == null) throw new ArgumentNullException("writer"); if (value < 0) throw new ArgumentOutOfRangeException("value", value, "value must be 0 or greater"); bool first = true; while (first || value > 0) { first = false; byte lower7bits = (byte)(value & 0x7f); value >>= 7; if (value > 0) lower7bits |= 128; writer.Write(lower7bits); } } /// <summary> /// Decodes a <see cref="Int32"/> value from a variable number of /// bytes, originally encoded with <see cref="EncodeInt32"/> from the specified reader. /// </summary> /// <param name="reader"> /// The <see cref="BinaryReader"/> to read the encoded value from. /// </param> /// <returns> /// The decoded <see cref="Int32"/> value. /// </returns> /// <exception cref="ArgumentNullException"> /// <para><paramref name="reader"/> is <c>null</c>.</para> /// </exception> public static int DecodeInt32(this BinaryReader reader) { if (reader == null) throw new ArgumentNullException("reader"); bool more = true; int value = 0; int shift = 0; while (more) { byte lower7bits = reader.ReadByte(); more = (lower7bits & 128) != 0; value |= (lower7bits & 0x7f) << shift; shift += 7; } return value; } }