Condition for one bit code for a character in Huffman code?

This is a question that I asked in school, but he continues to listen to me, so I decided to ask him here.

In Huffman compression, sequences of fixed length (characters) are encoded by sequences of variable length. The length of the code sequence depends on the frequencies (or probabilities) of the source symbols.

My questions: what is the minimum high frequency of characters with which this character will be encoded in one bit?

+7
math compression huffman-code
source share
2 answers

It turns out that the answer is 0.4, that is, if the highest frequency p is p> = 0.4, a 1-bit code for the corresponding symbol is guaranteed. In other words, this is a sufficient condition.

It is also true that p> = 1/3 is a necessary condition. That is, there can be examples where 0.4> p> = 1/3, and the shortest code is 1 bit, but there are no such cases if p <1/3.

A way to reason about how to see how the code tree is constructed, in particular, at the frequencies of the last 3 surviving subtrees. Proof appears in Johnsen's β€œOn the Redundancy of Huffman Binary Codes”, 1980 (unfortunately, this is a paid link).

+5
source share

In general, about 50% of the input character stream should consist of a given character for Huffman to encode it as one bit. The reason for this is because of how the Huffman encoding works (encoding one character cannot be a prefix of another), by encoding a character with one bit, you need the first bit for each other character to be the opposite value (i.e. if one character is encoded as 0 , all others must begin with 1 plus at least one more bit). Since you eliminate half of the possible encoding space for any given bit length, you need to get a way to encode at least half of the input characters to break even.

Please note that there is a special case where the character space consists of only 3 characters. In this case, whichever symbol has the highest frequency, it will be encoded with 1 bit (since the other two will be variants of the second bit, depending on which value of the first bit is not selected) - if 2 or more is equally likely, or you can encode . Thus, in the case of 3 symbols, it is possible that a theorem with a probability of 34% can theoretically be encoded as one bit (say, 0 ), and the other two can have 33% or lower probabilities and be encoded as 10 and 11 .

So, if you are considering all the possibilities, then technically something 1/3 or higher can be encoded as one bit (in the case of 3 characters).

+7
source share

All Articles