In general, about 50% of the input character stream should consist of a given character for Huffman to encode it as one bit. The reason for this is because of how the Huffman encoding works (encoding one character cannot be a prefix of another), by encoding a character with one bit, you need the first bit for each other character to be the opposite value (i.e. if one character is encoded as 0 , all others must begin with 1 plus at least one more bit). Since you eliminate half of the possible encoding space for any given bit length, you need to get a way to encode at least half of the input characters to break even.
Please note that there is a special case where the character space consists of only 3 characters. In this case, whichever symbol has the highest frequency, it will be encoded with 1 bit (since the other two will be variants of the second bit, depending on which value of the first bit is not selected) - if 2 or more is equally likely, or you can encode . Thus, in the case of 3 symbols, it is possible that a theorem with a probability of 34% can theoretically be encoded as one bit (say, 0 ), and the other two can have 33% or lower probabilities and be encoded as 10 and 11 .
So, if you are considering all the possibilities, then technically something 1/3 or higher can be encoded as one bit (in the case of 3 characters).
Amber
source share