Why is the entropy of a uniform distribution below the repeated values ​​in R?

According to Wikipedia, uniform distribution is the "maximum probability distribution of entropy." Thus, if I have two sequences (one uniformly distributed and one with repeating values), both of length k, then I expect that the entropy of the uniformly distributed sequence will be higher than the sequence of repeating values. However, this is not what happens when you run the following code in R:

require(entropy) entropy(runif(1024), method="ML", unit="log2") entropy(rep(1,1024), method="ML", unit="log2") 

The first output produces about 9.7 bits of entropy, and the second produces exactly 10 bits of entropy (log base 2 1024 = 10). Why doesn't even distribution have more than 10 bits of entropy?

+5
source share
1 answer

I think you do not understand what the first argument, y , in entropy() represents. As mentioned in ?entropy , it gives a vector of samples. These calculations give the relative frequencies of each of the symbols that make up the messages on this "discrete source of information."

To find out how this happens, look at a simpler example: a binary source of information with two characters (1/0, on / off, A / B that you have). In this case, all the following will give entropy for the source in which the relative frequencies of the two characters are the same (i.e., half of the characters are A and half are B s):

 entropy(c(0.5, 0.5)) # [1] 0.6931472 entropy(c(1,1)) # [1] 0.6931472 entropy(c(1000,1000)) # [1] 0.6931472 entropy(c(0.0004, 0.0004)) # [1] 0.6931472 entropy(rep(1,2)) # [1] 0.6931472 

Since they all refer to the same basic distribution, in which the probability is most prevalent among the available symbols, each of them gives the maximum possible entropy for a source of information from two states ( log(2) = 0.6931472) ).

Instead, entropy(runif(2)) you provide relative probabilities for two characters that are randomly selected from a uniform distribution. If these two randomly selected numbers are not equal, you tell entropy() that you have a source of information with two characters that are used with different frequencies. As a result, you always get the calculated entropy below log(2) . Here is a brief example illustrating what I mean:

 set.seed(4) (x <- runif(2)) # [1] 0.585800305 0.008945796 freqs.empirical(x) ## Helper function called by `entropy()` via `entropy.empirical()` # [1] 0.98495863 0.01504137 ## Low entropy, as you should expect entropy(x) # [1] 0.07805556 ## Essentially the same thing; you can interpret this as the expected entropy ## of a source from which a message with 984 '0 and 15 '1 has been observed entropy(c(984, 15)) 

So, passing the argument y= long string of 1 s, as in entropy(rep(1, 1024)) , you describe a source of information that is a discrete analogue of uniform distribution . In the long term or in a very long message, each of its 1,024 letters is expected to happen with equal frequency, and you cannot get more even than that!

+3
source

All Articles