I think you do not understand what the first argument, y , in entropy() represents. As mentioned in ?entropy , it gives a vector of samples. These calculations give the relative frequencies of each of the symbols that make up the messages on this "discrete source of information."
To find out how this happens, look at a simpler example: a binary source of information with two characters (1/0, on / off, A / B that you have). In this case, all the following will give entropy for the source in which the relative frequencies of the two characters are the same (i.e., half of the characters are A and half are B s):
entropy(c(0.5, 0.5)) # [1] 0.6931472 entropy(c(1,1)) # [1] 0.6931472 entropy(c(1000,1000)) # [1] 0.6931472 entropy(c(0.0004, 0.0004)) # [1] 0.6931472 entropy(rep(1,2)) # [1] 0.6931472
Since they all refer to the same basic distribution, in which the probability is most prevalent among the available symbols, each of them gives the maximum possible entropy for a source of information from two states ( log(2) = 0.6931472) ).
Instead, entropy(runif(2)) you provide relative probabilities for two characters that are randomly selected from a uniform distribution. If these two randomly selected numbers are not equal, you tell entropy() that you have a source of information with two characters that are used with different frequencies. As a result, you always get the calculated entropy below log(2) . Here is a brief example illustrating what I mean:
set.seed(4) (x <- runif(2))
So, passing the argument y= long string of 1 s, as in entropy(rep(1, 1024)) , you describe a source of information that is a discrete analogue of uniform distribution . In the long term or in a very long message, each of its 1,024 letters is expected to happen with equal frequency, and you cannot get more even than that!