Is this password generator biased?

Is there a flaw in this command to generate passwords?

head -c 8 /dev/random | uuencode -m - | sed -n '2s/=*$//;2p'

After creating a few passwords, I began to suspect that he tends to approve certain characters. Of course, people see templates well where they are not, so I decided to test the team on a larger sample. Below are the results.

From a sample of 12,000 generated (12-digit) passwords, the most and least common letters are shown here and how many times they appear.

  TOP 10 BOTTOM 10 Freq | Char Freq | Char -----|----- -----|----- 2751 | I 1833 | p 2748 | Q 1831 | V 2714 | w 1825 | 1 2690 | Y 1821 | r 2673 | k 1817 | 7 2642 | o 1815 | R 2628 | g 1815 | 2 2609 | 4 1809 | u 2605 | 8 1791 | P 2592 | c 1787 | + 

So, for example, ā€œIā€ appears more than 1.5 times more often than ā€œ+ā€.

Is this statistically significant? If so, how can the team be improved?

+7
source share
1 answer

Yes, I think it will be biased. uuencode requires 3 bytes for each 4 output characters. since you give it 8 bytes, the last byte is a filling of some (non-random) type and which will offset the 12th character (and slightly affect the 11th).

can try

 head -c 9 /dev/random | uuencode -m - 

(instead of 9 instead of 8) and publish the results? which should not have the same problem.

ps also you will no longer need to discard the complement "=", since this is a multiple of 3.

http://en.wikipedia.org/wiki/Uuencoding

pps is, of course, statistically significant. you expect a natural change in sqrt (mean), which (guesses) sqrt (2000) or about 40. Thus, three deviations from this, +/- 120 or 1880-2120 should contain 99% of the letters - you see something much more systematic.

ppps is a neat idea.

ooops I just realized -m for uuencode force base64, not for the uudecode algorithm, but the same idea applies.

+7
source

All Articles