Bit efficient, uniform, cryptographically secure random number generation

I remember reading about the method of efficiently using random bits in an article on a math-oriented website, but I can’t find the right keywords on Google to find it anymore, and that’s not in my browser history.

The essence of the problem that was asked was to take a sequence of random numbers in the domain [ domainStart , domainEnd ) and effectively use the bits of the sequence of random numbers for uniform design in the range [ rangeStart , rangeEnd ). Both the domain and the range are integers (more precisely, long , not Z). What is the algorithm for this?

Implementation, I have a function with this signature:

 long doRead(InputStream in, long rangeStart, long rangeEnd); 

in based on CSPRNG (served by hardware RNG due to SecureRandom), which I should use; the return value should be between rangeStart and rangeEnd , but the obvious implementation of this is wasteful:

 long doRead(InputStream in, long rangeStart, long rangeEnd) { long retVal = 0; long range = rangeEnd - rangeStart; // Fill until we get to range for (int i = 0; (1 << (8 * i)) < range; i++) { int in = 0; do { in = in.read(); // but be sure we don't exceed range } while(retVal + (in << (8 * i)) >= range); retVal += in << (8 * i); } return retVal + rangeStart; } 

I believe this is actually the same idea as (rand() * (max - min)) + min , only we discard the bits that push us to max . Instead of using the modulo operator, which may incorrectly bias results to lower values, we discard these bits and try again. Since pressing CSPRNG can cause re-sowing (which can block InputStream), I would like to avoid random bits. Henry points out that this code is biased against 0 and 257; Banthar demonstrates this in an example.

First edit: Henry reminded me that summation brings up a central limit theorem. I fixed the code above to get around this problem.

Second edit: The mechanical snail suggested I look at the source of Random.nextInt (). After reading it for a while, I realized that this problem is similar to the basic conversion problem. See answer below.

+4
source share
2 answers

Your algorithm produces biased results. Suppose rangeStart=0 and rangeEnd=257 . If the first byte is greater than 0 , this will be the result. If it is 0 , the result will be either 0 or 256 with a probability of 50/50 . Thus, 0 and 256 twice less likely to be selected than any other number.

I did a simple test to confirm this:

 p(0)=0.001945 p(1)=0.003827 p(2)=0.003818 ... p(254)=0.003941 p(255)=0.003817 p(256)=0.001955 

I think you need to do the same as java.util.Random.nextInt and discard the whole number, not just the last byte.

+2
source

After reading the source Random.nextInt (), I realized that this problem is similar to the basic conversion problem.

Instead of converting one character at a time, it would be more efficient to convert the blocks of the input character at a time through a cumulative “buffer” that is large enough to represent at least one character in the domain and in the range. The new code looks like this:

 public int[] fromStream(InputStream input, int length, int rangeLow, int rangeHigh) throws IOException { int[] outputBuffer = new int[length]; // buffer is initially 0, so there is only 1 possible state it can be in int numStates = 1; long buffer = 0; int alphaLength = rangeLow - rangeHigh; // Fill outputBuffer from 0 to length for (int i = 0; i < length; i++) { // Until buffer has sufficient data filled in from input to emit one symbol in the output alphabet, fill buffer. fill: while(numStates < alphaLength) { // Shift buffer by 8 (*256) to mix in new data (of 8 bits) buffer = buffer << 8 | input.read(); // Multiply by 256, as that the number of states that we have possibly introduced numStates = numStates << 8; } // spits out least significant symbol in alphaLength outputBuffer[i] = (int) (rangeLow + (buffer % alphaLength)); // We have consumed the least significant portion of the input. buffer = buffer / alphaLength; // Track the number of states we've introduced into buffer numStates = numStates / alphaLength; } return outputBuffer; } 

There is a fundamental difference between the conversion of numbers between the basics and this problem; to convert between bases, I think you need to have enough information about the quantity to perform the calculation - sequential divisions by the result of the target base in the remainders, which are used to build numbers in the target alphabet. In this problem, I don’t need to know all this information until I shift the data, which means that I can do what I did in the loop with the inscription "fill".

0
source

All Articles