Improving "randomness" in extending the range of rand ()

Question

Improving "randomness" in extending the range of rand ()

I play with two algorithms that I found on other SO posts (listed in the links below), and try to figure out how to improve distribution. I am effectively expanding the random number range by doubling the number of bits and want the distribution to be as uniform as possible, removing (or at least reducing) the effects of modular bias and other artifacts for the shuffle algorithm that will use the result of my modified random number generator.

So, I understand that if I initialize my RNG with a constant seed (i.e.: srand(1) ), I will get the same pattern of deterministic exits from the rand() call in a for loop. Now, if I initialized my seed via srand(time(NULL)) , it would be a different template, but it still may not help me with the following problem: I am trying to figure out if I should implement the following algorithm:

Take two random numbers a, b
Calculate a * (RAND_MAX + 1) + b

Can I:

Create all possible coordinate pairs (a, b), where a,b ∈ Z+ on [0, RAND_MAX] ( a and b are positive integers from zero to RAND_MAX inclusive).
Increase the uniformity of the entire distribution (i.e., an optimally flat histogram).

While the output of rand() should be evenly distributed, I don't know if it guaranteed to give me values for N, N + 1 calls of rand in each loop and give me a list of each pair at point (1) before random sequence will be repeated again. My new random number generator can theoretically generate random values on [0, RAND_MAX ^ 2] , but I don’t know if there can be “holes” like values in this range that can never be generated by my algorithm.

I tried to continue this question myself, but I could not find information on how long a random rand() sequence was generated in C until it repeated. Without this and other information, I could not understand whether it is possible to generate each pair (a, b).

So, using rand() , is it possible to reach point (1), and if so, are there any solid suggestions on how to optimize your “randomness” according to point (2)?

Thanks for your time and help.

Update

Later I reviewed this problem and modeled it using an 8-bit PRNG. Although he could indeed generate all possible coordinate pairs, the distribution was really quite interesting and, of course, uneven. In the end, I read several articles / articles on PRNG and used the Mersenne Twiser algorithm to generate the extra bits needed (i.e. MT19937-64).

References

Extend range rand () max, Available 2014-05-07, < /questions/813325/extend-rand-max-range >
Shuffle array in C, Accessed 2014-05-07, < /questions/168552/shuffle-array-in-c >

+7

c algorithm random statistics

Devnull May 07 '14 at 16:57

source share

1 answer

DPenner1 · Accepted Answer · 2014-05-07T18:55:24+0000

Assumptions

As pointed out in the comments, the behavior of rand() is implementation dependent. So, let's make some simplifying assumptions to get to the point of the question:

rand() can generate all values from 0 to RAND_MAX . Justfication . If this failed, then it would be even more difficult to create all possible pairs (a, b).
rand() generates a statistically random sequence. Rationale : the result of compiling two random functions (or the same thing twice) is only good as a basic random function.

Of course, we should not expect the result to be better than the building blocks, so any flaws in the implementation of rand() will reflect itself in any functions that consist of it.

Distribution holes

The sowing rand() generates a determinate sequence for a given seed, since the seed determines the initial state of the PRNG. The maximum period of the sequence is 2 ^N where N is the number of bits in the state. Note that in fact the state may have more bits than RAND_MAX , we will assume RAND_MAX = (2 ^N - 1) for this section. Since this is a sequence, generating two consecutive "random" N-bit values a and b means that a & ne; b. Therefore, the method a * ( RAND_MAX + 1) + b will have some holes.

A little explanation for & ne; b: PRNGs work by maintaining an internal state of N bits. He uses this state uniquely to determine his next state, therefore, as soon as the same state is repeated, the sequence begins to repeat. The number of states that passed before the start of a sequence is called a period. So technically we could have a = b, but that means period 1, and that is a very bad PRNG . For more information, a helpful answer has been posted on the Software Engineering website for PRNG periods .

Holeless Algorithm

One way to allow consecutive “random” calls to be equal is to generate 2 N-bit numbers, but only consider a certain number of significant bits, that is, discard some. Now we can have a = b, but with a very small probability than another random number c. Note that this is similar to how the Java random number generator works. It is seeded with 48 bits, but outputs a 32-bit random number, discarding 16 bits (assuming in bits in state = number of bits in state).

However, since you need values greater than RAND_MAX , what you could do is use the method described above and then combine the bits until you get enough bits to reach the desired maximum (although, again, the distribution is not quite shape) .

Improving "randomness" in extending the range of rand ()

Update

Assumptions

Distribution holes

Holeless Algorithm

More articles: