Is the libc random number generator wrong?

Consider an algorithm for checking the probability that a certain number is selected from a set of N unique numbers after a certain number of attempts (for example, with N = 2, what is the probability in roulette (without 0) that it takes X trying to win black?).

The correct distribution for this is pow (1-1 / N, X-1) * (1 / N).

However, when I test this using the following code, there is always a deep ditch at X = 31, regardless of N, and regardless of seed.

Is this an internal flaw that cannot be prevented due to the specific implementation of the PRNG used, is this a real mistake, or am I not seeing anything obvious?

// C #include <sys/times.h> #include <math.h> #include <stdio.h> int array[101]; void main(){ int nsamples=10000000; double breakVal,diffVal; int i,cnt; // seed, but doesn't change anything struct tms time; srandom(times(&time)); // sample for(i=0;i<nsamples;i++){ cnt=1; do{ if((random()%36)==0) // break if 0 is chosen break; cnt++; }while(cnt<100); array[cnt]++; } // show distribution for(i=1;i<100;i++){ breakVal=array[i]/(double)nsamples; // normalize diffVal=breakVal-pow(1-1/36.,i-1)*1/36.; // difference to expected value printf("%d %.12g %.12g\n",i,breakVal,diffVal); } } 

Tested on updated Xubuntu 12.10 with libc6 2.15-0ubuntu20 and Intel Core i5-2500 SandyBridge, but I discovered this a few years ago on an older Ubuntu machine.

I also tested this on Windows 7 using Unity3D / Mono (not sure which version of Mono is, though), and here the ditch happens at X = 55 when using System.Random, while Unity built in Unity.Random has no visible ditch (at least not for X <100).

Distribution: enter image description here

Differences: enter image description here

+20
c math algorithm random glibc
Feb 04 '13 at 0:25
source share
3 answers

This is because the glibc random() function is not random enough. According to this page , for the random numbers returned by random() , we have:

o i = (o i-3 + o i-31 ) % 2^31

or

o i = (o i-3 + o i-31 + 1) % 2^31 .

Now take x i = o i % 36 and suppose the first equation above is used (this happens with a 50% chance for each number). Now, if x i-31 =0 and x i-3 !=0 , then the probability that x i =0 less than 1/36. This is due to the fact that 50% of the time o i-31 + o i-3 will be less than 2 ^ 31, and when this happens,

x i = o i % 36 = (o i-3 + o i-31 ) % 36 = o i-3 % 36 = x i-3 ,

which is nonzero. This leads to the fact that in the ditch you see 31 samples after 0 counts.

+10
Feb 04 '13 at 1:10
source share

What is measured in this experiment is the interval between successful tests of the Bernoulli experiment, where success is defined as random() mod k == 0 for some k (36 in OP). Unfortunately, this is overshadowed by the fact that the implementation of random() means that Bernoulli's tests are not statistically independent.

We will write rnd i to output i th "random ()" and note that:

rnd i = rnd i-31 + rnd i-3 with a probability of 0.75

rnd i = rnd i-31 + rnd i-3 + 1 with a probability of 0.25

(see picture below for illustration).

Suppose rnd i-31 mod k == 0 , and we are currently looking at rnd i . Then it should be so that rnd i-3 mod k β‰  0 , because otherwise we would calculate the cycle as the length k-3 .

But (most of the time) (mod k): rnd i = rnd i-31 + rnd i-3 = rnd i-3 β‰  0 .

Thus, the current study is not statistically independent from previous trials, and the 31st trial after success will be much less successful than in the unbiased series of Bernoulli trials.

The usual advice for using linear congruent generators that does not actually apply to the random() algorithm is to use higher order bits instead of the lower bits, because the higher order bit is β€œmore” random (that is, less correlated with sequential values). But this will not work in this case either, because the above identities are equally valid for the high log k bits function as for the mod k == low log k bits function.

In fact, we could expect a linear congruent generator to work better, especially if we use high-order output bits, because although LCG is not particularly good at Monte Carlo simulations, it does not suffer from random() linear feedback.




random , for the default case:

Let state be a vector of unsigned lengths. Initialize state 0 ...state 30 using the seed, some fixed values, and a mixing algorithm. For simplicity, we can consider the state vector to be infinite, although only the last 31 values ​​are used, therefore it is actually implemented as a ring buffer.

To generate rnd i : (Note: βŠ• mod 2 32 .)

state i = state i-31 βŠ• state i-3

rnd i = (state i - (state i mod 2)) / 2

Now notice that:

(i + j) mod 2 = i mod 2 + j mod 2 if i mod 2 == 0 or j mod 2 == 0

(i + j) mod 2 = i mod 2 + j mod 2 - 2 if i mod 2 == 1 and j mod 2 == 1

If i and j evenly distributed, the first case will occur in 75% of cases, and the second case - 25%.

So, by substituting into the generation formula:

rnd i = (state i-31 βŠ• state i-3 - ((state i-31 + state i-3 ) mod 2)) / 2

= ((state i-31 - (state i-31 mod 2)) βŠ• (state i-3 - (state i-3 mod 2))) / 2 or

= ((state i-31 - (state i-31 mod 2)) βŠ• (state i-3 - (state i-3 mod 2)) + 2) / 2

Two cases can be further reduced to:

rnd i = rnd i-31 βŠ• rnd i-3

rnd i = rnd i-31 & oplus; rnd i-3 + 1

As above, the first case arises in 75% of cases, considering that rnd i-31 and rnd i-3 are independently inferred from the uniform distribution (which they are not, but this is a reasonable first approximation).

+7
Feb 04 '13 at 2:31
source share

As others have noted, random() is not random enough.

Using higher bits instead of lower ones will not help in this case. According to the manual ( man 3 rand ), old rand() implementations had a problem in the lower bits. Therefore, random() recommended. Although the current implementation of rand() uses the same generator as random() .

I tried using the old rand() correctly:

 if ((int)(rand()/(RAND_MAX+1.0)*36)==0) 

... and got the same deep ditch at X = 31

Interestingly, if I mix rand() numbers with another sequence, I get rid of the ditch:

 unsigned x=0; //... x = (179*x + 79) % 997; if(((rand()+x)%36)==0) 

I am using the old Linear Congruential Generator . I selected 79, 179 and 997 randomly from the prime table. This should generate a repeating sequence of length 997.

However, this trick probably introduced some randomness, some trace ... The resulting mixed sequence will undoubtedly not give other statistical tests. x never takes the same value in successive iterations. Indeed, exactly 997 iterations are required to repeat each value.

'' [..] random numbers should not be generated using a randomly selected method. Some theory should be used "(D.E. Knut," The Art of Computer Programming ", Volume 2)

For modeling, if you want to be sure, use the Mersenne Twister

+1
Feb 04 '13 at 11:52
source share



All Articles