Homogeneity of random numbers taken modulo N

Question

Homogeneity of random numbers taken modulo N

One common way to select a random number in [0, n) is to take the result of rand() modulo n: rand() % n . However, even if the results returned by the available rand() implementation are completely homogeneous, there should be no problem with the uniformity of the obtained [0, n) numbers when RAND_MAX + 1 does not divide evenly by n? For example. Assume that RAND_MAX is 2 and n is 2. Then out of the 3 possible outputs of rand() : 0, 1 and 2 we get 0, 1 and 0, respectively, when we use them modulo n. Therefore, the output will not be uniform.

Is this a real problem in practice? What is the best way to select random numbers in [0, n) uniformly obtained from rand() output, preferably without any floating point arithmetic?

+11

c random uniform

dragonroot Oct 27

source share

4 answers

It depends on the:

Value RAND_MAX
Your value is N

Suppose your RAND_MAX is 2 ^ 32. If N is pretty small (say 2), then the offset is 1/2/31 - or too little to notice.

But if N is slightly larger, say 2 ^ 20, then the offset is 1/2 ^ 12, or about 1 in 4096. A lot more, but still quite a bit.

+4

slashingweapon Oct 27

source share

One approach you can do is the following:

Knowing the value of N , you do R_MAX = ((RAND_MAX + 1) / N) * N; for homogeneity.

So you can execute your own rand() function:

 int custom_rand(int mod) { int x = rand(); const int R_MAX = ((RAND_MAX + 1) / mod) * mod; while (x > R_MAX) { // discard the result if it is bigger x = rand(); } return (x % mod); }

+1

Murilo Vasconcelos Oct. 27 '12 at 22:11

source share

There are two problems with using the remainder (% is not a “modular” operator in C) for a uniform random number in the given range. Firstly, there is a slight shift to smaller numbers (mentioned above), and secondly, that typical PRNGs are usually less random in lower order bits. I seem to remember that from Knuth ("The Art of Programming", Vol II, Seminumerical Algorithms) along with the statement that (after translating from MIX to C) rand ()% 2 is a weak source of random single bits. Better to choose (rand ()> RAND_MAX / 2) (or check the high order bit if RAND_MAX is almost 2.)

The rest should be good enough randomly at small intervals. Avoid it for modeling. Actually, avoid rand () at all for large simulations or Monte Carlo calculations. Implementations have a period of the order of 2 ^ 32 or less. It is easy to exceed 4 billion tests on a processor with a frequency of 2+ GHz.

+1

Mike Housky Oct 27

source share

Kevin · Accepted Answer · 2012-10-27 22:03

You are right, rand() % N not exactly evenly distributed. Exactly how important this is depends on the range of numbers you want and on the degree of randomness you want, but if you want enough randomness that you don't care, you don't want to use rand() anyway. Get a real random number generator.

However, in order to get a real random distribution, change to the next power of 2 and the sample until you get one of the desired range (for example, for 0-9, use while(n = rand()%0x10 > 10); ).

Homogeneity of random numbers taken modulo N

More articles: