What is the optimal algorithm for creating an unbiased random integer within a range?

In this StackOverflow question:

Creating a random integer from a range

the accepted answer offers the following formula for generating a random integer between given min and max , with min and max included in the range:

 output = min + (rand() % (int)(max - min + 1)) 

But it also says that

It is still slightly biased towards lower numbers ... It is also possible to extend it so that it removes the bias.

But this does not explain why it is biased towards lower numbers or how to eliminate bias. So, the question arises: is this the most optimal approach to generating a random integer within the (signed) range, without relying on any fantasy, just the rand() function, and if it is optimal, how to remove the offset

EDIT:

I just tested the while -loop algorithm suggested by @Joey against floating point extrapolation:

 static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0); return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax); 

to see how evenly "balls" "fall" and are distributed between several "buckets", one test for extrapolating from a floating point and another for the while -loop algorithm. But the results were different depending on the number of “balls” (and “buckets”), so I could not easily choose a winner. Working code can be found on this Ideone page . For example, with 10 buckets and 100 balls, the maximum deviation from the ideal probability among the buckets is less for floating point extrapolation than for the while -loop algorithm (0.04 and 0.05, respectively), but with 1000 balls the maximum deviation while -loop algorithm is less (0.024 and 0.011), and with 10000 balls, floating point extrapolation improves again (0.0034 and 0.0053), etc. without significant consistency. Thinking about the possibility that none of the algorithms will consistently create a homogeneous distribution better than that of the other algorithm makes me tend to extrapolate with a floating point, because it works faster than the while -loop algorithm. So is it good to choose a floating point extrapolation algorithm or are my tests / conclusions not entirely correct?

+11
c ++ c random uniform
Aug 01 '12 at 12:05
source share
4 answers

The problem arises when the number of outputs of the random number generator (RAND_MAX + 1) is not evenly divided by the desired range (max-min + 1). Since there will be a sequential mapping from a random number to an output, some outputs will be displayed in more random numbers than others. This is regardless of how the comparison is performed - you can use the module, division, conversion to a floating point, no matter what kind of voodoo you can come up with, the main problem remains.

The magnitude of the problem is very small, and undemanding applications can usually go away, ignoring it. The smaller the range and the larger RAND_MAX, the less pronounced the effect will be.

I took your sample program and slightly modified it. First, I created a special version of rand that has only a range of 0-255 to better demonstrate the effect. I made some settings for rangeRandomAlg2 . Finally, I changed the number of “balls” to 1,000,000 to improve consistency. You can see the results here: http://ideone.com/4P4HY

Note that the floating point version creates two tightly grouped probabilities, around 0.101 or 0.097, there is nothing between them. This is bias in action.

I think that calling this "Java algorithm" is a bit misleading - I'm sure it is much older than Java.

 int rangeRandomAlg2 (int min, int max) { int n = max - min + 1; int remainder = RAND_MAX % n; int x; do { x = rand(); } while (x >= RAND_MAX - remainder); return min + x % n; } 
+9
Aug 01 2018-12-12T00:
source share

The problem is that you are doing the modulo operation. This is not a problem if RAND_MAX will evenly share your module, but usually it is not. As a very far-fetched example, suppose RAND_MAX is 11 and your module is 3. You will get the following possible random numbers and the following leftovers:

 0 1 2 3 4 5 6 7 8 9 10 0 1 2 0 1 2 0 1 2 0 1 

As you can see, 0 and 1 are slightly more likely than 2.

One of the solutions to this problem is to select a reject: by prohibiting the numbers 9 and 10 above, you can cause the resulting distribution to be uniform. The hard part is figuring out how to do it effectively. A very good example (the one that took me two days to understand why it works) can be found in Java java.util.Random.nextInt(int) .

The reason the Java algorithm is a bit complicated is because they avoid slow operations such as multiplication and division for validation. If you care, you can also do it naively:

 int n = (int)(max - min + 1); int remainder = RAND_MAX % n; int x, output; do { x = rand(); output = x % n; } while (x >= RAND_MAX - remainder); return min + output; 

EDIT: Fixed fencepost error in the above code, now it works as it should. I also created a small trial program (C #; taking a single PRNG for numbers from 0 to 15 and building a PRNG for numbers from 0 to 6 from it in various ways):

 using System; class Rand { static Random r = new Random(); static int Rand16() { return r.Next(16); } static int Rand7Naive() { return Rand16() % 7; } static int Rand7Float() { return (int)(Rand16() / 16.0 * 7); } // corrected static int Rand7RejectionNaive() { int n = 7, remainder = 16 % n, x, output; do { x = Rand16(); output = x % n; } while (x >= 16 - remainder); return output; } // adapted to fit the constraints of this example static int Rand7RejectionJava() { int n = 7, x, output; do { x = Rand16(); output = x % n; } while (x - output + 6 > 15); return output; } static void Test(Func<int> rand, string name) { var buckets = new int[7]; for (int i = 0; i < 10000000; i++) buckets[rand()]++; Console.WriteLine(name); for (int i = 0; i < 7; i++) Console.WriteLine("{0}\t{1}", i, buckets[i]); } static void Main() { Test(Rand7Naive, "Rand7Naive"); Test(Rand7Float, "Rand7Float"); Test(Rand7RejectionNaive, "Rand7RejectionNaive"); } } 

The result is as follows (paste into Excel and add conditional coloring of the cells, so that the differences are more obvious):

enter image description here

Now that I have corrected my mistake in the rejected sample, it works as it should (before it moves to 0). As you can see, the float method is not perfect at all, it just distributes the offset numbers differently.

+14
Aug 01 2018-12-12T00:
source share

It is easy to see why this algorithm creates a biased sample. Suppose your rand() function returns homogeneous integers from the set {0, 1, 2, 3, 4} . If I want to use this to generate a random bit 0 or 1 , I would say rand() % 2 . A lot of {0, 2, 4} gives me 0 , and a set of {1, 3} gives me 1 - it's so clear that I tried 0 with 60% and 1 with 40% likelihood, uneven at all!

To fix this, you need to either make sure that your desired range divides the range of the random number generator, or else discards the result when the random number generator returns a number greater than the maximum possible multiple of the target range.

In the above example, the target range is 2, the largest factor that fits into the random generation range is 4, so we discard any sample that is not in the set {0, 1, 2, 3} , and again roll.

+6
Aug 01 '12 at 12:12
source share

The simplest solution is std::uniform_int_distribution<int>(min, max) .

+3
Aug 03 '12 at 15:05
source share



All Articles