The correct way to generate a random float using a binary random number generator?

Let's say we have a binary random number generator, int r(); which will return zero or one with the possibility of prolongation of 0.5.

I looked at Boost.Random and they generate, say, 32 bits and do something like this (pseudocode):

 x = double(rand_int32()); return min + x / (2^32) * (max - min); 

I have serious doubts about this. The double has 53 bits of the mantissa, and 32 bits can never properly generate a completely random mantissa, among other things, such as rounding errors, etc.

What would be a quick way to create evenly distributed float or double in the half-open range [min, max) , assuming IEEE754? The focus here is on the correct distribution , not speed.

To correctly determine the correct, the correct distribution would be equal to what we would get if we take an infinitely accurate uniformly distributed random number generator, and for each number we would round up to the nearest IEEE754 representation, if this representation is still within [min, max) , otherwise the number will not be taken into account for distribution.

PS: I would also be interested in the right solutions for open bands.

+8
c ++ random uniform
source share
4 answers

Here is the right approach without any attempt at efficiency.

Let's start with the bignum class, and then with a rational wrapper for these bonuses.

We produce a range “rather large” than our range [min, max) , so rounding our smaller_min and bigger_max results in floating point values ​​outside this range, in our rational construction on bignum.

Now we divide the range into two parts perfectly in the middle (which we can do, since we have a rational bignome system). We choose one of two parts at random.

If, after rounding, the top and bottom of the selected range are (A) outside of [min, max) (on the same side, notice!), You reject and restart from the very beginning.

If (B) the upper and lower parts of your range are rounded to the same double (or float if you return a float), you will end and you will return that value.

Otherwise (C) you recurse in this new, smaller range (subdivide, randomly select, check).

There is no guarantee that this procedure stops because you can either constantly go to the "edge" between the two rounds of double s, or constantly select values ​​outside the range [min, max) . The probability of this event (never stops), however, is zero (assuming a good random number generator and [min, max) nonzero size).

This also works for (min, max) or even to select a number in a rounded, rather thick Cantor set. As long as the measure of the allowable range of reals, rounded to the correct floating point values, is not equal to zero, and the range has compact support, this procedure can be performed and has a 100% probability of completion, but not a rigid upper bound to the time that may be required.

+3
source share

The problem here is that in IEEE754 the twins that can be represented are not equally distributed. That is, if we have a generator that generates real numbers, say in (0,1), and then comparing it with the IEEE754 numbers presented, the result will not be equally distributed.

Therefore, we must define "equidistribution." However, assuming that each IEEE754 number is simply a representative of the probability of lying in the interval determined by the rounding of IEEE754, the procedure for the first generation of equally distributed "numbers" and rounding to IEEE754 will generate (by definition) equi-distribution "IEEE754 numbers.

Therefore, I believe that the above formula will become arbitrary, close to such a distribution, if we simply choose a sufficiently high accuracy. If we restrict the problem to finding the number in [0,1], this means restricting the set of denominated IEEE 754 numbers, which are one-to-one and 53-bit integers. Thus, it should be fast and correct to only generate the mantissa with a 53-bit random number generator.

Arithmetic IEEE 754 is always "arithmetic with infinite precision followed by rounding", i.e. the IEEE754 number representing ab is the closest to ab (in other words, you can think of * b calculated with infinite precision, then rounded to the IEEE754 number being closed). Therefore, I believe that min + (max-min) * x, where x is a denominated number, is a valid approach.

(Note. As you can see from my comment, at first I did not know that you are pointing to the case where min and max are different from 0.1. Denormalized numbers have the property that they are evenly distributed. Therefore, you get equi by matching 53 bits with mantissa.Then you can use floating point arithmetic, because it is correct before the machine precision action.If you use inverse matching, you will restore the equal distribution.

See this question for another aspect of this problem: Scaling a uniform uniform random range in Double one

+1
source share

std::uniform_real_distribution .

There's a really nice talk from STL from this year's Going Native Conference that explains why you should use standard distributions whenever possible. In short, manual code tends to be ridiculously low quality (think std::rand() % 100 ) or have more subtle homogeneity flaws like in (std::rand() * 1.0 / RAND_MAX) * 99 , which is The example given in the conversation is a special case of the code sent to the question.

EDIT: I took a look at the implementation of libstdc ++ s std::uniform_real_distribution , and here is what I found:

The implementation creates a number in the range [dist_min, dist_max) using a simple linear transformation from some number created in the range [0, 1) . It generates this source number using std::generate_canonical , an implementation I can find here (at the end of the file). std::generate_canonical determines the number of times (denoted by k ), the distribution range, expressed as an integer and denoted here as r *, will fit in the mantissa of the target type. What he then does is to generate one number in [0, r) for each r -separate mantissa segment and, using arithmetic, fill each segment accordingly. The formula for the obtained value can be expressed as

 Σ(i=0, k-1, X/(r^i)) 

where X is the stochastic variable in [0, r) . Each division by range is equivalent to a shift by the number of bits used to represent it (i.e. log2(r) ), and thus fills the corresponding mantissa segment. Thus, all the accuracy of the target type is used, and since the range of results is [0, 1) , the indicator remains 0 ** (modulo bias), and you do not get the uniformity problems that you have when you start tinkering with the indicator.

I would not believe that this method is cryptographically secure (and I have suspicions of possible errors "one after another" when calculating the size r ), but I think that it is much more reliable in terms of uniformity than the Boost implementation you published, and definitely better than looking for std::rand .

It may be worth noting that the Boost code is actually a degenerate case of this algorithm, where k = 1 , which means that it is equivalent if the input range requires at least 23 bits to represent its size (single-point IEE 754) or at least 52 bits (double precision). This means a minimum range of ~ 8.4 million or ~ 4.5e15, respectively. In light of this information, I don’t think that if you use a binary generator, the Boost implementation is quite going to cut it off.

After a brief overview of the libC ++ s implementation , it looks like they are using the same algorithm, implemented a little differently.

(*) r is actually the input range plus one. This allows you to use the max urng value as valid input.

(**) Strictly speaking, the coded metric is not 0 because IEEE 754 encodes the implicit beginning of 1 before the base of the character. Conceptually, however, this is not relevant to this algorithm.

+1
source share

AFAIK, the correct (and probably the fastest) way is to first create a 64-bit unsigned integer, where 52 bit bits are random bits, and the exponent is 1023, which if the type is written in (IEEE 754), it will be double to be evenly distributed random value in the range [1.0, 2.0]. Thus, the last step is to subtract 1.0 from this, which leads to a uniformly distributed random double value in the range [0.0, 1.0].

In pseudo code:

rndDouble = bit CastUInt64ToDouble (1023 <52 | rndUInt64 and 0xfffffffffffff) - 1.0

This method is mentioned here: http://xoroshiro.di.unimi.it (See "Creating uniform duplications in a unit interval")

EDIT: The recommended method has since been changed to: (x → 11) * (1./(UINT64_C (1) <53))

See the link above for more details.

+1
source share

All Articles