Why is random () * random () different from random () ** 2?

Is there any difference between random() * random() and random() ** 2 ? random() returns a value from 0 to 1 from the uniform distribution.

When testing both versions of random square numbers, I noticed a slight difference. I created 100,000 random square numbers and calculated the number of numbers in each interval 0.01 (from 0.00 to 0.01, 0.01 to 0.02, ...). It seems that these versions of the generation of random numbers squared are different.

By squaring a random number, instead of multiplying two random numbers, you reuse the random number, but I think the distribution should remain the same. Is there any difference? If not, why does my test show the difference?


I generate two random binary distributions for random() * random() and one for random() ** 2 as follows:

 from random import random lst = [0 for i in range(100)] lst2, lst3 = list(lst), list(lst) #create two random distributions for random() * random() for i in range(100000): lst[int(100 * random() * random())] += 1 for i in range(100000): lst2[int(100 * random() * random())] += 1 for i in range(100000): lst3[int(100 * random() ** 2)] += 1 

which gives

 >>> lst [ 5626, 4139, 3705, 3348, 3085, 2933, 2725, 2539, 2449, 2413, 2259, 2179, 2116, 2062, 1961, 1827, 1754, 1743, 1719, 1753, 1522, 1543, 1513, 1361, 1372, 1290, 1336, 1274, 1219, 1178, 1139, 1147, 1109, 1163, 1060, 1022, 1007, 952, 984, 957, 906, 900, 843, 883, 802, 801, 710, 752, 705, 729, 654, 668, 628, 633, 615, 600, 566, 551, 532, 541, 511, 493, 465, 503, 450, 394, 405, 405, 404, 332, 369, 369, 332, 316, 272, 284, 315, 257, 224, 230, 221, 175, 209, 188, 162, 156, 159, 114, 131, 124, 96, 94, 80, 73, 54, 45, 43, 23, 18, 3 ] >>> lst2 [ 5548, 4218, 3604, 3237, 3082, 2921, 2872, 2570, 2479, 2392, 2296, 2205, 2113, 1990, 1901, 1814, 1801, 1714, 1660, 1591, 1631, 1523, 1491, 1505, 1385, 1329, 1275, 1308, 1324, 1207, 1209, 1208, 1117, 1136, 1015, 1080, 1001, 993, 958, 948, 903, 843, 843, 849, 801, 799, 748, 729, 705, 660, 701, 689, 676, 656, 632, 581, 564, 537, 517, 525, 483, 478, 473, 494, 457, 422, 412, 390, 384, 352, 350, 323, 322, 308, 304, 275, 272, 256, 246, 265, 227, 204, 171, 191, 191, 136, 145, 136, 108, 117, 93, 83, 74, 77, 55, 38, 32, 25, 21, 1 ] >>> lst3 [ 10047, 4198, 3214, 2696, 2369, 2117, 2010, 1869, 1752, 1653, 1552, 1416, 1405, 1377, 1328, 1293, 1252, 1245, 1121, 1146, 1047, 1051, 1123, 1100, 951, 948, 967, 933, 939, 925, 940, 893, 929, 874, 824, 843, 868, 800, 844, 822, 746, 733, 808, 734, 740, 682, 713, 681, 675, 686, 689, 730, 707, 677, 645, 661, 645, 651, 649, 672, 679, 593, 585, 622, 611, 636, 543, 571, 594, 593, 629, 624, 593, 567, 584, 585, 610, 549, 553, 574, 547, 583, 582, 553, 536, 512, 498, 562, 536, 523, 553, 485, 503, 502, 518, 554, 485, 482, 470, 516 ] 

The expected random error is the difference in the first two:

 [ 78, 79, 101, 111, 3, 12, 147, 31, 30, 21, 37, 26, 3, 72, 60, 13, 47, 29, 59, 162, 109, 20, 22, 144, 13, 39, 61, 34, 105, 29, 70, 61, 8, 27, 45, 58, 6, 41, 26, 9, 3, 57, 0, 34, 1, 2, 38, 23, 0, 69, 47, 21, 48, 23, 17, 19, 2, 14, 15, 16, 28, 15, 8, 9, 7, 28, 7, 15, 20, 20, 19, 46, 10, 8, 32, 9, 43, 1, 22, 35, 6, 29, 38, 3, 29, 20, 14, 22, 23, 7, 3, 11, 6, 4, 1, 7, 11, 2, 3, 2 ] 

But the difference between the first and third is much larger, hinting that the distributions are different:

 [ 4421, 59, 491, 652, 716, 816, 715, 670, 697, 760, 707, 763, 711, 685, 633, 534, 502, 498, 598, 607, 475, 492, 390, 261, 421, 342, 369, 341, 280, 253, 199, 254, 180, 289, 236, 179, 139, 152, 140, 135, 160, 167, 35, 149, 62, 119, 3, 71, 30, 43, 35, 62, 79, 44, 30, 61, 79, 100, 117, 131, 168, 100, 120, 119, 161, 242, 138, 166, 190, 261, 260, 255, 261, 251, 312, 301, 295, 292, 329, 344, 326, 408, 373, 365, 374, 356, 339, 448, 405, 399, 457, 391, 423, 429, 464, 509, 442, 459, 452, 513 ] 
+7
python random random-sample
source share
2 answers

Here are some graphs:

All the features for random() * random() :

A 2D heatmap with most intensity in the top-right.

The X axis is one random variable that grows to the right, and the y axis of the other increases up.

You can see that if either low, the result will be low, and both must be high in order to get a high result.

When the only solver is the single axis, as in the case of random() ** 2 , you get

A 2D heatmap that increases quadratically from bottom to top, and is invariant in the x-axis

In this, it is much more likely to get a very dark (large) value, since the entire top is dark, not just the corner.

When you do both linearized, with random() * random() on top:

A linearization of the first graphA linearization of the second graph

You see that distributions are really different.

the code:

 import numpy import matplotlib from matplotlib import pyplot import matplotlib.cm def make_fig(name, data): figure = matplotlib.pyplot.figure() print(data.shape) figure.set_size_inches(data.shape[1]//100, data.shape[0]//100) axes = matplotlib.pyplot.Axes(figure, [0, 0, 1, 1]) axes.set_axis_off() figure.add_axes(axes) axes.imshow(data, origin="lower", cmap=matplotlib.cm.Greys, aspect="auto") figure.savefig(name, dpi=200) xs, ys = numpy.mgrid[:1000, :1000] two_random = xs * ys make_fig("two_random.png", two_random) two_random_flat = two_random.flatten() two_random_flat.sort() two_random_flat = two_random_flat[::1000] make_fig("two_random_1D.png", numpy.tile(two_random_flat, (100, 1))) one_random = xs * xs make_fig("one_random.png", one_random) one_random_flat = one_random.flatten() one_random_flat.sort() one_random_flat = one_random_flat[::1000] make_fig("one_random_1D.png", numpy.tile(one_random_flat, (100, 1))) 

You can also come close to this mathematically. The probability of getting a value less than x , with 0 ≤ x ≤ 1 is

For random()² :

 √x 

since the probability that a random value will be less than x is the probability that random()² < x .

For random() · random() :

For the first random variable r and the second for r , we can find the probability that Rr < x with fixed r :

 P(Rr < x) = P(r < x/R) = 1 if x > R (and so x/R > 1) or = x/R otherwise 

So we want

 ∫ P(Rr < x) dR from R=0 to R=1 =  1 dR from R=0 to R=x + ∫ x/R dR from R=x to R=1 = x(1 - ln R) 

As we see, √x ≠ x(1 - ln R) .

These distributions are displayed as:

Probability that the function is less than a given value

The y axis gives the likelihood that the line ( random()² or random() · random() ) is smaller than the x axis.

We see that for random() · random() probability of large numbers is much less.

Density functions

I think the most indicative is the differentiation ( ½x ^ -½ and - ln x ) and the construction of probability density functions:

Probabilities of each number's occurring

This shows the probability of each x in relative terms. Thus, the probability that x is large ( > 0.5 ) is about two times for the random()² variant.

+17
source share

Simplify the problem a little. Consider throwing two dice and multiplying the result by throwing one die and squaring it. In the first case, you have a chance from 1 to 36 to throw a double 1, therefore, 1 out of 36 chances that the product is 1. On the other hand, the second case, obviously, has a chance of 1 out of 6, that the square is 1. The same used for double 6, so extremes are much more likely to be squared.

The same thing happens when using random floats: you are much less likely to get two random values ​​in extreme cases than you should get a single value, so very small or very large values ​​will occur much more often when squaring when multiplying two independent values.

+13
source share

All Articles