Randint does not always follow uniform distribution

I played with a random library in Python to simulate a project that I am working on, and I ended up in a very strange position.

Say we have the following code in Python:

from random import randint import seaborn as sns a = [] for i in range(1000000): a.append(randint(1,150)) sns.distplot(a) 

The plot follows a “discrete uniform” distribution, as it should be.

Range betwee 1 and 150

However, when I change the range from 1 to 110, the graph has several peaks.

 from random import randint import seaborn as sns a = [] for i in range(1000000): a.append(randint(1,110)) sns.distplot(a) 

Range 1 to 110

My impression is that the peaks are at 0,10,20,30, ... but I can not explain it.

Edit: The question was not like the proposed one as a duplicate, since the problem in my case was in the library of the ship and in how I visualized the data.

Edit 2: Following the recommendations for the answers, I tried to verify this by modifying the seabed library. Instead, using matplotlib, both graphs were the same

 from random import randint import matplotlib.pyplot as plt a = [] for i in range(1000000): a.append(randint(1,110)) plt.hist(a) 

From matplotlib

+58
python random
Dec 12 '16 at 11:53 on
source share
2 answers

The problem seems to be in your grapher, seaborn , not randint() .

Your seaborn distribution seaborn has 50 boxes, according to my account. It seems that the ship actually binders your returned randint() values ​​in these cells, and there is no way to get evenly distributed 110 values ​​in 50 bins. Therefore, you get those peaks where three values ​​are placed in the basket, and not the usual two values ​​for other bins. The values ​​of your peaks confirm this: they are 50% higher than other bars, as expected for the three binned values, and not for 2.

Another way to check this is to force seaborn use 55 bins for these 110 values ​​(or maybe 10 bins or another divider out of 110). If you are still getting peaks, you should worry about randint() .

+116
Dec 12 '16 at 12:00
source share

To add an excellent answer to @RoryDaulton, I ran randint(1:110) , creating a frequency and converting it to the R-vector of such counters:

 hits = {i:0 for i in range(1,111)} for i in range(1000000): hits[randint(1,110)] += 1 hits = [hits[i] for i in range(1,111)] s = 'c('+','.join(str(x) for x in hits)+')' print(s) c(9123,9067,9124,8898,9193,9077,9155,9042,9112,9015,8949,9139,9064,9152,8848,9167,9077,9122,9025,9159,9109,9015,9265,9026,9115,9169,9110,9364,9042,9238,9079,9032,9134,9186,9085,9196,9217,9195,9027,9003,9190,9159,9006,9069,9222,9205,8952,9106,9041,9019,8999,9085,9054,9119,9114,9085,9123,8951,9023,9292,8900,9064,9046,9054,9034,9088,9002,8780,9098,9157,9130,9084,9097,8990,9194,9019,9046,9087,9100,9017,9203,9182,9165,9113,9041,9138,9162,9024,9133,9159,9197,9168,9105,9146,8991,9045,9155,8986,9091,9000,9077,9117,9134,9143,9067,9168,9047,9166,9017,8944) 

Then I inserted this on the R-console, restored the observations and used R hist() according to the result, getting this histogram (with an imposed density curve):

enter image description here

As you can see, this confirms that the problem you are observing is not traceable to randint , but is an artifact of sns.displot() .

+20
Dec 12 '16 at 12:21
source share



All Articles