Randint does not always follow uniform distribution

Question

Randint does not always follow uniform distribution

I played with a random library in Python to simulate a project that I am working on, and I ended up in a very strange position.

Say we have the following code in Python:

from random import randint import seaborn as sns a = [] for i in range(1000000): a.append(randint(1,150)) sns.distplot(a)

The plot follows a “discrete uniform” distribution, as it should be.

However, when I change the range from 1 to 110, the graph has several peaks.

 from random import randint import seaborn as sns a = [] for i in range(1000000): a.append(randint(1,110)) sns.distplot(a)

My impression is that the peaks are at 0,10,20,30, ... but I can not explain it.

Edit: The question was not like the proposed one as a duplicate, since the problem in my case was in the library of the ship and in how I visualized the data.

Edit 2: Following the recommendations for the answers, I tried to verify this by modifying the seabed library. Instead, using matplotlib, both graphs were the same

 from random import randint import matplotlib.pyplot as plt a = [] for i in range(1000000): a.append(randint(1,110)) plt.hist(a)

+58

python random

Tasos Dec 12 '16 at 11:53 on

source share

2 answers

To add an excellent answer to @RoryDaulton, I ran randint(1:110) , creating a frequency and converting it to the R-vector of such counters:

 hits = {i:0 for i in range(1,111)} for i in range(1000000): hits[randint(1,110)] += 1 hits = [hits[i] for i in range(1,111)] s = 'c('+','.join(str(x) for x in hits)+')' print(s) c(9123,9067,9124,8898,9193,9077,9155,9042,9112,9015,8949,9139,9064,9152,8848,9167,9077,9122,9025,9159,9109,9015,9265,9026,9115,9169,9110,9364,9042,9238,9079,9032,9134,9186,9085,9196,9217,9195,9027,9003,9190,9159,9006,9069,9222,9205,8952,9106,9041,9019,8999,9085,9054,9119,9114,9085,9123,8951,9023,9292,8900,9064,9046,9054,9034,9088,9002,8780,9098,9157,9130,9084,9097,8990,9194,9019,9046,9087,9100,9017,9203,9182,9165,9113,9041,9138,9162,9024,9133,9159,9197,9168,9105,9146,8991,9045,9155,8986,9091,9000,9077,9117,9134,9143,9067,9168,9047,9166,9017,8944)

Then I inserted this on the R-console, restored the observations and used R hist() according to the result, getting this histogram (with an imposed density curve):

As you can see, this confirms that the problem you are observing is not traceable to randint , but is an artifact of sns.displot() .

+20

John Coleman Dec 12 '16 at 12:21

source share

Rory Daulton · Accepted Answer · 2016-12-12 12:00

The problem seems to be in your grapher, seaborn , not randint() .

Your seaborn distribution seaborn has 50 boxes, according to my account. It seems that the ship actually binders your returned randint() values in these cells, and there is no way to get evenly distributed 110 values in 50 bins. Therefore, you get those peaks where three values are placed in the basket, and not the usual two values for other bins. The values of your peaks confirm this: they are 50% higher than other bars, as expected for the three binned values, and not for 2.

Another way to check this is to force seaborn use 55 bins for these 110 values (or maybe 10 bins or another divider out of 110). If you are still getting peaks, you should worry about randint() .

Randint does not always follow uniform distribution

More articles: