I have an integer that needs to be divided into cells according to the probability distribution. For example, if I had N=100 objects included in [0.02, 0.08, 0.16, 0.29, 0.45] , you could get [1, 10, 20, 25, 44] .
import numpy as np # sample distribution d = np.array([x ** 2 for x in range(1,6)], dtype=float) d = d / d.sum() dcs = d.cumsum() bins = np.zeros(d.shape) N = 100 for roll in np.random.rand(N): # grab the first index that the roll satisfies i = np.where(roll < dcs)[0][0] bins[i] += 1
In fact, N and my number of boxes are very large, so the loop is not really a viable option. Is there a way I can vectorize this operation to speed it up?
source share