Effectively create multiple instances of numpy.random.choice without replacement

I am new to Python. While reading, please indicate any other suggestions regarding ways to improve my Python code.

Question: How to create a 8xN sized array in Python containing random numbers? The limitation is that each column of this array must contain 8 draws without replacement from an integer set [1,8] . More specifically, when N = 10, I want something like this.

[[ 6. 2. 3. 4. 7. 5. 5. 7. 8. 4.] [ 1. 4. 5. 5. 4. 4. 8. 5. 7. 5.] [ 7. 3. 8. 8. 3. 8. 7. 3. 6. 7.] [ 3. 6. 7. 1. 5. 6. 2. 1. 5. 1.] [ 8. 1. 4. 3. 8. 2. 3. 4. 3. 3.] [ 5. 8. 1. 7. 1. 3. 6. 8. 1. 6.] [ 4. 5. 2. 6. 2. 1. 1. 6. 4. 2.] [ 2. 7. 6. 2. 6. 7. 4. 2. 2. 8.]] 

To do this, I use the following approach:

 import numpy.random import numpy def rand_M(N): M = numpy.zeros(shape = (8, N)) for i in range (0, N): M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 return M 

In practice, N will be ~ 1e7. The algorithm described above is O (n) in time and takes approximately 0.38 seconds at N = 1e3. Therefore, the time when N = 1e7 ~ 1 h (i.e., 3800 sec). There should be a much more efficient way.

Function Timing

 from timeit import Timer t = Timer(lambda: rand_M(1000)) print(t.timeit(5)) 0.3863314103162543 
+3
source share
4 answers

Create a random array of the specified shape, and then sort along the axis where you want to keep the limits, which gives us a vectorized and very efficient solution. This will be based on this smart answer on MATLAB randomly permuting columns differently . Here's the implementation -

Run Example -

 In [122]: N = 10 In [123]: np.argsort(np.random.rand(8,N),axis=0)+1 Out[123]: array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4], [8, 4, 3, 2, 2, 8, 5, 5, 6, 2], [1, 2, 4, 6, 5, 4, 4, 3, 4, 7], [5, 6, 2, 5, 8, 2, 7, 8, 5, 8], [2, 8, 6, 3, 4, 7, 1, 1, 2, 6], [6, 7, 7, 8, 6, 6, 3, 2, 7, 3], [4, 1, 1, 4, 3, 3, 8, 6, 8, 1], [3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64) 

Runtime Tests -

 In [124]: def sortbased_rand8(N): ...: return np.argsort(np.random.rand(8,N),axis=0)+1 ...: ...: def rand_M(N): ...: M = np.zeros(shape = (8, N)) ...: for i in range (0, N): ...: M[:, i] = np.random.choice(8, size = 8, replace = False) + 1 ...: return M ...: In [125]: N = 5000 In [126]: %timeit sortbased_rand8(N) 100 loops, best of 3: 1.95 ms per loop In [127]: %timeit rand_M(N) 1 loops, best of 3: 233 ms per loop 

Thus, an acceleration of 120x is expected!

+3
source

How about shuffling, that is, shuffling?

 import random import numpy from timeit import Timer def B_rand_M(N): a = numpy.arange(1,9) M = numpy.zeros(shape = (8, N)) for i in range (0, N): M[:, i] = numpy.random.permutation(a) return M # your original implementation def J_rand_M(N): M = numpy.zeros(shape = (8, N)) for i in range (0, N): M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 return M 

some timings:

 def compare(N): for f in (J_rand_M, B_rand_M): t = Timer(lambda: f(N)).timeit(6) print 'time for %s(%s): %.6f' % (f.__name__, N, t) for i in range(6): print 'N = 10^%s' % i compare(10**i) print 

gives

 N = 10^0 time for J_rand_M(1): 0.001199 time for B_rand_M(1): 0.000080 N = 10^1 time for J_rand_M(10): 0.001112 time for B_rand_M(10): 0.000335 N = 10^2 time for J_rand_M(100): 0.011118 time for B_rand_M(100): 0.003022 N = 10^3 time for J_rand_M(1000): 0.110887 time for B_rand_M(1000): 0.030528 N = 10^4 time for J_rand_M(10000): 1.100540 time for B_rand_M(10000): 0.304696 N = 10^5 time for J_rand_M(100000): 11.151576 time for B_rand_M(100000): 3.049474 
+1
source

Just comment on your analysis at run time - my intuition is that O (n) is the best possible run time you can get when generating random O (n) numbers.

Have you actually tried running your code with n = 10 million? Your assumption that the execution time will be scaled by 1000 when the input data increases by 1000 times may not be true in practice, since there is usually a constant term when running any program (loading libraries, etc.), which can be significant depending on the problem.

Speaking, it seems that the question related to Eric Wright has been very carefully worked out and can be easily adapted to your question.

0
source

Use the code below to generate an array.

 import numpy as np N=1e7 # THe value you want to have np.random.randint(1,high=8,size=(8,N)) 

Hope this helps, it certainly won't take long.

0
source

All Articles