An elegant way to create a 2d array with all possible columns

In numpy, I would like to do a 2d arithmetic (r, at 2 ** r), where the columns are all possible binary columns.

For example, if the height of the columns is 5, the columns will be

[0,0,0,0,0], [0,0,0,0,1], [0,0,0,1,0], [0,0,0,1,1], [0,0,1,0,0], ... 

My decision

  np.array(list(itertools.product([0,1],repeat = c))).T 

It seems very ugly. Is there a more elegant way?

+7
python numpy
source share
5 answers

Here you can use broadcasting , for example:

 (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int) 

For r between 0 and 8 you can also use np.unpackbits -

 np.unpackbits(np.arange(2**r,dtype='uint8')[:,None], axis=1)[:,8-r:] 

Runtime Tests -

Case No. 1 (Original r = 5 ):

 In [217]: r = 5 In [218]: from itertools import product In [219]: %timeit np.array(list(product([0,1], repeat=5))) 10000 loops, best of 3: 33.9 µs per loop In [220]: %timeit np.unpackbits(np.arange(2**r,dtype='uint8')[:,None], axis=1)[:,8-r:] 100000 loops, best of 3: 10.6 µs per loop In [221]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int) 10000 loops, best of 3: 31.1 µs per loop 

Case No. 2 (Larger r ):

 In [242]: r = 15 In [243]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int) 100 loops, best of 3: 6.6 ms per loop In [244]: %timeit np.array(list(product([0,1], repeat=r))) 10 loops, best of 3: 77.5 ms per loop 
+4
source share

It is not much, but rather elegant:

 r = 5 boolcols = [[y&1<<x and 1 for x in range(r)[::-1]] for y in range(2**r)] 

Some explanations of the central part may be useful:

 y&1<<x and 1 

It is equivalent

 (y&(1<<x)) and 1 

Suppose that x = 3 and y = 5 we get:

  • 1<<x 1<<3 , which in binary notation is 1000 . The binary is shifted 3 steps to the left. In python you can write 0b1000 or just 8 . More details
  • y&(1<<x) now 5&8 , which is interchangeable with "AND" 0101 and 1000 , which is 0
  • Now we are left with (0) and 1 , which gives 0

Note: After using timeit to measure the effectiveness of this compared to other solutions, this is not the one you should choose to use! This is slower in every test I have done. Example:

 In [21]: r = 15 In [23]: %timeit [[y&1<<x and 1 for x in range(r)[::-1]] for y in range(2**r)] 10 loops, best of 3: 39.8 ms per loop In [30]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int) 1000 loops, best of 3: 1.31 ms per loop 
+3
source share

Another non-numpy solution:

 r = 3 print [map(int, list('{:0{w}b}'.format(x, w=r))) for x in xrange(2**r)] 

Donation:

 [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]] 
+3
source share

How about this

  np.array([list("0"*(r-1-int(np.log2(i)))+"{0:b}".format(i)) if i>0 else [i]*r for i in range(0,2**r)]).astype(int) 

EDIT

I noticed that there is no need to use log2 calculations in the formatting solution:

 frmt = "{0:0"+str(r)+"b}" print [map(int,list(frmt.format(i))) for i in range(0,2**r)] 

And the time difference ..

 In [17]: timeit [map(int,list(frmt.format(i))) for i in range(0,2**r)] 10000 loops, best of 3: 171 µs per loop In [18]: timeit np.array([list("0"*(r-1-int(np.log2(i)))+"{0:b}".format(i)) if i>0 else [i]*r for i in range(0,2**r)]).astype(int) 1000 loops, best of 3: 514 µs per loop 
+1
source share

itertools.product - fast C code; np.array is a slower general purpose code. np.fromiter can be faster in the right situations. But fromiter requires flat input, not nested lists. But another itertools great job of smoothing lists.

Here are some examples of time:

 In [72]: timeit list(product([0,1],repeat=5)) 100000 loops, best of 3: 7.03 us per loop In [73]: timeit list(chain(*product([0,1],repeat=5))) 100000 loops, best of 3: 18.3 us per loop In [74]: timeit np.fromiter(chain(*product([0,1],repeat=5)),int,count=160).reshape(-1,5) 10000 loops, best of 3: 33.8 us per loop In [75]: timeit np.array(list(product([0,1],repeat=5))) 10000 loops, best of 3: 65.1 us per loop 

In this case, including count win fromiter , doesn't really matter.

There is a certain pythonian elegance in chain generators.


As a comparison, my time for the pure numpy method is a bit slower:

 In [85]: timeit (((np.arange(2**5)[:,None] & 2**np.arange(5)[::-1]))>0).astype(int) 10000 loops, best of 3: 38.1 us per loop 

But @Divakar shows that this solution scales much better.

+1
source share

All Articles