Numpy.array indexing issue

I am trying to create a "mask" numpy.array by specifying certain criteria. Python even has good syntax for something like this:

>> A = numpy.array([1,2,3,4,5]) >> A > 3 array([False, False, False, True, True]) 

But if I have a list of criteria, not a range:

 >> A = numpy.array([1,2,3,4,5]) >> crit = [1,3,5] 

I can not do it:

 >> A in crit 

I need to do something based on a list comprehension, for example:

 >> [a in crit for a in A] array([True, False, True, False, True]) 

It is right.

Now the problem is that I am working with large arrays and the code above is very slow. Is there a more natural way to do this surgery that can speed it up?

EDIT: I was able to get a little acceleration by doing crit in the set.

EDIT2: For those interested:

Jouni approach: 1000, best 3: 102 μs per loop

numpy.in1d: 1000, best of 3: 1.33 ms per cycle

EDIT3: just checked again with B = randint (10, size = 100)

Jouni approach: 1000, best 3: 2.96 ms per cycle

numpy.in1d: 1000, best of 3: 1.34 ms per cycle

Conclusion : use numpy.in1d ​​() if B is not very small.

+6
python arrays numpy
source share
3 answers

I think the numpy in1d function is what you are looking for:

 >>> A = numpy.array([1,2,3,4,5]) >>> B = [1,3,5] >>> numpy.in1d(A,crit) array([ True, False, True, False, True], dtype=bool) 

as stated in his docstring, " in1d(a, b) roughly equivalent to np.array([item in b for item in a]) "

Admittedly, I have not done any speed tests, but that sounds like what you are looking for.

Another faster way

Here is another way to do it faster. First sort the array B (containing the elements you want to find in A), turn it into a numpy array, and then do:

 B[B.searchsorted(A)] == A 

although if you have elements in A larger than the largest in B, you will need to do:

 inds = B.searchsorted(A) inds[inds == len(B)] = 0 mask = B[inds] == A 

It may not be that fast for small arrays (especially for small Bs), but soon it will definitely be faster. What for? Since this is an O (N log M) algorithm, where N is the number of elements in A and M is the number of elements in M, the union of the set of individual masks is O (N * M). I tested it with N = 10000 and M = 14, and it was faster. Anyway, I just thought you might like to know, especially if you really plan to use this on very large arrays.

+6
source share

Combine several comparisons with "or":

 A = randint(10,size=10000) mask = (A == 1) | (A == 3) | (A == 5) 

Or if you have list B and want to create a mask dynamically:

 B = [1, 3, 5] mask = zeros((10000,),dtype=bool) for t in B: mask = mask | (A == t) 
+3
source share

Create a mask and use the numpy array compression function. It should be much faster. If you have complex criteria, be sure to build it based on the mathematics of arrays.

 a = numpy.array([3,1,2,4,5]) mask = a > 3 b = a.compress(mask) 

or

 a = numpy.random.random_integers(1,5,100000) c=a.compress((a<=4)*(a>=2)) ## numbers between n<=4 and n>=2 d=a.compress(~((a<=4)*(a>=2))) ## numbers either n>4 or n<2 

Well, if you want a mask that has everything a in [1,3,5], you can do something like

 a = numpy.random.random_integers(1,5,100000) mask=(a==1)+(a==3)+(a==5) 

or

 a = numpy.random.random_integers(1,5,100000) mask = numpy.zeros(len(a), dtype=bool) for num in [1,3,5]: mask += (a==num) 
0
source share

All Articles