How to remove multiple values ​​from an array at once

Can someone provide me with a better (simpler, more readable, more Pythonic, more efficient, etc.) way to remove multiple values ​​from an array than the following:

import numpy as np # The array. x = np.linspace(0, 360, 37) # The values to be removed. a = 0 b = 180 c = 360 new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c))) 

A good answer to this question will give the same result as the above code (i.e. new_array ), but it can better deal with equality between floats than the code does.

Bonus

Can someone explain to me why this leads to the wrong result?

 In [5]: np.delete(x, x == a) /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer "of casting it to integer", FutureWarning) Out[5]: array([ 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 180., 190., 200., 210., 220., 230., 240., 250., 260., 270., 280., 290., 300., 310., 320., 330., 340., 350., 360.]) 

The values ​​0 and 10 have been deleted, not just 0 ( a ).

Note. x == a meets expectations (so the problem is inside np.delete ):

 In [6]: x == a Out[6]: array([ True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], dtype=bool) 

Note also that np.delete(x, np.where(x == a)) gives the correct result. Thus, it seems to me that np.delete cannot handle boolean indexes.

+5
source share
3 answers

Your code looks a bit complicated. I wondered if you thought numeric Boolean vector indexing is numeric.

After the same setup as me, I timed your code:

 In [175]: %%timeit .....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c))) .....: 10000 loops, best of 3: 32.9 µs per loop 

Then I timed two separate applications of Boolean indexing.

 In [176]: %%timeit .....: x1 = x[x != a] .....: x2 = x1[x1 != b] .....: new_array = x2[x2 != c] .....: 100000 loops, best of 3: 6.56 µs per loop 

Finally, for the convenience of programming and expanding the technique to an arbitrary number of excluded values, I rewrote the same code as the loop. It will be a little slower due to the need to make a copy first, but it is still very respectable.

 In [177]: %%timeit .....: new_array = x.copy() .....: for val in (a, b, c): .....: new_array = new_array[new_array != val] .....: 100000 loops, best of 3: 7.61 µs per loop 

I think the real gain in programming clarity. Finally, I thought it was best to verify that the three algorithms give the same results ...

 In [179]: new_array1 = np.delete(x, .....: np.where(np.logical_or(np.logical_or(x == a, x == b), x == c))) In [180]: x1 = x[x != a] In [181]: x2 = x1[x1 != b] In [182]: new_array2 = x2[x2 != c] In [183]: new_array3 = x.copy() In [184]: for val in (a, b, c): .....: new_array3 = new_array3[new_array3 != val] .....: In [185]: all(new_array1 == new_array2) Out[185]: True In [186]: all(new_array1 == new_array3) Out[186]: True 

To deal with the problem of floating point comparisons, you need to use the numpy isclose() function. As expected, this sends time to hell:

 In [188]: %%timeit .....: new_array = x.copy() .....: for val in (a, b, c): .....: new_array = new_array[~np.isclose(new_array, val)] .....: 10000 loops, best of 3: 126 µs per loop 

The answer to your bonus is contained in the warning, but the warning is not very useful if you do not know that False and True compared numerically equal to zero and one, respectively. So your code is equivalent

 np.delete(1, 1) 

As the warning clearly np.delete() , the numpy command ultimately assumes that the result using boolean arguments in np.delete() is likely to change, but currently it only accepts index arguments.

+4
source

You can also use np.ravel to get the index of values and then delete them with np.delete

 In [32]: r = [a,b,c] In [33]: indx = np.ravel([np.where(x == i) for i in r]) In [34]: indx Out[34]: array([ 0, 18, 36]) In [35]: np.delete(x, indx) Out[35]: array([ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 190., 200., 210., 220., 230., 240., 250., 260., 270., 280., 290., 300., 310., 320., 330., 340., 350.]) 
+3
source

You could borrow the np.allclose approach for testing if floats are equal:

 def float_equal(x,y,rtol=1.e-5, atol=1.e-8): return np.less_equal(abs(xy), atol + rtol * abs(y)) np.delete(x,np.where(np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]]))) 

Here where part produces:

 (array([ 0, 18, 36]),) 

float_equal could possibly be changed to translate x to y , excluding list comprehension.

I used the fact that logical_or is ufunc and has a reduce method.

You do not need where ; just use the result of logical_or as a logical index:

 I = np.logical_or.reduce([float_equal(x,y) for y in [0,180,360]]) x[~I] 

(with this small example, using boolean 2x directly is faster than the np.delete(np.where(...)) approach np.delete(np.where(...)) .)


In this case, x , == produces the same thing:

 np.where(np.logical_or.reduce([x==y for y in [0,180,360]])) # (array([ 0, 18, 36]),) 

this vector approach does this:

 abc = np.array([0,180,360]) np.where(np.sum(x==abc[:,None],axis=0)) # (array([ 0, 18, 36]),) 

x==abc[:,None] is a (3,37) boolean array; np.sum acts as a logical or.

My float_equal also works as follows:

 float_equal(x,abc[:,None]).sum(axis=0) 
+1
source

All Articles