How to ignore zeros when I take the median over the columns of an array?

Question

How to ignore zeros when I take the median over the columns of an array?

I have a simple numpy array.

array([[10, 0, 10, 0], [ 1, 1, 0, 0] [ 9, 9, 9, 0] [ 0, 10, 1, 0]])

I would like to take the median of each column separately from this array.

However, in different places there are several values of 0 that I would like to ignore when calculating medians.

To complicate things even further, I would like the columns to contain only 0 tags as having a median of 0 . Thus, these columns will serve as a place to store space, keeping the dimensions of the matrix the same.

There are no arguments in the numpy documentation that could work for what I want (maybe I'm spoiled by the many switches we get with R!)

numpy.median(a, axis=None, out=None, overwrite_input=False)[source]

Can someone shed some light on an effective way to do this, which is in line with numpy spirit? I could hack it, but in this case it seems to me that I defeated the goal of using numpy in the first place.

Thanks in advance.

+8

python arrays numpy zero median

Matt O'Brien Feb 26 '14 at 17:43

source share

4 answers

Masked array always handy, but slooooooow:

 In [14]: %timeit np.ma.median(y, axis=0).filled(0) 1000 loops, best of 3: 1.73 ms per loop In [15]: %%timeit ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x) ans[np.isnan(ans)]=0. 1000 loops, best of 3: 402 µs per loop In [16]: ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x) ans[np.isnan(ans)]=0.; ans Out[16]: array([ 9., 9., 9., 0.])

np.nonzero even faster:

 In [25]: %%timeit ans=np.apply_along_axis(lambda v: np.median(v[np.nonzero(v)]), 0, x) ans[np.isnan(ans)]=0. 1000 loops, best of 3: 384 µs per loop

+11

CT Zhu Feb 26 '14 at 18:18

source share

This can help. When you get a non-zero array, you can get the median information directly from [non-zero (a)]

numpy.nonzero

numpy.nonzero (a) [source]

 Return the indices of the elements that are non-zero. Returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension. The corresponding non-zero values can be obtained with: a[nonzero(a)] To group the indices by element, rather than dimension, use: transpose(nonzero(a)) The result of this is always a 2-D array, with a row for each non-zero element. Parameters : a : array_like Input array. Returns : tuple_of_arrays : tuple Indices of elements that are non-zero. See also flatnonzero Return indices that are non-zero in the flattened version of the input array. ndarray.nonzero Equivalent ndarray method. count_nonzero Counts the number of non-zero elements in the input array. Examples >>> x = np.eye(3) >>> x array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]) >>> np.nonzero(x) (array([0, 1, 2]), array([0, 1, 2])) >>> x[np.nonzero(x)] array([ 1., 1., 1.]) >>> np.transpose(np.nonzero(x)) array([[0, 0], [1, 1], [2, 2]]) A common use for nonzero is to find the indices of an array, where a condition is True. Given an array a, the condition a > 3 is a boolean array and since False is interpreted as 0, np.nonzero(a > 3) yields the indices of the a where the condition is true. >>> a = np.array([[1,2,3],[4,5,6],[7,8,9]]) >>> a > 3 array([[False, False, False], [ True, True, True], [ True, True, True]], dtype=bool) >>> np.nonzero(a > 3) (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2])) The nonzero method of the boolean array can also be called. >>> (a > 3).nonzero() (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))

0

sabbahillel Feb 26 '14 at 17:55

source share

You can use masked arrays .

 a = np.array([[10, 0, 10, 0], [1, 1, 0, 0],[9,9,9,0],[0,10,1,0]]) m = np.ma.masked_equal(a, 0) In [44]: np.median(a) Out[44]: 1.0 In [45]: np.ma.median(m) Out[45]: 9.0 In [46]: m Out[46]: masked_array(data = [[10 -- 10 --] [1 1 -- --] [9 9 9 --] [-- 10 1 --]], mask = [[False True False True] [False False True True] [False False False True] [ True False False True]], fill_value = 0)

0

M4rtini Feb 26 '14 at 17:55

source share

wflynny · Accepted Answer · 2014-02-26T18:02:34+0000

Use masked arrays and np.ma.median(axis=0).filled(0) to get the medians of the columns.

 In [1]: x = np.array([[10, 0, 10, 0], [1, 1, 0, 0], [9, 9, 9, 0], [0, 10, 1, 0]]) In [2]: y = np.ma.masked_where(x == 0, x) In [3]: x Out[3]: array([[10, 0, 10, 0], [ 1, 1, 0, 0], [ 9, 9, 9, 0], [ 0, 10, 1, 0]]) In [4]: y Out[4]: masked_array(data = [[10 -- 10 --] [1 1 -- --] [9 9 9 --] [-- 10 1 --]], mask = [[False True False True] [False False True True] [False False False True] [ True False False True]], fill_value = 999999) In [6]: np.median(x, axis=0) Out[6]: array([ 5., 5., 5., 0.]) In [7]: np.ma.median(y, axis=0).filled(0) Out[7]: array(data = [ 9. 9. 9., 0.])

How to ignore zeros when I take the median over the columns of an array?

More articles: