Average pristine values

I have an N * M size matrix and I want to find the average value for each row. Values ​​are from 1 to 5, and records that have no value are 0. However, when I want to find the average value using the following method, this gives me the wrong average value, since it also takes into account records that have the value 0.

matrix_row_mean= matrix.mean(axis=1) 

How to get the average of only non-zero values?

+6
source share
2 answers

Get a count of non-zeros in each row and use this to average the sum along each row. Thus, the implementation will look something like this:

 np.true_divide(matrix.sum(1),(matrix!=0).sum(1)) 

If you are using an older version of NumPy, you can use the counter float conversion to replace np.true_divide , for example:

 matrix.sum(1)/(matrix!=0).sum(1).astype(float) 

Run Example -

 In [160]: matrix Out[160]: array([[0, 0, 1, 0, 2], [1, 0, 0, 2, 0], [0, 1, 1, 0, 0], [0, 2, 2, 2, 2]]) In [161]: np.true_divide(matrix.sum(1),(matrix!=0).sum(1)) Out[161]: array([ 1.5, 1.5, 1. , 2. ]) 

Another way to solve the problem is to replace the zeros of NaNs , and then use np.nanmean , which will ignore those NaNs and, in fact, those original zeros like this -

 np.nanmean(np.where(matrix!=0,matrix,np.nan),1) 

In terms of performance, I would recommend the first approach.

+10
source

Here I will talk about a more general solution that uses a masked array mask. To illustrate the details, create a lower triangular matrix with only one:

 matrix = np.tril(np.ones((5, 5)), 0) 

If you did not specify the above terminology, this matrix is ​​as follows:

  [[ 1., 0., 0., 0., 0.], [ 1., 1., 0., 0., 0.], [ 1., 1., 1., 0., 0.], [ 1., 1., 1., 1., 0.], [ 1., 1., 1., 1., 1.]] 

Now we want our function to return an average of 1 for each row. Or, in other words, the average along axis 1 is equal to a vector of five. To do this, we created a matrix matrix where records whose values ​​are equal to zero are considered invalid . This can be achieved using np.ma.masked_equal :

 masked = np.ma.masked_equal(matrix, 0) 

Finally, we perform numpy operations in this array, which systematically ignore masked elements (0). With this in mind, we get the desired result:

 masked.mean(axis=1) 

This should result in a vector whose entries are only one.


In more detail, the output of np.ma.masked_equal(matrix, 0) should look like this:

 masked_array(data = [[1.0 -- -- -- --] [1.0 1.0 -- -- --] [1.0 1.0 1.0 -- --] [1.0 1.0 1.0 1.0 --] [1.0 1.0 1.0 1.0 1.0]], mask = [[False True True True True] [False False True True True] [False False False True True] [False False False False True] [False False False False False]], fill_value = 0.0) 

This means that the eh on -- values -- considered invalid. This is also shown in the mask attribute of masked arrays as True , which indicates that IT is an invalid element and should therefore be ignored.

Finally, the output of the average operation on this array should be:

 masked_array(data = [1.0 1.0 1.0 1.0 1.0], mask = [False False False False False], fill_value = 1e+20) 
0
source

All Articles