The numpy.ma (masked) method has averages of an inappropriate type

I noticed that the numpy masked-array mean method returns different types if it probably shouldn't:

import numpy as np A = np.ma.masked_equal([1,1,0], value=0) B = np.ma.masked_equal([1,1,1], value=0) # no masked values type(A.mean()) #numpy.float64 type(B.mean()) #numpy.ma.core.MaskedArray 

Other numpy.ma.core.MaskedArray methods seem consistent

 type( A.sum()) == type(B.sum()) # True type( A.prod()) == type(B.prod()) # True type( A.std()) == type(B.std()) # True type( A.mean()) == type(B.mean()) # False 

Can someone explain this?

UPDATE: As stated in the comments

 C = np.ma.masked_array([1, 1, 1], mask=[False, False, False]) type(C.mean()) == type(A.mean()) # True 
+5
source share
1 answer

B.mask starts with:

  if self._mask is nomask: result = super(MaskedArray, self).mean(axis=axis, dtype=dtype) 

np.ma.nomask - False .

This applies to your B :

 masked_array(data = [1 1 1], mask = False, fill_value = 0) 

For A mask is an array that matches the size of data . In B this is a scalar, False and mean treats this as a special case.

I need to dig a little more to understand what that means.

 In [127]: np.mean(B) Out[127]: masked_array(data = 1.0, mask = False, fill_value = 0) In [141]: super(np.ma.MaskedArray,B).mean() Out[141]: masked_array(data = 1.0, mask = False, fill_value = 0) 

I am not sure what helps; There is some circular reference between the np.ndarray methods and the np and np.ma , which makes it difficult to determine exactly which code is used. This is similar to using the compiled mean method, but it is unclear how this works with masking.

I wonder if you intend to use

  np.mean(B.data) # or B.data.mean() 

and the choice of the super method is not suitable.

In any case, the same array, but with a vector mask, returns a scalar.

 In [132]: C Out[132]: masked_array(data = [1 1 1], mask = [False False False], fill_value = 0) In [133]: C.mean() Out[133]: 1.0 

======================

Attempting this method without a nomask label causes an error after

  dsum = self.sum(axis=axis, dtype=dtype) cnt = self.count(axis=axis) if cnt.shape == () and (cnt == 0): result = masked else: result = dsum * 1. / cnt 

self.count returns a scalar in the case of nomask , but a np.int32 in regular disguise. This way cnt.shape throttles.

trace is the only other masked method that attempts to execute this super(MaskedArray...) 'shortcut'. There is clearly something ragged about the middle code.

======================

Relevant bug: https://github.com/numpy/numpy/issues/5769

Accordingly, the same question was raised last year: Numpy MaskedArray instance equivalence testing raises attribute error

It seems that there are many camouflage issues, not just mean . There may be fixes in the development wizard or in the near future.

+1
source

All Articles