How to subclass numpy.`ma.core.masked_array`?

Question

How to subclass numpy.`ma.core.masked_array`?

I am trying to write a subclass of a masked_array . What I still know:

 class gridded_array(ma.core.masked_array): def __init__(self, data, dimensions, mask=False, dtype=None, copy=False, subok=True, ndmin=0, fill_value=None, keep_mask=True, hard_mask=None, shrink=True): ma.core.masked_array.__init__(data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, shrink) self.dimensions = dimensions

However, when I create gridded_array , I do not get the expected:

 dims = OrderedDict() dims['x'] = np.arange(4) gridded_array(np.random.randn(4), dims) masked_array(data = [-- -- -- --], mask = [ True True True True], fill_value = 1e+20)

I would expect an unscaled array. I have a suspicion that the dimensions argument I pass in is passed by calling masked_array.__init__ , but since I'm pretty new to OOP, I don't know how to resolve this.

Any help is greatly appreciated.

PS: I'm on Python 2.7

+3

python inheritance oop python-2.7 subclass

andreas-h Sep 26 '12 at 8:43

source share

2 answers

The problem is that masked_array uses __new__ instead of __init__ , so your dimensions argument is misinterpreted.

To override __new__ , use:

 class gridded_array(ma.core.masked_array): def __new__(cls, data, dimensions, *args, **kwargs): self = super(gridded_array, cls).__new__(cls, data, *args, **kwargs) self.dimensions = dimensions return self

0

ecatmur Sep 26 '12 at 10:00

source share

Pierre GM · Accepted Answer · 2012-09-26T10:04:57+0000

A word of warning: if you are new to OOP, subclasses of ndarrays and MaskedArrays are not the easiest way to get started, of course ...

First of all, you should go and check out this tutorial . This should introduce you to the mechanisms involved in the ndarrays subclass.

MaskedArrays , such as ndarrays , uses the __new__ method to instantiate the class, not __init__ . When you go to __init__ your subclass, you already have a fully running object, with actual initialization delegated to the __array_finalize__ method. Simply put: your __init__ does not work as you would expect with a standard Python object. (in fact, I wonder if this is called at all ... After __array_finalize__ , if I remember correctly ...)

Now that you have been warned, you might be wondering if you really need to go through the ndarray subclass:

What are your goals with gridded_array ?
Should you support all ndarrays methods or just some? All dtypes?
What happens if you take one element or fragment of your object?
Will you make extensive use of gridded_arrays as inputs to NumPy functions?

If in doubt, it might be easier to create a gridded_array as a generic class that takes an ndarray (or MaskedArray ) attribute (say gridded_array._array ) and adds only the methods that you will need to use for your self._array .

suggestions

If you just need to "tag" each element of your gridded_array , you might be interested in pandas .
If you need to deal with floats, MaskedArray may be a little redundant: just use nans to represent invalid data, many numpy functions have a nans equivalent. In the worst case, you can always mask your gridded_array if necessary: viewing the subclass ndarray with .view(np.ma.MaskedArray) should return the mask version of your input ...

How to subclass numpy.`ma.core.masked_array`?

suggestions

More articles: