It seems likely that dbaupp's answer is correct. But just for the sake of diversity, here is another solution that saves memory. This will work even for operations that do not have a built-in numpy equivalent.
>>> values = numpy.array([(x % 2) for x in range(12)], dtype=bool).reshape((4,3)) >>> weights = numpy.array(range(1, 4)) >>> weights_stretched = numpy.lib.stride_tricks.as_strided(weights, (4, 3), (0, 8))
numpy.lib.stride_tricks.as_strided is a great little feature! It allows you to specify shape and strides values ββthat allow small arrays to simulate a much larger array. Watch - there are no four rows; it looks like this:
>>> weights_stretched[0][0] = 4 >>> weights_stretched array([[4, 2, 3], [4, 2, 3], [4, 2, 3], [4, 2, 3]])
So, instead of passing a huge array to MaskedArray , you can pass less. (But, as you already noticed, numpy disguise works the other way around, you can expect: masks are right, not open, so you need to keep the inverted values .) As you can see, MaskedArray does not copy data; it just reflects everything that is in weights_stretched :
>>> masked = numpy.ma.MaskedArray(weights_stretched, numpy.logical_not(values)) >>> weights_stretched[0][0] = 1 >>> masked masked_array(data = [[-- 2 --] [1 -- 3] [-- 2 --] [1 -- 3]], mask = [[ True False True] [False True False] [ True False True] [False True False]], fill_value=999999)
Now we can just pass it to the sum:
>>> sum(masked, axis=1) masked_array(data = [2 4 2 4], mask = [False False False False], fill_value=999999)
I compared numpy.dot and higher with an array of 1,000,000 x 30. This is the result for a modern MacBook Pro ( numpy.dot is dot1 , my dot2 ):
>>> %timeit dot1(values, weights) 1 loops, best of 3: 194 ms per loop >>> %timeit dot2(values, weights) 1 loops, best of 3: 459 ms per loop
As you can see, the numpy inline solution is faster. But stride_tricks worth knowing about something, so I leave it stride_tricks .