Why is creating a masked numpy array so slow with mask = None or mask = 0

Today I was profiling a function, and I found (at least for me) a strange bottleneck: creating a masked array with mask=None or mask=0 to initialize the mask with all zeros, but with the same form as data very slow:

 >>> import numpy as np >>> data = np.ones((100, 100, 100)) >>> %timeit ma_array = np.ma.array(data, mask=None, copy=False) 1 loop, best of 3: 803 ms per loop >>> %timeit ma_array = np.ma.array(data, mask=0, copy=False) 1 loop, best of 3: 807 ms per loop 

on the other hand, using mask=False or creating a mask manually is much faster:

 >>> %timeit ma_array = np.ma.array(data, mask=False, copy=False) 1000 loops, best of 3: 438 ยตs per loop >>> %timeit ma_array = np.ma.array(data, mask=np.zeros(data.shape, dtype=bool), copy=False) 1000 loops, best of 3: 453 ยตs per loop 

Why is None or 0 almost 2000 times slower than False or np.zeros(data.shape) as a mask parameter? Given that the docs function only says that it:

Must be converted to an array of logic elements with the same form as the data. True indicates masked (i.e. invalid) data.

I am using python 3.5, numpy 1.11.0 on windows 10

+5
source share
2 answers

mask=False wrapped in NumPy 1.11.0 source code :

 if mask is True and mdtype == MaskType: mask = np.ones(_data.shape, dtype=mdtype) elif mask is False and mdtype == MaskType: mask = np.zeros(_data.shape, dtype=mdtype) 

mask=0 or mask=None follow the slow path by creating a 0-dimensional array of masks and going through np.resize to resize it.

+4
source

I believe that @ user2357112 has an explanation. I have profiled both cases, here are the results:

 In [14]: q.run('q.np.ma.array(q.data, mask=None, copy=False)') 49 function calls in 0.161 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 3 0.000 0.000 0.000 0.000 :0(array) 1 0.154 0.154 0.154 0.154 :0(concatenate) 1 0.000 0.000 0.161 0.161 :0(exec) 11 0.000 0.000 0.000 0.000 :0(getattr) 1 0.000 0.000 0.000 0.000 :0(hasattr) 7 0.000 0.000 0.000 0.000 :0(isinstance) 1 0.000 0.000 0.000 0.000 :0(len) 1 0.000 0.000 0.000 0.000 :0(ravel) 1 0.000 0.000 0.000 0.000 :0(reduce) 1 0.000 0.000 0.000 0.000 :0(reshape) 1 0.000 0.000 0.000 0.000 :0(setprofile) 5 0.000 0.000 0.000 0.000 :0(update) 1 0.000 0.000 0.161 0.161 <string>:1(<module>) 1 0.000 0.000 0.161 0.161 core.py:2704(__new__) 1 0.000 0.000 0.000 0.000 core.py:2838(_update_from) 1 0.000 0.000 0.000 0.000 core.py:2864(__array_finalize__) 5 0.000 0.000 0.000 0.000 core.py:3264(__setattr__) 1 0.000 0.000 0.161 0.161 core.py:6119(array) 1 0.007 0.007 0.161 0.161 fromnumeric.py:1097(resize) 1 0.000 0.000 0.000 0.000 fromnumeric.py:128(reshape) 1 0.000 0.000 0.000 0.000 fromnumeric.py:1383(ravel) 1 0.000 0.000 0.000 0.000 numeric.py:484(asanyarray) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 0.161 0.161 profile:0(q.np.ma.array(q.data, mask=None, copy=False)) In [15]: q.run('q.np.ma.array(q.data, mask=False, copy=False)') 37 function calls in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(array) 1 0.000 0.000 0.000 0.000 :0(exec) 11 0.000 0.000 0.000 0.000 :0(getattr) 1 0.000 0.000 0.000 0.000 :0(hasattr) 5 0.000 0.000 0.000 0.000 :0(isinstance) 1 0.000 0.000 0.000 0.000 :0(setprofile) 5 0.000 0.000 0.000 0.000 :0(update) 1 0.000 0.000 0.000 0.000 :0(zeros) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 core.py:2704(__new__) 1 0.000 0.000 0.000 0.000 core.py:2838(_update_from) 1 0.000 0.000 0.000 0.000 core.py:2864(__array_finalize__) 5 0.000 0.000 0.000 0.000 core.py:3264(__setattr__) 1 0.000 0.000 0.000 0.000 core.py:6119(array) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 0.000 0.000 profile:0(q.np.ma.array(q.data, mask=False, copy=False)) 

So, it seems that the stage of concatenating arrays is a bottleneck.

+1
source

All Articles