The fastest python / numpy method for filtering 2d kernel rank on array masks (and / or selective rating)

For 2D numpy array

MyArray = np.array([[ 8.02, 9.54, 0.82, 7.56, 2.26, 9.47], [ 2.68, 7.3 , 2.74, 3.03, 2.25, 8.84], [ 2.21, 3.62, 0.55, 2.94, 5.77, 0.21], [ 5.78, 5.72, 8.85, 0.24, 5.37, 9.9 ], [ 9.1 , 7.21, 4.14, 9.95, 6.73, 6.08], [ 1.8 , 5.14, 5.02, 6.52, 0.3 , 6.11]]) 

and an array of masks

 MyMask = np.array([[ 0., 0., 1., 1., 0., 1.], [ 1., 0., 0., 0., 0., 1.], [ 0., 0., 0., 1., 0., 0.], [ 0., 1., 1., 1., 1., 0.], [ 0., 1., 0., 1., 0., 0.], [ 0., 1., 0., 0., 1., 1.]]) 

I want to run a β€œleaky” median filter that ignores masked elements.

For example, a rank filter with a kernel

 k = np.array([[ 1, 1, 1], [ 1, 0, 1], [ 1, 1, 1]]); 

will be performed on MyArray : sorting the neighborhood defined by the kernel for each element from MyArray , and returning the median of only non-masked elements (averaging if the array is an even number).

Now, I'm currently doing this in non-python loops using bottleneck.nanmedian, matching the mask with NaN. This gives me exactly what I need, but I was hoping to rely on 2D manipulations with arrays.

scipy.signal.order_filter and scipy.ndimage.filters.rank_filter both available (rank_filter looks much faster), but they seem to sort NaN and Inf at the top of the array before returning the rank and shifting the result. It seems that none of these methods supports numpy.ma arrays (masking) and does not accept an array of selective ranks (then I could fill all the masks with 0 and shift my rank), and there is no obvious way to change the kernel for each place .

I am wondering if I missed a combination and / or python function, or if I should look for a new procedure in Cython.

Ignoring border processing, the internal points of the specified problem will be

 [[ 0. 0. 0. 0. 0. 0. ] [ 0. 3.18 3.62 2.26 2.645 0. ] [ 0. 2.74 3.325 2.74 2.64 0. ] [ 0. 3.88 3.62 4.955 6.08 0. ] [ 0. 5.02 5.77 5.77 6.52 0. ] [ 0. 0. 0. 0. 0. 0. ]] 
+7
python numpy scipy median
source share
1 answer

One way is to sacrifice RAM usage to discard Python loops. That is, we blew up the original array so that we could immediately apply the filter to all subarrays. This is similar to Numpy Broadcast.

For a 1000x1000 array, a vectorized function performs approximately 100 times faster in my testing.

In my code, I used NaN to mask, but with some additional lines of code, you could also use numpy.ma arrays. And I did not have a nanmedian function, so I used nanmean , the performance should be comparable.

 import numpy as np from numpy.lib.stride_tricks import as_strided # test data N = 1000 A = np.random.rand(N, N)*10 mask = np.random.choice([True, False], size=(N, N)) def filter_loop(A, mask): kernel = np.array([[1,1,1],[1,0,1],[1,1,1]], bool) A = A.copy() A[mask] = np.nan N = A.shape[0] - 2 # assuming square matrix out = np.empty((N, N)) for i in xrange(N): for j in xrange(N): out[i,j] = np.nanmean(A[i:i+3, j:j+3][kernel]) return out def filter_broadcast(A, mask): A = A.copy() A[mask] = np.nan N = A.shape[0] - 2 B = as_strided(A, (N, N, 3, 3), A.strides+A.strides) B = B.copy().reshape((N, N, 3*3)) B[:,:,4] = np.nan return np.nanmean(B, axis=2) 
+3
source share

All Articles