EDIT Given your actual template search, I would go with something like:
from numpy.lib.stride_tricks import as strided win_img = as_strided(im, shape=(h, w - 3 + 1, 3), strides=im.strides + im.strides[-1:]) cond_1 = np.sum(win_img, axis=-1) == 1 cond_2 = im == 0 cond_3 = im == 1 cond = cond_1[:-2, :] & cond_2[1:-1, 2:] & cond_3[2:, 2:]
Now cond[i, j] has a boolean value for a window centered at im[i+1, j+1] and two elements less in each direction than your original image. You can get a logical array for the whole image:
cond_im = np.zeros_like(im, dtype=bool) cond_im[1:-1, 1:-1] = cond
Take the window view of your array:
from numpy.lib.stride_tricks import as strided win_img = as_strided(im, shape=(h - 3 + 1, w - 3+ 1 , 3, 3), strides=im.strides * 2)
Now win_img[i, j] is an array (3, 3) with the contents of the 3x3 window of your image in the upper left corner in i, j .
If the pattern you are is an array of pattern form (3, 3) , you can simply do:
np.where(np.all(np.all(win_img == pattern, axis=-1), axis=-1))
to get a tuple of two arrays, with rows and columns of the upper left corners of the windows where your template maps.
Your only problem is that when you do win_img == pattern , an array is created that is 9 times larger than your image, which can be problematic if your image is very large. If you have memory problems, divide the template check into several bands and run through them. A cycle of 10 ranges will still be much faster than your current two nested cycles across the entire width and height of the image.