An idealized way to numpy work with a sliding window like this is to build a 4D array
C.shape = (N,M,3,3)
Where
C[i,j,:,:] = np.array([a[i-1, j-1], a[i-1, j ], a[i-1, j+1], a[i , j-1], a[i , j ], a[i , j+1], a[i+1, j-1], a[i+1, j ], a[i+1, j+1]])
and write that your function makes some kind of reduction in the last two dimensions. sum or mean will be typical, for example
B = C.sum(axis=(2,3))
Other SO questions show how to use np.lib.stride_tricks.as_strided to build such an array. But only with a 3x3 sub-matrix, can it be just as quick to do something like
C = np.zeros((N,M,3,3)) C[:,:,0,0] = a[:-1,:-1] etc.
(or use hstack and vstack for the same effect).
But the nice thing (or maybe not so nice) about the stylistic approach is that it does not include copying any data a - it's just an idea.
As for dividing the work into pieces, I can imagine using fragments of C (according to the 1st dimension), for example
C[0:100,0:100,:,:].sum(axis=(2,3))