Numpy: Replace each value in the array with the average value of its adjacent elements

I have ndarray and I want to replace each value in the array with the average value of its adjacent elements. The code below can do the job, but it is very slow when I have 700 arrays with the form (7000, 7000), so I wonder if there are better ways to do this. Thanks!

a = np.array(([1,2,3,4,5,6,7,8,9],[4,5,6,7,8,9,10,11,12],[3,4,5,6,7,8,9,10,11])) row,col = a.shape new_arr = np.ndarray(a.shape) for x in xrange(row): for y in xrange(col): min_x = max(0, x-1) min_y = max(0, y-1) new_arr[x][y] = a[min_x:(x+2),min_y:(y+2)].mean() print new_arr 
+5
source share
3 answers

Well, that smoothing operation in image processing , which can be achieved using 2D convolution. You work differently on cross-border elements. So, if the border elements are disabled for accuracy, you can use scipy convolve2d , like this -

 from scipy.signal import convolve2d as conv2 out = (conv2(a,np.ones((3,3)),'same')/9.0 

This particular operation is built-in in the OpenCV module like cv2.blur and is very effective in it. The name basically describes its work of blurring input arrays representing images. I believe that efficiency comes from the fact that internally it is fully implemented in C for performance with a thin Python shell for handling NumPy arrays.

Thus, the result can be calculated using it, for example:

 import cv2 # Import OpenCV module out = cv2.blur(a.astype(float),(3,3)) 

Here's a quick display of timings on a decently large image / array -

 In [93]: a = np.random.randint(0,255,(5000,5000)) # Input array In [94]: %timeit conv2(a,np.ones((3,3)),'same')/9.0 1 loops, best of 3: 2.74 s per loop In [95]: %timeit cv2.blur(a.astype(float),(3,3)) 1 loops, best of 3: 627 ms per loop 
+9
source

After a discussion with @Divakar, find a comparison of the various convolution methods present in scipy:

 import numpy as np from scipy import signal, ndimage def conv2(A, size): return signal.convolve2d(A, np.ones((size, size)), mode='same') / float(size**2) def fftconv(A, size): return signal.fftconvolve(A, np.ones((size, size)), mode='same') / float(size**2) def uniform(A, size): return ndimage.uniform_filter(A, size, mode='constant') 

All 3 methods return exactly the same value. However, note that uniform_filter has the parameter mode='constant' , which indicates the boundary conditions of the filter, and constant == 0 is the same boundary condition according to which the Fourier domain is used (in the other two methods). For different use cases, you can change the boundary conditions.

Now some test matrices:

 A = np.random.randn(1000, 1000) 

And some timings:

 %timeit conv2(A, 3) # 33.8 ms per loop %timeit fftconv(A, 3) # 84.1 ms per loop %timeit uniform(A, 3) # 17.1 ms per loop %timeit conv2(A, 5) # 68.7 ms per loop %timeit fftconv(A, 5) # 92.8 ms per loop %timeit uniform(A, 5) # 17.1 ms per loop %timeit conv2(A, 10) # 210 ms per loop %timeit fftconv(A, 10) # 86 ms per loop %timeit uniform(A, 10) # 16.4 ms per loop %timeit conv2(A, 30) # 1.75 s per loop %timeit fftconv(A, 30) # 102 ms per loop %timeit uniform(A, 30) # 16.5 ms per loop 

In short, uniform_filter seems to be faster, and this is because the convolution is separable in two one-dimensional convolutions (similar to gaussian_filter , which is also separable).

Other inseparable filters with different cores are more likely to use the signal module (the one used by @Divakar) more quickly.

The speed of both fftconvolve and uniform_filter remains constant for different kernel sizes, and convolve2d gets a little slower.

+4
source

I recently had a similar problem and had to find another solution, since I cannot use scipy.

 import numpy as np a = np.random.randint(100, size=(7000,7000)) #Array of 7000 x 7000 row,col = a.shape column_totals = a.sum(axis=0) #Dump the sum of all columns into a single array new_array = np.zeros([row,col]) #Create an receiving array for i in range(row): #Resulting row = the value of all rows minus the orignal row, divided by the row number minus one. new_array[i] = (column_totals - a[i]) / (row - 1) print(new_array) 
0
source

All Articles