Speed ​​up convolution loop for numpy 3D array?

Convolving along the Z-vector of a 3d numpy array, then performing other operations on the results, but it is slow because it is currently implemented. Is a for loop slowing me down, or is it a convolution? I tried rebuilding the 1st vector and doing the convolution in 1 pass (as in Matlab), without a for loop, but this does not improve performance. My version of Matlab is about 50% faster than anything I can think of in Python. Corresponding code section:

convolved=np.zeros((y_lines,x_lines,z_depth)) for i in range(0, y_lines): for j in range(0, x_lines): convolved[i,j,:]= fftconvolve(data[i,j,:], Gauss) #80% of time here result[i,j,:]= other_calculations(convolved[i,j,:]) #20% of time here 

Is there a better way to do this than a for loop? Heard of Cython, but I have limited experience with Python at the moment, will aim for the simplest solution.

+7
python arrays numpy for-loop convolution
source share
2 answers

The fftconvolve function you are using is supposedly from SciPy . If so, keep in mind that it accepts N-dimensional arrays. Thus, a faster way to do your convolution would be to generate a 3d kernel that does nothing in the x and y dimensions and does a 1d Gaussian convolution in z.

Below are some code and time results. On my machine and with some toy data, this led to 10 × acceleration, as you can see:

 import numpy as np from scipy.signal import fftconvolve from scipy.ndimage.filters import gaussian_filter # use scipy filtering functions designed to apply kernels to isolate a 1d gaussian kernel kernel_base = np.ones(shape=(5)) kernel_1d = gaussian_filter(kernel_base, sigma=1, mode='constant') kernel_1d = kernel_1d / np.sum(kernel_1d) # make the 3d kernel that does gaussian convolution in z axis only kernel_3d = np.zeros(shape=(1, 1, 5,)) kernel_3d[0, 0, :] = kernel_1d # generate random data data = np.random.random(size=(50, 50, 50)) # define a function for loop based convolution for easy timeit invocation def convolve_with_loops(data): nx, ny, nz = data.shape convolved=np.zeros((nx, ny, nz)) for i in range(0, nx): for j in range(0, ny): convolved[i,j,:]= fftconvolve(data[i, j, :], kernel_1d, mode='same') return convolved # compute the convolution two diff. ways: with loops (first) or as a 3d convolution (2nd) convolved = convolve_with_loops(data) convolved_2 = fftconvolve(data, kernel_3d, mode='same') # raise an error unless the two computations return equivalent results assert np.all(np.isclose(convolved, convolved_2)) # time the two routes of the computation %timeit convolved = convolve_with_loops(data) %timeit convolved_2 = fftconvolve(data, kernel_3d, mode='same') 

timeit results:

 10 loops, best of 3: 198 ms per loop 100 loops, best of 3: 18.1 ms per loop 
+4
source share

I think you have already found the source code of the fftconvolve function. Typically, for real inputs, the numpy.fft.rfftn and .irfftn functions are used, which compute N-dimensional transformations. Since the goal is to perform several one-dimensional transformations, you can basically rewrite fftconvolve as follows (simplified):

 from scipy.signal.signaltools import _next_regular def fftconvolve_1d(in1, in2): outlen = in1.shape[-1] + in2.shape[-1] - 1 n = _next_regular(outlen) tr1 = np.fft.rfft(in1, n) tr2 = np.fft.rfft(in2, n) out = np.fft.irfft(tr1 * tr2, n) return out[..., :outlen].copy() 

And calculate the desired result:

 result = fftconvolve_1d(data, Gauss) 

This works because numpy.fft.rfft and .irfft (note the absence of n in the name) on one axis of the input array (the last axis by default). This is about 40% faster than the OP code on my system.


Further acceleration can be achieved using another FFT.

First, the functions from scipy.fftpack turn out to be somewhat faster than their Numpy equivalents. However, the output format for Scipy variants is rather inconvenient (see docs ), and this makes multiplication difficult.

Another possible backend is FFTW through the pyFFTW wrapper. The disadvantage is that the transformations are preceded by a slow “planning phase” and the inputs must be aligned by 16 bytes in order to achieve the best performance. This is very well explained in the pyFFTW tutorial . The resulting code can be, for example:

 from scipy.signal.signaltools import _next_regular import pyfftw pyfftw.interfaces.cache.enable() # Cache for the "planning" pyfftw.interfaces.cache.set_keepalive_time(1.0) def fftwconvolve_1d(in1, in2): outlen = in1.shape[-1] + in2.shape[-1] - 1 n = _next_regular(outlen) tr1 = pyfftw.interfaces.numpy_fft.rfft(in1, n) tr2 = pyfftw.interfaces.numpy_fft.rfft(in2, n) sh = np.broadcast(tr1, tr2).shape dt = np.common_type(tr1, tr2) pr = pyfftw.n_byte_align_empty(sh, 16, dt) np.multiply(tr1, tr2, out=pr) out = pyfftw.interfaces.numpy_fft.irfft(pr, n) return out[..., :outlen].copy() 

With aligned inputs and cached "scheduling," I saw an almost 3-fold acceleration compared to the code in OP. You can easily check the memory alignment by looking at the memory address in the ctypes.data attribute of the ctypes.data array.

+1
source share

All Articles