I think you have already found the source code of the fftconvolve function. Typically, for real inputs, the numpy.fft.rfftn and .irfftn functions are used, which compute N-dimensional transformations. Since the goal is to perform several one-dimensional transformations, you can basically rewrite fftconvolve as follows (simplified):
from scipy.signal.signaltools import _next_regular def fftconvolve_1d(in1, in2): outlen = in1.shape[-1] + in2.shape[-1] - 1 n = _next_regular(outlen) tr1 = np.fft.rfft(in1, n) tr2 = np.fft.rfft(in2, n) out = np.fft.irfft(tr1 * tr2, n) return out[..., :outlen].copy()
And calculate the desired result:
result = fftconvolve_1d(data, Gauss)
This works because numpy.fft.rfft and .irfft (note the absence of n in the name) on one axis of the input array (the last axis by default). This is about 40% faster than the OP code on my system.
Further acceleration can be achieved using another FFT.
First, the functions from scipy.fftpack turn out to be somewhat faster than their Numpy equivalents. However, the output format for Scipy variants is rather inconvenient (see docs ), and this makes multiplication difficult.
Another possible backend is FFTW through the pyFFTW wrapper. The disadvantage is that the transformations are preceded by a slow “planning phase” and the inputs must be aligned by 16 bytes in order to achieve the best performance. This is very well explained in the pyFFTW tutorial . The resulting code can be, for example:
from scipy.signal.signaltools import _next_regular import pyfftw pyfftw.interfaces.cache.enable()
With aligned inputs and cached "scheduling," I saw an almost 3-fold acceleration compared to the code in OP. You can easily check the memory alignment by looking at the memory address in the ctypes.data attribute of the ctypes.data array.
user2379410
source share