EDIT: improved speed to 220 μs - see edit at the end - direct version
The required calculation can be easily estimated using the Autocorrelation function or similarly convolution. The Wiener-Khinchin theorem allows us to calculate autocorrelation using two fast Fourier transforms (FFT) with time complexity O (n log n) . I use the fftconvolve accelerated convolution function from Scipy . The advantage is that it is easy to explain why this works. Everything is vectorized, there is no loop at the level of the Python interpreter.
from scipy.signal import fftconvolve def difference_by_convol(x, W, tau_max): x = np.array(x, np.float64) w = x.size x_cumsum = np.concatenate((np.array([0.]), (x * x).cumsum())) conv = fftconvolve(x, x[::-1]) df = x_cumsum[w:0:-1] + x_cumsum[w] - x_cumsum[:w] - 2 * conv[w - 1:] return df[:tau_max + 1]
- Compared to the
differenceFunction_1loop in Elliott Response : faster with FFT: 430 μs compared to the original 1170 μs. It starts faster around tau_max >= 40 . The numerical accuracy is great. The highest relative error is less than 1E-14 compared to the exact integer result. (Therefore, it could be easily rounded to the exact long integer solution.) - The
tau_max parameter tau_max not important for the algorithm. He finally limits the conclusion. The zero element with index 0 is added to the output, since indexes must begin with 0 in Python. - The
W parameter is not important to Python. Size is best to be introspective. - First, the data is converted to np.float64 to prevent repeated conversions. This is half a percent faster. Any type smaller than np.int64 would be unacceptable due to overflow.
- The required difference function is the double energy minus the autocorrelation function. This can be estimated by convolution:
correlate(x, x) = convolve(x, reversed(x) . - “Starting with Scipy v0.19, a regular
convolve automatically selects this method or direct method based on evaluation, which is faster.” This heuristic is not suitable for this case, because the convolution evaluates much more tau than tau_max , and it should be outweighed much faster than the FFT than the direct method. - It can also be calculated by the ftp Numpy module without Scipy by rewriting the answer Calculate autocorrelation using FFT in Matlab in Python (below at the end). I think the solution above may be easier to understand.
Proof: (for Pythonistas :-)
The initial naive implementation can be written as:
df = [sum((x[j] - x[j + t]) ** 2 for j in range(w - t)) for t in range(tau_max + 1)]
where tau_max < w .
Print according to the rule (a - b)**2 == a**2 + b**2 - 2 * a * b
df = [ sum(x[j] ** 2 for j in range(w - t)) + sum(x[j] ** 2 for j in range(t, w)) - 2 * sum(x[j] * x[j + t] for j in range(w - t)) for t in range(tau_max + 1)]
Replace the first two elements with x_cumsum = [sum(x[j] ** 2 for j in range(i)) for i in range(w + 1)] , which can be easily calculated in linear time. Replace sum(x[j] * x[j + t] for j in range(w - t)) with conv = convolvefft(x, reversed(x), mode='full') , which has the size len(x) + len(x) - 1 .
df = [x_cumsum[w - t] + x_cumsum[w] - x_cumsum[t] - 2 * convolve(x, x[::-1])[w - 1 + t] for t in range(tau_max + 1)]
Vector Expression Optimization:
df = x_cumsum[w:0:-1] + x_cumsum[w] - x_cumsum[:w] - 2 * conv[w - 1:]
Each step can also be verified and compared using test data numerically.
EDIT: Implemented solution directly using Numpy FFT.
def difference_fft(x, W, tau_max): x = np.array(x, np.float64) w = x.size tau_max = min(tau_max, w) x_cumsum = np.concatenate((np.array([0.]), (x * x).cumsum())) size = w + tau_max p2 = (size // 32).bit_length() nice_numbers = (16, 18, 20, 24, 25, 27, 30, 32) size_pad = min(x * 2 ** p2 for x in nice_numbers if x * 2 ** p2 >= size) fc = np.fft.rfft(x, size_pad) conv = np.fft.irfft(fc * fc.conjugate())[:tau_max] return x_cumsum[w:w - tau_max:-1] + x_cumsum[w] - x_cumsum[:tau_max] - 2 * conv
This is more than twice as fast as my previous solution, because the convolution length is limited by the closest "nice" number with small coefficients after W + tau_max , it is not estimated as full 2 * W Also, there is no need to convert the same data twice, as it was in `fftconvolve (x, reverse (x)).
In [211]: %timeit differenceFunction_1loop(x, W, tau_max) 1.1 ms ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [212]: %timeit difference_by_convol(x, W, tau_max) 431 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [213]: %timeit difference_fft(x, W, tau_max) 218 µs ± 685 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The new solution is faster than Eliot difference_by_convol for tau_max> = 20. This ratio does not depend much on the size of the data due to a similar ratio of overhead costs.