Optimize the calculation of the "difference function"

My code calls many "difference functions" to compute the " Yin algorithm " (main frequency extractor).

The difference function (equation 6 in the article) is defined as:

enter image description here

And this is my implementation of the differences function:

def differenceFunction(x, W, tau_max): df = [0] * tau_max for tau in range(1, tau_max): for j in range(0, W - tau): tmp = long(x[j] - x[j + tau]) df[tau] += tmp * tmp return df 

For example:

 x = np.random.randint(0, high=32000, size=2048, dtype='int16') W = 2048 tau_max = 106 differenceFunction(x, W, tau_max) 

Is there a way to optimize this double-loop calculation (only with python, preferably without other libraries than numpy)?

EDIT: Changed code to avoid index error (j loop, @Elliot answer)

EDIT2: Changed code to use x [0] (j loop, comment by @hynekcer)

+7
optimization python loops
source share
6 answers

EDIT: improved speed to 220 μs - see edit at the end - direct version

The required calculation can be easily estimated using the Autocorrelation function or similarly convolution. The Wiener-Khinchin theorem allows us to calculate autocorrelation using two fast Fourier transforms (FFT) with time complexity O (n log n) . I use the fftconvolve accelerated convolution function from Scipy . The advantage is that it is easy to explain why this works. Everything is vectorized, there is no loop at the level of the Python interpreter.

 from scipy.signal import fftconvolve def difference_by_convol(x, W, tau_max): x = np.array(x, np.float64) w = x.size x_cumsum = np.concatenate((np.array([0.]), (x * x).cumsum())) conv = fftconvolve(x, x[::-1]) df = x_cumsum[w:0:-1] + x_cumsum[w] - x_cumsum[:w] - 2 * conv[w - 1:] return df[:tau_max + 1] 
  • Compared to the differenceFunction_1loop in Elliott Response : faster with FFT: 430 μs compared to the original 1170 μs. It starts faster around tau_max >= 40 . The numerical accuracy is great. The highest relative error is less than 1E-14 compared to the exact integer result. (Therefore, it could be easily rounded to the exact long integer solution.)
  • The tau_max parameter tau_max not important for the algorithm. He finally limits the conclusion. The zero element with index 0 is added to the output, since indexes must begin with 0 in Python.
  • The W parameter is not important to Python. Size is best to be introspective.
  • First, the data is converted to np.float64 to prevent repeated conversions. This is half a percent faster. Any type smaller than np.int64 would be unacceptable due to overflow.
  • The required difference function is the double energy minus the autocorrelation function. This can be estimated by convolution: correlate(x, x) = convolve(x, reversed(x) .
  • “Starting with Scipy v0.19, a regular convolve automatically selects this method or direct method based on evaluation, which is faster.” This heuristic is not suitable for this case, because the convolution evaluates much more tau than tau_max , and it should be outweighed much faster than the FFT than the direct method.
  • It can also be calculated by the ftp Numpy module without Scipy by rewriting the answer Calculate autocorrelation using FFT in Matlab in Python (below at the end). I think the solution above may be easier to understand.

Proof: (for Pythonistas :-)

The initial naive implementation can be written as:

 df = [sum((x[j] - x[j + t]) ** 2 for j in range(w - t)) for t in range(tau_max + 1)] 

where tau_max < w .

Print according to the rule (a - b)**2 == a**2 + b**2 - 2 * a * b

 df = [ sum(x[j] ** 2 for j in range(w - t)) + sum(x[j] ** 2 for j in range(t, w)) - 2 * sum(x[j] * x[j + t] for j in range(w - t)) for t in range(tau_max + 1)] 

Replace the first two elements with x_cumsum = [sum(x[j] ** 2 for j in range(i)) for i in range(w + 1)] , which can be easily calculated in linear time. Replace sum(x[j] * x[j + t] for j in range(w - t)) with conv = convolvefft(x, reversed(x), mode='full') , which has the size len(x) + len(x) - 1 .

 df = [x_cumsum[w - t] + x_cumsum[w] - x_cumsum[t] - 2 * convolve(x, x[::-1])[w - 1 + t] for t in range(tau_max + 1)] 

Vector Expression Optimization:

 df = x_cumsum[w:0:-1] + x_cumsum[w] - x_cumsum[:w] - 2 * conv[w - 1:] 

Each step can also be verified and compared using test data numerically.


EDIT: Implemented solution directly using Numpy FFT.

 def difference_fft(x, W, tau_max): x = np.array(x, np.float64) w = x.size tau_max = min(tau_max, w) x_cumsum = np.concatenate((np.array([0.]), (x * x).cumsum())) size = w + tau_max p2 = (size // 32).bit_length() nice_numbers = (16, 18, 20, 24, 25, 27, 30, 32) size_pad = min(x * 2 ** p2 for x in nice_numbers if x * 2 ** p2 >= size) fc = np.fft.rfft(x, size_pad) conv = np.fft.irfft(fc * fc.conjugate())[:tau_max] return x_cumsum[w:w - tau_max:-1] + x_cumsum[w] - x_cumsum[:tau_max] - 2 * conv 

This is more than twice as fast as my previous solution, because the convolution length is limited by the closest "nice" number with small coefficients after W + tau_max , it is not estimated as full 2 * W Also, there is no need to convert the same data twice, as it was in `fftconvolve (x, reverse (x)).

 In [211]: %timeit differenceFunction_1loop(x, W, tau_max) 1.1 ms ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [212]: %timeit difference_by_convol(x, W, tau_max) 431 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [213]: %timeit difference_fft(x, W, tau_max) 218 µs ± 685 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

The new solution is faster than Eliot difference_by_convol for tau_max> = 20. This ratio does not depend much on the size of the data due to a similar ratio of overhead costs.

+6
source share

First of all, you should consider the boundaries of the array. Your code, as originally written, will receive an IndexError . You can get significant acceleration by vectorizing the inner loop

 import numpy as np # original version def differenceFunction_2loop(x, W, tau_max): df = np.zeros(tau_max, np.long) for tau in range(1, tau_max): for j in range(0, W - tau): # -tau eliminates the IndexError tmp = np.long(x[j] -x[j + tau]) df[tau] += np.square(tmp) return df # vectorized inner loop def differenceFunction_1loop(x, W, tau_max): df = np.zeros(tau_max, np.long) for tau in range(1, tau_max): tmp = (x[:-tau]) - (x[tau:]).astype(np.long) df[tau] = np.dot(tmp, tmp) return df x = np.random.randint(0, high=32000, size=2048, dtype='int16') W = 2048 tau_max = 106 twoloop = differenceFunction_2loop(x, W, tau_max) oneloop = differenceFunction_1loop(x, W, tau_max) # confirm that the result comes out the same. print(np.all(twoloop == oneloop)) # True 

Now for some benchmarking. In ipython I get the following

 In [103]: %timeit twoloop = differenceFunction_2loop(x, W, tau_max) 1 loop, best of 3: 2.35 s per loop In [104]: %timeit oneloop = differenceFunction_1loop(x, W, tau_max) 100 loops, best of 3: 8.23 ms per loop 

So, about 300x acceleration.

+6
source share

Unlike the optimization algorithm, you can optimize the interpreter using numba.jit:

 import timeit import numpy as np from numba import jit def differenceFunction(x, W, tau_max): df = [0] * tau_max for tau in range(1, tau_max): for j in range(0, W - tau): tmp = int(x[j] - x[j + tau]) df[tau] += tmp * tmp return df @jit def differenceFunction2(x, W, tau_max): df = np.ndarray(shape=(tau_max,)) for tau in range(1, tau_max): for j in range(0, W - tau): tmp = int(x[j] - x[j + tau]) df[tau] += tmp * tmp return df x = np.random.randint(0, high=32000, size=2048, dtype='int16') W = 2048 tau_max = 106 differenceFunction(x, W, tau_max) print('old', timeit.timeit('differenceFunction(x, W, tau_max)', 'from __main__ import differenceFunction, x, W, tau_max', number=20) / 20) print('new', timeit.timeit('differenceFunction2(x, W, tau_max)', 'from __main__ import differenceFunction2, x, W, tau_max', number=20) / 20) 

Result:

 old 0.18265145074453273 new 0.016223197058214667 

You can combine algorithm optimization and numba.jit for the best result.

+1
source share

Here is another approach using list comprehension. It takes approximately less than ten seconds from the original function, but does not answer Elliot's answer . Just put it there anyway.

 import numpy as np import time # original version def differenceFunction_2loop(x, W, tau_max): df = np.zeros(tau_max, np.long) for tau in range(1, tau_max): for j in range(0, W - tau): # -tau eliminates the IndexError tmp = np.long(x[j] -x[j + tau]) df[tau] += np.square(tmp) return df # vectorized inner loop def differenceFunction_1loop(x, W, tau_max): df = np.zeros(tau_max, np.long) for tau in range(1, tau_max): tmp = (x[:-tau]) - (x[tau:]).astype(np.long) df[tau] = np.dot(tmp, tmp) return df # with list comprehension def differenceFunction_1loop_listcomp(x, W, tau_max): df = [sum(((x[:-tau]) - (x[tau:]).astype(np.long))**2) for tau in range(1, tau_max)] return [0] + df[:] x = np.random.randint(0, high=32000, size=2048, dtype='int16') W = 2048 tau_max = 106 s = time.clock() twoloop = differenceFunction_2loop(x, W, tau_max) print(time.clock() - s) s = time.clock() oneloop = differenceFunction_1loop(x, W, tau_max) print(time.clock() - s) s = time.clock() listcomprehension = differenceFunction_1loop_listcomp(x, W, tau_max) print(time.clock() - s) # confirm that the result comes out the same. print(np.all(twoloop == listcomprehension)) # True 

Results of work (approximately):

 differenceFunction_2loop() = 0.47s differenceFunction_1loop() = 0.003s differenceFunction_1loop_listcomp() = 0.033s 
+1
source share

I do not know how you can find an alternative to your problem with nested loops, but for arithmetic functions you can use the numpy library. It is faster than manual operations.

 import numpy as np tmp = np.subtract(long(x[j] ,x[j + tau]) 
0
source share

I would do something like this:

 >>> x = np.random.randint(0, high=32000, size=2048, dtype='int16') >>> tau_max = 106 >>> res = np.square((x[tau_max:] - x[:-tau_max])) 

However, I am convinced that this is not the fastest way to do this.

0
source share

All Articles