Is there a numpy autocorrelation function with a standardized output?

Question

Is there a numpy autocorrelation function with a standardized output?

I followed the advice of defining an autocorrelation function in another post:

def autocorr(x): result = np.correlate(x, x, mode = 'full') maxcorr = np.argmax(result) #print 'maximum = ', result[maxcorr] result = result / result[maxcorr] # <=== normalization return result[result.size/2:]

however, the maximum value was not "1.0". so I entered a line with the tag "<=== normalization"

I tried the function with the Time Series Analysis dataset (Box - Jenkins), chapter 2. I expected to get the result as shown. 2.7 in this book. However, I got the following:

enter image description here

Does anyone have an explanation for this strange not expected autocorrelation behavior?

Addition (2012-09-07):

I got into Python programming and did the following:

 from ClimateUtilities import * import phys # # the above imports are from RTPierrehumbert book "principles of planetary # climate" # and the homepage of that book at "cambridge University press" ... they mostly # define the # class "Curve()" used in the below section which is not necessary in order to solve # my # numpy-problem ... :) # import numpy as np; import scipy.spatial.distance; # functions to be defined ... : # # def autocorr(x): result = np.correlate(x, x, mode = 'full') maxcorr = np.argmax(result) # print 'maximum = ', result[maxcorr] result = result / result[maxcorr] # return result[result.size/2:] ## # second try ... "Box and Jenkins" chapter 2.1 Autocorrelation Properties # of stationary models ## # from table 2.1 I get: s1 = np.array([47,64,23,71,38,64,55,41,59,48,71,35,57,40,58,44,\ 80,55,37,74,51,57,50,60,45,57,50,45,25,59,50,71,56,74,50,58,45,\ 54,36,54,48,55,45,57,50,62,44,64,43,52,38,59,\ 55,41,53,49,34,35,54,45,68,38,50,\ 60,39,59,40,57,54,23],dtype=float); # alternatively in order to test: s2 = np.array([47,64,23,71,38,64,55,41,59,48,71]) ##################################################################################3 # according to BJ, ch.2 ###################################################################################3 print '*************************************************' global s1short, meanshort, stdShort, s1dev, s1shX, s1shXk s1short = s1 #s1short = s2 # for testing take s2 meanshort = s1short.mean() stdShort = s1short.std() s1dev = s1short - meanshort #print 's1short = \n', s1short, '\nmeanshort = ', meanshort, '\ns1deviation = \n',\ # s1dev, \ # '\nstdShort = ', stdShort s1sh_len = s1short.size s1shX = np.arange(1,s1sh_len + 1) #print 'Len = ', s1sh_len, '\nx-value = ', s1shX ########################################################## # c0 to be computed ... ########################################################## sumY = 0 kk = 1 for ii in s1shX: #print 'ii-1 = ',ii-1, if ii > s1sh_len: break sumY += s1dev[ii-1]*s1dev[ii-1] #print 'sumY = ',sumY, 's1dev**2 = ', s1dev[ii-1]*s1dev[ii-1] c0 = sumY / s1sh_len print 'c0 = ', c0 ########################################################## # now compute autocorrelation ########################################################## auCorr = [] s1shXk = s1shX lenS1 = s1sh_len nn = 1 # factor by which lenS1 should be divided in order # to reduce computation length ... 1, 2, 3, 4 # should not exceed 4 #print 's1shX = ',s1shX for kk in s1shXk: sumY = 0 for ii in s1shX: #print 'ii-1 = ',ii-1, ' kk = ', kk, 'kk+ii-1 = ', kk+ii-1 if ii >= s1sh_len or ii + kk - 1>=s1sh_len/nn: break sumY += s1dev[ii-1]*s1dev[ii+kk-1] #print sumY, s1dev[ii-1], '*', s1dev[ii+kk-1] auCorrElement = sumY / s1sh_len auCorrElement = auCorrElement / c0 #print 'sum = ', sumY, ' element = ', auCorrElement auCorr.append(auCorrElement) #print '', auCorr # #manipulate s1shX # s1shX = s1shXk[:lenS1-kk] #print 's1shX = ',s1shX #print 'AutoCorr = \n', auCorr ######################################################### # # first 15 of above Values are consistent with # Box-Jenkins "Time Series Analysis", p.34 Table 2.2 # ######################################################### s1sh_sdt = s1dev.std() # Standardabweichung short #print '\ns1sh_std = ', s1sh_sdt print '#########################################' # "Curve()" is a class from RTP ClimateUtilities.py c2 = Curve() s1shXfloat = np.ndarray(shape=(1,lenS1),dtype=float) s1shXfloat = s1shXk # to make floating point from integer # might be not necessary #print 'test plotting ... ', s1shXk, s1shXfloat c2.addCurve(s1shXfloat) c2.addCurve(auCorr, '', 'Autocorr') c2.PlotTitle = 'Autokorrelation' w2 = plot(c2) ########################################################## # # now try function "autocorr(arr)" and plot it # ########################################################## auCorr = autocorr(s1short) c3 = Curve() c3.addCurve( s1shXfloat ) c3.addCurve( auCorr, '', 'Autocorr' ) c3.PlotTitle = 'Autocorr with "autocorr"' w3 = plot(c3) # # well that should it be! #

+9

python numpy statistics correlation autocorrelation

kampmannpeine Sep 04

source share

2 answers

deltap · Answer 1 · 2013-05-13 06:07

So, your problem with your initial attempt is that you do not subtract the average from your signal. The following code should work:

 timeseries = (your data here) mean = np.mean(timeseries) timeseries -= np.mean(timeseries) autocorr_f = np.correlate(timeseries, timeseries, mode='full') temp = autocorr_f[autocorr_f.size/2:]/autocorr_f[autocorr_f.size/2] iact.append(sum(autocorr_f[autocorr_f.size/2:]/autocorr_f[autocorr_f.size/2]))

In my example, temp is the variable you are interested in; it is a direct integrated autocorrelation function. If you want the integrated autocorrelation time that interests you iact .

Phillip Cloud · Answer 2 · 2012-10-30 19:35

I am not sure what the problem is.

The autocorrelation of the vector x should be 1 with a lag of 0, since this is only the square of the norm L2 divided by itself, i.e. dot(x, x) / dot(x, x) == 1 .

In the general case, for any lags i, j in Z, where i != j self-oscillations with unit scaling dot(shift(x, i), shift(x, j)) / dot(x, x) , where shift(y, n) is a function that shifts the vector y by n time points, and Z is a set of integers since we are talking about implementation (theoretically, lags can be in a set of real numbers).

I get 1.0 as max with the following code (starting at the command prompt as $ ipython --pylab ), as expected:

 In[1]: n = 1000 In[2]: x = randn(n) In[3]: xc = correlate(x, x, mode='full') In[4]: xc /= xc[xc.argmax()] In[5]: xchalf = xc[xc.size / 2:] In[6]: xchalf_max = xchalf.max() In[7]: print xchalf_max Out[1]: 1.0

The only time when the autocorrelation of lag 0 is not equal to 1 is equal to x - zero signal (all zeros).

The answer to your question : no, there is no NumPy function that automatically performs standardization for you.

In addition, even if this happened, you would still have to check it for the expected result, and if you could say “Yes, it did the standardization correctly,” I would suggest that you know how to implement it.

I am going to assume that it may be that you did not correctly implement your algorithm, although I cannot be sure, since I am not familiar with it.

Is there a numpy autocorrelation function with a standardized output?

More articles: