Exponential attenuation curve in numpy and scipy

I am having trouble fitting a curve for some data, but I cannot decide where I am going wrong.

In the past, I did this with numpy.linalg.lstsq for exponential functions and scipy.optimize.curve_fit for sigmoid functions. This time I wanted to create a script that would allow me to specify various functions, define the parameters, and check their compliance with the data. While doing this, I noticed that Scipy leastsq and Numpy lstsq seem to provide different answers for the same dataset and function. The function is simply y = e^(l*x) and is bounded in such a way that y=1 at x=0 .

The Excel trend line is consistent with the result of Numpy lstsq , but since Scipy leastsq can take any function, it would be nice to decide what the problem is.

 import scipy.optimize as optimize import numpy as np import matplotlib.pyplot as plt ## Sampled data x = np.array([0, 14, 37, 975, 2013, 2095, 2147]) y = np.array([1.0, 0.764317544, 0.647136491, 0.070803763, 0.003630962, 0.001485394, 0.000495131]) # function fp = lambda p, x: np.exp(p*x) # error function e = lambda p, x, y: (fp(p, x) - y) # using scipy least squares l1, s = optimize.leastsq(e, -0.004, args=(x,y)) print l1 # [-0.0132281] # using numpy least squares l2 = np.linalg.lstsq(np.vstack([x, np.zeros(len(x))]).T,np.log(y))[0][0] print l2 # -0.00313461628963 (same answer as Excel trend line) # smooth x for plotting x_ = np.arange(0, x[-1], 0.2) plt.figure() plt.plot(x, y, 'rx', x_, fp(l1, x_), 'b-', x_, fp(l2, x_), 'g-') plt.show() 

Edit - Additional Information

The above MWE contains a small sample of the data set. When selecting the actual data, the scipy.optimize.curve_fit curve represents R ^ 2 0.82, and the numpy.linalg.lstsq curve, which coincides with the one calculated by Excel, has R ^ 2 from 0.41.

+6
source share
2 answers

You minimize various error functions.

When you use numpy.linalg.lstsq , the minimized error function

 np.sum((np.log(y) - p * x)**2) 

while scipy.optimize.leastsq minimizes the function

 np.sum((y - np.exp(p * x))**2) 

The first case requires a linear relationship between dependent and independent variables, but the solution is known analytically, and the second can handle any dependence, but relies on an iterative method.

In a separate note, I cannot check it right now, but when using numpy.linalg.lstsq , I do not need vstack string of zeros, the following also works:

 l2 = np.linalg.lstsq(x[:, None], np.log(y))[0][0] 
+4
source

To set out a little on Jaime's point, any non-linear data transformation will lead to a different error function and therefore to different solutions. This will lead to different confidence intervals for the fit parameters. Thus, you have three possible criteria for making a decision: which error you want to minimize, which parameters you want more confidence, and finally, if you use a fitting to predict a certain value, this method gives fewer errors in an interesting predicted value . Playing a little analytically and in Excel, it assumes that different types of noise in the data (for example, if the noise function scales the amplitude, affects the time constant or is additive) leads to different solutions.

I will also add that, although this trick "works" for exponential decay to 0, it cannot be used in the more general (and general) case of decaying exponentials (rising or falling) to values ​​that cannot be accepted should be 0.

+1
source

All Articles