Extrapolating data with numpy / python

Say I have a simple data set. Perhaps in dictionary form this would look like this:

{1:5, 2:10, 3:15, 4:20, 5:25}

(order is always increasing). What I want to do is logically figure out what the next data point will be. In the case of, for example, this would be {6: 30}

What would be the best way to do this?

+7
python numpy scipy
source share
5 answers

After discussing with you in a Python chat and fitting your data to exponential. This should give a relatively good indicator, since you are not looking for long-term extrapolation.

 import numpy as np from scipy.optimize import curve_fit import matplotlib.pyplot as plt def exponential_fit(x, a, b, c): return a*np.exp(-b*x) + c if __name__ == "__main__": x = np.array([0, 1, 2, 3, 4, 5]) y = np.array([30, 50, 80, 160, 300, 580]) fitting_parameters, covariance = curve_fit(exponential_fit, x, y) a, b, c = fitting_parameters next_x = 6 next_y = exponential_fit(next_x, a, b, c) plt.plot(y) plt.plot(np.append(y, next_y), 'ro') plt.show() 

The red dot on the right right axis shows the next "predicted" point.

+3
source share

You can also use numpy polyfit :

 data = np.array([[1,5], [2,10], [3,15], [4,20], [5,25]]) fit = np.polyfit(data[:,0], data[:,1] ,1) #The use of 1 signifies a linear fit. fit [ 5.00000000e+00 1.58882186e-15] #y = 5x + 0 line = np.poly1d(fit) new_points = np.arange(5)+6 new_points [ 6, 7, 8, 9, 10] line(new_points) [ 30. 35. 40. 45. 50.] 

This allows you to easily change the degree of the polynomial, since the polyfit function takes the following arguments np.polyfit(x data, y data, degree) . A linear binding is displayed in which the returned array looks like fit[0]*x^n + fit[1]*x^(n-1) + ... + fit[n-1]*x^0 for any degree n . The poly1d function allows poly1d to turn this array into a function that returns the value of the polynomial for any given value of x .

In general, extrapolation without a well-understood model will result in sporadic results.


Exponential curve setting .

 from scipy.optimize import curve_fit def func(x, a, b, c): return a * np.exp(-b * x) + c x = np.linspace(0,4,5) y = func(x, 2.5, 1.3, 0.5) yn = y + 0.2*np.random.normal(size=len(x)) fit ,cov = curve_fit(func, x, yn) fit [ 2.67217435 1.21470107 0.52942728] #Variables y [ 3. 1.18132948 0.68568395 0.55060478 0.51379141] #Original data func(x,*fit) [ 3.20160163 1.32252521 0.76481773 0.59929086 0.5501627 ] #Fit to original + noise 
+6
source share

As indicated in this answer to the corresponding question, starting with version 0.17.0 scipy, there is an option in scipy.interpolate.interp1d that allows linear extrapolation. In your case, you can do:

 >>> import numpy as np >>> from scipy import interpolate >>> x = [1, 2, 3, 4, 5] >>> y = [5, 10, 15, 20, 25] >>> f = interpolate.interp1d(x, y, fill_value = "extrapolate") >>> print(f(6)) 30.0 
+4
source share

Since your data is approximately linear, you can do a linear regression and then use the results of this regression to calculate the next point using y = w[0]*x + w[1] (keeping the notation from the linked example for y = mx + b) .

If your data is not approximately linear, and you do not have any other theoretical form for regression, then general extrapolations (using polynomials or splines) are much less reliable, since they can go a little crazy outside the known data points, for example, see The accepted answer is here .

+1
source share

Using scipy.interpolate.splrep :

 >>> from scipy.interpolate import splrep, splev >>> d = {1:5, 2:10, 3:15, 4:20, 5:25} >>> x, y = zip(*d.items()) >>> spl = splrep(x, y, k=1, s=0) >>> splev(6, spl) array(30.0) >>> splev(7, spl) array(35.0) >>> int(splev(7, spl)) 35 >>> splev(10000000000, spl) array(50000000000.0) >>> int(splev(10000000000, spl)) 50000000000L 

See How to get scipy.interpolate to provide an extrapolated result outside the input range?

0
source share

All Articles