Linear regression for python timer (numpy or pandas)

I am new to python and programming in general, so goodbye to any simple errors / things that should be obvious.

What I'm trying to do is pretty simple, I just want to set a linear trend (1st polynomial) to a bunch of time series to see if slopes are positive or negative. Now I'm just trying to get it to work once.

Problem: it seems that both pandas and numpy cannot do regressions for datetimes. My time is not regular (usually 1 day per month, but not on the same day), so you cannot use the sentence proposed in Linear regression from the Pandas time series

My csv time series looks like this:

StationName, year, month, day, depth, NO3-N, PO4-P, TotP, TotN, Kvarnbacken (Savaran), 2003, 2, 25, 0.5, 46, 9, 14, 451 Kvarnbacken (Savaran), 2003, 3, 18, 0.5, 64, 15, 17, 310 Kvarnbacken (Savaran), 2003, 3, 31, 0.5, 76, 7, 19, 566 

I still have

 import datetime as dt from scipy import stats import numpy as np # read in station csv file data = pd.read_csv('Kvarnbacken (Savaran)_2003.csv') data.head() # set up dates to something python can recognize data['date'] = pd.to_datetime(data.year*10000+data.month * 100+data.day, format='%Y%m%d') 

I tried

 slope, intercept, r_value, p_value, std_err = stats.linregress(data.date, data.TotP) 

and got a TypeError error: ufunc add cannot use operands with dtype types ('

I also tried

 coefP = np.polyfit(data.date, data.TotP, 1) polyP = np.poly1d(coefP) ys = polyP(data.date) print 'For P: coef, poly' print coefP print polyP 

and got the same error.

I guess the easiest way is to do something when I just count the days from the first measurement, and then just do a regression from days_since to the total phosphorus concentration (totP), but I'm not sure the easiest way to do this, or if there is another trick.

+7
python numpy pandas statsmodels
source share
1 answer

You can convert date-time to days as follows.

 data['days_since'] = (data.date - pd.to_datetime('2003-02-25') ).astype('timedelta64[D]') date days_since 0 2003-02-25 0 1 2003-03-18 21 2 2003-03-31 34 

You should now be able to regress, as you did above.

 slope, intercept, r_value, p_value, std_err = stats.linregress(data.days_since, data.TotP) slope, intercept (0.1466591166477916, 13.977916194790488) 

You can also consider other regression options, such as statsmodels , especially if you will do such a thing very often. (Note that x and y are reversed compared to linregress)

 import statsmodels.formula.api as smf smf.ols( 'TotP ~ days_since', data=data ).fit().params Intercept 13.977916 days_since 0.146659 

This is just a piece of statsmodels btw data (use summary() instead of params to get extra output.

+8
source share

All Articles