I am trying to do a simple linear regression on a pandas data frame using scikit learn linear regressor. My data is a time series, and the pandas data frame has a datetime index:
value 2007-01-01 0.771305 2007-02-01 0.256628 2008-01-01 0.670920 2008-02-01 0.098047
Doing something as simple as
from sklearn import linear_model lr = linear_model.LinearRegression() lr(data.index, data['value'])
does not work:
float() argument must be a string or a number
So, I tried to create a new date column to try to convert it:
data['date'] = data.index data['date'] = pd.to_datetime(data['date']) lr(data['date'], data['value'])
but now i get:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Therefore, regression cannot process the date and time. I saw many ways to convert integer data to datetime, but could not find a way to convert from datetime to integer, for example.
What is the right way to do this?
PS: I am interested in using scikit because I plan to do more things with it, so there are no statistical models yet.
python pandas
Ivan
source share