I start casting, for example, from 100 OLS regression estimation windows of the dataset found in this link ( https://drive.google.com/drive/folders/0B2Iv8dfU4fTUMVFyYTEtWXlzYkk ), as in the following format.
time XY 0.000543 0 10 0.000575 0 10 0.041324 1 10 0.041331 2 10 0.041336 3 10 0.04134 4 10 ... 9.987735 55 239 9.987739 56 239 9.987744 57 239 9.987749 58 239 9.987938 59 239
The third column (Y) in my dataset is my true value - this is what I wanted to predict (evaluate). I want to make a prediction of Y (i.e., Predict the current value of Y according to the previous 3 values ββof the X ride. For this, I have the following python script work using statsmodels .
# /usr/bin/python -tt import pandas as pd import numpy as np import statsmodels.api as sm df=pd.read_csv('estimated_pred.csv') df=df.dropna()
Which gives me sample output in the following format.
time XY a b1 b2 predicted 0 0.000543 0 10 None None None NaN 1 0.000575 0 10 None None None NaN 2 0.041324 1 10 None None None NaN 3 0.041331 2 10 None None None NaN 4 0.041336 3 10 None None None NaN .. ... .. .. ... ... ... ... 50 0.041340 4 10 10 0 1.55431e-15 NaN 51 0.041345 5 10 10 1.7053e-13 7.77156e-16 10 52 0.041350 6 10 10 1.74623e-09 -7.99361e-15 10 53 0.041354 7 10 10 6.98492e-10 -6.21725e-15 10 .. ... .. .. ... ... ... ... 509 0.160835 38 20 20 4.88944e-09 -1.15463e-14 20 510 0.160839 39 20 20 1.86265e-09 5.32907e-15 20 .. ... .. .. ... ... ... ...
Finally, I want to include the root mean square error ( MSE ) for the entire forecast ( OLS regression analysis summary). For example, if we look at line 5, the value of X is 2, and the value of Y is 10. Let's say the prediction value Y in the current line is 6, and so the MSE will be (10-6)^2 . sm.OLS returns an instance of this class <class 'statsmodels.regression.linear_model.OLS'> when we do print (RollOLS.summary()) .
OLS Regression Results ============================================================================== Dep. Variable: Y R-squared: -inf Model: OLS Adj. R-squared: -inf Method: Least Squares F-statistic: -48.50 Date: Tue, 04 Jul 2017 Prob (F-statistic): 1.00 Time: 22:19:18 Log-Likelihood: 2359.7 No. Observations: 100 AIC: -4713. Df Residuals: 97 BIC: -4706. Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 239.0000 2.58e-09 9.26e+10 0.000 239.000 239.000 time 4.547e-13 2.58e-10 0.002 0.999 -5.12e-10 5.13e-10 X -3.886e-16 1.1e-13 -0.004 0.997 -2.19e-13 2.19e-13 ============================================================================== Omnibus: 44.322 Durbin-Watson: 0.000 Prob(Omnibus): 0.000 Jarque-Bera (JB): 86.471 Skew: -1.886 Prob(JB): 1.67e-19 Kurtosis: 5.556 Cond. No. 9.72e+04 ==============================================================================
But the value of rsquared ( print (RollOLS.rsquared)) , for example, should be between 0 and 1 instead of -inf , and this seems to be a problem with missing intercepts . If we want to print MSE , we do print (RollOLS.mse_model) ... etc. in accordance with the documentation . How to add intercepts and print regression statistics with the correct values, as for the predicted values? What am I doing wrong here? Or is there another way to do this using scikit-learn libraries?
Desta Haileselassie Hagos Jul 05 '17 at 10:34 on 2017-07-05 10:34
source share