Pandas Rollback OLS

When I run the old code, I get the following warning: "pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages such as statsmodels." I could not understand if the statistical model has a user-friendly OLS module. What was nice about the pandas.stats.ols module was that you can easily indicate whether it needs to be intercepted or not, the type of window (rolling, expanding) and the length of the window. Is there a module that does the same?

For example:

YY = DataFrame(np.log(np.linspace(1,10,10)),columns=['Y']) XX = DataFrame(np.transpose([np.linspace(1,10,10),np.linspace(‌​2,10,10)]),columns=[‌​'XX1','XX2']) from pandas.stats.ols import MovingOLS MovingOLS( y=YY['Y'], x=XX, intercept=True, window_type='rolling', window=5).resid 

I would like to get an example of how to get the result of the last line (residual moving ols) using statsmodel or any other module.

thanks

+7
python pandas
source share
1 answer

I created an ols module designed to simulate pandas' obsolete MovingOLS ; it is here .

It has three main classes:

  • ols : static (single window) normal least squares regression. The output presents NumPy arrays
  • RollingOLS : Cyclical (multi-window) RollingOLS Least Square Regression. The result is higher order NumPy arrays.
  • PandasRollingOLS : completes RollingOLS results in pandas rows and in DataFrames. Designed to simulate the appearance of the pandas module.

Please note that the module is part of the package (which I am currently loading in PyPi), and this requires a single import between packages.

The first two classes above are fully implemented in NumPy and primarily use matrix algebra. RollingOLS also widely used broadcasting. Attributes pretty much mimic OLS RegressionResultsWrapper statsmodels.

Example:

 # Pull some data from fred.stlouisfed.org from pandas_datareader.data import DataReader syms = {'TWEXBMTH' : 'usd', 'T10Y2YM' : 'term_spread', 'PCOPPUSDM' : 'copper' } data = (DataReader(syms.keys(), 'fred', start='2000-01-01') .pct_change() .dropna()) data = data.rename(columns=syms) print(data.head()) # usd term_spread copper # DATE # 2000-02-01 0.01260 -1.40909 -0.01997 # 2000-03-01 -0.00012 2.00000 -0.03720 # 2000-04-01 0.00564 0.51852 -0.03328 # 2000-05-01 0.02204 -0.09756 0.06135 # 2000-06-01 -0.01012 0.02703 -0.01850 # Rolling regressions from pyfinance.ols import OLS, RollingOLS, PandasRollingOLS y = data.usd x = data.drop('usd', axis=1) window = 12 # months model = PandasRollingOLS(y=y, x=x, window=window) # Here `.resids` will be a stacked, MultiIndex'd DataFrame. Each outer # index is a "period ending" and each inner index block are the # subperiods for that rolling window. print(model.resids) # end subperiod # 2001-01-01 2000-02-01 0.00834 # 2000-03-01 -0.00375 # 2000-04-01 0.00194 # 2000-05-01 0.01312 # 2000-06-01 -0.01460 # 2000-07-01 -0.00462 # 2000-08-01 -0.00032 # 2000-09-01 0.00299 # 2000-10-01 0.01103 # 2000-11-01 0.00556 # 2000-12-01 -0.01544 # 2001-01-01 -0.00425 # 2017-06-01 2016-07-01 0.01098 # 2016-08-01 -0.00725 # 2016-09-01 0.00447 # 2016-10-01 0.00422 # 2016-11-01 -0.00213 # 2016-12-01 0.00558 # 2017-01-01 0.00166 # 2017-02-01 -0.01554 # 2017-03-01 -0.00021 # 2017-04-01 0.00057 # 2017-05-01 0.00085 # 2017-06-01 -0.00320 # Name: resids, dtype: float64 
+1
source share

All Articles