I am sure it is simple, but as a complete newbie in python, it is difficult for me to figure out how to iterate over the variables in the pandas dataframe and run the regression with each.
That's what I'm doing:
all_data = {} for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']: all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015') prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()}) returns = prices.pct_change()
I know that I can run a regression as follows:
regs = sm.OLS(returns.FIUIX,returns.FSTMX).fit()
but suppose I want to do this for each column in a dataframe. In particular, I want to regress FIUIX to FSTMX, and then FSAIX to FSTMX, and then FSAVX to FSTMX. After each regression, I want to keep the rest.
I have tried various versions of the following, but I should not get the syntax correctly:
resids = {} for k in returns.keys(): reg = sm.OLS(returns[k],returns.FSTMX).fit() resids[k] = reg.resid
I think the problem is that I donβt know how to access the return column by key, so returns[k] is probably incorrect.
Any advice on the best way to do this would be greatly appreciated. Perhaps there is a general pandas approach that I am missing.
python pandas statsmodels
itzy Jan 29 '15 at 15:42 2015-01-29 15:42
source share