I am performing a ridge regression on several collinear data. One of the methods used to determine a stable fit is to trace the ridge, and thanks to the excellent scikit-learn example, I can do it. Another method is to calculate the dispersion inflation coefficients (VIF) for each variable with increasing k. When VIF decreases to <5, this is an indicator that meets the requirements. Statsmodels has code for VIF, but for OLS regression. I tried changing it to cope with crest regression.
I test my results against regression analysis using the example of the fifth edition, chapter 10. My code generates the correct results for k = 0.000, but not after that. SAS working code is available, but I am not a SAS user, and I do not know the differences between this implementation and scikit-learn (and / or statsmodels's).
I have been stuck with this for several days, so any help would be greatly appreciated.
#http://www.ats.ucla.edu/stat/sas/examples/chp/chp_ch10.htm from __future__ import division import numpy as np import pandas as pd example = pd.read_csv('by_example_import.csv') example.dropna(inplace=True) from sklearn import preprocessing scaler = preprocessing.StandardScaler().fit(example) scaler.transform(example) X = example.drop(['year', 'import'], axis=1)
python scikit-learn statsmodels
zerovector
source share