I am working on a logistic regression model, and I have problems understanding how to get the model that matches my training set to my test suite. Sorry, I'm new to python and VERY new for statsmodels ..
import pandas as pd import statsmodels.api as sm from sklearn import cross_validation independent_vars = phy_train.columns[3:] X_train, X_test, y_train, y_test = cross_validation.train_test_split(phy_train[independent_vars], phy_train['target'], test_size=0.3, random_state=0) X_train = pd.DataFrame(X_train) X_train.columns = independent_vars X_test = pd.DataFrame(X_test) X_test.columns = independent_vars y_train = pd.DataFrame(y_train) y_train.columns = ['target'] y_test = pd.DataFrame(y_test) y_test.columns = ['target'] logit = sm.Logit(y_train,X_train[subset],missing='drop') result = logit.fit() print result.summary() y_pred = logit.predict(X_test[subset])
From the last line, I get this error:
y_pred = logit.predict (X_test [subset]) Traceback (last last call): File ", line 1, to File" C: \ Users \ eMachine \ WinPython-64bit-2.7.5.3 \ python-2.7.5.amd64 \ lib \ site-packages \ statsmodels \ discrete \ discrete_model.py ", line 378, in the prediction return self.cdf (np.dot (exog, params)) ValueError: matrices are not aligned
My dataset for training and testing has the same number of variables, so I'm sure I don't understand what logit.predict () does.
source share