Compute logical regression in python

I tried to calculate the logical regression. I have data as a csv file. it looks like

node_id,second_major,gender,major_index,year,dorm,high_school,student_fac 0,0,2,257,2007,111,2849,1 1,0,2,271,2005,0,51195,2 2,0,2,269,2007,0,21462,1 3,269,1,245,2008,111,2597,1 .......................... 

This is my coding.

 import pandas as pd import statsmodels.api as sm import pylab as pl import numpy as np df = pd.read_csv("Reed98.csv") print df.describe() dummy_ranks = pd.get_dummies(df['second_major'], prefix='second_major') cols_to_keep = ['second_major', 'dorm', 'high_school'] data = df[cols_to_keep].join(dummy_ranks.ix[:, 'year':]) train_cols = data.columns[1:] # Index([gre, gpa, prestige_2, prestige_3, prestige_4], dtype=object) logit = sm.Logit(data['second_major'], data[train_cols]) result = logit.fit() print result.summary() 

When I run coding in python, I got an error:

 Traceback (most recent call last): File "D:\project\logisticregression.py", line 24, in <module> result = logit.fit() File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6- win32.egg\statsmodels\discrete\discrete_model.py", line 282, in fit disp=disp, callback=callback, **kwargs) File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6- win32.egg\statsmodels\discrete\discrete_model.py", line 233, in fit disp=disp, callback=callback, **kwargs) File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6- win32.egg\statsmodels\base\model.py", line 291, in fit hess=hess) File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6-win32.egg\statsmodels\base\model.py", line 341, in _fit_mle_newton newparams = oldparams - np.dot(np.linalg.inv(H), File "C:\Python26\Lib\site-packages\numpy\linalg\linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "C:\Python26\Lib\site-packages\numpy\linalg\linalg.py", line 328, in solve raise LinAlgError('Singular matrix') LinAlgError: Singular matrix 

How to rewrite the code?

+7
source share
1 answer

There is nothing wrong with the code. I assume that you do not have data in your data. Try dropna or use missing='drop' to log in. You can also check that the right side has the full rank np.linalg.matrix_rank(data[train_cols].values)

+9
source

All Articles