I am performing logistic regression using pandas 0.11.0 (data processing) and statsmodels 0.4.3 for actual regression on Mac OSX Lion.
I am going to launch ~ 2900 different logistic regression models and I want the results to be output to a csv file and formatted in a certain way.
Currently, I only know about doing print result.summary() , which prints the results (as indicated below) in the shell:
Logit Regression Results ============================================================================== Dep. Variable: death_death No. Observations: 9752 Model: Logit Df Residuals: 9747 Method: MLE Df Model: 4 Date: Wed, 22 May 2013 Pseudo R-squ.: -0.02672 Time: 22:15:05 Log-Likelihood: -5806.9 converged: True LL-Null: -5655.8 LLR p-value: 1.000 =============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------- age_age5064 -0.1999 0.055 -3.619 0.000 -0.308 -0.092 age_age6574 -0.2553 0.053 -4.847 0.000 -0.359 -0.152 sex_female -0.2515 0.044 -5.765 0.000 -0.337 -0.166 stage_early -0.1838 0.041 -4.528 0.000 -0.263 -0.104 access -0.0102 0.001 -16.381 0.000 -0.011 -0.009 ===============================================================================
I will also need a chance factor, which is calculated by print np.exp(result.params) and printed in the shell as such:
age_age5064 0.818842 age_age6574 0.774648 sex_female 0.777667 stage_early 0.832098 access 0.989859 dtype: float64
I need each of them to be written to the csv file as a very long line (I'm not sure at the moment whether I need things like Log-Likelihood , but included them for the sake of thoroughness):
`Log-Likelihood, age_age5064_coef, age_age5064_std_err, age_age5064_z, age_age5064_p>|z|,...age_age6574_coef, age_age6574_std_err, ......access_coef, access_std_err, ....age_age5064_odds_ratio, age_age6574_odds_ratio, ...sex_female_odds_ratio,.....access_odds_ratio`
I think you get an image - a very long line with all these actual values ββand a heading with all the column designations in a similar format.
I am familiar with csv module in Python and am familiar with pandas . I'm not sure that this information can be formatted and saved in a pandas dataframe , and then written to_csv to a file after all logistic regression models totaling ~ 2900 are completed; it will certainly be good. In addition, recording them at the end of each model is also fine (using the csv module ).
UPDATE:
So, I looked more at the statsmodels website, in particular, trying to figure out how the model results are stored in classes. There seems to be a class called Results that will need to be used. I think that using the inheritance of this class to create another class, where some of the methods / operators can be changed, may be a way to get the formatting you need. I have very little experience with how to do this, and you will have to spend a lot of time understanding this (which is good). If someone can help / have more experience, that would be awesome!
Here is the site where the classes are laid out: statsmodels result class