Statsmodel.api.Logit: the valueerror array must not contain infs or nans

I am trying to apply logical regression in Python using statsmodel.api.Logit. I ran into a ValueError error : the array should not contain infs or NaN.

When I execute:

data['intercept'] = 1.0 train_cols = data.columns[1:] logit = sm.Logit(data['admit'], data[train_cols]) result = logit.fit(start_params=None, method='bfgs', maxiter=20, full_output=1, disp=1, callback=None) 

The data contains more than 15,000 columns and 2,000 rows. which data ['admit'] is the target value, and the data [train_cols] is a list of functions. Can someone please give me some tips to fix this problem?

+1
source share
1 answer

By default, Logit does not validate your data for unprocessed infinities ( np.inf ) or NaNs ( np.nan ). In pandas, the latter usually indicates a missing entry.

To ignore rows with missing data and continue, use missing='drop' as follows:

 sm.Logit(data['admit'], data[train_cols], missing='drop') 

Refer to Logit docs for other options.

If you do not expect your data to contain any missing entries or infinities, you may have downloaded them incorrectly. See data[data.isnull()] to find out where the problem is. (NB Read this to find out how to make the register inf as null.)

+2
source

All Articles