Building a multi-regression model causes an error: `Pandas data other than a numpy dtype object. Check the input with np.asarray (data) .`

I have a pandas dataframe with some categorical predictors (i.e. variables) like 0 and 1 and some numerical variables. When I adapt it to stasmodel like:

est = sm.OLS(y, X).fit() 

He throws out:

 Pandas data cast to numpy dtype of object. Check input data with np.asarray(data). 

I converted all DataFrame data types using df.convert_objects(convert_numeric=True)

After that, all dtypes of the dataframe variables are displayed as int32 or int64. But in the end, it still shows a dtype: object , like this:

 4516 int32 4523 int32 4525 int32 4531 int32 4533 int32 4542 int32 4562 int32 sex int64 race int64 dispstd int64 age_days int64 dtype: object 

Here 4516, 4523 are variable labels.

Any idea? I need to build a multi-regression model for more than a hundred variables. To do this, I combined 3 pandas DataFrames to come up with the final DataFrame for use in building the model.

+6
source share
2 answers

If X is your data framework, try using the .astype method to convert to float when starting the model:

 est = sm.OLS(y, X.astype(float)).fit() 
+6
source

if both y (dependent) and X are taken from a data frame, then enter cast: -

 est = sm.OLS(y.astype(float), X.astype(float)).fit() 
+4
source

All Articles