Building a multi-regression model causes an error: `Pandas data other than a numpy dtype object. Check the input with np.asarray (data) .`

Question

Building a multi-regression model causes an error: `Pandas data other than a numpy dtype object. Check the input with np.asarray (data) .`

I have a pandas dataframe with some categorical predictors (i.e. variables) like 0 and 1 and some numerical variables. When I adapt it to stasmodel like:

est = sm.OLS(y, X).fit()

He throws out:

 Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

I converted all DataFrame data types using df.convert_objects(convert_numeric=True)

After that, all dtypes of the dataframe variables are displayed as int32 or int64. But in the end, it still shows a dtype: object , like this:

 4516 int32 4523 int32 4525 int32 4531 int32 4533 int32 4542 int32 4562 int32 sex int64 race int64 dispstd int64 age_days int64 dtype: object

Here 4516, 4523 are variable labels.

Any idea? I need to build a multi-regression model for more than a hundred variables. To do this, I combined 3 pandas DataFrames to come up with the final DataFrame for use in building the model.

+6

python numpy pandas statsmodels

Sanoj Nov 20 '15 at 18:42

source share

2 answers

Daniel Gibson · Answer 1 · 2016-02-17T17:43:32+0000

If X is your data framework, try using the .astype method to convert to float when starting the model:

 est = sm.OLS(y, X.astype(float)).fit()

kratant adhaulia · Answer 2 · 2016-07-08T04:56:10+0000

if both y (dependent) and X are taken from a data frame, then enter cast: -

 est = sm.OLS(y.astype(float), X.astype(float)).fit()

Building a multi-regression model causes an error: `Pandas data other than a numpy dtype object. Check the input with np.asarray (data) .`

More articles: