Convert Pandas Dataframe to numpy for sklearn

Question

Convert Pandas Dataframe to numpy for sklearn

I am new to python and sklearn. I have a pandas data frame of a titanic dataset. I want it to be used for sklearn logistics forecasting.

I tried the following

data_np = data.astype(np.int32).values

But does not work. I want to use various functions in a data set, such as "Pclass", "Age", "Sex", etc.

I want to convert all data, as well as individual columns that say data ["Age"], into sklearn numpy format. Any help.

+4

python numpy pandas scikit-learn

Seja nair Apr 08 '15 at 10:28

source share

3 answers

, "" "", , LogisticRegression. pandas get_dummies(data['Sex']).

, :

http://nbviewer.ipython.org/github/ogrisel/parallel_ml_tutorial/blob/master/rendered_notebooks/04%20-%20Pandas%20and%20Heterogeneous%20Data%20Modeling.ipynb

+3

ogrisel 08 . '15 13:25

For processing numerical and non-digital data, use scikit-learn LabelEncoder , which allows you to

Encode labels with a value from 0 to n_classes-1.

See also:

fooobar.com/questions/1215771 / ...

+1

AGS Apr 08 '15 at 12:57

source share

user3116355 · Accepted Answer · 2015-04-09T01:24:46+0000

This is a common problem. The main reason is the lack of familiarity with numpy.

['Sex'] , .

from sklearn.preprocessing import LabelEncoder

enc = LabelEncoder()
label_encoder = enc.fit(p_train['Sex'])
print "Categorical classes:", label_encoder.classes_
integer_classes = label_encoder.transform(label_encoder.classes_)
print "Integer classes:", integer_classes
x_train = label_encoder.transform(p_train['Sex'])
x_test = label_encoder.transform(p_test['Sex'])

x_train = x_train[:,np.newaxis]
x_test = x_test[:,np.newaxis]

"" "" 0 1. , sclera , . Np.newaxis x_train (n_features,) (n_features, 1). .

Convert Pandas Dataframe to numpy for sklearn

More articles: