Convert Pandas Dataframe to numpy for sklearn

I am new to python and sklearn. I have a pandas data frame of a titanic dataset. I want it to be used for sklearn logistics forecasting.

I tried the following

data_np = data.astype(np.int32).values

But does not work. I want to use various functions in a data set, such as "Pclass", "Age", "Sex", etc.

I want to convert all data, as well as individual columns that say data ["Age"], into sklearn numpy format. Any help.

+4
source share
3 answers

This is a common problem. The main reason is the lack of familiarity with numpy.

['Sex'] , .

from sklearn.preprocessing import LabelEncoder

enc = LabelEncoder()
label_encoder = enc.fit(p_train['Sex'])
print "Categorical classes:", label_encoder.classes_
integer_classes = label_encoder.transform(label_encoder.classes_)
print "Integer classes:", integer_classes
x_train = label_encoder.transform(p_train['Sex'])
x_test = label_encoder.transform(p_test['Sex'])

x_train = x_train[:,np.newaxis]
x_test = x_test[:,np.newaxis]

"" "" 0 1. , sclera , . Np.newaxis x_train (n_features,) (n_features, 1). .

+3

For processing numerical and non-digital data, use scikit-learn LabelEncoder , which allows you to

Encode labels with a value from 0 to n_classes-1.

See also:

fooobar.com/questions/1215771 / ...

+1
source

All Articles