LabelEncoder defines classes in a DataFrame

Question

LabelEncoder defines classes in a DataFrame

Im applying LabelEncoder to pandas DataFrame, df

 Feat1 Feat2 Feat3 Feat4 Feat5 AAAAE BBCCE CDCCE DACDE

I am applying a label encoder to a data file like this -

 from sklearn import preprocessing le = preprocessing.LabelEncoder() intIndexed = df.apply(le.fit_transform)

Here's how the labels appear

 A = 0 B = 1 C = 2 D = 3 E = 0

I assume that E not set to 4 because it does not appear in any other column except Feat 5 .

I need E to assign a value of 4 - but I don’t know how to do this in a DataFrame.

+6

python pandas scikit-learn machine-learning

gbhrea Aug 11 '16 at 10:09

source share

2 answers

You can fit and convert to a single operator. Please find the code to encode a single column and assign a data frame.

 df[columnName] = LabelEncoder().fit_transform(df[columnName])

+3

Anvesh_vs Jul 22 '17 at 17:34

source share

Nickil maveli · Accepted Answer · 2016-08-11T10:39:45+0000

You can fit encode the label, and then transform labels to their normalized encoding as follows:

 In [4]: from sklearn import preprocessing ...: import numpy as np In [5]: le = preprocessing.LabelEncoder() In [6]: le.fit(np.unique(df.values)) Out[6]: LabelEncoder() In [7]: list(le.classes_) Out[7]: ['A', 'B', 'C', 'D', 'E'] In [8]: df.apply(le.transform) Out[8]: Feat1 Feat2 Feat3 Feat4 Feat5 0 0 0 0 0 4 1 1 1 2 2 4 2 2 3 2 2 4 3 3 0 2 3 4

One way to specify default labels:

 In [9]: labels = ['A', 'B', 'C', 'D', 'E'] In [10]: enc = le.fit(labels) In [11]: enc.classes_ # sorts the labels in alphabetical order Out[11]: array(['A', 'B', 'C', 'D', 'E'], dtype='<U1') In [12]: enc.transform('E') Out[12]: 4

LabelEncoder defines classes in a DataFrame

More articles: