Sklearn LabelBinarizer returns a vector when there are 2 classes

The following code:

from sklearn.preprocessing import LabelBinarizer lb = LabelBinarizer() lb.fit_transform(['yes', 'no', 'no', 'yes']) 

returns:

 array([[1], [0], [0], [1]]) 

However, I would like there to be one column for each class:

 array([[1, 0], [0, 1], [0, 1], [1, 0]]) 

(I need data in this format, so I can transfer it to a neural network that uses the softmax function at the output level)

If there are more than 2 classes, LabelBinarizer behaves as desired:

 from sklearn.preprocessing import LabelBinarizer lb = LabelBinarizer() lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe']) 

returns

 array([[0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]) 

Above: 1 class for each class.

Is there an easy way to achieve the same (1 column per class) when there are 2 classes?

Edit: based on yangjie's answer, I wrote a class to wrap LabelBinarizer to create the desired behavior described above: http://pastebin.com/UEL2dP62

 import numpy as np from sklearn.preprocessing import LabelBinarizer class LabelBinarizer2: def __init__(self): self.lb = LabelBinarizer() def fit(self, X): # Convert X to array X = np.array(X) # Fit X using the LabelBinarizer object self.lb.fit(X) # Save the classes self.classes_ = self.lb.classes_ def fit_transform(self, X): # Convert X to array X = np.array(X) # Fit + transform X using the LabelBinarizer object Xlb = self.lb.fit_transform(X) # Save the classes self.classes_ = self.lb.classes_ if len(self.classes_) == 2: Xlb = np.hstack((Xlb, 1 - Xlb)) return Xlb def transform(self, X): # Convert X to array X = np.array(X) # Transform X using the LabelBinarizer object Xlb = self.lb.transform(X) if len(self.classes_) == 2: Xlb = np.hstack((Xlb, 1 - Xlb)) return Xlb def inverse_transform(self, Xlb): # Convert Xlb to array Xlb = np.array(Xlb) if len(self.classes_) == 2: X = self.lb.inverse_transform(Xlb[:, 0]) else: X = self.lb.inverse_transform(Xlb) return X 

Edit 2: It turns out that Yangjie also wrote a new version of LabelBinarizer, amazing!

+7
python scikit-learn machine-learning
source share
2 answers

I think there is no direct way to do this, especially if you want to have inverse_transform .

But you can use numpy for easy tag assembly

 In [18]: import numpy as np In [19]: from sklearn.preprocessing import LabelBinarizer In [20]: lb = LabelBinarizer() In [21]: label = lb.fit_transform(['yes', 'no', 'no', 'yes']) In [22]: label = np.hstack((label, 1 - label)) In [23]: label Out[23]: array([[1, 0], [0, 1], [0, 1], [1, 0]]) 

Then you can use inverse_transform by cutting off the first column

 In [24]: lb.inverse_transform(label[:, 0]) Out[24]: array(['yes', 'no', 'no', 'yes'], dtype='<U3') 

Based on the above solution, you can write a class that inherits LabelBinarizer , which makes the operations and results consistent for both the binary and the multiclass case.

 from sklearn.preprocessing import LabelBinarizer import numpy as np class MyLabelBinarizer(LabelBinarizer): def transform(self, y): Y = super().transform(y) if self.y_type_ == 'binary': return np.hstack((Y, 1-Y)) else: return Y def inverse_transform(self, Y, threshold=None): if self.y_type_ == 'binary': return super().inverse_transform(Y[:, 0], threshold) else: return super().inverse_transform(Y, threshold) 

Then

 lb = MyLabelBinarizer() label1 = lb.fit_transform(['yes', 'no', 'no', 'yes']) print(label1) print(lb.inverse_transform(label1)) label2 = lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe']) print(label2) print(lb.inverse_transform(label2)) 

gives

 [[1 0] [0 1] [0 1] [1 0]] ['yes' 'no' 'no' 'yes'] [[0 0 1] [0 1 0] [0 1 0] [0 0 1] [1 0 0]] ['yes' 'no' 'no' 'yes' 'maybe'] 
+13
source share

it should do it

 labels = ['yes', 'no', 'no', 'yes'] np.array([[1,0] if l=='yes' else [0,1] for l in labels]) 
0
source share

All Articles