How to binar values in pandas DataFrame?

Question

How to binar values in pandas DataFrame?

I have the following DataFrame:

df = pd.DataFrame(['Male','Female', 'Female', 'Unknown', 'Male'], columns = ['Gender'])

I want to convert this to a DataFrame with the “Male”, “Female” and “Unknown” columns, the values 0 and 1 indicated Gender.

 Gender Male Female Male 1 0 Female 0 1 . . . .

To do this, I wrote a function and called the function using a map.

 def isValue(x , value): if(x == value): return 1 else: return 0 for value in df['Gender'].unique(): df[str(value)] = df['Gender'].map( lambda x: isValue(str(x) , str(value)))

Which works great. But is there a better way to do this? Is there a built-in function in any sklearn package that I can use?

+5

python pandas scikit-learn dataframe

Rakesh adhikesavan Aug 1 '16 at 17:15

source share

2 answers

My preference is pd.get_dummies() . Yes, there is a sklearn method.

From Documents:

 >>> from sklearn.preprocessing import OneHotEncoder >>> enc = OneHotEncoder() >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) OneHotEncoder(categorical_features='all', dtype=<... 'float'>, handle_unknown='error', n_values='auto', sparse=True) >>> enc.n_values_ array([2, 3, 4]) >>> enc.feature_indices_ array([0, 2, 5, 9]) >>> enc.transform([[0, 1, 1]]).toarray() array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

+2

Merlin Aug 1 '16 at 18:43

source share

piRSquared · Accepted Answer · 2016-08-01T17:21:08+0000

Yes, there is a better way to do this. It was called pd.get_dummies

 pd.get_dummies(df)

To reproduce what you have:

 order = ['Gender', 'Male', 'Female', 'Unknown'] pd.concat([df, pd.get_dummies(df, '', '').astype(int)], axis=1)[order]

How to binar values ​​in pandas DataFrame?

More articles:

How to binar values in pandas DataFrame?