I have a Pandas Dataframe with 2 categorical variables and an ID variable and a target variable (for classification). I was able to convert categorical values ββusing OneHotEncoder . This results in a sparse matrix.
ohe = OneHotEncoder() # First I remapped the string values in the categorical variables to integers as OneHotEncoder needs integers as input ... remapping code ... ohe.fit(df[['col_a', 'col_b']]) ohe.transform(df[['col_a', 'col_b']])
But I have no idea how I can use this sparse matrix in DecisionTreeClassifier? Especially when I want to add some other non-categorical variables to my DataFrame later. Thanks!
EDIT In response to miraculixx comment: I also tried DataFrameMapper in sklearn- pandas
mapper = DataFrameMapper([ ('id_col', None), ('target_col', None), (['col_a'], OneHotEncoder()), (['col_b'], OneHotEncoder()) ]) t = mapper.fit_transform(df)
But then I get this error:
TypeError: there is no supported conversion for types: (dtype ('O'), dtype ('int64'), dtype ('float64'), dtype ('float64')).
source share