Calculate sklearn.roc_auc_score for multiclass

Question

Calculate sklearn.roc_auc_score for multiclass

I would like to calculate AUC, accuracy, accuracy for my classifier. I do supervised learning:

Here is my working code. This code works fine for a binary class, but not for multiple classes. Suppose you have a dataframe with binary classes:

sample_features_dataframe = self._get_sample_features_dataframe() labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe) labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe) k = 10 k_folds = StratifiedKFold(binary_class_series, k) for train_indexes, test_indexes in k_folds: train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()] test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()] train_class = binary_class_series[train_indexes] test_class = binary_class_series[test_indexes] selected_classifier = RandomForestClassifier(n_estimators=100) selected_classifier.fit(train_set_dataframe, train_class) predictions = selected_classifier.predict(test_set_dataframe) predictions_proba = selected_classifier.predict_proba(test_set_dataframe) roc += roc_auc_score(test_class, predictions_proba[:,1]) accuracy += accuracy_score(test_class, predictions) recall += recall_score(test_class, predictions) precision += precision_score(test_class, predictions)

In the end, I divided the results in K, of course, into getting the average AUC, accuracy, etc. This code is working fine. However, I cannot calculate the same for several classes:

  train_class = multi_class_series[train_indexes] test_class = multi_class_series[test_indexes] selected_classifier = RandomForestClassifier(n_estimators=100) selected_classifier.fit(train_set_dataframe, train_class) predictions = selected_classifier.predict(test_set_dataframe) predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

I found that for several classes I should add a weighted parameter for the average.

  roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")

I got an error: raise ValueError (format "{0} is not supported" .format (y_type))

ValueError: multiclass format is not supported

+6

python scikit-learn supervised-learning

Aviade 25 sept. '16 at 10:17

source share

2 answers

ogrisel · Answer 1 · 2016-09-26T13:13:06+0000

The average roc_auc_score defined only for multi-valued problems.

You can see the following example from the scikit-learn documentation to determine your own micro- or macro-averaged grades for multiclass problems:

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings

Edit: There is a problem with the scikit-learn tracker for implementing ROC AUC for multiclass problems: https://github.com/scikit-learn/scikit-learn/issues/3298

maxymoo · Answer 2 · 2016-09-26T00:00:46+0000

You cannot use roc_auc as a single summary metric for multiclass models. If you want, you can calculate per-class roc_auc since

 roc = {label: [] for label in multi_class_series.unique()} for label in multi_class_series.unique(): selected_classifier.fit(train_set_dataframe, train_class == label) predictions_proba = selected_classifier.predict_proba(test_set_dataframe) roc[label] += roc_auc_score(test_class, predictions_proba[:,1])

However, in a more common way, use sklearn.metrics.confusion_matrix to evaluate the performance of a multiclass model.

Calculate sklearn.roc_auc_score for multiclass

More articles: