Calculate sklearn.roc_auc_score for multiclass

I would like to calculate AUC, accuracy, accuracy for my classifier. I do supervised learning:

Here is my working code. This code works fine for a binary class, but not for multiple classes. Suppose you have a dataframe with binary classes:

sample_features_dataframe = self._get_sample_features_dataframe() labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe) labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe) k = 10 k_folds = StratifiedKFold(binary_class_series, k) for train_indexes, test_indexes in k_folds: train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()] test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()] train_class = binary_class_series[train_indexes] test_class = binary_class_series[test_indexes] selected_classifier = RandomForestClassifier(n_estimators=100) selected_classifier.fit(train_set_dataframe, train_class) predictions = selected_classifier.predict(test_set_dataframe) predictions_proba = selected_classifier.predict_proba(test_set_dataframe) roc += roc_auc_score(test_class, predictions_proba[:,1]) accuracy += accuracy_score(test_class, predictions) recall += recall_score(test_class, predictions) precision += precision_score(test_class, predictions) 

In the end, I divided the results in K, of course, into getting the average AUC, accuracy, etc. This code is working fine. However, I cannot calculate the same for several classes:

  train_class = multi_class_series[train_indexes] test_class = multi_class_series[test_indexes] selected_classifier = RandomForestClassifier(n_estimators=100) selected_classifier.fit(train_set_dataframe, train_class) predictions = selected_classifier.predict(test_set_dataframe) predictions_proba = selected_classifier.predict_proba(test_set_dataframe) 

I found that for several classes I should add a weighted parameter for the average.

  roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted") 

I got an error: raise ValueError (format "{0} is not supported" .format (y_type))

ValueError: multiclass format is not supported

+6
source share
2 answers

The average roc_auc_score defined only for multi-valued problems.

You can see the following example from the scikit-learn documentation to determine your own micro- or macro-averaged grades for multiclass problems:

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings

Edit: There is a problem with the scikit-learn tracker for implementing ROC AUC for multiclass problems: https://github.com/scikit-learn/scikit-learn/issues/3298

+7
source

You cannot use roc_auc as a single summary metric for multiclass models. If you want, you can calculate per-class roc_auc since

 roc = {label: [] for label in multi_class_series.unique()} for label in multi_class_series.unique(): selected_classifier.fit(train_set_dataframe, train_class == label) predictions_proba = selected_classifier.predict_proba(test_set_dataframe) roc[label] += roc_auc_score(test_class, predictions_proba[:,1]) 

However, in a more common way, use sklearn.metrics.confusion_matrix to evaluate the performance of a multiclass model.

+4
source

All Articles