I would like to calculate AUC, accuracy, accuracy for my classifier. I do supervised learning:
Here is my working code. This code works fine for a binary class, but not for multiple classes. Suppose you have a dataframe with binary classes:
sample_features_dataframe = self._get_sample_features_dataframe() labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe) labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe) k = 10 k_folds = StratifiedKFold(binary_class_series, k) for train_indexes, test_indexes in k_folds: train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()] test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()] train_class = binary_class_series[train_indexes] test_class = binary_class_series[test_indexes] selected_classifier = RandomForestClassifier(n_estimators=100) selected_classifier.fit(train_set_dataframe, train_class) predictions = selected_classifier.predict(test_set_dataframe) predictions_proba = selected_classifier.predict_proba(test_set_dataframe) roc += roc_auc_score(test_class, predictions_proba[:,1]) accuracy += accuracy_score(test_class, predictions) recall += recall_score(test_class, predictions) precision += precision_score(test_class, predictions)
In the end, I divided the results in K, of course, into getting the average AUC, accuracy, etc. This code is working fine. However, I cannot calculate the same for several classes:
train_class = multi_class_series[train_indexes] test_class = multi_class_series[test_indexes] selected_classifier = RandomForestClassifier(n_estimators=100) selected_classifier.fit(train_set_dataframe, train_class) predictions = selected_classifier.predict(test_set_dataframe) predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
I found that for several classes I should add a weighted parameter for the average.
roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")
I got an error: raise ValueError (format "{0} is not supported" .format (y_type))
ValueError: multiclass format is not supported