Scikit-learn: roc_auc_score

Question

Scikit-learn: roc_auc_score

I am using the roc_auc_score function from scikit-learn to evaluate the performance of my model. Be that as it may, I get different values regardless of whether I use prediction () or pred_proba ()

p_pred = forest.predict_proba(x_test) y_test_predicted= forest.predict(x_test) fpr, tpr, _ = roc_curve(y_test, p_pred[:, 1]) roc_auc = auc(fpr, tpr) roc_auc_score(y_test,y_test_predicted) # = 0.68 roc_auc_score(y_test, p_pred[:, 1]) # = 0.93

Could you advise this?

Thanks in advance

+7

python scikit-learn machine-learning auc

user4640449 Jun 03 '15 at 14:53

source share

1 answer

AN6U5 · Answer 1 · 2015-06-04T01:55:20+0000

First look at the difference between the forecast and the forecast_proba. The former predicts a class for a set of functions, where, since the latter predicts the probabilities of various classes.

You see the effect of rounding error, which is implied in the binary format y_test_predicted. y_test_predicted consists of 1 and 0, where p_pred consists of floating point values between 0 and 1. The roc_auc_score procedure changes the threshold value and generates true positive speed and false positive speed, so the estimate looks completely different.

Consider the case when:

 y_test = [ 1, 0, 0, 1, 0, 1, 1] p_pred = [.6,.4,.6,.9,.2,.7,.4] y_test_predicted = [ 1, 0, 1, 1, 0, 1, 0]

Note that the ROC curve is generated taking into account all cutoff thresholds. Now consider the threshold of 0.65 ...

The p_pred case gives:

 TPR=0.5, FPR=0,

and y_test_predicted case gives:

 TPR=.75 FPR=.25.

You can probably see that if these two points are different, then the area under the two curves will also be completely different.

But to really understand this, I suggest taking a look at the ROC curves to help understand this difference.

Hope this helps!

Scikit-learn: roc_auc_score

More articles: