Scikit-learn: roc_auc_score

I am using the roc_auc_score function from scikit-learn to evaluate the performance of my model. Be that as it may, I get different values ​​regardless of whether I use prediction () or pred_proba ()

p_pred = forest.predict_proba(x_test) y_test_predicted= forest.predict(x_test) fpr, tpr, _ = roc_curve(y_test, p_pred[:, 1]) roc_auc = auc(fpr, tpr) roc_auc_score(y_test,y_test_predicted) # = 0.68 roc_auc_score(y_test, p_pred[:, 1]) # = 0.93 

Could you advise this?

Thanks in advance

+7
python scikit-learn machine-learning auc
source share
1 answer

First look at the difference between the forecast and the forecast_proba. The former predicts a class for a set of functions, where, since the latter predicts the probabilities of various classes.

You see the effect of rounding error, which is implied in the binary format y_test_predicted. y_test_predicted consists of 1 and 0, where p_pred consists of floating point values ​​between 0 and 1. The roc_auc_score procedure changes the threshold value and generates true positive speed and false positive speed, so the estimate looks completely different.

Consider the case when:

 y_test = [ 1, 0, 0, 1, 0, 1, 1] p_pred = [.6,.4,.6,.9,.2,.7,.4] y_test_predicted = [ 1, 0, 1, 1, 0, 1, 0] 

Note that the ROC curve is generated taking into account all cutoff thresholds. Now consider the threshold of 0.65 ...

The p_pred case gives:

 TPR=0.5, FPR=0, 

and y_test_predicted case gives:

 TPR=.75 FPR=.25. 

You can probably see that if these two points are different, then the area under the two curves will also be completely different.

But to really understand this, I suggest taking a look at the ROC curves to help understand this difference.

Hope this helps!

+6
source share

All Articles