First look at the difference between the forecast and the forecast_proba. The former predicts a class for a set of functions, where, since the latter predicts the probabilities of various classes.
You see the effect of rounding error, which is implied in the binary format y_test_predicted. y_test_predicted consists of 1 and 0, where p_pred consists of floating point values between 0 and 1. The roc_auc_score procedure changes the threshold value and generates true positive speed and false positive speed, so the estimate looks completely different.
Consider the case when:
y_test = [ 1, 0, 0, 1, 0, 1, 1] p_pred = [.6,.4,.6,.9,.2,.7,.4] y_test_predicted = [ 1, 0, 1, 1, 0, 1, 0]
Note that the ROC curve is generated taking into account all cutoff thresholds. Now consider the threshold of 0.65 ...
The p_pred case gives:
TPR=0.5, FPR=0,
and y_test_predicted case gives:
TPR=.75 FPR=.25.
You can probably see that if these two points are different, then the area under the two curves will also be completely different.
But to really understand this, I suggest taking a look at the ROC curves to help understand this difference.
Hope this helps!
AN6U5
source share