Its my first time using scikit to study metrics, and I want a graphical curve to use this library.
This ROC curve says AUC = 1.00, which, as I know, is incorrect. Here is the code:
from sklearn.metrics import roc_curve, auc
import pylab as pl
def show_roc(test_target, predicted_probs):
actual = [1, -1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1]
prediction_probas = [0.374, 0.145, 0.263, 0.129, 0.215, 0.538, 0.24, 0.183, 0.402, 0.2, 0.281,
0.277, 0.222, 0.204, 0.193, 0.171, 0.401, 0.204, 0.213, 0.182]
fpr, tpr, thresholds = roc_curve(actual, prediction_probas)
roc_auc = auc(fpr, tpr)
pl.clf()
pl.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
pl.plot([0, 1], [0, 1], 'k--')
pl.xlim([-0.1, 1.2])
pl.ylim([-0.1, 1.2])
pl.xlabel('False Positive Rate')
pl.ylabel('True Positive Rate')
pl.title('Receiver operating characteristic example')
pl.legend(loc="lower right")
pl.show()
for this first set, here is the graph:
http://i.stack.imgur.com/pa93c.png
The probabilities are very low, especially for positive ones, I don’t know why it displays the ideal ROC chart for these inputs.
actual = [1,1,1,0,0,0]
prediction_probas = [0.9,0.9,0.1,0.1,0.1,0.1]
fpr, tpr, thresholds = roc_curve(actual, prediction_probas)
roc_auc = auc(fpr, tpr)
pl.clf()
pl.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
pl.plot([0, 1], [0, 1], 'k--')
pl.xlim([-0.1, 1.2])
pl.ylim([-0.1, 1.2])
pl.xlabel('False Positive Rate')
pl.ylabel('True Positive Rate')
pl.title('Receiver operating characteristic example')
pl.legend(loc="lower right")
pl.show()
for the second set, here is the graph output:

This seems more reasonable, and I included it for comparison.
I read scikit learning documentation almost all day, and I'm at a dead end.