What is a threshold value in a Precision-Recall curve?

I know the concept of accuracy, as well as the concept of recall. But it’s very difficult for me to understand the idea of ​​a “threshold” that makes any PR curve possible.

Imagine that I have a model for construction that predicts the re-emergence (yes or no) of cancer in patients using some sort of decent classification algorithm according to relevant criteria. I shared my data for training and testing. Suppose I trained a model using train data and got accuracy and repetition metrics using test data.

But HOW can I draw a PR curve now? On what basis? I have only two values, one accuracy and one review. I read that this is a "Threshold" that allows you to get multiple pairs with repeat accuracy. But what is this threshold? I am still a beginner and I cannot understand the concept of the threshold itself.

I see many classification models of comparisons like the ones below. But how do they get a lot of couples?

Comparison of models using a precision response curve

+7
precision-recall machine-learning classification auc
source share
1 answer

First of all, you should remove the “roc” and “auc” tags, since the curve with the exact recall is something else:

ROC Curves:

  • x axis: false positive speed FPR = FP / (FP + TN) = FP / N
  • y-axis: True Positive Rate TPR = Recall = TP / (TP + FN) = TP / P

Precision curves:

  • x axis: Recall = TP / (TP + FN) = TP / P = TPR
  • y axis: Accuracy = TP / (TP + FP) = TP / PP

An example of cancer detection is a binary classification problem. Your predictions are based on probability. The likelihood of (in) the presence of cancer.

In general, an instance will be classified as A if P (A)> 0.5 (your threshold). For this value, you get your Recall-Precision pair based on True Positives, True Negatives, False Positives and False Negatives.

Now, when you change the threshold of 0.5, you get a different result (different pairs). You can already classify the patient as "cancer" for P (A)> 0.3. This will decrease accuracy and increase recall. You would tell someone that he has cancer, even if he did not, to make sure that patients with cancer will receive the necessary treatment. This is an intuitive compromise between TPR and FPR, as well as accuracy and recall, sensitivity and specificity.

Add these terms as you see them most often in biostatistics.

  • Sensitivity = TP / P = Recall = TPR
  • Specificity = TN / N = (1 - FPR)

ROC and Precision-Recall curves visualize all of these possible threshold values ​​for your classifier.

You should consider these indicators, unless accuracy is a suitable measure of quality. Classifying all patients as “cancer-free” will give you maximum accuracy, but your ROC and Precision-Recall curves will be 1 s and 0 s.

+6
source share

All Articles