Recall measures: of all the actual positives in the data, what fraction did the model correctly identify as positive?
The denominator is all the positives — both those the model caught (TP) and those it missed (FN).
The metric goes by three names that mean exactly the same thing:
- Recall is the standard term in information retrieval and machine learning.
- Sensitivity is the standard term in medicine and biostatistics.
- True positive rate (TPR) is the standard term in signal detection and on the y-axis of ROC curves.
When recall matters most
High recall means the model rarely misses a positive. The applications that care most about recall are those where missing a positive is catastrophic:
- Medical screening, where missing a sick patient (FN) is much worse than wrongly flagging a healthy one (FP, leading to a follow-up test).
- Security and fraud detection, where missing a real intrusion is much worse than a false alarm.
- Safety-critical alerts, where any missed event has serious consequences.
In these settings, classifiers are tuned for high recall, often at the cost of accepting more false positives. The threshold of a probabilistic classifier (typically 0.5 by default in Logistic regression) can be lowered to catch more positives — TPR goes up, but False positive rate (FPR) also goes up.
Relationship to other metrics
Recall and precision are easy to confuse but are different. Both have in the numerator, but they divide by different things:
- Recall divides by all actual positives.
- Precision divides by all predicted positives.
A model that always predicts positive has perfect recall (no positives are missed) but terrible precision (most predictions are wrong). A model that predicts positive rarely but only when very confident has high precision but low recall. They’re in tension. F1 score is the standard way to combine them into a single number.
The complementary metric on the negative side is specificity — of actual negatives, what fraction did the model correctly identify? Together, recall and specificity describe how the classifier behaves on the two classes separately.
In scikit-learn
from sklearn.metrics import recall_score
recall = recall_score(y_test, y_pred)For multi-class classification, recall is computed per class and averaged. The average= parameter chooses how — 'macro' for an unweighted mean, 'weighted' for weighting by class support.