Precision measures: of all the predictions the model made as positive, what fraction were actually positive?
The denominator is all predicted positives — both correct (TP) and incorrect (FP). High precision means that when the model says positive, it’s usually right.
In information retrieval, where a search result is treated as a positive prediction, precision corresponds to what fraction of the returned results are actually relevant. A search engine with high precision shows you mostly results you wanted; one with low precision shows you a lot of noise.
In other classification contexts: a spam filter with high precision rarely flags legitimate email as spam. A fraud detector with high precision rarely blocks legitimate transactions. A cancer screening tool with high precision rarely raises false alarms.
Precision vs. recall
Precision and recall both have in the numerator, but their denominators differ:
- Precision divides by all predicted positives.
- Recall divides by all actual positives.
A model that always predicts positive has perfect recall (no positives missed) but terrible precision (most predictions are wrong). A model that predicts positive rarely but only when very confident has high precision but low recall. They’re in tension.
Different applications weight them differently:
- A spam filter cares deeply about precision — we don’t want legitimate email going to spam — and somewhat less about recall (a few spam messages slipping through is annoying but tolerable).
- A cancer-screening tool cares deeply about recall — a missed diagnosis is catastrophic — and accepts a moderate false-positive rate (a follow-up test is inconvenient but not dangerous).
The metric that matters depends on the relative costs of each kind of mistake.
F1 combines them
F1 score is the harmonic mean of precision and recall:
Both factors must be high for the result to be high. F1 is the standard one-number summary when we don’t have a strong reason to prefer precision or recall.
In scikit-learn
from sklearn.metrics import precision_score
prec = precision_score(y_test, y_pred)For multi-class classification, precision is computed per class with averaging controlled by average=.