False negative

A false negative (FN) is a test example whose true class is positive but which the classifier predicted as negative. The model missed a positive: said the high-quality wine was low-quality, said the spam email was legitimate, said the cancer patient was healthy. In statistical hypothesis testing this kind of error is called a Type II error.

False negatives are one of the two off-diagonal entries of the Confusion matrix, alongside False positive (FP). They’re the mistakes the classifier makes by being too conservative about calling something positive.

The metric most directly affected is recall (also called sensitivity, true positive rate):

$recall = \frac{TP}{TP + FN}$

Of all the actual positives, what fraction the classifier caught. A high FN count drags recall down. F1 score = $2 TP / (2 TP + FP + FN)$ is also affected.

When false negatives matter most

The cost of a false negative depends on the application:

In cancer screening, a false negative is a sick patient told they’re healthy. They go home and don’t get treated, the disease progresses. Catastrophic. Cancer-screening tools are designed with high recall (low FN) at the cost of accepting more false positives.
In security alarms, a false negative is an actual intrusion not detected. The cost depends on what’s being protected.
In diagnostic tests for rare but serious diseases, the same logic: better to flag a few healthy patients (FP, leading to follow-up tests) than miss any sick ones (FN, leading to untreated illness).
In spam filtering, a false negative is a spam message delivered to the inbox, annoying but tolerable. Spam-filter designers care less about FNs than about FPs (lost legitimate email).

The tradeoff with false positives

False negatives and false positives are in inherent tension. Lowering a classifier’s threshold catches more positives (more TP, fewer FN) but produces more false positives. Raising the threshold does the reverse: fewer false alarms (fewer FP) but more missed positives (more FN).

The ROC curve shows this tradeoff across all thresholds. Different applications pick different operating points: spam filters operate at high precision (few FPs); cancer screens operate at high recall (few FNs). The right tradeoff depends on the relative costs of each kind of mistake.

Idriss Rami — Notes

Explorer

False negative

When false negatives matter most

The tradeoff with false positives

Graph View

Table of Contents

Backlinks