Decision threshold

The decision threshold in classification with a probabilistic model is the cutoff at which a predicted probability is converted to a hard class prediction. A Logistic regression classifier outputs a probability $\overset{p}{^} (y = 1 ∣ x)$ , a real number between 0 and 1, and the threshold says predict 1 if $\overset{p}{^} \geq threshold$ , else 0. The default threshold is 0.5.

But the threshold is a knob we can turn. Nothing forces 0.5.

Lower thresholds make the model more eager to predict positive. More examples cross 0.3 than cross 0.5, so more get predicted positive, increasing both true positives (good) and false positives (bad). Recall goes up; specificity goes down.
Higher thresholds make the model more conservative. Only the most confident predictions cross 0.7, so fewer positives predicted, but the ones we do predict are more likely to be correct. Precision goes up; recall goes down.

Sweeping the threshold

As the threshold sweeps from 0 to 1, the model’s TPR and FPR change:

At threshold 0: every example is predicted positive. TPR $= 1$ , FPR $= 1$ .
At threshold 1: no example is predicted positive. TPR $= 0$ , FPR $= 0$ .

In between, the model traces out a curve in (FPR, TPR) space. That curve is the ROC curve, and its summary AUC is the standard threshold-independent classifier metric.

Picking the operating threshold

The right threshold depends on the costs of each kind of mistake:

For a cancer screen, missing a positive (FN) is catastrophic. Use a low threshold: high TPR, accept high FPR.
For a spam filter, flagging legitimate email (FP) is harmful. Use a high threshold: high precision, accept lower recall.
For a balanced task with equal costs, the default 0.5 is often fine.

Some applications use threshold tuning as a hyperparameter, picked by validation-set performance on the cost function relevant to the application. Others pick a threshold to hit a target metric, e.g. recall at least 0.95, then maximize precision subject to that.

A classifier reports a single hard-prediction performance number at one chosen threshold (typically 0.5). The ROC curve and AUC give the full picture across all thresholds, which is why they’re often more informative than accuracy at a fixed threshold.

In scikit-learn

classifier.predict(X) uses the default threshold (0.5 for Logistic regression). For a custom threshold, get the probabilities and threshold them manually:

y_prob = clf.predict_proba(X_test)[:, 1]   # probability of class 1
y_pred = (y_prob > 0.3).astype(int)        # custom threshold = 0.3

Idriss Rami — Notes

Explorer

Decision threshold

Sweeping the threshold

Picking the operating threshold

In scikit-learn

Graph View

Table of Contents

Backlinks