Logistic regression

Logistic regression is a binary classification model that applies the Sigmoid function to a linear combination of inputs to produce a probability of class 1. Despite the name, it’s a classifier, not a regressor — the regression refers to the linear function inside, but the output is a discrete class.

The model

Define a linear function of the inputs:

$z = w_{0} + w_{1} x_{1} + w_{2} x_{2} + \dots + w_{m} x_{m}$

This is the same expression used in Linear regression. It can take any real value. Wrap it in the sigmoid:

$\overset{p}{^} (y = 1 ∣ x) = \frac{1}{1 + e ^{- z}}$

The notation $\overset{p}{^} (y = 1 ∣ x)$ reads as the predicted probability that the class is 1, given the input $x$ . The hat means predicted by the model; the bar means given that. The output is bounded between 0 and 1 by the sigmoid, and makes sense as a probability.

To turn this into a hard classification, threshold at 0.5:

$\overset{y}{^} = {10 if \overset{p}{^} (y = 1 ∣ x) \geq 0.5 otherwise$

The decision boundary is the surface where $\overset{p}{^} = 0.5$ , equivalently where $z = 0$ . For two input features, this is a straight line in feature space; for three, a plane; in general, a hyperplane.

Training

Logistic regression is trained with Gradient descent using Binary cross-entropy as the Loss function:

$J (w) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} lo g f (x_{i}) + (1 - y_{i}) lo g (1 - f (x_{i}))]$

The $1/ N$ averages over the dataset; some textbooks drop it and use the un-normalized sum, which doesn’t change the minimizer. The cross-entropy loss penalizes confident mistakes much more harshly than uncertain ones. We can’t use Mean squared error here — MSE combined with the sigmoid produces a non-convex loss surface, so gradient descent has weaker guarantees and can stall in flat regions. Cross-entropy with sigmoid gives a clean convex bowl that gradient descent reliably finds the bottom of.

The training loop is the standard one:

Initialize parameters $w$ .
Compute predictions $f (x_{i})$ for every training example.
Compute the loss.
Compute the gradient $\nabla J (w)$ .
Update: $w \leftarrow w - η \nabla J (w)$ .
Repeat until convergence.

In scikit-learn

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
 
clf = make_pipeline(StandardScaler(), LogisticRegression(max_iter=10000))
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_prob = clf.predict_proba(X_test)

predict() returns the hard class label (0 or 1). predict_proba() returns an $N \times 2$ array where the first column is the probability of class 0 and the second is the probability of class 1. Each row sums to 1. The second column is what we use to compute the ROC curve and AUC.

max_iter=10000 controls the maximum number of optimizer iterations. The default 100 is sometimes too few for the optimizer to converge on real datasets; 10000 is a generous upper bound.

The make_pipeline(StandardScaler(), LogisticRegression(...)) pattern is the standard way to avoid Data leakage — the StandardScaler is fit only on training data and the same fitted scaling is applied to test data.

Limitations

Logistic regression is a linear classifier — its decision boundary is a hyperplane. It can’t capture non-linear class boundaries directly. The standard workarounds are:

Feature engineering: hand-craft non-linear features (squared, interaction, log-transformed) so the boundary in the engineered feature space is linear.
Kernel methods: implicitly map inputs to a higher-dimensional space where the boundary is linear.
Neural networks: stack many logistic regressions, learning a non-linear decision boundary end-to-end.

For the canonical wine-quality and heart-disease examples in the Introduction to Data Science textbook, logistic regression gets to roughly 73-80% accuracy with AUC around 0.80 — respectable, well above random, but limited by the linearity assumption.

Idriss Rami — Notes

Explorer

Logistic regression

The model

Training

In scikit-learn

Limitations

Graph View

Table of Contents

Backlinks