Polynomial regression

Polynomial regression is the natural extension of Linear regression to non-linear data. Instead of fitting a straight line $f (x) = w_{0} + w_{1} x$ , we fit a polynomial:

$f (x, w) = w_{0} + w_{1} x + w_{2} x^{2} + \dots + w_{m} x^{m} = \sum_{j = 0}^{m} w_{j} x^{j}$

where $m$ is the degree of the polynomial. With $m = 1$ we recover linear regression. $m = 2$ gives a quadratic, which fits parabolic curvature. $m = 3$ a cubic, which captures inflection points. As $m$ grows the shapes get more flexible.

The vector $w = (w_{0}, w_{1}, \dots, w_{m})$ now has $m + 1$ components. The training problem is the same as for linear regression: find values for the parameters that make $f$ a good fit to the data, using a Loss function (typically Mean squared error) and Gradient descent (or a closed-form solver).

The trick that makes polynomial regression work with linear-regression machinery: treat the powers of $x$ as new features. Define $z_{1} = x, z_{2} = x^{2}, z_{3} = x^{3}, \dots$ , and the polynomial in $x$ is exactly a linear function of the $z$ ‘s. Everything that works for linear regression (the closed-form solution, gradient descent, all of scikit-learn’s linear-regression tooling) works for polynomial regression on the expanded feature set.

The tradeoff: bias and variance

Higher degree fits more flexible shapes, but needs more parameters, more data to estimate them well, and risks overfitting. A degree-9 polynomial through 10 data points can pass through every point exactly (zero training loss) while wiggling wildly between them, which means poor generalization to new data. Lower degree is less flexible but holds up better against noise.

The right degree is a Hyperparameter, usually picked by K-fold cross-validation — try several degrees, see which generalizes best on held-out data.

A second pathology specific to high-degree polynomials is the Runge phenomenon: at the boundary of the training range, polynomial fits develop large oscillations that grow with the degree, even if the underlying function is smooth and well-behaved. The classic example is fitting $f (x) = 1/ (1 + 25 x^{2})$ on equally-spaced points in $[- 1, 1]$ , where the polynomial fit oscillates wildly near $\pm 1$ . So polynomial regression fits smooth curves well in the interior of the data, but extrapolation past the training range is unreliable, and very high degrees fit boundary regions worse than low degrees.

Implementation

In scikit-learn, polynomial regression is the composition of a polynomial feature expansion and a linear regression:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
 
model = make_pipeline(
    PolynomialFeatures(degree=3),
    LinearRegression()
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

PolynomialFeatures(degree=3) expands each input $x$ into $[1, x, x^{2}, x^{3}]$ (and includes interaction terms for multivariate inputs). The LinearRegression step then fits a linear model on the expanded features. There’s no separate PolynomialRegression class in scikit-learn because it’s just a pipeline of these two steps.

For more flexible non-linear models that don’t assume polynomial form, neural networks, decision trees, and kernel methods (kernel ridge regression, Gaussian processes) are the alternatives.

Idriss Rami — Notes

Explorer

Polynomial regression

The tradeoff: bias and variance

Implementation

Graph View

Table of Contents

Backlinks