Polynomial regression is the natural extension of Linear regression to non-linear data. Instead of fitting a straight line , we fit a polynomial:

where is the degree of the polynomial. With we recover linear regression. With we have a quadratic, which can fit parabolic curvature. With a cubic, which can capture inflection points. As grows, we can fit increasingly complex shapes.

The vector now has components. The training problem is the same as for linear regression: find values for the parameters that make a good fit to the data, using a Loss function (typically Mean squared error) and Gradient descent (or a closed-form solver).

The trick that makes polynomial regression work with linear-regression machinery: treat the powers of as new features. Define , and the polynomial in is exactly a linear function of the ‘s. Everything that works for linear regression — the closed-form solution, gradient descent, all of scikit-learn’s linear-regression tooling — works for polynomial regression on the expanded feature set.

The tradeoff: bias and variance

Higher degree fits more flexible shapes — but needs more parameters, more data to estimate them well, and risks overfitting. A degree-9 polynomial through 10 data points can pass through every point exactly (zero training loss) while wiggling wildly between them — meaning poor generalization to new data. Lower degree is less flexible but more robust to noise.

The right degree is a Hyperparameter, usually picked by K-fold cross-validation — try several degrees, see which generalizes best on held-out data.

A second pathology specific to high-degree polynomials is the Runge phenomenon: at the boundary of the training range, polynomial fits develop large oscillations that grow with the degree, even if the underlying function is smooth and well-behaved. The classic example is fitting on equally-spaced points in — the polynomial fit oscillates wildly near . The takeaway: polynomial regression fits smooth curves well in the interior of the data, but extrapolation past the training range is unreliable, and very high degrees fit boundary regions worse than low degrees.

Implementation

In scikit-learn, polynomial regression is the composition of a polynomial feature expansion and a linear regression:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
 
model = make_pipeline(
    PolynomialFeatures(degree=3),
    LinearRegression()
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

PolynomialFeatures(degree=3) expands each input into (and includes interaction terms for multivariate inputs). The subsequent LinearRegression then fits a linear model on the expanded features. This is the standard way to do polynomial regression in scikit-learn — there’s no separate PolynomialRegression class because it’s a pipeline of two simpler steps.

For more flexible non-linear models that don’t assume polynomial form, neural networks, decision trees, and kernel methods (kernel ridge regression, Gaussian processes) are the alternatives.