The directional derivative of a scalar function at a point, in the direction of a unit vector , is the rate of change of along that direction:

where is the angle between (gradient) and . Use the dot-product form to compute; the form gives the geometry.

What it means

The partial derivative tells you how changes when you move in . The directional derivative generalizes this to any direction : it tells you how fast changes when you walk in the direction , per unit distance traveled.

For : . For : . For arbitrary :

The partial derivatives along the three coordinate axes are components of the gradient; the directional derivative in an arbitrary direction is the projection of the gradient onto that direction.

Why must be a unit vector

The formula assumes , i.e. “rate per unit distance.” Plug a non-unit vector into the same formula and the result comes out scaled by . So for a non-unit :

Forgetting to normalize is an easy mistake here.

Geometric interpretation: steepest ascent

Writing :

  • Maximized at ( pointing along ): . This is the steepest-ascent direction.
  • Zero at ( perpendicular to ): . Moving along a level surface (where is constant) means .
  • Minimized at ( antiparallel to ): . The steepest-descent direction, which is what Gradient descent runs on.

The gradient is the steepest-ascent vector; its magnitude is the steepest slope; the directional derivative in any other direction is just the projection.

Worked example

Take .

Gradient:

At point :

Directional derivative in the direction of . First normalize: , so .

Interpretation: walking from in the direction at unit speed, increases at rate per unit distance.

The maximum possible rate at this point: , in the direction .

Connection to chain rule

If is a parametrized curve passing through a point at with , then

This is the chain rule. The directional derivative is the special case where is a unit vector, measuring rate of change “per unit arc length” along the curve, at the starting point. Otherwise the rate is scaled by the speed .

In machine learning

In gradient-based optimization, the “step direction” question is: in which should we update parameters to decrease the loss fastest? Answer: . The negative gradient direction has directional derivative , the most-negative possible value. See Gradient descent.

In adaptive methods (Adam, RMSprop, etc.), the effective at each step is a messier function of past gradients and squared gradients, but it’s the same idea: move in the direction of greatest local decrease.

In electromagnetics

The directional derivative shows up wherever you want the rate of change of a field along a specific direction, usually along a path, a boundary, or a streamline. The line integral can be viewed as integrating along the path (where is the unit tangent and ): the integral accumulates the rate of potential change along the path, giving the total potential difference.