The directional derivative of a scalar function at a point, in the direction of a unit vector , is the rate of change of along that direction:

where is the angle between (gradient) and . The dot-product form is the workhorse for computation; the form is the geometric interpretation.

What it means

The partial derivative tells you how changes when you move in . The directional derivative generalizes this to any direction : it tells you how fast changes when you walk in the direction , per unit distance traveled.

For : . For : . For arbitrary :

The partial derivatives along the three coordinate axes are components of the gradient; the directional derivative in an arbitrary direction is the projection of the gradient onto that direction.

Why must be a unit vector

The formula assumes — “rate per unit distance.” If you use a non-unit vector in the same formula, the result would be scaled by . To express directional derivative for a non-unit :

Forgetting to normalize is one of the more common errors in introductory vector calculus.

Geometric interpretation: steepest ascent

Writing :

  • Maximized at (i.e., pointing along ): . This is the “steepest ascent” direction.
  • Zero at ( perpendicular to ): . Moving along a level surface (where is constant) means .
  • Minimized at ( antiparallel to ): . The “steepest descent” direction, foundational for Gradient descent.

The gradient is the steepest-ascent vector; its magnitude is the steepest slope; the directional derivative in any other direction is just the projection.

Worked example

Take .

Gradient:

At point :

Directional derivative in the direction of . First normalize: , so .

Interpretation: walking from in the direction at unit speed, increases at rate per unit distance.

The maximum possible rate at this point: , in the direction .

Connection to chain rule

If is a parametrized curve passing through a point at with , then

This is the chain rule. The directional derivative is the special case where is a unit vector — measuring rate of change “per unit arc length” along the curve, at the starting point. Otherwise the rate is scaled by the speed .

In machine learning

In gradient-based optimization, the “step direction” question is: in which should we update parameters to decrease the loss fastest? Answer: . The negative gradient direction has directional derivative , the most-negative possible value. See Gradient descent.

In adaptive methods (Adam, RMSprop, etc.), the effective at each step is a more complex function of past gradients and squared gradients, but the underlying concept — “move in the direction of greatest local decrease” — is the directional-derivative principle.

In electromagnetics

The directional derivative appears wherever you want “the rate of change of a field along a specific direction” — typically along a path, a boundary, or a streamline. The line integral can be viewed as integrating along the path (where is the unit tangent and ): the integral accumulates the rate of potential change along the path, giving the total potential difference.