The directional derivative of a scalar function at a point, in the direction of a unit vector , is the rate of change of along that direction:
where is the angle between (gradient) and . Use the dot-product form to compute; the form gives the geometry.
What it means
The partial derivative tells you how changes when you move in . The directional derivative generalizes this to any direction : it tells you how fast changes when you walk in the direction , per unit distance traveled.
For : . For : . For arbitrary :
The partial derivatives along the three coordinate axes are components of the gradient; the directional derivative in an arbitrary direction is the projection of the gradient onto that direction.
Why must be a unit vector
The formula assumes , i.e. “rate per unit distance.” Plug a non-unit vector into the same formula and the result comes out scaled by . So for a non-unit :
Forgetting to normalize is an easy mistake here.
Geometric interpretation: steepest ascent
Writing :
- Maximized at ( pointing along ): . This is the steepest-ascent direction.
- Zero at ( perpendicular to ): . Moving along a level surface (where is constant) means .
- Minimized at ( antiparallel to ): . The steepest-descent direction, which is what Gradient descent runs on.
The gradient is the steepest-ascent vector; its magnitude is the steepest slope; the directional derivative in any other direction is just the projection.
Worked example
Take .
Gradient:
At point :
Directional derivative in the direction of . First normalize: , so .
Interpretation: walking from in the direction at unit speed, increases at rate per unit distance.
The maximum possible rate at this point: , in the direction .
Connection to chain rule
If is a parametrized curve passing through a point at with , then
This is the chain rule. The directional derivative is the special case where is a unit vector, measuring rate of change “per unit arc length” along the curve, at the starting point. Otherwise the rate is scaled by the speed .
In machine learning
In gradient-based optimization, the “step direction” question is: in which should we update parameters to decrease the loss fastest? Answer: . The negative gradient direction has directional derivative , the most-negative possible value. See Gradient descent.
In adaptive methods (Adam, RMSprop, etc.), the effective at each step is a messier function of past gradients and squared gradients, but it’s the same idea: move in the direction of greatest local decrease.
In electromagnetics
The directional derivative shows up wherever you want the rate of change of a field along a specific direction, usually along a path, a boundary, or a streamline. The line integral can be viewed as integrating along the path (where is the unit tangent and ): the integral accumulates the rate of potential change along the path, giving the total potential difference.