Directional derivative

The directional derivative $D_{\hat{u}} f$ of a scalar function $f (x, y, z)$ at a point, in the direction of a unit vector $\hat{u}$ , is the rate of change of $f$ along that direction:

$D_{\hat{u}} f = \nabla f \cdot \hat{u} = ∣\nabla f ∣ cos θ,$

where $θ$ is the angle between $\nabla f$ (gradient) and $\hat{u}$ . Use the dot-product form to compute; the $∣\nabla f ∣ cos θ$ form gives the geometry.

What it means

The partial derivative $\partial f / \partial x$ tells you how $f$ changes when you move in $+ \hat{x}$ . The directional derivative generalizes this to any direction $\hat{u}$ : it tells you how fast $f$ changes when you walk in the direction $\hat{u}$ , per unit distance traveled.

For $\hat{u} = \hat{x}$ : $D_{\hat{x}} f = \partial f / \partial x$ . For $\hat{u} = \hat{y}$ : $D_{\hat{y}} f = \partial f / \partial y$ . For arbitrary $\hat{u} = u_{x} \hat{x} + u_{y} \hat{y} + u_{z} \hat{z}$ :

$D_{\hat{u}} f = u_{x} \frac{\partial f}{\partial x} + u_{y} \frac{\partial f}{\partial y} + u_{z} \frac{\partial f}{\partial z} = \nabla f \cdot \hat{u} .$

The partial derivatives along the three coordinate axes are components of the gradient; the directional derivative in an arbitrary direction is the projection of the gradient onto that direction.

Why $\hat{u}$ must be a unit vector

The formula $D_{\hat{u}} f = \nabla f \cdot \hat{u}$ assumes $∣ \hat{u} ∣ = 1$ , i.e. “rate per unit distance.” Plug a non-unit vector $v$ into the same formula and the result $\nabla f \cdot v$ comes out scaled by $∣ v ∣$ . So for a non-unit $v$ :

$D_{v} f = \nabla f \cdot \frac{v}{∣ v ∣} .$

Forgetting to normalize is an easy mistake here.

Geometric interpretation: steepest ascent

Writing $D_{\hat{u}} f = ∣\nabla f ∣ cos θ$ :

Maximized at $θ = 0$ ( $\hat{u}$ pointing along $\nabla f$ ): $D_{\hat{u}} f = ∣\nabla f ∣$ . This is the steepest-ascent direction.
Zero at $θ = π /2$ ( $\hat{u}$ perpendicular to $\nabla f$ ): $D_{\hat{u}} f = 0$ . Moving along a level surface (where $f$ is constant) means $\nabla f ⊥ \hat{u}$ .
Minimized at $θ = π$ ( $\hat{u}$ antiparallel to $\nabla f$ ): $D_{\hat{u}} f = - ∣\nabla f ∣$ . The steepest-descent direction, which is what Gradient descent runs on.

The gradient is the steepest-ascent vector; its magnitude is the steepest slope; the directional derivative in any other direction is just the projection.

Worked example

Take $f (x, y, z) = x^{2} + 2 x y + z^{3}$ .

Gradient:

$\nabla f = (2 x + 2 y) \hat{x} + 2 x \hat{y} + 3 z^{2} \hat{z} .$

At point $(1, 1, 1)$ :

$\nabla f ∣_{(1, 1, 1)} = 4 \hat{x} + 2 \hat{y} + 3 \hat{z} .$

Directional derivative in the direction of $v = \hat{x} + \hat{y} - \hat{z}$ . First normalize: $∣ v ∣ = 3$ , so $\hat{u} = (\hat{x} + \hat{y} - \hat{z}) / 3$ .

$D_{\hat{u}} f = \nabla f \cdot \hat{u} = \frac{4 + 2 - 3}{3} = \frac{3}{3} = 3 \approx 1.73.$

Interpretation: walking from $(1, 1, 1)$ in the direction $(1, 1, - 1) / 3$ at unit speed, $f$ increases at rate $3$ per unit distance.

The maximum possible rate at this point: $∣\nabla f ∣ = 16 + 4 + 9 = 29 \approx 5.39$ , in the direction $\hat{u}_{m a x} = (4, 2, 3) / 29$ .

Connection to chain rule

If $r (t)$ is a parametrized curve passing through a point at $t = 0$ with $\dot{r} (0) = v$ , then

$\frac{d}{d t} f (r (t))_{t = 0} = \nabla f \cdot v .$

This is the chain rule. The directional derivative is the special case where $v = \hat{u}$ is a unit vector, measuring rate of change “per unit arc length” along the curve, at the starting point. Otherwise the rate is scaled by the speed $∣ v ∣$ .

In machine learning

In gradient-based optimization, the “step direction” question is: in which $\hat{u}$ should we update parameters to decrease the loss $J (w)$ fastest? Answer: $\hat{u} = - \nabla J /∣\nabla J ∣$ . The negative gradient direction has directional derivative $- ∣\nabla J ∣$ , the most-negative possible value. See Gradient descent.

In adaptive methods (Adam, RMSprop, etc.), the effective $\hat{u}$ at each step is a messier function of past gradients and squared gradients, but it’s the same idea: move in the direction of greatest local decrease.

In electromagnetics

The directional derivative shows up wherever you want the rate of change of a field along a specific direction, usually along a path, a boundary, or a streamline. The line integral $\int_{C} E \cdot d l$ can be viewed as integrating $D_{\hat{t}} V$ along the path (where $\hat{t}$ is the unit tangent and $E = - \nabla V$ ): the integral accumulates the rate of potential change along the path, giving the total potential difference.

Idriss Rami — Notes

Explorer

Directional derivative

What it means

Why $\hat{u}$ must be a unit vector

Geometric interpretation: steepest ascent

Worked example

Connection to chain rule

In machine learning

In electromagnetics

Graph View

Table of Contents

Backlinks

Idriss Rami — Notes

Explorer

Directional derivative

What it means

Why u^ must be a unit vector

Geometric interpretation: steepest ascent

Worked example

Connection to chain rule

In machine learning

In electromagnetics

Graph View

Table of Contents

Backlinks

Why $\hat{u}$ must be a unit vector