If you recall some basic calculus, the gradient of a scalar function $f(x_1,\dots x_n)$ is just the generalization of the derivative:

$$f'(x_1,\dots x_n) =\left[\begin{array}{}\frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{array} \right] $$

And the Hessian of a scalar function $f(x_1,\dots x_n)$ is just the generalization of the second derivative:

$$f''(x_1,\dots x_n) =\left[\begin{array}{}\frac{\partial^2 f}{\partial x_1^2} & \dots & \frac{\partial^2 f}{\partial x_1\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \dots & \frac{\partial^2 f}{\partial x_n^2} \end{array} \right] $$

Why is this interesting? Consider just $f$ quadratic -- then just like in one dimension, $f$ can be written only in terms of its value, derivative, and second derivative at 0, $f$ can be written only in terms of its value, gradient and Hessian at 0.

$$ \begin{align} f(x,y) &= c + (c_1x+c_2y) + (c_{11}x^2 + c_{12}xy + c_{21}y^2) \\ &= f(0) + \left(f_x(0)x+f_y(0)y\right) + \frac12 \left(f_{xx}(0)x^2+2f_{xy}(0)xy+ f_{yy}(0)y^2\right) \\ f(\mathbf{x}) &= f(0) + f'(0)\cdot \mathbf{x} + \mathbf{x}\cdot f''(0) \mathbf{x} \end{align}$$

What this tells us is:

- The gradient is
*naturally thought of*as a linear form. - The Hessian is
*naturally thought of*as a quadratic form.

A what and a what?

There are two ways of thinking of a thing like $\left[\begin{array}{}a \\ b \end{array} \right]$ -- a vector $a\mathbf{e}_1+b\mathbf{e}_2$, or a linear expression $ax_1+bx_2$, a function on $x_1,x_2$. The former is an object in the space $\mathbb{R}^n$, while the latter is a function $\mathbb{R}^n\to\mathbb{R}$ (do you see why?).

Similarly, there are two ways of thinking of a matrix $\left[\begin{array}{}a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right]$ -- a linear transformation $\mathbb{R}^n\to\mathbb{R}^n$, or a quadratic expression $a_{11}x^2+(a_{12}+a_{21})xy+a_{22}y^2$, which is a function on $x_1,x_2$, a function $\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}$ (do you see why?).

So e.g. the gradient should naturally be thought of as a function that, given some vector as input, gives you the directional derivative in the direction of that vector.

(Make sure you understand this very clearly.)

Similarly, the Hessian should be thought of as a function that, given two vectors as input, gives the second derivative in their directions $f_{xy}$.

(Make sure you understand this VERY clearly.)

Now suppose we wanted to talk about the *curvature of a surface*.

We know that the curvature of some curve $\phi(t)$ at the point $t=0$ is $\phi''(0)$. Naturally, we'd like the "curvature of a surface" would be something of a function that gives you the curvature in each direction -- that gives you the second derivative in each direction. So naturally, you'd want something like the Hessian.

So we'd like to define some quadratic form $A$ such that $\phi'(0)\cdot A \phi'(0)$ is the curvature $\phi''(0)$. Actually, it should just be the *normal curvature*, the component of $\phi''(0)$ normal to the surface, the sort of curvature that can be attributed entirely to the surface, rather than to the curve wiggling around on the surface.

[For whomsoever it may concern, Theorem 10.4 in your notes is what computes this quadratic form $A$ as the differential of the Gauss map, and is what motivates the Gauss map in the first place. This is why you should start with the last chapter and read backwards.]

Hello, great article but could you please give some examples to illustrate this? Cannot really make sense of

ReplyDelete"So e.g. the gradient should naturally be thought of as a function that, given some vector as input, gives you the directional derivative in the direction of that vector.

(Make sure you understand this very clearly.)

Similarly, the Hessian should be thought of as a function that, given two vectors as input, gives the second derivative in their directions (Make sure you understand this VERY clearly.)"

Hi, regarding your comment on if A(X,Y) ever come to use, I would like to point that if they belong to the orthonormal basis of TpS then they would naturally be 0, so what significance do they hold?

ReplyDelete