If you recall some basic calculus, the gradient of a scalar function f(x1,…xn) is just the generalization of the derivative:
f′(x1,…xn)=[∂f∂x1⋮∂f∂xn]
And the Hessian of a scalar function f(x1,…xn) is just the generalization of the second derivative:
f″(x1,…xn)=[∂2f∂x21…∂2f∂x1∂xn⋮⋱⋮∂2f∂xn∂x1…∂2f∂x2n]
Why is this interesting? Consider just f quadratic -- then just like in one dimension, f can be written only in terms of its value, derivative, and second derivative at 0, f can be written only in terms of its value, gradient and Hessian at 0.
f(x,y)=c+(c1x+c2y)+(c11x2+c12xy+c21y2)=f(0)+(fx(0)x+fy(0)y)+12(fxx(0)x2+2fxy(0)xy+fyy(0)y2)f(x)=f(0)+f′(0)⋅x+x⋅f″(0)x
What this tells us is:
- The gradient is naturally thought of as a linear form.
- The Hessian is naturally thought of as a quadratic form.
A what and a what?
There are two ways of thinking of a thing like [ab] -- a vector ae1+be2, or a linear expression ax1+bx2, a function on x1,x2. The former is an object in the space Rn, while the latter is a function Rn→R (do you see why?).
Similarly, there are two ways of thinking of a matrix [a11a12a21a22] -- a linear transformation Rn→Rn, or a quadratic expression a11x2+(a12+a21)xy+a22y2, which is a function on x1,x2, a function Rn×Rn→R (do you see why?).
So e.g. the gradient should naturally be thought of as a function that, given some vector as input, gives you the directional derivative in the direction of that vector.
(Make sure you understand this very clearly.)
Similarly, the Hessian should be thought of as a function that, given two vectors as input, gives the second derivative in their directions fxy.
(Make sure you understand this VERY clearly.)
Now suppose we wanted to talk about the curvature of a surface.
We know that the curvature of some curve ϕ(t) at the point t=0 is ϕ″(0). Naturally, we'd like the "curvature of a surface" would be something of a function that gives you the curvature in each direction -- that gives you the second derivative in each direction. So naturally, you'd want something like the Hessian.
So we'd like to define some quadratic form A such that ϕ′(0)⋅Aϕ′(0) is the curvature ϕ″(0). Actually, it should just be the normal curvature, the component of ϕ″(0) normal to the surface, the sort of curvature that can be attributed entirely to the surface, rather than to the curve wiggling around on the surface.
[For whomsoever it may concern, Theorem 10.4 in your notes is what computes this quadratic form A as the differential of the Gauss map, and is what motivates the Gauss map in the first place. This is why you should start with the last chapter and read backwards.]
Hello, great article but could you please give some examples to illustrate this? Cannot really make sense of
ReplyDelete"So e.g. the gradient should naturally be thought of as a function that, given some vector as input, gives you the directional derivative in the direction of that vector.
(Make sure you understand this very clearly.)
Similarly, the Hessian should be thought of as a function that, given two vectors as input, gives the second derivative in their directions (Make sure you understand this VERY clearly.)"