Why is the Pythagoras theorem true anyway?

[has been a draft for several years, just publishing it because I don't plan on completing it]


I have a list of things I don't really get in math: 

  1. the Pythagoras theorem 
  2. why Fourier transforms are special among transformations (I mean I get that it converts things into frequency basis -- aka the eigenbasis for the differentiation operator -- but why does that seem to be useful in totally unrelated settings like solving functional equations and convolutional neural networks) // why the normal distribution is the fixed point of the Fourier transform
  3. the second fundamental theorem of Lie theory aka the Baker-Campbell-Hausdorff formula
  4. Fisher information
  5. why the first two derivatives are all that matter in physics // why position and momentum determine the state
(It's mildly interesting that all these things seem to have a unifying theme of "why is the second-order so much more important than anything higher-order?" Perhaps there is a connection between them -- there is probably a connection between 1 and 2, and I think I've read paper about "deriving physical laws from Fisher information" or something like that so maybe there's a connection between 4 and 5, and there probably is a connection between 2 and 5.)

The Pythagoras theorem sticks out in this list for being quite elementary, pedagogically. In fact from a historical perspective, it is the most elementary mathematical result in math: in both ancient mathematical traditions, the Indian and the Greek, the Pythagorean theorem was the first mathematical rule to be discovered -- predating (perhaps causing) all other math and science. Yet all its proofs seem like hacks. No, I will not make four copies of that triangle and put them in a square.

One way to think about the Pythagoras theorem is like this: if you line up two segments parallelly and outward, the area of the square of their "hypotenuse" is $(x+y)^2=x^2+y^2+2xy$. If you rotate one of the segments 180 degrees into the same direction as the other, the area of their "hypotenuse" is $x^2+y^2-2xy$. But if you line them up exactly midway between those two extremes, i.e. perpendicularly, then that area is also mid-way between them, i.e. $x^2+y^2$.


Of course, this is not good enough motivation either: there is no reason we should be thinking of areas in the first place, there is no reason why the length couldn't simply be the average of $x+y$ and $x-y$ (I mean OK, it can't because that's just $x$, but you get the point) or something either. 

The reason why squares come into picture, I think, has to do with the equivariance rules underlying geometry -- specifically, scale invariance.

(It makes sense to reason from invariance rules, because fundamentally, geometry is about the behaviour of a space under transformations. Euclidean geometry, for example, is defined by a plane invariant under translations, rotations and reflections and equivariant under scalings -- trigonometry is invariant under scalings too.)

Namely, any length $h$ constructed from $x$ and $y$ must satisfy:

$$h(\alpha x, \alpha y), = \alpha h(x, y)$$

 So, you can imagine, 

Fundamentally I think the Pythagoras theorem is a statement about our notion of "rigid motions". The first-principles way of stating the theorem is -- if you take the point $(x,y)$ and rotate it rigidly until it falls on the $x$-axis, its $x$ co-ordinate will now be $\sqrt{x^2+y^2}$. Indeed any proof of the theorem relies on appeals to what constitutes a rigid motion, e.g. this:


relies on moving the triangles around and accepting that the concept of lengths and angles -- whatever they may be -- remain invariant under these motions. In linear algebra these rigid motions are formalized as orthogonal transformations, you can demonstrate that the "dot product" (which defines lengths and angles) is invariant under them.

But the fact that real space follows these rules is an empirical fact. The fact that rulers move in a "circle" when you "turn" them is an empirical fact -- it would have been possible to live in a world in which they instead move in a hyperbola (below is a sketch of me turning a ruler in this space), and then we would perceive hyperbolic distance as regular distance and be totally happy to accept that the length of the hypotenuse is $x^2-y^2$


Indeed this is already how we perceive spacetime -- and here is a "Minkowskian version" of the above proof of the Pythagoras theorem, from Ron Maimon:

Similarly in the space of probability distributions, we are happy to accept that rulers move like this -- that potatoes are the real circles in information geometry:




No comments:

Post a Comment