Probabilistic inequalities

Consider a random variable with mean 0 and variance 1 (this is like the "natural units" of second-moment probability). Now this variance puts a value on how dispersed the PDF of the random variable can be: one couldn't, for example, have two Dirac-delta poles really far from the origin, because you can calculate the variance of that, and it's too high.

Which raises the question: for some $k$, what's the maximum fraction of the distribution that can be outside $(-k,k)$?

Ok, one thing is clear: at this maximum, there should be nothing outside $[-k,k]$, because then you could bring inwards without changing the fraction but reducing the variance.

Also, for any value in $(-k,k)$, if you moved it over to the mean (0), the variance would go down. So the distribution must be comprised of just three poles at 0, $-k$ and $+k$, and is necessarily symmetric so that the mean is at 0. Letting $p/2$ be the height of each pole at $k$, the variance in terms of $p$ is $pk^2$. So $pk^2=1$, and $p=1/k^2$. I.e.

$$P\left(\left|X\right|>k\right) \le 1/k^2$$
Or for general mean and variance:

$$P\left(\left|\frac{X-\mu}{\sigma}\right|>k\right) \le 1/k^2$$
This is Chebyshev's inequality, and gives you a limit on how much of the distribution can be some given distance $k$ from the mean. Note how it only becomes interesting for large $k$ ($k>1$).

Well, clearly this approach seems to open up a whole world of similar inequalities. Another is the Markov inequality, which states that (for a nonnegative random variable) no more than $1/k$ of a population can have value more than $k$ times the mean, i.e.

$$P\left(X\ge k\right)\le\frac{\mu}{k}$$
(Justify this with similar reasoning as Chebyshev's.)

In fact, Chebyshev's inequality can be derived as a special case of Markov's (do it).

The point of these inequalities is that means and variances are generally easy to track, even when probability distributions are unknown. Providing bounds on probability fractions based on these is very useful for proving convergence in probability -- for example, the weak law of large numbers becomes elementary with Chebyshev's inequality.