### Probabilistic convergence

The law of large numbers is something that we know, that in our heads is almost the definition of probability (it's not): "as sampling increases, the average of a variable approaches its expected value". I.e. for $X_i$ IID:

$$\lim_{n\to\infty}\frac1n \sum{X_i}=\mu$$
Let's think about what this statement really says: when you take more and more readings of $X$, the average will go closer and closer to $\mu$. But the values of these readings are inherently probabilistic: this is not an actual sequence of real numbers you can take the limit of. Rather, you are talking saying that of all the possible realizations (which are real number sequences), almost all of them (probabilistically) converge to the thing. I.e.

$$\mathrm{Pr}\left[\lim_{n\to\infty}X_n=X\right]=1$$
This is known as almost sure convergence.

In general, the thing on the right could've been a random variable, rather than a real number. And here's where some probability theory (read the article) comes in, because the random variables $X_n$ and $X$ need to be defined on the same sample space for this to make sense (i.e. it's not just about the distribution).

But with this, the definition as above still works: as an example, consider the sample space $[0,1]$ and consider a sequence of random variables $X_n$ that is respectively 1 on some corresponding sequence of sub-intervals approaching $[0,1/2]$. Then this approaches the random variable that is 1 on $[0,1/2]$ almost surely.

And yes, this is entirely due to the correlations between these things.

In any case, almost sure convergence isn't really the best way to express random variables converging to each other, as you can see. E.g. the central limit theorem -- like $\frac{1}{\sqrt{n}}\sum\frac{X_n-\mu}{\sigma}\sim N(0,1)$, cannot be phrased in terms of almost sure convergence, because $N(0,1)$ is a distribution, not a random variable.

Indeed, you may have figured that the problem of a random sequence converging to a random variable is somewhat similar to the notion of "functions converging to a function" -- indeed, one may think of the distributions of the random variables in the sequence and discuss their convergence. I.e.

$$F_n(x)\to F(x)$$
This is called convergence in distribution.

While convergence in distribution does not imply almost sure convergence in general as we've seen, we would expect that it does imply it in the case where the limiting random variable is constant (because then issue of correlations disappears).

But you may realize that this is not really so: a sequence may look increasingly like something without actually limiting to it. For example, think about a sequence like 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1... with an infinite number of 1s, but decreasing in frequency. This doesn't limit to zero. If this were a deterministic sequence, this would never be expected to limit to 0 as the positions of the 1s would be hardcoded into the generation of the sequence. However, the sequence can also be realized as a realization of a sequence of random variables $X_n$ that have probability $1/n$ of being 1. Then the $X_n$ converge in distribution to 0, but their realizations almost never (thus in particular don't almost surely) converge to 0.

So it seems that asking for realizations to almost surely converge to the right thing is a bit too strong for a lot of purposes. A weaker notion of convergence than almost sure convergence can be constructed by considering probabilities of each $X_n$ separately rather than as a sequence: $X_n$ converges to $X$ if each $X_n$ is in the limit almost surely arbitrarily close to $X$. Or more precisely:

$$\lim_{n\to\infty}\mathrm{Pr}\left(\left|X_n-X\right|<\varepsilon\right)=1$$
This is known as convergence in probability. Indeed:
1. Almost sure convergence implies convergence in probability (obviously).
2. Convergence in probability implies convergence in distribution (because they are both topological notions of convergence and the map from a random variable to its distribution is continuous).
3. When the limit random variable is constant, convergence in distribution implies convergence in probability.
In fact, the law of large numbers that we stated above (in terms of almost-sure convergence) is the strong law of large numbers, while the weak law of large numbers only states convergence in probability.

Exercise:

Prove Slutsky's lemma: given $X_n, Y_n$ converge to $X,y$ in probability and $y$ is a constant random variable:

1. $X_n+Y_n$ converges to $X+y$ in probability.
2. $X_nY_n$ converges to $Xy$ in probability.
3. $X_n/Y_n$ converges to $X/y$ in probability.
Why is it necessary that $y$ be a constant?