In the article Random variables as vectors, we discussed that random variables were vectors, and their covariance was their dot product.
The basic motivation for coming up with this idea was from contrasting Var(X+X)=4Var(X) to the formula for variables with zero covariance Var(X+Y)=Var(X)+Var(Y). These correspond to the geometric cases of adding two parallel and perpendicular vectors -- a more general addition is expressed through the cosine rule. What's the "cosine rule for random variables"?
Well, it's Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y). To me, this -- like the dot product form of the cosine rule -- is highly suggestive of a bilinear form, specifically the Gram matrix, called the covariance matrix, of the random vector X=[XY] (which is really to be seen as a "matrix", because the random variables are to be understood as row vectors).
Σ(X1,…Xn)=[Cov(Xi,Xj)]
One may compare this Gram matrix interpretation -- Σ=XXT (note: not XTX, the way we've defined X -- this is important!) -- to the variance formula σ2=XXT, and realise that the covariance matrix is the "right" measure of variance of a random vector (note how if we made random variables column vectors, this would all become XTX, etc.).
(yeah, yeah, you need to subtract the mean, etc.)
Analogously, one may define a cross-covariance matrix KXY=E((X−μX)(Y−μY)T) measuring the covariance between two random vectors.
It is rather natural to see this, being a bilinear form, as related to some notion of distance -- the standard deviation, after all, can be seen as a "natural distance unit" in one dimension (in the sense that the "unlikeliness" of a data point depends on its distance from the mean in units of standard deviation).
Suppose we wish to find the variance across some direction, i.e. the variance of some random variable u1X+u2Y=Xˆu with |ˆu|=1 -- this is clearly just ˆuTΣˆu. So this defines a natural distance scale in the direction of ˆu, so that the norm of a vector →v is defined as:
‖→v‖=→vT→vˆvTΣˆv
It is not too hard to show -- from the bilinearity of the expression -- that this is equivalent to:
‖→v‖=→vTΣ−1→v
Another way to interpret is this Σ−1 maps the distribution into a spherical one (one with identity covariance matrix), and this norm is just the norm of the data point in this spherical distribution, which is adjusted for variances and covariances. This measure of distance is called the Mahalanobis distance.
No comments:
Post a Comment