The Winding Number: Classical Statistics, Confidence Regions and Hypothesis tests

The basic general idea behind a confidence region is this: Given that the true value of some parameter is $\theta$ we may have some mechanism to sample "random regions" $R$ for $\theta$ such that 95% of these random regions contain $\theta$.

The first obvious issue is that this mechanism should not depend on $\theta$, as it is not known to us. We want a general experimental mechanism that for any $\theta$, produces random regions of the same confidence level ("95%").

In some basic cases, this is easy: for example, suppose we have some $X\sim N(\mu, 1)$. Then for any $\theta$, 95% of intervals generated as $X\pm 1.96$ contain $\mu$.

The key hint that you may find in the example above is that $\mu$ is a location parameter for $X$, i.e. the probability of $X\mid\mu$ is a function of just $X-\mu$, i.e. the distribution of $X-\mu$ itself does not depend on $\mu$, and is just $N(0,1)$. $X-\mu$ is what we call a pivotal quantity here.

In general, a pivotal quantity is a function of some data and the true value of the parameter itself $k(X,\theta)$ such that its distribution is completely specified. Then a confidence region for $k$ can hopefully be transformed back into a confidence region for $\theta$ at the same confidence level.

OK, next question: what is the implied prior of confidence region calculations? I.e. under what prior can the confidence level be interpreted as the probability that the true value of the parameter is contained in the confidence region?

(For a general prior, such a region that gives you some probability of containing the true value of the parameter is called a credible region.)

Well, what exactly is the confidence level? It's the probability that a randomly generated random region contains the true parameter value -- i.e. before you actually know what the random region is. Once you get the generated random region, this probability may change depend on the prior probability of the true parameter value being in this concrete region.

In other words, the implied prior is one such that $\theta$ has an equal prior probability of being in any possible confidence region. This is easy to calculate in some specific examples:

If $\theta$ is a location parameter for $X$, the implied prior on $\theta$ is uniform, $\propto 1$.
If $\theta$ is a scale parameter for $X$, the implied prior on $\theta$ is logarithmic, $\propto 1/\theta$.

The way that hypothesis testing is first introduced, one talks of things like "the probability of finding a value of $x$ at least as extreme as you did". And one sometimes chooses a "one-sided" hypothesis test and other times a "two-sided" hypothesis test. It should be clear that this isn't too fundamental a concept to be interested in.

Rather, one sensible, more generally appropriate way of thinking of hypothesis tests is in terms of confidence regions. Specifically: testing a null hypothesis is equivalent to asking if it is contained within the confidence region of our data.

Obviously, this depends entirely on the shape we choose of our confidence region. We can always just choose a confidence region that includes or excludes our null hypothesis and maintain the same confidence level.

While it may be disappointing that there is no one way to construct a confidence region, this makes a great deal of sense. For example, consider the following multimodal distribution:

The sensible confidence region to construct would then be one that contains the bulk of both peaks. "Sensibility" in this sense is getting the confidence region of the least length (you may observe that this is not reparameterization-invariant).

Various different constructions of confidence regions is what gives you things like two-tailed and one-tailed tests.

Also read: Choosing the more likely hypothesis by Richard Startz

Classical Statistics, Confidence Regions and Hypothesis tests

No comments:

Post a Comment