The Winding Number: You can't hack Bayes's theorem

Aka the difference between rationalization and rationality.

Eliezer Yudkowsky calls this the dilemma of the clever arguer: A propagandist (clever arguer) is hired to sell you a box that may or may not contain a diamond -- he tells you that the box has a blue stamp on it (which you know occurs more on boxes containing diamonds). If you could handle the box yourself, you could rationally evaluate all the characteristics of the box and compile their influences on your probability estimate Are you then forced, for each argument the clever arguer provides, to helplessly update your probabilities as the clever arguer wishes, even though you know the propagandist has omitted the evidence he doesn't want you to know of?

There are various equivalent formulations of the problem including:

p-hacking
Filtered evidence: an experimenter flips a coin 10 times and tells you that the 4th, 5th and 7th tosses came up heads, without telling you anything about the other tosses.

The key matter to realize is that the information you get is not simply the 4th, 5th and 7th tosses came up heads, but instead: the propagandist tells me that the 4th, 5th and 7th tosses came up heads. The statistical process we are studying is no longer the "natural" (IID Bernoulli) process we're used to, but a different process, which depends on the inner mechanism used by the propagandist. For example:

If the propagandist always tells you only the 4th, 5th and 7th tosses, you update your beliefs from this evidence as normal.
If the propagandist only tells you the coin tosses that came up heads, then you now know that the other seven tosses come up tails, and you update your beliefs accordingly.
If the propagandist chooses any three heads that came up to tell you about, then the probability of 4, 5 and 7 specifically being chosen is only slightly greater with a biased coin (you can calculate this, but the key point is that the process we're observing is not about which coins come up heads, but which coins are chosen by the propagandist).

We could have a probability distribution on the various possible mechanisms underlying the propagandist, and then the information we get from our observations is actually split between information on the propagandist's mechanism and information on the coin's mechanism.

Here's a model that makes it easier to perform inference on: we have $n$ features $X_i$ distributed as $N(\mu, 1)$ with prior $N(0,1)$ on $\mu$, and the propagandist chooses to tell us the value $x$ of $X_j$ that has the greatest value among all $X_i$s. The posterior density on $\mu$ can then be calculated:

\[\frac{{{\Phi _\mu }{{(x)}^{n - 1}}{\phi _\mu }(x){\phi _0}(\mu )}}{{\int_\mu {{\Phi _\mu }{{(x)}^{n - 1}}{\phi _\mu }(x){\phi _0}(\mu )\,d\mu } }}\]

Where $\phi_\mu$, $\Phi_\mu$ are the normal PDF and CDF respectively. Here's an interactive visualization of this density:

Link to interactive version.(Note that in this visualization, $\mu$ is $x$ and $x$ is $z$).

For example for $n=10$, $x=1$, the distribution actually shifts leftwards, because surely if $\mu$ were 0, the propagandist could have found a feature with a better value than 1.

The moral of the story is that you can't hack Bayes's theorem; you can't fool a rational agent. If you have a parameter whose value you know, you can't systematically produce misinformation about the parameter. This is a result of the conservation of expected evidence: the expectation of the posterior probability of each value is its prior probability.

So when we talk about scientific standards -- about scientists revealing all relevant information and not p-hacking, etc. -- these are not requirements for Bayesian inference, but they're simply a way to ensure that scientific research is maximally informative. If you don't know the underlying process that scientists use to report their data, or if you know that they use a "biased" process, then your estimator of the relevant parameter will be less informative than it could have otherwise been.

The method they teach you in primary school to combat filtered evidence -- listing "arguments and counter-arguments", "pros and cons", etc. -- is far inferior to the standard of scientific ethics. Listing arguments and counter-arguments replaces a one-sided rationalization with a two-sided rationalization, but it doesn't truly approach rationality -- you just have two propagandists instead of one.

You can't hack Bayes's theorem

No comments:

Post a Comment