Tuesday, February 21, 2012

How likely am I to be misled?

Suppose a hypothesis H is true, and I assign a positive prior probability P(H)>0 to H. I now perform some experiment such that I have a correct estimate of how likely the various possible outcomes of the experiment are given H as well as how likely they are given not-H. (For simplicity, I will suppose all the possible outcomes of the experiment have non-zero probability.) It could turn out that the experiment supports not-H, even though H is true. That's the case of being misled by the outcome of the experiment.

How likely am I to be misled? Perhaps very likely. Suppose that I have a machine that picks a number from 1 to 100, and suppose that the only two relevant hypotheses are H, the hypothesis that the machine is biased in favor of the number 1, which it picks out with probability 2/100, while all the other numbers are picked out with probability 98/9900, and not-H, which says that all the numbers are equal. My experiment is that I will run the machine once. Suppose that H is in fact true. Then, by Bayes' theorem, if I get the number 1, my credence for H will go up, while if I get anything else, my credence for H will go down. Since my probability of getting something other than 1 is 98/100, very likely I will be misled. But only by a little—the amount of confirmation to the fairness hypothesis not-H that is given by a result other than 1 is very small.

Could one cook up cases where I am likely to be misled to a significant degree?

It turns out that one can prove that one cannot. For we can measure the degree to which I am misled by the Bayes' factor B=P(E|H)/P(E|~H), where E is the outcome of the experiment. When B<1, then I am misled by the result of the experiment into downgrading my confidence in the truth H, and the smaller B is, the more I am misled. But it turns out that there is a very elegant inequality saying how likely I am to be misled to any particular degree:

  1. P(By)≤y.
This inequality holds for any experiment where all the possible outcomes have non-zero probability that I could perform, as long as my estimates of the conditional probabilities of the outcomes are correct.

We can also state the result in terms of probabilities. Suppose I start with assigning probability 1/2 to H. I then perform an experiment and perform a Bayesian update. How likely is it that I end up with a posterior probability at or below p? It turns out that the probability that I will end up with a posterior probability at or below p is no more than p/(1−p). So, for instance, the probability that an experiment will lead me to assign probability 1/10 is at most 11%, no matter how carefully the experiment is set up to be unfavorable to H, as long as I have correct estimates of the likelihoods of the experiment's outcomes on H and on not-H.

This has an interesting corollary. We know that it need not be the case that a series of experiments will yield convergence of posterior probabilities to truth, even if we have correct likelihood estimates. For the experiments might get less and less discriminating, and if they do so fast enough, the posteriors for the true hypothesis H will not converge to 1. But could things be so bad that the posteriors would be sure to converge to 0 (i.e., would do so with probability one)? No, it cannot be the case that with probability one the posteriors will converge to 0, because we could aggregate a sequence of experiments into a single large experiment (whose outcome is an n-tuple of outcomes) and use inequality (1).

So even though Bayesian convergence doesn't always happen, even if one has correct likelihoods, nonetheless we can be confident that Bayesian mis-convergence won't happen if we have correct likelihoods.

1 comment:

Alexander R Pruss said...

Inequality (1) (in slightly different form) is proved in Richard Royall's book, Statistical Evidence: a Likelihood Paradigm.