Let’s say that I am in the lab and I am measuring some unknown value U. My best model of the measurement process involves a random additive error E independent of U, with E having some known distribution, say a Gaussian of some particular standard deviation (perhaps specified by the measurement equipment manufacturer) centered around zero. The measurement gives the value 7.3. How should I now answer probabilistic questions like: “How likely is it that U is actually between 7.2 and 7.4?”
Here’s how this is sometimes done in practice. We know that U = 7.3 − E. Then we say that the probability that U is, say, between 7.2 and 7.4 is the same as the probability that E is between −0.1 and 0.1, and we calculate the latter probability using the known distribution of E.
But this is an un-Bayesian way of proceeding. We can see that from the fact that we never said anything about our priors regarding U, and for a Bayesian that should matter. Here’s another way to see the mistake: When I calculated the probability that U was between 7.2 and 7.4, I used the prior distribution of E. But to do that neglects data that I have received. For instance, suppose that U is the diameter of a human hair that I have placed between my digital calipers. And the calipers show 7.3 millimeters. What is the probability that the hair really has a diameter between 7.2 and 7.4 millimeters? It’s vanishingly small! That would be just an absurdly large diameter for a hair. Rather, the fact that the calipers show 7.3 millimeters shows that E is approximately equal to 7.3 millimeters. The posterior distribution of E, given background information on human hair thickness, is very different from the prior distribution.
Yet the above is what one does in practice. Can one justify that practice? Yes, in some cases. Generalize a little. Let’s say we measure the value of U to be α, and we want to know the posterior probability that U lies in some set I. This probability is:
Now suppose that E has a certain maximum range, say, from −δ to δ. (For instance, there is no way that digital with four digits can show more than 9999 or less than −9999.) And suppose that U is uniformly distributed over the region from α − δ to α + δ, i.e., its distribution over that region is perfectly flat. In that case, it’s easy to see that E and U + E = α are actually statistically independent. Thus:
And so in this case our initial naive approach works just fine.
In the original setting, if for instance we’re completely confident that E cannot exceed 0.5 in absolute value, and our prior distribution for U is flat from 6.8 to 7.8, then the initial calculation that the probability that U is between 7.2 and 7.4 equals the prior probability that E is between −0.1 and 0.1 stands. (The counterexample then doesn’t apply, since in the counterexample we had the possibility, now ruled out, that E is really big.)
The original un-Bayesian way of approaching basically pretended that U was (per impossibile) uniformly distributed over the whole real line. When U is close to uniformly distributed over a large salient portion of the real line, the original way kind of works.
The general point goes something like this: As long as the value of E is approximately independent of whether U + E = α, we can approximate the posterior distribution of E by its prior and all is well. In the case of the hair measurement, E was not approximately independent of whether U + E = 7.3, since if U + E = 7.3, then very likely E is enormous, but I assume E isn’t in other cases very likely to be enormous.
This is no doubt stuff well-known to statisticians, but I’m not a statistician, and it’s clarified some things for me.
The naive un-Bayesian calculation I gave at the beginning is precisely the one that I used in my previous post when adjusting for errors in the evaluation of evidence. But an appropriate flatness of prior distribution assumption can rescue the calculations in that post.