Monday, October 19, 2015

Correcting Bayesian calculations

Normally, we take a given measurement is a sample of a bell-curve distribution centered on the true value. But we have to be careful. Suppose I report to you the volume of a cubical cup. What the error distribution is like depends on how I measured it. Suppose I weighed the cup before and after filling it with water. Then the error might well have the normal distribution we associate with the error of a scale. But suppose instead I measure the (inner) length of one of the sides of the cup, and then take the cube of that length. Then the measurement of the length will be normally distributed, but not the measurement of the volume. Suppose that what I mean by "my best estimate" of a value is the mathematical expectation of that value with respect to my credences. Then it turns out that my best estimate of the volume shouldn't be the cube of the side length, but rather it should be L3+3Lσ2, where L is the side-length and σ is the standard deviation in the side-length measurements. Intuitively, here's what happens. Suppose I measure the side length at 5 cm. Now, it's equally likely that the actual side length is 4 cm as that it is 6 cm. But 43=64 and 63=216. The average of these two equally-likely values is 140, which is actually more than 53=125. So if by best-estimate I mean the estimate that is the mathematical expectation of the value with respect to my credences, the best-estimate for the volume should be higher than the cube of the best-estimate for the side-length. (I'm ignoring complications due to the question whether the side-length could be negative; in effect, I'm assuming that the σ is quite a bit smaller than L.)

There is a very general point here. Suppose that by the best estimate of a quantity I mean the mathematical expectation of that quantity. Suppose that the quantity y I am interested in is given by the formula y=f(x) where x is something I directly measure and where my measurement of x has a symmetric error distribution (error of the same magnitude in either direction are equally likely). Then if f is a strictly convex function, then my best estimate for y should actually be bigger than f(x): simply taking my best estimate for x and applying f will underestimate y. On the other hand, if f is strictly concave, then my best estimate for y should be smaller than f(x).

But now let's consider something different: estimating the weight of evidence. Suppose I make a bunch of observations and update in a Bayesian way on the basis of them to arrive at a final credence. Now, it turns out that when you formulate Bayes' theorem in terms of the log-odds-ratio, it becomes a neat additive theorem:

• posterior log-odds-ratio = prior log-odds-ratio + log-likelihood-ratio.
[If p is the probability, the log-odds ratio is log (p/(1−p)). If E is the evidence and H is the hypothesis, the log-likelihood-ratio is log (P(E|H)/P(E|~H)).] As we keep on repeating adding new evidence into the mix, we keep on adding new log-likelihood-ratios to the log-odds-ratio. Assuming competency in doing addition, there are two or three sources of error--sources of potential divergence between my actual credences and the rational credences given the evidence. First, I could have stupid priors. Second, I could have the wrong likelihoods. Third, perhaps, I could fail to identify the evidence correctly. Given the additivity between these errors, it's not unreasonable to think that error in the log-odds-ratio will be approximately normally distributed. (All I will need for my argument is that it has a distribution symmetric around some value.)

But as the case of the cubical cup shows, it does not follow that the error in the credence will be normally distributed. If x is the log-odds-ratio and p is the probability or credence, then p=ex/(ex+1). This is a very pretty function. It is concave for log-odds-ratios bigger than 0, corresponding to probabilities bigger than 1/2, and convex for log-odds-ratios smaller than 0, corresponding to probabilities less than 1/2, though it is actually fairly linear over a range of probabilities from about 0.3 to 0.7.

We can now calculate an estimate of the rational credence by applying the function ex/(ex+1) to the log-odds-ratio. This will be equivalent to the standard Bayesian calculation of the rational credence. But as we learn from the cube case, we don't in general get the best estimate of a quantity y that is a mathematical function of another quantity x by measuring x with normally distributed error and computing the corresponding y. When the function in question is convex, my best estimate for y will be higher than what I get in this way. When the function is concave, I should lower it. Thus, as long as we are dealing with small normal error in the log-odds-ratio, when we are dealing with probabilities bigger than around 0.7, I should lower my credence from that yielded by the Bayesian calculation, and when we are dealing with probabilities smaller than around 0.3, I should raise my credence relative to the Bayesian calculation. When my credence is between 0.3 and 0.7, to a decent approximation I can stick to the Bayesian credence, as the transformation function between log-odds-ratios and probabilities is pretty linear there.

How much difference does this correction to Bayesianism make? That depends on what the actual normally distributed error in log-odds-ratios is. Let's make up some numbers and plug into Derive. Suppose my standard deviation in log-odds-ratio is 0.4, which corresponds to an error of about 0.1 in probabilities when around 0.5. Then the correction makes almost no difference: it replaces a Bayesian's calculation of a credence 0.01 with a slightly more cautious 0.0108, say. On the other hand, if my log-odds-ratio standard deviation is 1, which corresponds with a variation of probability of around plus or minus 0.23 when centered on 0.5, then the correction changes a Bayesian's calculation of 0.01 to the definitively more cautious 0.016. But if my log-odds-ratio standard deviation is 2, corresponding to a variation of probability of 0.38 when centered on 0.5, then the correction changes a Bayesian's calculation of 0.01 to 0.04. That's a big difference.

There is an important lesson here. When I am badly unsure of the priors and/or likelihoods, I shouldn't just run with my best guesses and plug them into Bayes' theorem. I need to correct for the fact that my uncertainty about priors and/or likelihoods is apt to be normally (or at least symmetrically about the right value) distributed on the log-odds scale, not on the probability scale.

This could be relevant to the puzzle that some calculations in the fine-tuning argument yield way more confirmation than is intuitively right (I am grateful to Mike Rota for drawing my attention to the last puzzle, in a talk he gave at the ACPA).