Friday, September 21, 2012

Priors don't wash out

When I was a grad student, I was taught that in Bayesian epistemology the prior probabilities wash out as evidence comes in.

But that's false or at least deeply misleading.

Suppose Sam and Jennifer start with significantly different priors, but have the same relevant conditional probabilities and the same evidence. Then their posterior probabilities will always be significantly different. For instance, suppose Sam and Jennifer start with with priors of 0.1 and 0.9 respectively for some proposition p. They then get a ton of evidence, so that Sam's posterior probability is 0.99. But Jennifer's posterior probability will be way higher, about 0.99988. Suppose Sam's is 0.999. Then Jennifer's will be 0.999988. And so on. Jennifer's probabilities will always be way higher.

If the difference between 0.999 and 0.999988 seems small, that's because we're using the wrong scale. Notice that Sam assigns 81X higher probability to not-p than Jennifer does.

And in fact, if with Turing we measure probabilities with log-odds (log-odds(A) = log (P(A)/(1−P(A))), then no matter how much Sam and Jennifer collect the same evidence, Jennifer's log-odds for p minus Sam's will always equal about 4.39.

11 comments:

  1. I'm no expert, but does this not only hold if all the evidence comes in in one direction?

    e.g. Given priors of Sam 0.1 and Jen 0.9 both receive 3 positives and 1 negative (the maths may be wrong here but I'm more interested in the logic)e.g. 4*0.75, which would leave, assuming equal weighting on each, Sam on 0.63 and Jen on 0.78?

    Like I said, the maths is out since I'm using a straight arithmetic mean, but does the logic hold? I'm not saying it does, I'm genuinely asking because I don't know.

    ReplyDelete
  2. No, we don't evaluate evidence by arithmetic means. Would that it were so simple. Rather, we add log-Bayes-ratios to the log-odds.

    ReplyDelete
  3. Depends on how you measure the difference in probability assignments, I suppose. It seems to me that the most straightforward measure of how different Sam's probability is than Jennifer's is the difference between them (!): Pr_S() - Pr_J(). And this difference goes to zero as more and more common evidence comes in. Doesn't this give us a legitimate sense in which the priors get washed out?

    Of course, we still have the following problem: given any n bits of common evidence that will come in (along with the relevant conditional probabilities), I can always specify a set of prior probability assignments that will not be washed out!

    ReplyDelete
  4. Jonah,

    I don't think the log-odds formulation is best thought of as a way of measuring the difference in probability assignments. Rather, it is a way of measuring the difference in the weight that people assign to their evidence. The issue, then, is that there are multiple ways of measuring difference in the weight two people place on same evidence.

    One might take difference in probability assignment as the correct measure, but now that we're comparing weights of evidence, it is not obvious (in the way that it is if you are really looking for a way to compare probability assignments) that difference in probability assignment is the correct measure.

    At least, that's the way I'm reading the point here. Am I way off base?

    ReplyDelete
  5. Jonah:

    I don't think absolute differences are a good way to measure differences in probability assignments. The difference between 0.50 and 0.55 is much smaller than that between 0.95 and 1.

    But bracketing that, there is still the issue that the priors are only washing out insofar as we have converged to 0 or 1. But when our evidence has not yet converged, which could still be the case after a lot of evidence, there will be no washing out.

    ReplyDelete
  6. Jonathan: I'm wanting to emphasize that there is a clear sense in which priors *do* wash out with the accumulation of common evidence. This sense has to do with the difference in people's probability assignments (measured in the simplest way possible, by subtracting one from the other). This difference goes to zero / washes out with the accumulation of common evidence. As far as I can tell, it would only confuse matters to try and explicate this sense of the difference (the one that does get washed out) by using measures of confirmation, support, weight of evidence, etc. But now I'm wondering if I'm missing your point (??)

    Alex: You write, "The difference between 0.50 and 0.55 is much smaller than that between 0.95 and 1." But once we disentangle the notion of difference in probability assignments from that of difference in support, confirmation, or the like, then I don't see any reason why I should accept this. It seems to me that the difference in both cases (difference again in the most simple sense) is exactly the same ... and so it's nice that it is exactly the same! I completely agree with your second comment.

    ReplyDelete
  7. I suppose it's a judgment call as to what the significant kind of difference is.

    Do you think a hydrogen atom is closer in size to my big toe (a difference of about two inches) than I am to someone who is four inches taller than me? :-)

    ReplyDelete
  8. Well yeah. But I'd rather say that the difference in size is about the same -- given that an electron is so small in comparison. Wouldn't you?! In the simplest sense of difference, this is just obviously correct.

    ReplyDelete
  9. Alex: Your post has me scratching my head. I have never understood the point about the "washing out of the priors" to mean that if enough evidence is accumulated, then eventually one will arrive at exactly the same final (posterior) probability regardless of one's starting point (one's prior probabilities).

    Rather, I thought the point of "the priors wash out" is something weaker than that, such as

    * the final probabilities are in the same 'direction' regardless of one's prior probabilities (i.e., the evidence either increases or decreases the probability of some hypothesis)
    * the 'magnitude' of both final probabilities is either above or below 0.5

    ReplyDelete
  10. Jeffery:

    I guess your education was better than mine in this regard.

    ReplyDelete
  11. "If the difference between 0.999 and 0.999988 seems small, that's because we're using the wrong scale." You've got a point there; however, in my line of work we round both numbers to 1.0 for presentations to higher-ups. Keeps them happy. :-)

    ReplyDelete