Wednesday, March 21, 2012

More on interpersonal data consolidation

This expands on the discussion here.

You query three distinguished astrobiologists each of whom is currently doing Mars research how likely they think there was once life on Mars. They give you probabilities of 0.85, 0.90 and 0.95 respectively for there being life on Mars, and explain the data on which they base their decisions. You find that each of them assigns the correct probability given their data set (hence they each have at least somewhat different data they are working with), given the same reasonable prior. Moreover, you have no other data about whether there was life on Mars.

What probability should you assign to L, the hypothesis that there was once life on Mars? The intuitive answer is: 0.90. But it turns out that what I told you in the preceding paragraph underdetermines the answer. What I said in the preceding paragraph is compatible with any probability other than zero and one, depending on what sort of dependencies there are between the data sets on the basis of which the scientists have formed their respective judgments.

For instance, it could be that they have in common a single very strong piece of evidence E0 against L but that they each have an even stronger piece of evidence in favor of L. Moreover, their respective pieces of evidence E+1, E+2 and E+3 in favor of L are conditionally independent of each other and of E0 (on L and on not-L). In such a case, when you consolidate their data, you get E0,E+1,E+2,E+3. Since each of the E+i is sufficient to significantly undo the anti-L effect of E0, it follows that when you consolidate all four pieces of data (starting with the same prior), you get a very high probability of L, indeed much higher than 0.90. In a case where the evidence-against is shared but the evidence-for is not, the consolidated probability is much higher than the individual probabilities.

For another case, it could be that each expert started off with a credence of 1/2 in L, and then each expert had a completely different data set that moved them to their respective probabilities. In this case, when you consolidate, you will once again get a probability significantly higher than any of their individual probabilities, since their data will add up.

On the other hand, if they each have in common a single extremely strong peice of evidence in favor of L but also each have a different strong piece of evidence against L, and we've got the right independence hypotheses, then the result of consolidating their data will be a small probability.

Both scenarios I've described are compatible with the setup. And if one assigns numbers appropriately, one can use these two scenarios to generate any consolidated probability strictly between 0 and 1.

The lesson here is that the result of consolidating expert opinions is not just a function of the expert's credences, even if these credences are all exactly right given the evidence the experts have. Consolidation needs to get below the hood on the experts' credences, to see just how much overlap and dependence there is in the evidence that the experts are basing their views on.

We can, however, give one general rule. If the experts are basing their views on entirely independent (given the hypothesis and given its negation) evidence, and are starting with a prior credence of 1/2, then the consolidated odds are equal to the product of the odds, where the odds corresponding to a probability p are p/(1−p). (It's a lot easier to Bayesian stuff in terms of odds or their logarithms.)

No comments: