## Thursday, March 22, 2012

### Aggregating data from agents with the same evidence

Consider a case where we have two or more rational agents who have in some sense the same evidence, but who evaluate the force of the evidence differently and who have different priors, and who assign different credences to p. Suppose for simplicity that you are a completely undecided agent, with no evidence of your own, rather than one of the people with the evidence (this brackets one of the questions that the disagreement literature is concerned with—whether if you are one of these agents, you should stand pat or not). What credence should we assign after aggregating the agents' different credences?

An obvious suggestion is that we average the credences. That suggestion is incorrect, I believe.

The intuition I have is that averaging is the right move to make when aggregating estimates that are likely to suffer from normally distributed errors. But credences do not suffer from normally distributed errors. Suppose the correct credence, given the evidence, is 0.9. The rational agent's credences is not normally distributed around 0.9, since it cannot exceed 1 or fall below 0.

However, once we replace the credences with logarithms of odds, as we have learned to do from Turing, where the log-odds corresponding to a credence p is log (p/(1−p)), then we are dealing with the sorts of additive quantities where we can expect normally distributed error. When we are dealing with log-odds, Bayes' theorem becomes additive:

• posterior-log-odds = prior-log-odds + log-likelihood-ratio.
We can think of the rational agents as having normally distributed errors for their prior log-odds and for their estimate of the evidence's log-likelihood-ratio. (Maybe more can be said in defense of those assumptions.) We idealize, then, by supposing errors to be independent. And in cases where we are dealing with independent normally distributed errors, the best aggregation of the estimates is arithmetic averaging (cf. this post on voting).

If this line of thought works, what we should do is calculate the log-odds corresponding to the agents' credences, average these (somehow weighting by competence, I suppose, if there is competence data), and then calculate the credence corresponding to that average.

This method handles symmetry cases just as ordinary averaging does. If one agent says 0.9 and another says 0.1, then we get 0.5, as we should.

But this method of aggregation yields significantly different results when some of the credences are close to 0 or 1. Suppose we have two agents with credences 0.1 and 0.99. The arithmetic average would be 0.55. But this method recommends 0.77. Suppose we have three agents with credences 0.1, 0.1 and 0.99. The arithmetic average would be 0.40. But our aggregation method yields 0.52. On the other hand, if we have credences 0.02 and 0.8, we get 0.22. All this is correct, under the normal distribution in log-odds error assumption.

If you want to play with this, I made a simple credence aggregation calculator.

This method, thus, accords greater weight to those who are more certain, in either direction. Therefore, the method suffers from the same manipulation problem that the corresponding voting method does. The method will produce terrible results when applied to agents who significantly overestimate probabilities close to 1 or underestimate probabilities close to 0—or when they lie about their credences. That's why I am only advertising this method in the case of rational agents. How useful this is in real life is hard to say. It could be that one just needs to adapt the method by throwing out fairly extreme credences, just as one throws out outliers in science, by taking them to be evidence of credences not formed on the basis of evidence (this need not be pejorative—I am not an evidentialist).

There is, I think, an interesting lesson here that parallels a lesson I drew out in the voting case. In aggregating credences, just as in aggregating votings, we have two desiderata: (1) extract as much useful information as we can from the individual agent data, and (2) not allow individual non-rational or non-team-player agents to manipulate the outcome unduly. These two desiderata are at odds with each other. How far we can trust other agents not to be manipulative affects social epistemology just as it does voting.

But here is a happy thought for those of us who (like me) have high credences in various propositions that are dear to us and where those credences are, we think, evidence-based. For then we get to outvote, in the court of our own minds (for our friends may dismiss us as outliers), more sceptically oriented friends. Let's say my credence that it's objectively wrong to torture those known to be innocent is 0.99999999, but I have two colleagues who incline to irrealism, and hence assign 0.1 to this claim. Even if I accord no greater weight to my own opinions, I still end up with an aggregate credence of 0.99.