Let me qualify what I'm going to say by saying that I know next to nothing about the voting literature.

It's time for admissions committees to deliberate. But Arrow's Theorem says that there is no really good voting method with more than two options.

In some cases, however, there is a simple voting method that, with appropriate assumptions, is provably optimal. The method is simply to have each voter estimate a voter-independent utility of every option, and then to average these estimates, and choose the option with the highest average. By a "voter-independent utility", I mean a utility that does not vary from voter to voter. This could be a global utility of the option or it could be a utility-for-the-community of the voter or even a degree to which a certain set of shared goals are furthered. In other words, it doesn't have to be a full agent-neutral utility, but it needs to be the case that the voters are all estimating the same value—so it can depend on the group of voters as a whole.

Now if we are instead to choose *n* non-interacting options (i.e., the utilities of the options are additive), then we just choose the *n* with the highest averages. Under some assumptions, these simple methods are optimal. The assumptions are onerous, however.

Voting theory, as far as I can tell, is usually conducted in terms of *preferences* between options. In political elections, many people's preferences are probably agent-centered: people are apt to vote for candidates they think will do more for them and for those they take to be close to them. In situations like that, the simple method won't work, because people aren't estimating voter-indepenent utilities but agent-centered utilities.

But there are cases where people really are doing something more like estimating voter-independent utilities. For instance, take graduate admissions or hiring. The voters there really are trying to optimize something like "the objective value of choosing this candidate or these candidates", though of course their deliberations suffer from all sorts of errors.

In such cases, instead of thinking of the problem as a preference reconciliation problem, we can think of it as an estimation problem. We have a set of unknown quantities, *the values of the options*. If we knew what these quantities are, we'd know what decision to take: we'd go for the option(s) with the highest values. Instead, we have a number of evaluators who are each trying to estimate this unknown. Assume that each evaluator's estimate of the unknown quantity simply adds an independent random error to the quantity, and that the error is normally distributed with mean zero. Assume, further, that either the variances of the normal errors are the same between evaluators or that our information about these variances is symmetric between the evaluators (thus, we may know that evaluators are not equally accurate, but we don't know which ones are the ones who are more accurate). Suppose that I have no further relevant information about the differences in the values of the options besides the evaluators' estimates, and so I have the same prior probability distribution for the value of each option (maybe it's a pessimistic one that says that the option is probably bad).

Given all of the above information, I now want to choose the option that maximizes, with respect to my epistemic probabilities, the expected value of the option. It turns out by Bayes' Theorem together with some properties of normal random variables that the expected value of an option *o*, given the above information, can be written *A**a*_{0}+*B**a*(*o*), where *a*_{0} is the mean-value of my baseline estimate for all the options and *a*(*o*) is the average of the evaluators' evaluations of *o*, and where both *A* and *B* are positive. It follows that under the above assumptions, if I am trying to maximize expected value, choosing the option(s) with the highest value of *a*(*o*) is provably optimal.

Now there are some serious problems here, besides the looming problem that the whole business of numerical utilities may be bankrupt (which I think in some cases isn't so big an issue, because numerical utilities can be a useful approximation in some cases). One of them is that one evaluator can skew the evaluations by assigning such enormous utilities to the candidates that her evaluations swamp everyone else's data. The possibility of such an evaluator violates my assumption that each person's evaluation is equal to the unknown plus an error term centered on zero. Such an evaluator is either really stupid, or dishonest (i.e., not reporting her actual estimates of utilities). This problem by itself is enough to ensure that the method can't be used except in a community of justified mutual trust.

A second serious problem is that we're not very good at making absolute utility judgments, and are probably better at rank ordering. The optimality condition requires that we work with utilities rather than rank orderings. But in a case where the number of options is largish—admissions and hiring cases are like that—if we assume that value is normally distributed in the option pool, we can get an approximation to the utilities from an evaluator's rank ordering of the *n* options. One way to do this is to use the rank ordering to assign estimated percentile ranks to each option, and then convert them to one's best estimate of the normally distributed value (maybe this can just be done by applying the inverse normal cumulative distribution function—I am not a statistician). Then average these between evaluators. Doing this also compensates for any affine shift, such as that due to the exaggerating evaluator in the preceding paragraph. I can't prove the optimality of this method, and it is still subject to manipulation by a dishonest evaluator (say, one who engages in strategic voting rather than reporting her real views).

I think the above can also work under some restrictive assumptions even if the evaluators are evaluating value-for-them rather than voter-independent value.

The basic thought in the above is that in some cases instead of approaching a voting situation as a preference situation, we approach it as a scientific estimation situation.

## 4 comments:

"I can't prove the optimality of this method"

That's because computer simulation shows this modified method is worse than Condorcet when the number of options is small (say, 3-10), though it's better than Condorcet when the number is bigger (say 30 or more), where I am only comparing against Condorcet when there is a Condorcet winner.

This is intriguing and I can see a version of it being used in some mutually-trusting situations. However, a cautionary tale:

I regularly give my students an example of me doing a bad action (not buckling my son in his carseat before driving a few quiet blocks) and ask them to estimate, on a scale of 0-10, how bad the action is. 0 means not bad at all and 10 is Hitler. I regularly get numbers around 3 and sometimes up to 8.

This shows either (a) people are really bad at estimating values on a scale, or (b) I need a very deep moral reformation.

Yeah, this is part of why ordinal rankings are probably better. It's a lot easier to see that your action is less bad than Hitler's paradigmatic bad actions than to see how much less bad it is.

I wonder if it would help to normalize your scale if you gave a particular example for what 1 means (say, your coming 15 seconds late to class because you were texting a friend for a trivial reason).

I wonder if something that may throw your students off is that some Protestants seem to think that all sins are equal (an invalid inference from the (also false) claim that all sins deserve hell). People who are in the grip of that idea then think your action and Hitlers' are equal. But they can't get themselves to utter that absurdity (plus, it's rude to tell your teacher that they're as bad as Hitler), so they fudge and give you an 8, even though their theology implies you should have a 10. Just speculation.

I don't think the "all sins are equal" business is driving any/much of the results.

It did occur to me that perhaps people tend to use a log scale. Those are intuitive for data distributions with a lot of points on one end and diminishing points as you go up a scale.

Post a Comment