In my last two posts (here and here), I introduced the notion of an experiment being epistemically at least as good as another for a set of questions. I then announced a characterization of when this happens in the special case where the set of questions consists of a single binary (yes/no) question and the experiments are themselves binary.
The characterization was as follows. A binary experiment will result in one of two posterior probabilities for the hypothesis that our yes/no question concerns, and we can form the “posterior interval” between them. It turns out that one experiment is at least as good as another provided that the first one’s posterior interval contains the second one’s.
I then noted that I didn’t know what to say for non-binary questions (e.g., “How many mountains are there on Mars?”) but still binary experiments. Well, with a bit of thought, I think I now have it, and it’s almost exactly the same. A binary experiment now defines a “posterior line segment” in the space of probabilities, joining the two possible credence outcomes. (In the case of a probability space with a finite number n of points, the space of probabilities can be identified as the set of points in n-dimensional Euclidean space all of whose coordinates are non-negative and add up to 1.) A bit of thought about convex functions makes it pretty obvious that E2 is at least as good as E1 if and only if E2’s posterior line segment contains E1’s posterior line segment. (The necessity of this geometric condition is easy to see: consider a convex function that is zero everywhere on E2’s posterior line segment but non-zero on one of E1’s two possible posteriors, and use that convex function to generate the scoring rule.)
This is a pretty hard to satisfy condition. The two experiments have to be pretty carefully gerrymandered to make their posterior line segments be parallel, much less to make one a subset of the other. I conclude that when one’s interest is in more than just one binary question, one binary experiment will not be overall better than another except in very special cases.
Recall that my notion of “better” quantified over all proper scoring rules. I guess the upshot of this is that interesting comparisons of scoring rules are not only relative to a set of questions but to a specific proper scoring rule.
No comments:
Post a Comment