Suppose we have a probability space Ω with algebra F of events, and a distinguished
subalgebra H of events on
Ω. My interest here is in
accuracy H-scoring rules,
which take a (finitely-additive) probability assignment p on H and assigns to it an H-measurable score function s(p) on Ω, with values in [−∞,M] for some finite M, subject to the constraint that
s(p) is H-measurable. I will take the score
of a probability assignment to represent the epistemic utility or
accuracy of p.
For a probability p on
F, I will take the score of
p to be the score of the
restriction of p to H. (Note that any finitely-additive
probability on H extends to a
finitely-additive probability on F by Hahn-Banach theorem, assuming
Choice.)
The scoring rule s is
proper provided that Eps(q) ≤ Eps(p)
for all p and q, and strictly so if the inequality
is strict whenever p ≠ q. Propriety says that
one never expects a different probability from one’s own to have a
better score (if one did, wouldn’t one have switched to it?).
Say that the scoring rule s
is open-minded provided that for any probability p on F and any finite partition V of Ω into events in F with non-zero p-probability, the p-expected score of finding out
where in V we are and
conditionalizing on that is at least as big as the current p-expected score. If the scoring
rule is open-minded, then a Bayesian conditionalizer is never precluded
from accepting free information. Say that the scoring rule s is strictly open-minded
provided that the p-expected
score increases of finding out where in V we are and conditionalizing
increases whenever there is at least one event E in V such that p(⋅|E) differs from p on H and p(E) > 0.
Given a scoring rule s, let
the expected score function Gs on the
probabilities on H be defined
by Gs(p) = Eps(p),
with the same extension to probabilities on F as scores had.
It is well-known that:
- The (strict) propriety of s entails the (strict) convexity of
Gs.
It is easy to see that:
- The (strict) convexity of Gs implies the
(strict) open-mindedness of s.
Neither implication can be reversed. To see this, consider the
single-proposition case, where Ω has two points, say 0 and 1, and
H and F are the powerset of Ω, and we are interested in the
proposition that one of these point, say 1, is the actual truth. The scoring rule
s is then equivalent to a pair
of functions T and F on [0,1] where T(x) = s(px)(1)
and F(x) = s(px)(0)
where px
is the probability that assigns x to the point 1. Then Gs corresponds
to the function xT(x) + (1−x)F(x),
and each is convex if and only if the other is.
To see that the non-strict version of (1) cannot be reversed, suppose
(T,F) is a
non-trivial proper scoring rule with the limit of F(x)/x as x goes to 0 finite. Now form a new scoring rule by
letting T * (x) = T(x) + (1−x)F(x)/x.
Consider the scoring rule (T*,0). The corresponding function
xT * (x) is
going to be convex, but (T*,0)
isn’t going to be proper unless T* is constant, which isn’t going to
be true in general. The strict version is similar.
To see that (2) cannot be reversed, note that the only non-trivial
partition is {{0}, {1}}. If our current
probability for 1 is x, the expected score upon learning
where we are is xT(1) + (1−x)F(0).
Strict open-mindedness thus requires precisely that xT(x) + (1−x)F(x) < xT(1) + (1−x)F(0)
whenever x is neither 0 nor 1. It
is clear that this is not enough for convexity—we can have wild
oscillations of T and F on (0,1) as long as T(1) and F(1) are large enough.
Nonetheless, (2) can be reversed (both in the strict and non-strict
versions) on the following technical assumption:
- There is an event Z in
F such that Z ∩ A is a non-empty proper
subset of A for every
non-empty member of H.
This technical assumption basically says that there is a non-trivial
event that is logically independent of everything in H. In real life, the technical
assumption is always satisfied, because there will always be something
independent of the algebra H
of events we are evaluating probability assignments to (e.g., in many
cases Z can be the event that
the next coin toss by the investigator’s niece will be heads). I will
prove that (2) can be reversed in the Appendix.
It is easy to see that adding (3) to our assumptions doesn’t help
reverse (1).
Since open-mindedness is pretty plausible to people of a Bayesian
persuasion, this means that convexity of Gs can be
motivated independently of propriety. Perhaps instead of focusing on
propriety of s as much as the
literature has done, we should focus on the convexity of Gs?
Let’s think about this suggestion. One of the most important uses of
scoring rules could be to evaluate the expected value of an experiment
prior to doing the experiment, and hence decide which experiment we
should do. If we think of an experiment as a finite partition V of the probability space with each
cell having non-zero probability by one’s current lights p, then the expected value of the
experiment is:
- ∑A ∈ Vp(A)EpAs(pA) = ∑A ∈ Vp(A)Gs(pA),
where pA is the result
of conditionalizing p on A. In other words, to evaluate the
expected values of experiments, all we care about is Gs, not s itself, and so the convexity of
Gs is a
very natural condition: we are never oligated to refuse to know the
results of free experiments.
However, at least in the case where Ω is finite, it
is known that any (strictly) convex function (maybe subject to some
growth conditions?) is equal to Gu for a some
(strictly) proper scoring rule u. So we don’t really gain much
generality by moving from propriety of s to convexity of Gs. Indeed, the
above observations show that for finite Ω, a (strictly) open-minded way of
evaluating the expected epistemic values of experiments in a setting
rich enough to satisfy (3) is always generatable by a (strictly) proper
scoring rule.
In other words, if we have a scoring rule that is open-minded but not proper,
we can find a proper scoring rule that generates the same prospective evaluations of the
value of experiments (assuming no special growth conditions are needed).
Appendix: We now prove the converse of (2) assuming
(3).
Assume open-mindedness. Let p1 and p2 be two distinct
probabilities on H and let
t ∈ (0,1). We must show that
if p = tp1 + (1−t)p2,
then
- Gs(p) ≤ tGs(p1) + (1−t)Gs(p2)
with the inequality strict if the open-mindedness is strict. Let
Z be as in (3). Define
p′(A∩Z) = tp1(A)
p′(A∩Zc) = (1−t)p2(A)
p′(A) = p(A)
for any A ∈ H.
Then p′ is a probability on
the algebra generated by H and
Z extending p. Extend it to a probability on
F by Hahn-Banach. By
open-mindedness:
- Gs(p′) ≤ p′(Z)Ep′Zs(p′Z) + p′(Zc)Ep′Zcs(p′Zc).
But p′(Z) = p(Ω∩Z) = t
and p′(Zc) = 1 − t.
Moreover, p′Z = p1
on H and p′Zc = p2
on H. Since H-scores don’t care what the
probabilities are doing outside of H, we have s(p′Z) = s(p1)
and s(p′Zc) = s(p2)
and Gs(p′) = Gs(p).
Moreover our scores are H-measurable, so Ep′Zs(p1) = Ep1s(p1)
and Ep′Zcs(p2) = Ep2s(p2).
Thus (9) becomes:
- Gs(p) ≤ tGs(p1) + (1−t)Gs(p2).
Hence we have convexity. And given strict open-mindedness, the
inequality will be strict, and we get strict convexity.