Wednesday, February 1, 2023

Open-mindedness and propriety

Suppose we have a probability space Ω with algebra F of events, and a distinguished subalgebra H of events on Ω. My interest here is in accuracy H-scoring rules, which take a (finitely-additive) probability assignment p on H and assigns to it an H-measurable score function s(p) on Ω, with values in [−∞,M] for some finite M, subject to the constraint that s(p) is H-measurable. I will take the score of a probability assignment to represent the epistemic utility or accuracy of p.

For a probability p on F, I will take the score of p to be the score of the restriction of p to H. (Note that any finitely-additive probability on H extends to a finitely-additive probability on F by Hahn-Banach theorem, assuming Choice.)

The scoring rule s is proper provided that Eps(q) ≤ Eps(p) for all p and q, and strictly so if the inequality is strict whenever p ≠ q. Propriety says that one never expects a different probability from one’s own to have a better score (if one did, wouldn’t one have switched to it?).

Say that the scoring rule s is open-minded provided that for any probability p on F and any finite partition V of Ω into events in F with non-zero p-probability, the p-expected score of finding out where in V we are and conditionalizing on that is at least as big as the current p-expected score. If the scoring rule is open-minded, then a Bayesian conditionalizer is never precluded from accepting free information. Say that the scoring rule s is strictly open-minded provided that the p-expected score increases of finding out where in V we are and conditionalizing increases whenever there is at least one event E in V such that p(⋅|E) differs from p on H and p(E) > 0.

Given a scoring rule s, let the expected score function Gs on the probabilities on H be defined by Gs(p) = Eps(p), with the same extension to probabilities on F as scores had.

It is well-known that:

  1. The (strict) propriety of s entails the (strict) convexity of Gs.

It is easy to see that:

  1. The (strict) convexity of Gs implies the (strict) open-mindedness of s.

Neither implication can be reversed. To see this, consider the single-proposition case, where Ω has two points, say 0 and 1, and H and F are the powerset of Ω, and we are interested in the proposition that one of these point, say 1, is the actual truth. The scoring rule s is then equivalent to a pair of functions T and F on [0,1] where T(x) = s(px)(1) and F(x) = s(px)(0) where px is the probability that assigns x to the point 1. Then Gs corresponds to the function xT(x) + (1−x)F(x), and each is convex if and only if the other is.

To see that the non-strict version of (1) cannot be reversed, suppose (T,F) is a non-trivial proper scoring rule with the limit of F(x)/x as x goes to 0 finite. Now form a new scoring rule by letting T * (x) = T(x) + (1−x)F(x)/x. Consider the scoring rule (T*,0). The corresponding function xT * (x) is going to be convex, but (T*,0) isn’t going to be proper unless T* is constant, which isn’t going to be true in general. The strict version is similar.

To see that (2) cannot be reversed, note that the only non-trivial partition is {{0}, {1}}. If our current probability for 1 is x, the expected score upon learning where we are is xT(1) + (1−x)F(0). Strict open-mindedness thus requires precisely that xT(x) + (1−x)F(x) < xT(1) + (1−x)F(0) whenever x is neither 0 nor 1. It is clear that this is not enough for convexity—we can have wild oscillations of T and F on (0,1) as long as T(1) and F(1) are large enough.

Nonetheless, (2) can be reversed (both in the strict and non-strict versions) on the following technical assumption:

  1. There is an event Z in F such that Z ∩ A is a non-empty proper subset of A for every non-empty member of H.

This technical assumption basically says that there is a non-trivial event that is logically independent of everything in H. In real life, the technical assumption is always satisfied, because there will always be something independent of the algebra H of events we are evaluating probability assignments to (e.g., in many cases Z can be the event that the next coin toss by the investigator’s niece will be heads). I will prove that (2) can be reversed in the Appendix.

It is easy to see that adding (3) to our assumptions doesn’t help reverse (1).

Since open-mindedness is pretty plausible to people of a Bayesian persuasion, this means that convexity of Gs can be motivated independently of propriety. Perhaps instead of focusing on propriety of s as much as the literature has done, we should focus on the convexity of Gs?

Let’s think about this suggestion. One of the most important uses of scoring rules could be to evaluate the expected value of an experiment prior to doing the experiment, and hence decide which experiment we should do. If we think of an experiment as a finite partition V of the probability space with each cell having non-zero probability by one’s current lights p, then the expected value of the experiment is:

  1. A ∈ Vp(A)EpAs(pA) = ∑A ∈ Vp(A)Gs(pA),

where pA is the result of conditionalizing p on A. In other words, to evaluate the expected values of experiments, all we care about is Gs, not s itself, and so the convexity of Gs is a very natural condition: we are never oligated to refuse to know the results of free experiments.

However, at least in the case where Ω is finite, it is known that any (strictly) convex function (maybe subject to some growth conditions?) is equal to Gu for a some (strictly) proper scoring rule u. So we don’t really gain much generality by moving from propriety of s to convexity of Gs. Indeed, the above observations show that for finite Ω, a (strictly) open-minded way of evaluating the expected epistemic values of experiments in a setting rich enough to satisfy (3) is always generatable by a (strictly) proper scoring rule.

In other words, if we have a scoring rule that is open-minded but not proper, we can find a proper scoring rule that generates the same prospective evaluations of the value of experiments (assuming no special growth conditions are needed).

Appendix: We now prove the converse of (2) assuming (3).

Assume open-mindedness. Let p1 and p2 be two distinct probabilities on H and let t ∈ (0,1). We must show that if p = tp1 + (1−t)p2, then

  1. Gs(p) ≤ tGs(p1) + (1−t)Gs(p2)

with the inequality strict if the open-mindedness is strict. Let Z be as in (3). Define

  1. p′(AZ) = tp1(A)

  2. p′(AZc) = (1−t)p2(A)

  3. p′(A) = p(A)

for any A ∈ H. Then p′ is a probability on the algebra generated by H and Z extending p. Extend it to a probability on F by Hahn-Banach. By open-mindedness:

  1. Gs(p′) ≤ p′(Z)EpZs(pZ) + p′(Zc)EpZcs(pZc).

But p′(Z) = p(ΩZ) = t and p′(Zc) = 1 − t. Moreover, pZ = p1 on H and pZc = p2 on H. Since H-scores don’t care what the probabilities are doing outside of H, we have s(pZ) = s(p1) and s(pZc) = s(p2) and Gs(p′) = Gs(p). Moreover our scores are H-measurable, so EpZs(p1) = Ep1s(p1) and EpZcs(p2) = Ep2s(p2). Thus (9) becomes:

  1. Gs(p) ≤ tGs(p1) + (1−t)Gs(p2).

Hence we have convexity. And given strict open-mindedness, the inequality will be strict, and we get strict convexity.

No comments: