## Wednesday, January 27, 2021

### Nonadditive strictly proper scoring rules and arguments for probabilism

[This post uses the wrong concept of a strictly proper score. See the comments.]

A scoring rule for a credence assignment is a measure of the inaccuracy of the credences: the lower the value, the better.

A proper scoring rule is a scoring rule with the property that for each probabilistically consistent credence assignment P, the expected value according to P of the score for P is maximized at P. If it’s maximized uniquely at P, the scoring rule is said to be strictly proper.

A scoring rule is additive provided that it is the sum of scoring rules each of which depends only on the credence assigned to a single proposition and the truth value of that proposition.

The formal epistemology literature has a lot of discussion of a strict domination theorem that given an additive strictly proper scoring rule, you will do better to have a credence assignment that is probabilistically consistent: indeed, another credence assignment will give a better score in every possible world.

The assumption of strict propriety gets a fair amount of discussion. Not so the assumption of additivity.

It turns out that if you drop additivity, the theorem fails. Indeed: this is trivial. Consider any strictly proper scoring rule s, and modify it to a rule s* that assigns the score −∞ to any inconsistent credence. Then any inconsistent credence receives the best possible score in every possible world. Moreover, s* is still strictly proper if s is because the definition of strict propriety only involves the behavior of the scoring rule as applied to consistent credences, and hence s* is strictly proper if and only if s is. And, of course, s* is not additive.

But of course my rule s* is very much ad hoc and it is gerrymandered to reward inconsistency. Can we make a non-additive scoring for which the domination theorem fails that lacks such gerrymandering and is somewhat natural?

I think so. Consider a finite probability space Ω, with n points ω1, ..., ωn in it. Now, consider a scoring rule generated as follows.

Say that a simple gamble g on Ω is an assignment of values to the n points. Let G be a set of simple gambles. Imagine an agent who decides which simple gamble g in G to take by the following natural method: she calculates ∑iP({ωi})g(ωi), where P is her credence assignment, and chooses the gamble g that maximizes this sum. If there is a tie, she has some tie-resolution mechanism. Then, we can say that the G-score of her credences is the negative of the utility gained from the gamble she chose. In other words, her G-score at location ωi is −g(ωi) where g is a maximally auspicious gamble according to her credences.

It is easy to see that G-score is a proper score. Moreover, if there are never any ties in choosing the maximally auspicious gamble, the score is strictly proper.

This is a very natural way to generate a score: we generate a score by looking how well you would do when acting on the credences in the face of a practical decision. But any scores generated in this way will fail to satisfy the domination theorem. Here’s why: the scoring rule scores any inconsistent non-negative credence P that is non-zero on some singleton the same way as it scores the consistent credence P* defined by P*(A)=∑ω ∈ AP({ω})/∑ω ∈ ΩP({ω}). Thus, the domination theorem will fail to apply to any scoring rule generated in the above way, since the domination thing does not happen for consistent credences.

The only thing that remains is to check that there is some natural strictly proper rule that can be generated using the above method. Here’s one. Let Gn be the set of simple gambles that assign to the n points of Ω values that lie in the n-dimensional unit ball. In other words, each simple gamble g ∈ Gn is such that ∑i(g(ai))2 ≤ 1.

A bit of easy constrained maximization using Lagrange multipliers shows that if P is a credence assignment on Ω such that P({ωi}) ≠ 0 for at least one point ωi ∈ Ω, then there is a unique maximally auspicious gamble g and it is given by g(ωj)=P({ωj})/(∑i(P({ωi}))2)1/2. Because of the uniqueness, we have a strictly proper scoring rule.

The Gn-score of a credence assignment P is then s(P, ωj)= − P({ωj})/(∑i(P({ωi}))2)1/2.

This looks fairly natural. The choice of Gn seems fairly natural as well. There is no gerrymandering going on. And yet the domination theorem fails for the Gn-score. (I think any strictly convex set of simple gambles works for Gn, actually.)

Thus, absent some good argument for why Gn-score is a bad way to score credences, it seems that the scoring rule domination argument isn’t persuasive.

More generally, consider any credence-based procedure for deciding between finite sets of gambles that has the following two properties:

1. The procedure yields a gamble that maximizes expected utility in the case of consistent credences, and

2. The procedure never recommends a gamble that is dominated by another gamble.

There are such procedures that apply to interesting classes of inconsistent credences and that are nonetheless pretty natural. Given any such procedure, we can extend it arbitrarily to apply to all inconsistent credences, we assign a score to a credence assignment as the negative of the value of the selected gamble, and we have a proper score to which the domination theorem doesn’t apply. And if make our set of gambles be the n-ball Gn, then the score is strictly proper.

Dmitri Gallow said...

"Consider any strictly proper scoring rule s, and modify it to a rule s* that assigns the score −∞ to any inconsistent credence. Then any inconsistent credence receives the best possible score in every possible world. Moreover, s* is still strictly proper if s is because the definition of strict propriety only involves the behavior of the scoring rule as applied to consistent credences, and hence s* is strictly proper if and only if s is."

I think this isn't quite right. There are two ways that the term "strictly proper" get used. In some statistical contexts, they say that s is strictly proper iff each probability expects itself to have a better s-score than every other *probability*. If this is how "strict propriety" is understood, then the counterexample works. But the way that the term gets used by people arguing for probabilism is different. Those people say that a score s is strictly proper iff each probability expects itself to have a better s-score than every other *credence* function (whether it's probabilistic or not).

Now, take a non-trivial probabilistic credence function Pr (one which doesn't give all of its probability to a single world) and a non-probabilistic credence function Cr. If s*(Cr, w) = −∞ for every world w, then Pr's expectation of Cr's s*-score will be −∞. And this will be higher than Pr's expectation of its own s*-score. So there will be a probability function which expects another credence function to have a higher s*-score than it does. So s* is not strictly proper.

Also, if you're running with the first definition of "strictly proper", you'll have failures of the domination result even with additivity. Let your overall score just be the sum of your Brier score for each *atomic* proposition, {w} : s(C, w) = -(1-C(w))^2 - SUM_{w' =/= w} -C(w')^2. This score is strictly proper in the sense that every probability expects its score to be higher than every other probability, and it is additive.

Then, let the non-probabilistic Cr be just like the probabilistic Pr when it comes to the atomic propositions, but which gives a probability of 0 to every other other proposition. Cr will have the same score as Pr in every possible world. And so Pr will not strictly dominate Cr. That's not a problem for the theorems, since this score isn't strictly proper in the sense that those theorems use the term.

Alexander R Pruss said...

Thank you. I missed that in the theorems. That is embarrassing. And your point also applies to my "fairly natural" score at the end. Oh well.

Can one prove domination without additivity using the stronger notion of strong propriety?

Alexander R Pruss said...

I think the answer to my last question is negative, unless a similar mistake was made in this earlier post:
http://alexanderpruss.blogspot.com/2014/03/an-interesting-epistemic-scoring-rule.html

That earlier post, curiously, used the *correct* concept of propriety. But maybe it made some other mistake.

Dmitri Gallow said...

My understanding is that Predd et al (https://arxiv.org/abs/0710.3183) assume both additivity and strict propriety (and continuity) in their main result. I think Joyce (http://www-personal.umich.edu/~jjoyce/papers/aac.pdf) similarly relies upon additivity.

Interestingly, additivity is not needed for Greaves and Wallace's accuracy justifications of conditionalization. In that case, I believe strict propriety and differentiability suffice.

Alexander R Pruss said...

It's fun to think about this geometrically. If |Omega|=n, then a score for a credence assignment is a point in n-dimensional space. A necessary condition for strict propriety (in the sense you're talking about) is that the score for any probability is an extremal point of the convex hull of the set S of all scores. Say that the "rectilinear positive cone (rpc)" defined by a point z in the space is the set of all points z' such that every coordinate of z' is bigger than the corresponding coordinate of z. Then the domination condition holds provided that every score for a non-probability lies in the union of the rpcs defined by the scores for the probabilities.

This approach makes it easier for me to visualize non-additive scoring rules.

Alexander R Pruss said...

I think I may be onto a way of proving a domination result without additivity. If all works out, I will only need continuity (or maybe only the set of scores of probabilities being closed in R^n), but perhaps I will need boundedness as well (unlike Predd).

The example from my earlier blog post is not continuous (nor is the set of scores of probabilities closed).