Monday, March 8, 2021

Strict propriety of credential scoring rules

An (inaccuracy) scoring rule measures how far a probabilistic forecast lies from the truth. Thus, it assigns to each forecast p a score s(p) which is a [0, ∞]-valued random variable varying over the probability space Ω that measures distance from truth. Let’s work with finite probability spaces and assume all the forecasts are consistent probability functions.

A rule s is proper provided that Eps(p)≤Eps(q) for any probability functions p and q, where Epf = ∑ωΩp({ω})f(ω) is the expectation of f according to p, using the convention that 0 ⋅ ∞ = 0. Propriety is the very reasonable condition that whatever your forecast, according to your forecast you don’t expect any other other specific forecast to be better—if you did, you’d surely switch to it.

A rule is strictly proper provided that Eps(p)<Eps(q) whenever p and q are distinct. It says that by the lights of your forecast, your forecast is better than any other. It is rather harder to intuitively justify strict propriety.

A very plausible condition is continuity: your score in every possible situation ω ∈ Ω depends continuously on your probability assignment.

Last week while having a lot of time on my hands while our minivan was having an oil change, I got interested in the question of what kinds of failures of strict propriety can be exhibited by a continuous proper scoring rule. It is, of course, easy to see that one can have continuous proper scoring rules that aren’t strictly proper: for instance, one can assign the same score to every forecast. Thinking about this and other examples, I conjectured that the only way strict propriety can fail in a continuous proper scoring rule (restricted to probability functions) is by assigning the same score to multiple forecasts.

Last night I found what looks to be a very simple proof of the conjecture: Assuming the proof is right (it still looks right this morning), if s is a continuous proper scoring rule defined on the probabilities, and Eps(p)=Eps(q), then s(p)=s(q) (everywhere in Ω).

Given this, the following follows:

  • A continuous scoring rule defined on the probabilities is strictly proper if and only if it is proper and fine-grained,

where a scoring rule is fine-grained provided that it is one-to-one on the probabilities: it assigns different scores to different probabilities. (I mean: if p and q are different, then there is an ω ∈ Ω such that s(p)(ω)≠s(q)(ω).)

But fine-grainness seems moderately plausible to me: a scoring rule is insufficiently “sensitive” if it assigns the same score to different consistent forecasts. So we have an argument for strict propriety, at least as restricted to consistent probability functions.

No comments: