## Thursday, February 9, 2012

### Probabilities, scoring functions, and an argument that it is infinitely worse to be certain that a truth is false than it is good to be certain that that truth is true

One oddity of the normal 0-to-1 probability measure is that it hides epistemically significant differences near the endpoints in a way that may skew intuitions.  You need a ton of evidence to move your probability from 0.99 to 0.9999.  But the absolute difference in probabilities is only 0.0099.

It turns out there is a nice solution to this, apparently due to Alan Turing, which I had fun rediscovering yesterday.  Define
• φ(H) = log(1/P(H 1) = log(P(H)/P(~H)), and
• φ(H|E) = log(1/P(H|E 1) = log(P(H|E)/P(~H|E)).
This symmetrically transforms probabilities from the 0-to-1 range to a  to +∞ range.  To the right we have the graph of the transformation function.

But here is something else that's neat about φ.  It lets you rewrite Bayes' theorem so it becomes:
• φ(H|E) = φ(H) + C(E,H),
where C(E,H) = log(P(E|H)/P(E|~H)) is the log-Bayes'-ratio measure of confirmation.

And it gets better.  Suppose E1,...,En are pieces of evidence that are conditionally independent given H and conditionally independent given ~H.  (One can think of these pieces of evidence as independent tests for H versus ~H.  For instance, if our two hypotheses are that our coin is fair or that it is biased 9:1 in favor of heads, then E1,...,En can be the outcomes of successive tosses.)  Then:
• φ(H|E1&...&En) = φ(H)+C(E1,H)+...+C(En,H).
In other words, φ linearizes the effect of independent evidence.  (Doesn't this make nicely plausible the claim that C(E,H) is the correct measure of confirmation, or at least is ordinally equivalent to it?)

Jim Hawthorne tells me that L. J. Savage used φ to prove a Bayesian convergence theorem, and it's not that hard to see from the above formulae how might go about doing that.

Moreover, there is a rather interesting utility-related fact about φ.  Suppose we're performing exactly similar independent tests for H versus ~H that provide only a very small incremental change in probabilities.  Suppose each test has a fixed cost to perform.  Suppose that in fact the hypothesis H is true, and we start with a φ-value of 0 (corresponding to a probability of 1/2).  Then, assuming that the conditional probabilities are such that one can confirm H by these tests, the expected cost of getting to a φ-value of y by using such independent tests turns out to be, roughly speaking proportional to y.  Suppose, on the other hand, that you have a negative φ-value y and you want to know just how unfortunate that is, in light of the fact that H is actually true.  You can quantify the badness of the negative φ-value by looking at how much you should expect it to cost to perform the experiments needed to get to the neutral φ-value of zero.  It turns out that the cost is, again roughly speaking, |y|.  In other words, φ quantifies experimental costs.

This in turn leads to the following intuition.  If H is true, the epistemic utility of having a negative φ-value of y is going to be proportional to y, since the cost of moving from y to 0 is proportional to |y|.  Then, assuming our epistemic utilities are proper, I have a theorem that shows that this forces (at least under some mild assumptions on the epistemic utility) a particular value for the epistemic utility for positive y.

Putting this in terms of credences rather than φ-values, it turns out that our measure of the epistemic utility of assigning credence r to a truth is proportional to:
• log(1/r  1) for r1/2
• 2 1/r for r1/2.
In particular, it's infinitely worse to be certain that a truth p is false than it is good to be certain that the truth p is true (cf. a much weaker result I argued for earlier).  (Hence, if faith requires certainty, then faith is only self-interestedly rational when there is some other infinite benefit of that certainty--which there may well be.)

The plot to the right shows the above two-part function.  (It may be of interest to note that the graph is concave--concavity is a property discussed in the scoring-rule literature.)  Notice how very close to linear it is in the region between around 0.25 and 0.6.

My one worry about this is that by quantifying the disvalue of below-1/2 credence of a truth in terms of the experimental costs of getting out of the credence, one may be getting at practical rather than epistemic utility.  I am not very worried about this.