One of the most problematic aspects of some science practice is a cut-off, say at 95%, for the evidence-based confidence needed for publication.

I just realized, with the help of a mention of *p*-based biases and improper scoring
rules somewhere on the web, that what is going on here is precisely a
problem of a reward structure that does not result in a proper scoring
rule, where a proper scoring rule is one where your current probability
assignment is guaranteed to have an optimal expected score according to
that very probability assignment. Given an improper scoring rule, one
has a perverse incentive to change one’s probabilities without
evidence.

To a first approximation, the problem is *really, really* bad.
Insofar as publication is the relevant reward, it is a reward
independent of the truth of the matter! In other words, the scoring rule
has a reward for gaining probability 0.95 (say) in the hypothesis,
regardless of whether the hypothesis is true or false.

Fortunately, it’s not quite so bad. Publication is the short-term
reward. But there are long-term rewards and punishments. If one
publishes, and later it turns out that one was right, one may get
significant social recognition as the *discoverer* of the truth
of the hypothesis. And if one publishes, and later it turns out one is
wrong, one gets some negative reputation.

However, notice this. Fame for having been right is basically independent of the exact probability of the hypothesis one established in the original paper. As long as the probability was sufficient for publication, one is rewarded for fame. Thus if it turns out that one was right, one’s long-term reward is fame if and only if one’s probability met the threshold for publication and one was right. And one’s penalty is some negative reputation if and only if one’s probability met the threshold for publication and yet one was wrong. But note that scientists are actually extremely forgiving of people putting forward evidenced hypotheses that turn out to be false. Unlike in history, where some people live on in infamy, scientists who turn out to be wrong do not suffer infamy. At worst, some condescension. And it barely varies with your level of confidence.

The long-term reward structure is approximately this:

If your probability is insufficient for publication, nothing.

If your probability meets the threshold for publication and you’re right, big positive.

If your probability meets the threshold for publication and you’re wrong, at worst small negative.

This is not a proper scoring rule. It’s not even close. To make it
into a proper scoring rule, the penalty for being wrong at the threshold
would need to be way higher than the reward for being right.
Specifically, if the threshold is *p* (say 0.95), then the ratio of reward to penalty
needs to be (1−*p*) : *p*. If *p* = 0.95, the reward to penalty
ratio would need to be 1:19. If *p* = 0.99, it would need to be a
staggering 1:99, and if *p* = 0.9, it would need to be a still
large 1:9. We are very, very far from that. And when we add the
truth-independent reward for publication, things become even worse.

We can see that something is problematic if we think about cases like
this. Suppose your current level of confidence is just slightly above
the threshold, and a graduate student in your lab proposes to do one
last experiment in her spare time, using equipment and supplies that
would otherwise go to waste. Given the reward structure, it will likely
make sense for you to refuse this free offer of additional information.
If the experiment favors your hypothesis, you get nothing out of it—you
could have published without it, and you’d still have the same longer
term rewards available. But if the experiment disfavors your hypothesis,
it will likely make your paper unpublishable (since you were at the
threshold), but since it’s just one experiment, it is unlikely to put
you into the position of yet being able to publish a paper against the
hypothesis. At best it loses you the risk of the small negative
reputation for having been wrong, and since that’s a small penalty, and
an unlikely one (since most likely your hypothesis *is* true by
your data), so that’s not worth it. In other words, the the structure
rewards you for ignoring free information.

How can we fix this? We simply cannot realistically fix it if we have a high probability threshold for publication. The only way to fix it while keeping a high probability threshold would be by having a ridiculously high penalty for being wrong. But we should’t do stuff like sentencing scientists to jail for being wrong (which has happened). Increasing the probability threshold for publication would only require the penalty for being wrong to be increased. Decreasing probability thresholds for publication helps a little. But as long as there is a larger reputational benefit from getting things right than the reputational harm from getting things wrong, we are going to have perverse incentives from a probability threshold for publication bigger than 1/2, no matter where that threshold lies. (This follows from Fact 2 in my recent post, together with the observation that Schervish’s characterization of scoring rules shows implies that any reward function corresponds to a unique up to additive constant penalty function.)

What’s the solution? Maybe it’s this: reward people for publishing lots of data, rather than for the data showing anything interestingly, and do so sufficiently that it’s always worth publishing more data?