Friday, April 24, 2015

Blackmail, promises and self-punishment

I was reading this interesting paper which comes up with "blackmail" stories against both evidential and causal decision theory (CDT). I'll focus on the causal case. The paper talks about an Artificial Intelligence context, but we can transpose the stories into something more interpersonal. John blackmails Patrick in such a way that it's guaranteed that if Patrick pays up there will be no more blackmail. As a good CDT agent, Patrick pays up, since it pays. However, Patrick would have been better off if he were the sort of person who refuses to pay off blackmailers. For John is a very good predictor of Patrick's behavior, and if John foresaw that Patrick would be unlikely to pay him off, then John wouldn't have taken the risk of blackmailing Patrick. So CDT agents are subject to blackmail.

One solution is to add to the agent's capabilities the ability to adopt a policy of behavior. Then it would have paid for Patrick to have adopted a policy of refusing to pay off blackmailers and he would have adopted that policy. One problem with this, though, is that the agent could drop the policy afterwards, and in the blackmail situation it would pay to drop the policy. And that makes one subject to blackmail once again. (This is basically the retro-blackmail story in the paper.)

Anyway, thinking about these sorts of cases, I've been playing with a simplistic decision-theoretic model of promises and weak promises—or, more generally, commitments. When one makes a commitment, then on this model one changes one's utility function. The scenarios where one fails to fulfill the commitment get a lower utility, while scenarios where one succeeds in fulfilling the commitment are unchanged in utility. You might think that you get a utility bonus for fulfilling a commitment. That's mistaken. For if we got a utility bonus for fulfilling commitments, then we would have reason to promise to do all sorts of everyday things that we would do anyway, like eat breakfast.

This made me think about agents who have a special normative power: the power to lower their utility function in any way that they like. But they lack the power to raise it. In other words, they have the power to replace their utility function by a lower one. This can be thought of in terms of commitments—lowering the utility value of a scenario by some amount is equivalent to making a commitment of corresponding strength to ensure that scenario isn't actualized—or in terms of mechanisms for self-punishment. Imagine an agent who can make robots that will zap him in various scenarios.

Now, it would be stupid for an agent simply to lower his utility function by a constant amount everywhere. That wouldn't change the agent's behavior at all, but would make sure that the agent is less well off no matter what happens. However, it wouldn't be stupid for the agent to lower his utility function for scenarios where he gives in to blackmail by agents who can make good predictions of his behavior and who wouldn't have blackmailed him if they thought he wouldn't give in. If he lowers that utility enough—say, by making a promise not to negotiate with blackmailers or by generating a robot that zaps him painfully if he gives in—then a blackmailer like John will know that he is unlikely to give in to blackmail, and hence won't risk blackmailing him.

The worry about the agent changing policies and thereby opening oneself to blackmail does not apply on this story. For the agent in my model has only been given the power to lower his utility function at will. He doesn't have the power to raise it. If the agent were blackmailed, he could lower his utility function for the scenarios where he doesn't give in, and thereby get himself to give in. But it doesn't pay to do that, as is easy to confirm. It would pay for him to raise his utility function for the scenarios where he gives in, but he can't do that.

An agent like this would likewise give himself a penalty for two-boxing in Newcomb cases.

So it's actually good for agents to be able to lower their utility function. Setting up self-punishments can make perfect rational sense, even in the case of a perfect rational agent, so as to avoid blackmail.


Heath White said...

The basic logic here has been around since Schelling in the 60s, but (naively) I hadn't thought to apply it to questions of causal vs. evidential decision theory.

One "power" to lower our utilities is simply the psychological and social consequences of breaking commitments. We often feel guilty and receive opprobrium from others. Many have noted that these negative feelings are stronger than the corresponding ones of self- and other-congratulations when we keep commitments. Some people say that is a bad thing but you have at least one argument that it is not.

Alexander R Pruss said...

I was thinking that failure to fulfill a commitment is simply constitutive of lesser wellbeing, without needing to invoke feelings.