Monday, March 23, 2026

AI and emotion

In some of my work, I use the example of a pill which gives one that warm glow that one has when one has done something sefless and morally good, but which pill one can take when one hasn’t done anything like that, just to feel good about oneself. This is wrong, because that warm glow emotion is too important morally for it to be the subject of counterfeiting. Moreover, I think it remains wrong to take the warm glow pill even if one fully knows that one hasn’t done the morally good deed that it fakes the feeling of.

Generalizing, I think we shouldn’t deliberately induce emotions in contexts where they are inapt when these emotions have a significant amount of moral importance. We shouldn’t induce them in ourselves nor in others. For instance, we shouldn’t try to make others feel like we are their friends when we are not—even if they fully know that that the feeling is misleading.

Now, there is a multitude of significantly morally important interpersonal emotions that are only apt as reactions to another person’s actions. These include feelings of being the object of good- or ill-will, feelings of gratitude or resentment, a feeling of not being alone, and of course a feeling of being a friend. Such emotions have a significant amount of moral importance. We should thus not try to induce them deliberately.

But I think a plausible case can be that current AI chatbots are tuned (both through feedback from users and the system prompt) to produce emotional reactions that are of this interpersonal sort—the communications of the chatbot are tuned to make one feel that one’s concerns are care about. And since the chatbots aren’t persons, the emotions are inapt. The tuning is thus morally wrong, even if any sensible user knows that the chatbot has no cares.

One can, sometimes, have a double-effect justification of inducing misleading emotions, when doing so is an unintended side-effect. However, given that leaked system prompts do in fact have instructions about emotional cadence, it is very implausible to think that the induction of inapt emotions is an unintended side-effect.

A couple of days ago, Anthropic offered me a decent chunk of money for doing some part-time review of the reasoning capabilities of one or more of their models. I turned it down because of moral concerns along the above lines.

I note that double-effect can, however, justify using a chatbot when one does not intend an inapt emotion that one expects in oneself (e.g., I find myself feeling grateful when I get a good AI answer), when the goods gained from the use are sufficient in comparison to the significance of the inapt emotion. But I think the risk should be taken into account.

This is all rather similar to St. Augustine’s infamous concerns about stage drama. But I think one can make a distinction between the cases. Interpersonal emotions can be categorical or hypothetical. Categorical disapproval is apt only when a person has done something morally wrong. But we also have hypothetical disapproval: we can imagine someone hypothetically acting in some situation, and then have a feeling of disapproval towards that hypothetical action. I think there is a real felt difference between these two feelings, just as there is a real felt difference between seeing a sunset and imagining a sunset. And, perhaps, the audience of a dramatic performance one only has—or at least should only have—the more hypothetical feeling.

No comments: