Alexander Pruss's Blog: AI and emotion

Monday, March 23, 2026

AI and emotion

In some of my work, I use the example of a pill which gives one that warm glow that one has when one has done something sefless and morally good, but which pill one can take when one hasn’t done anything like that, just to feel good about oneself. This is wrong, because that warm glow emotion is too important morally for it to be the subject of counterfeiting. Moreover, I think it remains wrong to take the warm glow pill even if one fully knows that one hasn’t done the morally good deed that it fakes the feeling of.

Generalizing, I think we shouldn’t deliberately induce emotions in contexts where they are inapt when these emotions have a significant amount of moral importance. We shouldn’t induce them in ourselves nor in others. For instance, we shouldn’t try to make others feel like we are their friends when we are not—even if they fully know that that the feeling is misleading.

Now, there is a multitude of significantly morally important interpersonal emotions that are only apt as reactions to another person’s actions. These include feelings of being the object of good- or ill-will, feelings of gratitude or resentment, a feeling of not being alone, and of course a feeling of being a friend. Such emotions have a significant amount of moral importance. We should thus not try to induce them deliberately.

But I think a plausible case can be that current AI chatbots are tuned (both through feedback from users and the system prompt) to produce emotional reactions that are of this interpersonal sort—the communications of the chatbot are tuned to make one feel that one’s concerns are care about. And since the chatbots aren’t persons, the emotions are inapt. The tuning is thus morally wrong, even if any sensible user knows that the chatbot has no cares.

One can, sometimes, have a double-effect justification of inducing misleading emotions, when doing so is an unintended side-effect. However, given that leaked system prompts do in fact have instructions about emotional cadence, it is very implausible to think that the induction of inapt emotions is an unintended side-effect.

A couple of days ago, Anthropic offered me a decent chunk of money for doing some part-time review of the reasoning capabilities of one or more of their models. I turned it down because of moral concerns along the above lines.

I note that double-effect can, however, justify using a chatbot when one does not intend an inapt emotion that one expects in oneself (e.g., I find myself feeling grateful when I get a good AI answer), when the goods gained from the use are sufficient in comparison to the significance of the inapt emotion. But I think the risk should be taken into account.

This is all rather similar to St. Augustine’s infamous concerns about stage drama. But I think one can make a distinction between the cases. Interpersonal emotions can be categorical or hypothetical. Categorical disapproval is apt only when a person has done something morally wrong. But we also have hypothetical disapproval: we can imagine someone hypothetically acting in some situation, and then have a feeling of disapproval towards that hypothetical action. I think there is a real felt difference between these two feelings, just as there is a real felt difference between seeing a sunset and imagining a sunset. And, perhaps, the audience of a dramatic performance one only has—or at least should only have—the more hypothetical feeling.

7 comments:

SMatthewStolteMarch 23, 2026 at 3:56 PM
Have you thought much about the sorts of things that help people recalibrate their feeling faculties? I tend to think that a lot of laughter has a purpose like this. When, for example, I spend a long time treating a serious matter seriously, I can lose sight of the fact that, serious though it may be, it is only a relative seriousness; there are other things that command my attention and I kind of need to relax. A good laugh can help with that.

Maybe one of the functions of the hypothetical interpersonal emotions you talk about is also recalibrative. Say, I haven’t been able to mourn the death of my friend properly, but a sad movie kind of helps remove the barriers I’ve been having. Or I might be having a difficult time recognizing some of my legitimate accomplishments, and imagining what a third party might say about them could allow me to feel more positively about them.

I’m not sure what I would want to say about other ways that we can get recalibrated. Here’s something I am inclined to say. Suppose that I feel alone even though I am *not* alone. I’m suffering from some sort of depression. Two pills are available to me. One allows me to feel alone when alone and unalone when unalone. The other induces the feeling of being unalone either way. It would be morally fine to take the former and morally wrong to take the latter, even if I am unalone for the entire duration of the pill’s effect.

It’s not entirely obvious to me what kinds of feelings chatbots induce, though I agree they’re designed to induce the categorical kind.
ReplyDelete
Replies
scottMarch 24, 2026 at 12:27 PM
Can you say a bit more about why double effect reasoning doesn't apply in your case as well? You foresee that some people will have inapt feeling as a result of you helping train the AI. But you don't intend it. And proportional gravity is satisfied because the casual contribution you would make is minimal. Those same users would probably use it and have inapt emotions anyway. So the bad consequence isn't that bad. But you get a nice chunk of money and the chat bot becomes a bit of a better reasoner for its users because it is trained be trained by you.
ReplyDelete
Replies
Alexander R PrussMarch 25, 2026 at 9:16 AM
Scott:
It's a prudential judgment. I am very worried that depersonifying persons and personifying nonpersons is going to be one of the really big ethical problems over the next decades, and I don't want to have a hand in furthering that. Granted, one person's contribution to this big problem is likely to be very small--but one person's contribution to the goods of training is also likely to be very small.
ReplyDelete
Replies
TomMarch 25, 2026 at 7:39 PM
Anthropic has been quite forthcoming that they think training person-like virtues into their language models and engaging with them as honest brokers is essential to creating agents that behave in a morally good way (as opposed to other companies, which focus more on establishing rules and controlling their behavior). They've also said that they can't rule out the models' having conscious experiences now and have strongly implied that they expect them to in the future.

Do you think this approach is inherently wrong? And, further, do you think that digital consciousness/personhood/well-being is impossible?
ReplyDelete
Replies
Alexander R PrussMarch 25, 2026 at 9:43 PM
I do think that making people feel like they are interacting with persons when they are not is wrong. Of course, this does not apply if the AI actually *is* a person.

I don't think it's likely that a machine will be conscious. But if in fact an AI is a person, then I think it's a serious moral risk for us to create the AI. The reason is that persons are sacred, and one needs to have specific reason to think that it is permissible to produce a sacred thing in order to do so. This is why religions tend to have all these rules about who can perform sacred rituals, when they can be performed, etc. I think we have good reason to think that natural human reproduction is a permissible way to produce persons. But I don't think there are any other ways of producing persons where we can be confident of permissibility. That's playing God, as one says.

Moreover, we have good Kantian reasons to think that to produce persons for our benefit--rather than for their intrinsic good--is to treat these persons as mere means, and that is a failure of the respect due to persons.

I also think that *if* digital consciousness is possible, then we are taking an awful moral risk in training AI. The training process involves a vast amount of negative feedback, and *if* digital consciousness is possible, it is quite likely that negative feedback is unpleasant to the system.

But, as I said, I think it's pretty unlikely that that there is digital personhood, so our main moral worry is about non-persons pretending to be persons.
ReplyDelete
Replies
PeterMarch 26, 2026 at 9:38 AM
I think I can get a handle on hypothetical disapproval, but the feelings of fear or sorrow elicited by, for example, excellent movies, seems different. Do you think it is problematic to feel these or attempt to elicit them? Are horror movies and tear jerkers wrong?
ReplyDelete
Replies
Alexander R PrussMarch 26, 2026 at 12:32 PM
Maybe the feeling of sorrow is a feeling of sorrow for the _possibility_ of such evils? I still think that phenomenologically it feels different when it's real and when it's fictional. Imagine that you're watching a movie, and it makes you feel sad. And then you find out it was a true story. Doesn't that change the feeling?
ReplyDelete
Replies

Add comment