Alexander Pruss's Blog: 2024

Friday, July 26, 2024

Perfect nomic correlations

Here is an interesting special case of Ockham’s Razor:

If we find that of nomic necessity whenever A occurs, so does B, then it is reasonable to assume that B is not distinct from A.

Here are three examples.

We learn from Newton and Einstein that inertial mass and gravitational mass always have the same value. So by (1) we should suppose them to be one property, rather than two properties that are nomically correlated.
In a Newtonian context consider the hypothesis of a gravitational field. Because the gravitational field values at any point are fully determined by the positions and masses of material objects, (1) tells us that it’s reasonable to assume the gravitational field isn’t some additional entity beyond the positions and masses of material objects.
Suppose that we find that mental states supervene on physical states: that there is no difference in mental states without a corresponding difference in physical states. Then by (1) it’s reasonable to expect that mental states are not distinct from physical states. (This is of course more controversial than (A) and (B).)

But now consider that in a deterministic theory, future states occur of nomic necessity given past states. Thus, (1) makes it reasonable to reduce future states to past states: What it is for the universe to be in state S₇ at time t₇ is nothing but the universe’s being in state S₀ at time t₀ and the pair (S₇,t₇) having such-and-such a mathematical relationship to the pair (S₀,t₀). Similarly, entities that don’t exist at the beginning of the universe can be reduced to the initial state of the universe—we are thus reducible. This consequence of (1) will seem rather absurd to many people.

What should we do? One move is to embrace the consequence and conclude that indeed if we find good evidence for determinism, it will be reasonable to reduce the present to the past. I find this implausible.

Another move is to take the above argument as evidence against determinism.

Yet another move is to restrict (1) to cases where B occurs at the same time as A. This restriction is problematic in a relativistic context, since simultaneity is relative. Probably the better version of the move is to restrict (1) to cases where B occurs at the same time and place as A. Interestingly, this will undercut the gravitational field example (B). Moreover, because it is not clear that mental states have a location in space, this may undercut application (C) to mental staes.

A final move is either to reject (1) or, more modestly, to claim that the the evidence provided by nomic coincidence is pretty weak and defeasible on the basis of intuitions, such as our intuition that the present does not reduce to the past. In either case, application (C) is in question.

In any case, it is interesting to note that thinking about determinism gives us some reason to be suspicious of (1), and hence of the argument for mental reduction in (C).

Thursday, July 25, 2024

Aggression and self-defense

Let’s assume that lethal self-defense is permissible. Such self-defense requires an aggressor. There is a variety of concepts of an aggressor for purposes of self-defense, depending on what constitutes aggression. Here are a few accounts:

voluntarily, culpably and wrongfully threatening one’s life
voluntarily and wrongfully threatening one’s life
voluntarily threatening one’s life
threatening, voluntarily or involuntarily one’s life.

(I am bracketing the question of less serious threats, where health but not life is threatened.)

I want to focus on accounts of self-defense on which aggression is defined by (4), namely where there is no mens rea requirement at all on the threat. This leads to a very broad doctrine of lethal self-defense. I want to argue that it is too broad.

First note that it is obvious that a criminal is not permitted to use lethal force against a police officer who is legitimately using lethal force against them. This implies that even (3) is too lax an account of aggression for purposes of self-defense, and a fortiori (4) is too lax.

Second, I will argue against (4) more directly. Imagine that Alice and Bob are locked in a room together for a week. Alice has just been infected with a disease which would do her no harm but would kill Bob. If Alice dies in the next day, the disease will not yet have become contagious, and Bob’s life will be saved. Otherwise, Bob will die. By (4), Bob can deem Alice an aggressor simply by her being alive—she threatens his life. So on an account of self-defense where (4) defines aggression, Bob is permitted to engage in lethal self-defense against Alice.

My intuitions say that this is clearly wrong. But not everyone will see it this way, so let me push on. If Bob is permitted to kill Alice because aggression doesn’t have a mens rea requirement, Alice is also permitted to lethally fight back against Bob, despite the fact that Bob is acting permissibly in trying to kill her. (After all, Alice was also acting permissibly in breathing, and thereby staying alive and threatening Bob.) So the result of a broad view of self-defense against any kind of threat, voluntary or not, is situations where two people will permissibly engage in a fight to the death.

Now, it is counterintuitive to suppose that there could be a case where two people are both acting justly in a fight to the death, apart from cases of non-moral error (say, each thinks the other is an attacking bear).

Furthermore, the result of such a situation is that basically the stronger of the two gets to kill the weaker and survive. The effect is not literally might makes right, but is practically the same. This is an implausibly discriminatory setup.

Third, consider a more symmetric variant. Two people are trapped in a spaceship that has only air enough for one to survive until rescue. If (4) is the right account of aggression, then simply by breathing each is an aggressor against the other. This is already a little implausible. Two people in a room breathing is not what one normally thinks of as aggression. Let me back this intuition up a little more. Suppose that there is only one person trapped in a spaceship, and there is not enough air to survive until rescue. If in the case of two people each was engaging in aggression against the other simply by virtue of removing oxygen from air to the point where the other would die, in the case of one person in the spaceship, that person is engaging in aggression against themselves by removing oxygen from air to the point where they themselves will die. But that’s clearly false.

I don’t know exactly how to define aggression for purposes of self-defense, but I am confident that (4) is much too broad. I think the police officer and criminal case shows that (3) is too broad as well. I feel pulled towards both (1) and (2), and I find it difficult to resolve the choice between them.

Wednesday, July 24, 2024

Knowing what it's like to see green

You know what it’s like to see green. Close your eyes. Do you still know what it’s like to see green?

I think so.

Maybe you got lucky and saw some green patches while closing your eyes. But I am not assuming that happened. Even if you saw no green patches, you knew what it is like to see green.

Philosophers who are really taken with qualia sometimes say that:

Our knowledge of what it is like to see green could only be conferred on me by having an experience of green.

But if I have the knowledge of what it is like to see green when I am not experiencing green, then that can’t be right. For whatever state I am in when not experiencing green but knowing what it’s like to see green is a state that God could gift me with without ever giving me an experience of green. (One might worry that then it wouldn’t be knowledge, but something like true belief. But God could testify to the accuracy of my state, and that would make it knowledge.)

Perhaps, however, we can say this. When your eyes are closed and you see no green patches, you know what it’s like to see green in virtue of having the ability to visualize green, an ability that generates experiences of green. If so, we might weaken (1) to:

Our knowledge of what it is like to see green could only be conferred on me by having an experience of green or an ability to generate such an experience at will by visual imagination.

We still have a conceptual connection between knowledge of the qualia and experience of the qualia then.

But I think (2) is still questionable. First, it seems to equivocate on “knowledge”. Knowledge grounded in abilities seems to be knowledge-how, and that’s not what the advocates of qualia are talking about.

Second, suppose you’ve grown up never seeing green. And then God gives you an ability to generate an experience of green at will by visual imagination: if you “squint your imagination” thus-and-so, you will see a green patch. But you’ve never so squinted yet. It seems odd to say you know what it’s like to see green.

Third, our powers of visual imagination vary significantly. Surely I know what it’s like to see paradigm instances of green, say the green of a lawn in an area what water is plentiful. If I try to imagine a green patch, if I get lucky, my mind’s eye presents to me a patch of something dim, muddy and greenish, or maybe a lime green flash. I can’t imagine a paradigm instance of green. And yet surely, I know what it’s like to see paradigm instances of green. It seems implausible to think that when my eyes are closed my knowledge of what it’s like to see green (and even paradigm green) is grounded in my ability to visualize these dim non-paradigm instances.

It seems to me that what the qualia fanatic should say is that:

We only know what it’s like to see green when we are experiencing green.

But I think that weakens arguments from qualia against materialism because (3) is more than a little counterintuitive.

Wednesday, July 17, 2024

The explanation of our reliability is not physical

All facts completely reducible to physics are first-order facts.
All facts completely explained by first-order facts are themselves completely reducible to first-order facts.
Facts about our epistemic reliability are facts about truth.
Facts about truth are not completely reducible to first-order facts.
Therefore, no complete explanation of our epistemic reliability is completely reducible to physics.

This is a variant on Plantinga’s evolutionary argument against naturalism.

Premise (4) follows from Tarski’s Indefinability of Truth Theorem.

The one premise in the argument that I am not confident of (2). But it sounds right.

First-order naturalism

In a lovely paper, Leon Porter shows that semantic naturalism is false. One way to put the argument is as follows:

If semantic naturalism is true, truth is a natural property.
All natural properties are first order.
Truth is not a first order property.
So, truth is not a natural property.
So, semantic naturalism is not true.

One can show (3) by using the liar paradox or just take it as the outcome of Tarski’s Indefinability of Truth Theorem.

Of course, naturalism entails semantic naturalism, so the argument refutes naturalism.

But it occurred to me today, in conversation with Bryan Reece, that perhaps one could have a weaker version of naturalism, which one might call first-order naturalism that holds that all first order truths are natural truths.

First-order naturalism escapes Porter’s argument. It’s a pretty limited naturalism, but it has some force. It implies, for instance, that Zeus does not exist. For if Zeus exists, then that Zeus exists is a first-order truth that is not natural.

First-order naturalism is an interestingly modest naturalist thesis. It is interesting to think about its limits. One that comes to mind is that it does not appear to include naturalism about minds, since it does not appear possible to characterize minds in first-order language (minds represent the world, etc., and talk of representation is at least prima facie not first-order).

Truthteller's relative

The truthteller paradox is focused on the sentence:

This sentence is true.

There is no contradiction in taking (1) to be true, but neither is there a contradiction in taking (1) to be false. So where is the paradox? Well, one way to see the paradox is to note that there is no more reason to take (1) to be true than to be false or vice versa. Maybe there is a violation of the Principle of Sufficient Reason.

For technical reasons, I will take “This sentence” in sentences like (1) to be an abbreviation for a complex definite syntactic description that has the property that the only sentence that can satisfy the description is (1) is itself. (We can get such a syntactic description using the diagonal lemma, or just a bit of cleverness.)

But the fact that we don’t have a good reason to assign a specific truth value to (1) isn’t all there is to the paradox.

For consider this relative of the truthteller:

This sentence is true or 2+2=4.

There is no difficulty in assigning a truth value to (2) if it has one: it’s got to be true because 2+2=4. But nonetheless, (2) is not meaningful. When we try to unpack its meaning, that meaning keeps on fleeing. What does (2) say? Not just that 2+2=4. There is that first disjunct in it after all. That first disjunct depends for its truth value on (2) itself, in a viciously circular way.

But after all shouldn’t we just say that (2) is true? I don’t think so. Here is one reason to be suspicious of the truth of (2). If (2) is true, so is:

This sentence is true or there are stars.

But it seems that if (3) is meaningful, then it should should have a truth value in every possible world. But that would include the possible world where there are no stars. However, in that world, the sentence (3) functions like the truthteller sentence (1), to which we cannot assign a truth value. Thus (3) does not
have a sensible truth value assignment in worlds where there are no stars. But it is not the sort of sentence whose meaningfulness should vary between possible worlds. (It is important for this argument that the description that “This sentence” is an abbreviation for is syntactic, so that its referent should not vary between worlds.)

It might be tempting to take (2) to be basically an infinite disjunction of instances of “2+2=4”. But that’s not right. For by that token (3) would be basically an infinite disjunction of “there are stars”. But then (3) would be false in worlds where there are no stars, and that’s not clear.

If I am right, the fact that (1) wouldn’t have a preferred truth value is a symptom rather than the disease itself. For (2) would have a preferred truth value, but we have seen that it is not meaningful. This pushes me to think that the problem with (1) is the same as with (2) and (3): the attempt to bootstrap meaning in an infinite regress.

I don’t know how to make all this precise. I am just stating intuitions.

Monday, July 15, 2024

From love of neighbor to Christianity

Start with this argument:

It’s not wrong for me to love my friend as if they were in the image and likeness of God.
If someone is not God and not in the image and likeness of God, then to love them as if they were in the image and likeness of God is excessive.
Excessive love is wrong.
My friend is not God.
So, my friend is in the image and likeness of God.
So, God exists.

I think there may be some other variants on this argument that are worth considering. Replace being in the image and likeness of God, for instance, with (a) being so loved by God that God became incarnate out of love for them, or with (b) having the Spirit of God living in them. Then the conclusion is that God become incarnate or that the Spirit of God lives in our neighbor.

The general point is this. Christianity gives us an admirable aspiration as to how much we should love our neighbor. But that much love of our neighbor is inappropriate unless something like Christianity is true.

I think there is a way in which this argument is far from new. One of the great arguments for Christianity has always been those Christians who loved their neighbor as God called them to do. The immense attractiveness of their lives showed that their love was not wrong, and knoweldge of these lives showed that they were indeed loving their neighbor in the ways the above arguments talk about.

Friday, July 12, 2024

An act with a normative end

Here’s an interesting set of cases that I haven’t seen a philosophical discussion of. To get some item B, you need to affirm that you did A (e.g., took some precautions, read some text, etc.) But to permissibly affirm that you did A, you need to do A. Let us suppose that you know that your affirmation will not be subject to independent verification, and you in fact do A.

Is A a means to B in this case?

Interestingly, I think the answer is: Depends.

Let’s suppose for simplicity that the case is such that it would be wrong to lie about doing A in order to get B. (I think lying is always wrong, but won’t assume this here.)

If you have such an integrity of character that you wouldn’t affirm that you did A without having done A, then indeed doing A is a means to affirming that you did A, which is a means to B, and in this case transitivity appears ot hold: doing A is a means to B.

But we can imagine you have less integrity of character, and if the only way to get B would be to falsely affirm that you did A, you would dishonestly so affirm. However, you have enough integrity of character that you prefer honesty when the cost is not too high, and the cost of doing A is not too high. In such a case, you do A as a means to permissibly affirming that you did A. But it is affirming that you did A that is a means to getting B: permissibly affirming is not necessary. Thus, your doing A is not a means to getting B, but it is a means to the additional bonus that you get B without being dishonest.

In both specifications of character, your doing A is a means to its being permissible for you to affirm you did A. We see, thus, that we have a not uncommon set of cases where an ordinary action has a normative end, namely the permissibility of another action. (These are far from the only such cases. Requesting someone’s permission is another example of an action whose end is the permissibility of some other action.)

The cases also have another interesting feature: your action is a non-causal means to an end. For your doing A is a means to permissibility of affirming you did A, but does not cause that permissibility. The relationship is a grounding one.

Thursday, July 11, 2024

The dependence of evidence on prior confidence

Whether p is evidence for q will often depend on one’s background beliefs. This is a well-known phenomenon.

But here’s an interesting fact that I hadn’t noticed before: sometimes whether p is evidence for q depends on how confident one is in q.

The example is simple: let p be the proposition that all other reasonable people have confidence level around r in q. If r is significantly bigger than one’s current confidence level, then p tends to be evidence for q. If r is significantly smaller than one’s current confidence level, then p tends to be evidence against q.

Friday, July 5, 2024

From theism to something like Christianity

The Gospel message—the account of the infinite and perfect God becoming one of us in order to suffer and die in atonement of our sins—is immensely beautiful. Even abstracting from the truth of the message, it is more beautiful than the beauties of nature around us. Suppose, now, that God exists and the Gospel message is false. Then a human (or demonic) falsehood has exceeded the beauty of God’s created nature around us. That does not seem plausible. Thus, it is likely that:

If God exists, the Gospel message is true.

Furthermore, it seems unlikely that God would allow us to come up with a falsehood about what he has done where the content of that falsehood exceeds in beauty and goodness what God has in fact done. If so, then:

If God exists, something at least as beautiful and good as the Gospel message is true.

Thinking hard

I don’t remember seeing much philosophical discussion of the duty to think hard.

There is a distinction we should start with. For many xs it sounds right to say:

If you’re going to have an opinion about x, you should have thought hard about x.

But that doesn’t imply a duty to think hard about x unless you have a duty to have an opinion about x.

What I am interested in are things that you simply ought to think hard about. Some of these cases follow from specifics of your situation. If someone is drowning, and you don’t see how to save them, you ought to think hard about how to save them. But the more interesting cases are things that human beings at large should think hard about.

Consider these two statements, both of them likely true:

There are agnostics who have thought hard and honestly about God.
There are agnostics who have not thought hard about God.

Clearly, it is not crazy to think that (2) is a version of the problem of hiddenness: If God exists, why would he stay hidden from someone who thought hard about him? But (3) is not troubling in the same way. If there is a problem for theism from (3), it is just the good ol’ problem of moral evil: If God is perfectly good, why would he allow someone not to think hard about him. And it doesn’t feel like an especially problematic version of the problem of evil (it feels much less problematic than the problem of child abuse, say).

The intuitive difference between (2) and (3) suggest this plausible thesis:

All humans in normal circumstances should think hard about God.

Or maybe at least:

All humans in normal circumstances should think hard about fundamental questions.

How hard are people obligated to think about God and similar questions? Pascal’s Wager suggests that one should think very hard about them, both for prudential and moral reasons (the latter because our thinking hard about fundamental questions enables us to help others think about them). After all, God, if he exists, is the infinitely good ground of being, and there is nothing more important to think about.

I should note that I don’t think (4) means that everyone should think hard about whether God exists. I am inclined to think it is possible, either by faith or by easy observation of the world, to reasonably come to a position where it’s pretty obvious that God exists. But one should still think hard about God, even so.

All this leaves open a further question. What is it to think hard about something? The time ones puts into it is a part of that. But note that some of the time is apt to be unconscious: to think hard about something may involve significant periods during which one is not thinking consciously about the matter, but one is come back again and again to it. But there is also a seriousness or intensity of thought. I don’t know how exactly to specify what that means, but one interesting aspect of it is that if one is thinking seriously, one makes use of external tools. Thinking seriously can require actions of larger muscle groups: getting up to talk to friends; going to the library; performing scientific experiments; getting some scrap paper to make notes. (I sometimes know that I am not doing mathematics seriously if I don’t bother with scrap paper.) Thinking seriously involves more than just thinking. :-)

Tuesday, July 2, 2024

Do we have normative powers?

A normative power is supposed to be a power to directly change normative reality. We can, of course, indirectly change normative reality by affecting the antecedents of conditional norms: By unfairly insulting you, I get myself to have a duty to apologize, but that is simply due to a pre-existing duty to apologize for all unfair insults.

It would be attractive to deny our possession of normative powers. Typical examples of normative powers are promises, commands, permissions, and requests. But all of these can seemingly be reduced to conditional norms, such as:

Do whatever you promise
Do whatever you are validly commanded
Refrain from ϕing unless permitted
Treat what you are requested as a reason for doing it.

One might think that one can still count as having a normative power even if it is reducible to prior conditional norms. Here is a reason to deny this. I could promise to send you a dollar on any day on which your dog barks. Then your dog has the power to obligate me to send you a dollar, a power reducible to the norm arising from my promise. But dogs do not have normative powers. Hence an ability to change normative reality by affecting the antecedents of a prior conditional norm is not a normative power.

If this argument succeeds, if a power to affect normative reality is reducible to a non-normative power (such as the power to bark) and a prior norm, it is not a normative power. Are there any normative powers, then, powers not reducible in this way?

I am not sure. But here is a non-conclusive reason to think so. It seems we can invent new useful ways of affecting normative reality, within certain bounds. For instance, normally a request comes along with a permission—a request creates a reason for the other party to do the requested action and while removing any reasons of non-consent against the performance. But there are rare contexts where it is useful to create a reason without removing reasons of non-consent. An example is “If you are going to kill me, kill me quickly.” One can see this as creating a reason for the murderer to kill one quickly, without removing reasons of non-consent against killing (or even killing quickly). Or, for another example, normally a general’s command in an important matter generates a serious obligation. But there could be cases where the general doesn’t want a subordinate to feel very guilty for failing to fulfill the command, and it would be useful for the general to make a new commanding practice, a “slight command” which generates an obligation, but one that it is only slightly wrong to disobey.

There are approximable and non-approximable promises. When I promise to bake you seven cookies, and I am short on flour, normally I have reason to bake you four. But there are cases where there is no reason to bake you four—perhaps you are going to have seven guests, and you want to serve them the same sweet, so four are useless to you (maybe you hate cookies). Normally we leave such decisions to common sense and don’t make them explicit. However, we could also imagine making them explicit, and we could imagine promises with express approximability rules (perhaps when you can’t do cookies, cupcakes will be a second best; perhaps they won’t be). We can even imagine complex rules of preferability between different approximations to the promise: if it’s sunny, seven cupcakes is a better approximation than five cookies, while if it’s cloudy, five cookies is a better approximation. These rules might also specify the degree of moral failure that each approximation represents. It is, plausibly, within our normative authority over ourselves to issue promises with all sorts of approximability rules, and we can imagine a society inventing such.

Intuitively, normally, if one is capable of a greater change of normative reality, one is capable of a lesser one. Thus, if a general has the authority to create a serious obligation, they have the authority to create a slight one. And if you are capable of both creating a reason and providing a permission, you should be able to do one in isolation from the other. If you have the authority to command, you have the standing to create non-binding reasons by requesting.

We could imagine a society which starts with two normative powers, promising and commanding, and then invents the “weaker” powers of requesting and permitting, and an endless variety of normative subtlety.

It seems plausible to think that we are capable of inventing new, useful normative practices. These, of course, cannot be a normative power grab: there are limits. The epistemic rule of thumb for determining these limits is that the powers do not exceed ones that we clearly have.

It seems a little simpler to think that we can create new normative powers within predetermined limits than that all our norms are preset, and we simply instance their antecedents. But while this is a plausible argument for normative powers, it is not conclusive.

Monday, July 1, 2024

Duplicating electronic consciousnesses

Assume naturalism and suppose that digital electronic systems can be significantly conscious. Suppose Alice is a deterministic significantly conscious digital electronic system. Imagine we duplicated Alice to make another such system, Bob, and fed them both the same inputs. Then there are two conscious beings with qualitatively the same stream of consciousness.

But now let’s add a twist. Suppose that we create a monitoring system that continually checks all of Alice and Bob’s components, and as soon as any corresponding components disagree—are in a different state—then the system pulls the plug on both, thereby resetting all components to state zero. In fact, however, everything works well, and the inputs are always the same, so there is never any deviation between Alice and Bob, and the monitoring system never does anything.

What happens to the consciousnesses? Intuitively, neither Alice nor Bob should be affected by a monitoring system that never actually does anything. But it is not clear that this is the conclusion that specific naturalist theories will yield.

First, consider functionalism. Once the monitoring system is in place, both Alice and Bob change with respect to their dispositional features. All the subsystems of Alice are now incapable of producing any result other than one synchronized to Bob’s subsystems, and vice versa. I think a strong case can be made that on functionalism, Alice and Bob’s subsystems lose their defining functions when the monitoring system is in place, and hence lose consciousness. Therefore, on functionalism, consciousness has an implausible extrinsicness to it. The duplication-plus-monitoring case is some evidence against functionalism.

Second, consider Integrated Information Theory. It is easy to see that the whole system, consisting of Alice, Bob and the monitoring system, has a very low Φ value. Its components can be thought of as just those of Alice and Bob, but with a transition function that sets everything to zero if there is a deviation. We can now split the system into two subsystems: Alice and Bob. Each subsystem’s behavior can be fully predicted from that subsystem’s state plus one additional bit of information that represents whether the other system agrees with it. Because of this, the Φ value of the system is at most 2 bits, and hence the system as a whole has very, very little consciousness.

Moreover, Alice remains significantly conscious: we can think of Alice as having just as much integrated information after the monitoring system is attached as before, but now having one new bit of environmental dependency, so the Φ measure does not change significantly from the monitoring being added. Moreover, because the joint system is not significantly conscious, Integrated Information Theory’s proviso that a system loses consciousness when it comes to be in a part-to-whole relationship with a more conscious system is irrelevant.

Likewise, Bob remains conscious. So far everything seems perfectly intuitive. Adding a monitoring system doesn’t create a new significantly conscious system, and doesn’t destroy the two existing conscious systems. However, here is the kicker. Let X be any subsystem of Alice’s components. Let S_X be the system consisting of the components in X together with all of Bob’s components that don’t correspond to the components in X. In other words, S_X is a mix of Alice’s and Bob’s components. It is easy to see the information theoretic behavior of S_X is exactly the same as the information theoretic behavior of Alice (or of Bob for that matter). Thus, the Φ value of S_X will be the same for all X.

Hence, on Integrated Information Theory, each of the S_X systems will be equally conscious. The number of these systems equals to 2ⁿ where n is the number of components in Alice. Of course, one of these 2ⁿ systems is Alice herself (that’s S_A where A is the set of Alice’s components) and another one is Bob himself (that’s S_∅). Conclusion: By adding a monitoring system to our Alice and Bob pair, we have created a vast number of new equally conscious systems: 2ⁿ − 2 of them!

The ethical consequences are very weird. Suppose that Alice has some large number of components, say 10¹¹ (that’s how many neurons we have). We duplicate Alice to create Bob. We’ve doubled the number of beings with whatever interests Alice had. And then we add a dumb monitoring that pulls the plug given a deviation between them. Suddenly we have created 2^10¹¹ − 2 systems with the same level of consciousness. Suddenly, the moral consideration owed to to the Alice/Bob line of consciousness vastly outnumbers everything.

So both functionalism and Integrated Information Theory have trouble with our duplication story.

Thursday, June 27, 2024

Improving the Epicurean argument for the harmlessness of death

The famous Epicurean argument that death (considered as leading to nonexistence) is not a harm is that death doesn’t harm one when one is alive and it doesn’t harm one when one is dead, since the nonexistent cannot be harmed.

However, the thesis that the nonexistent cannot be harmed is questionable: posthumous infamy seems to be a harm.

But there’s a neat way to fix this gap in the Epicurean argument. Suppose Bob lives 30 years in an ordinary world, and Alice lives a very similar 30 years, except that in her world time started with her existence and ended with her death. Thus, literally, Alice is always alive—she is alive at every time. But notice that the fact that the existence of everything else ends with Alice does not make Alice any better off than Bob! Thus, if death is a harm to Bob, it is a harm to Alice. But even if it is possible for the nonexistent to be harmed, Alice cannot be harmed at a time at which she doesn’t exist—because there is no time at which Alice doesn’t exist.

Hence, we can run a version of the Epicurean argument without the assumption that the nonexistent cannot be harmed.

I am inclined to think that the only satisfactory way out of the argument, especially in the case of Alice, is to adopt eternalism and say that death is a harm without being a harm at any particular time. What is a harm to Alice is that her life has an untimely shortness to it—a fact that is not tied to any particular time.

Tuesday, June 25, 2024

Infinite evil

Alice and Bob are both bad people, and both believe in magic. Bob believes that he lives in an infinite universe, with infinitely many sentient beings. Alice thinks all the life there is is life on earth. They each perform a spell intended to cause severe pain to all sentient beings other than themselves.

There is a sense in which Bob does something infinitely worse than Alice: he tries to cause severe pain to infinitely many beings, while Alice is only trying to harm finitely many beings.

It is hard to judge Bob as an infinitely worse person than Alice, because we presume that if Alice thought that there were infinitely many sentient beings, she would have done as Bob did.

But even if we do not judge Bob as an infinitely worse person, shouldn’t we judge his action as infinitely worse? Yet even that doesn’t seem right. And neither seems to deserve that much more punishment than a sadistic dictator who tries to infect “mere millions” with a painful disease.

Could it be that punishment maxes out at some point?

Using as a mere means

Carl is an inventor and Davita works for a competing company. They are stuck on a deserted island for a week. Carl informs Davita about something he has just invented. Davita is perfectly honest and if questioned in a court of law will testify to what Carl said. In announcing it to Davita, according to the patent laws of their country, Carl establishes the priority of his invention. Davita does not want to help a competitor establish priority. She does not consent to being using in this way. But Carl has no one else to tell about his invention and thereby establish priority.

Carl has used Davita. In fact, he has used her body, by vibrating her eardrums in order to convey to her information that she rationally does not want to hear. But I am inclined—though not extremely strongly—to think that Carl has acted permissibly. It is an important feature of human sociality that we be permitted to communicated messages that our interlocutor does not want to hear, though there are some exceptions, such as facts that will traumatize us, or that violate the privacy of someone else, or that are selected to be misleading, etc. But it is hard to see that Carl’s action falls under some such exception.

Does Carl use Davita as a mere means in the Kantian sense? I think so. Davita does not consent. She is rational in refusing to consent.

I am inclined to conclude that Kant is simply wrong about a blanket prohibition on using others as mere means.

But there still are cases where such a prohibition stands. For instance, in sexual contexts. So I think the prohibition on using others as mere means depends on substantive features of the situation.

All that said, I am not completely sure about the case. Carl does seem sneaky. If I were Davita, I would be annoyed with him. But I don’t think I could morally object to what he did. It would be like the annoyance one has with an opponent in a game who exploits one’s weakness.

Monday, June 24, 2024

Another slit rectangle IIT system

One more observation on Integrated Information Theory (IIT), in Aaronson’s simplified formulation.

Let R_M, N be a wide rectangular grid of points (x,y) with x and y integers such that 0 ≤ x < M and 0 ≤ y < N. Suppose M ≫ 4N and M is divisible by four. Let R_M, N, t be the grid R_M, N with all points with coordinates (M/4,y) where y ≥ tN removed. This is a grid with a “bottleneck” at x = M/4.

Let S_M, N, t be a system with a binary cell at each coordinate of R_M, N, t evolving according to the rule that at the next time step, each cell’s value changes to the xor of the up-to-four neighboring cells’ values (near the boundaries and the slit, the count will be less than four).

My intuitions about IIT say that the measure of integrated information Φ(S_M, N, t) will be equal to exactly 2N bits when t = 1, and will stay at 2N bits as we decrease t, until t is below or around 1/4, at which point it will jump to 2ceil(tN) bits. This shows two problems with this version of IIT. First, we as we cut a small slit in the rectangle, we should always be decreasing the amount of integrated information in the system—not just suddenly when the slit reaches around 3/4 of the width of the rectangle. Second, we would expect the amount of integrated information to vary much more continuously rather than suddenly jump from 2N to N/2 bits at around t = 1/4.

The problem here is the rather gerrymandered character in which IIT minimizes one quantity to generate an optimal decomposition of the system, and then defines the measure of integrated information using another quantity.

Specifically, we calculate Φ by finding a cut of the system into two subsystems A and B that minimizes Φ(A,B)/min (|A|,|B|), and intuitively, there are two types of candidates for an optimal cut if M ≫ 4N:

cut the grid around x = M/2 into two equally sized portions using a cut that snips each horizontal line at exactly one cell (a vertical line is the most obvious option, but a diagonal cut will give the same Φ(A,B) value); the Φ(A,B) value is twice the vertical length of the cut, namely 2N bits
cut the grid around the bottleneck into two portions, A and B, where A contains slightly more than a quarter of the cells in the system using a cut that follows the same rule as above: more precisely, we start the cut at the top of the bottleneck, and cut diagonally down and to the right (the result is that A is an (M/4) by N rectangle together with an isosceles triangle with equal sides of size tN; the triangle’s area is swamped by the rectangle’s area because M ≫ 4N); the Φ(A,B) value is (maybe modulo an off-by-one error) twice the vertical length of the cut, namely 2ceil(tN) bits.

Assuming these intuitions are right, when t is close to 1, the first type of cut results in a smaller Φ(A,B)/min (|A|,|B|) value, but when t is below or around 1/4, the second type of cut results in a smaller value.

Friday, June 21, 2024

Conjectures about a system in the context of Integrated Information Theory

I show really be done with Integrated Information Theory (IIT), in Aaronson’s simplified formulation, but I noticed a rather interesting difficult.

In my previous post on the subject, I noticed that a double grid system where there are two grids stacked on top of one another, with the bottom grid consisting of inputs and the upper grid of outputs, and each upper value being the logical OR of the (up to) five neighboring input values will be conscious according to IIT if all the values are zero and the grid is large enough.

In this post, I am going to give some conjectures about the mathematics rather than even a proof sketch. But I think the conjectures are pretty plausible and, if true, it shows something fishy about IIT’s measure of integrated information.

Consider our dual grid system, except now the grids are with some exceptions rectangular, with a length of M along the x-axis and a width of N along the y-axis (and the stacking along the z-axis). But there are the following exceptions to the rectangularity:

at x-coordinates M/4 − 1 and M/4 the width instead of being N is N/8
at x-coordinates M/2 − 1 and M/2 the width is N/10.

In other words, at two x-coordinate areas, the grids have bottlenecks, of slightly different sizes. We suppose M is significantly larger than N, and N is very, very large (say, 10¹⁵).

Let A_k be the components on the grids with x-coordinates less than k and let B_k be the remaining components. I suspect (with a lot of confidence) that the optimal choice for a partition {A, B} that minimizes the “modified Φ value” Φ(A,B)/min (|A|,|B|) will be pretty close to {A_k, B_k} where k is in one of the bottlenecks. Thus to estimate Φ, we need only look at the Φ and modified Φ values for {A_M/4, B_M/4} and {A_M/2, B_M/2}. Note that if k is M/4 or M/2, then min (|A|,|B|) is approximately 2MN/4 and 2MN/2, respectively, since there are two grids of components.

I suspect (again with a lot of confidence) that Φ(A_k,B_k) will be approximately proportional to the width of the grid around coordinate k. Thus, Φ(A_M/4,B_M/4)/min (A_M/4,B_M/4) will be approximately proportional to (N/8)/(2NM/4) = 0.25/M while Φ(A_M/2,B_M/2)/min (A_M/2,B_M/2) will be approximately proportional to (N/10)/(2NM/2) = 0.1/M.

Moreover, I conjecture that the optimal partition will be close to {A_k, B_k} for some k in one of the bottlenecks. If so, then our best choice will be close to {A_M/2, B_M/2}, and it will yield a Φ value approximately proportional to N/10.

Now modify the system by taking each output component at an x-coordinate less than M/4 and putting four more output components besides the original output component, and with the very same value as the original output component.

I strongly suspect that the optimal partition will again be obtained by cutting the system at one of the two bottlenecks. The Φ values of at the M/4 and M/2 bottlenecks will be unchanged—mere duplication of outputs does not affect information content—but the modified Φ values (obtained by dividing Φ(A,B) by min (|A|,|B|)) will be (N/8)/(6NM/4) = 0.083/M and (N/10)/(2NM/2) = 0.1/M. Thus the optimal choice will be to partition the system at the M/4 bottleneck. This will yield a Φ value approximately proportional to N/8. Which is bigger than N/10.

For concreteness, let’s now imagine that each output is an LED. We now see that if we replace some of the LEDs by five LEDs (namely, the ones in the left-hand quarter of the system), we increase the amount of integrated information from N/10 to N/8. This has got to be wrong. Simply by duplicating LEDs we don’t add anything to the information content. And we certainly don’t make a system more conscious just by lighting up a portion of it with additional LEDs.

Notice, too, that IIT has a special proviso: if one system is a part of another with a higher degree of consciousness, the part system has no consciousness. So now imagine that a Φ value proportional to N/10 is sufficiently large for significant consciousness, so our original system, without extra output LEDs, is conscious. Now, besides the left quarter of the LEDs, add the quadruples of new LEDs that simply duplicate the original LED values (they might not even be electrically connected to the original system: they might sense whether the original LED is on, and light up if so). According to IIT, then, the new system is more conscious than the old—and the old system has had its consciousness destroyed, simply by adding enough duplicates of its LEDs. This seems wrong.

Of course, my conjectures and back-of-the-evelope calculations could be false.

Thursday, June 20, 2024

Panomnipsychism

We have good empirical ways of determining the presence of a significant amount of gold and we also have good empirical ways of determining the absence of a significant amount of gold.

Not so with consciousness. While I can tell that some chunks of matter exhibit significant consciousness (especially, the chunk that I am made of), to tell that a chunk of matter—say, a rock or a tree—does not exhibit significant consciousness relies very heavily on pre-theoretical intuition.

This makes it very hard to study consciousness scientifically. In science, we want to come up with conditions that help us explain why a phenonenon occurs where it occurs and doesn’t occur where it doesn’t occur. But if we can’t observe where consciousness does not occur, things are apt to get very hard.

Consider panomnipsychism: every chunk of matter exhibits every possible conscious state at every moment of its existence. This explains all our observations of consciousness. And since we don’t observe any absences of consciousness, panomnipsychism is not refutable by observation. Moreover, panomnipsychism is much simpler than any competing theory, since competing theories will have to give nontrivial psychophysical laws that say what conscious states are correlated with what physical states. It’s just that panomnipsychism doesn’t fit with our intuitions that rocks and trees aren’t conscious.

One might object that panomnipsychism incorrectly predicts that I am right now having an experience of hang gliding, and I can tell that I am not having any such experience. Not so! Panomnipsychism does predict that the chunk of matter making me up currently is having an experience of hang-gliding-while-not-writing-a-post, and that this chunk is also having an experience of writing-a-post-while-not-hang-gliding. But these experiences are not unified with each other on panomnipsychism: they are separate strands of conscious experience attached to a single chunk of matter. My observation of writing without gliding is among the predictions of panomnipsychism.

It is tempting to say that panomnipsychism violates Ockham’s razor. Whether it does or does not will depend on whether we understand Ockham’s razor in terms of theoretical complexity or in terms of the number of entities (such as acts of consciousness). If we understand it in terms of theoretical complexity, then as noted panomnipsychism beats its competitors. But if we understand Ockham’s razor in terms of the number of entities, then we should reject Ockham’s razor. For we shouldn’t have a general preference for theories with fewer entities. For instance, the argument that the world will soon come to an end because otherwise there are more human beings in spacetime is surely a bad one.

I think there is nothing wrong with relying on intuition, including our intuitions about the absence of consciousness. But it is interesting to note how much we need to.

Wednesday, June 19, 2024

Entropy

If p is a discrete probability measure, then the Shannon entropy of p is H(p) = − ∑_xp({x})log p({x}). I’ve never had any intuitive feeling for Shannon entropy until I noticed the well-known fact that H(p) is the expected value of the logarithmic inaccuracy score of p by the lights of p. Since I’ve spent a long time thinking about inaccuracy scores, I now get some intuitions about entropy for free.

Entropy is a measure of the randomness of p. But now I am thinking that there are other measures: For any strictly proper inaccuracy scoring rule s, we can take E_ps(p) to be some sort of a measure of the randomness of p. These won’t have the nice connections with information theory, though.

A bit more fun with Integrated Information Theory

I hope this is my last post for a while on Integrated Information Theory (IIT), in Aaronson’s simplified formulation.

One of the fun and well-known facts is that if you have an impractically large square two-dimensional grid of interconnected logic gates (presumably with some constant time-delay in each gate between inputs and outputs to prevent race conditions) in a fixed point (i.e., nothing is changing), the result can still have a degree of integrated information proportional to the square root of the number of gates. A particular known case is where you have a very large grid of XOR gates, with each gate’s output being connected to the inputs of its neighbors, all of them at 0.

That said, that kind of a grid does give off the “every part affects the rest of the system” vibe that IIT says consciousness consists in, so objecting that this grid isn’t conscious doesn’t impress IIT afficionados. Moreover, such a grid is more complex than it seems at first sight, because to avoid race conditions while maintaining the ostensible state transitions a practical implementation would require some kind of a synchronization between the gates.

Today I want to note that there seems to be an even less intuitive conscious system according to IIT. Imagine a large N by N grid of binary data, “the inputs”, and then another large N by N grid of binary data, “the outputs”, aligned above the first grid. Each value on the output grid is then the logical OR of the input value under it with the four (or three for edge points and two for corner points) neighbors of that input value. And all of the input grid is at zero.

This does not give off any “every part affects the rest of the system” vibe. And getting consciousness out of zeroed OR gates with no feedback system seems really absurd.

To see the alleged consciousness, recall the IIT measure Φ of integration information, which is supposed to be be proportional to the amount of consciousness. For any partition of the components into two nonempty subsets A and B, we compute the “effective information” EI(A→B) that A provides for B. This is the entropy of the new values of the components in B given the old values of the components in A while randomizing over all possible values of the components in A. Let Φ(A,B) = EI(A→B) + EI(B→A) be the two-way effective information in the partition. Then choose A and B to minimize Φ(A,B)/min (|A|,|B|), and let the system’s Φ be Φ(A,B) for that choice of A and B. Aaronson says it’s not clear what to do if the minimum is not unique. To be conservative (i.e., count fewer systems as conscious), if there are multiple pairs that minimize Φ(A,B)/min (|A|,|B|), I’ll assume we choose one that also minimizes Φ(A,B).

Let’s now do a handwavy proof that Φ for our pair of grids when everything is at zero is at least proportional to N, and hence for large N we have consciousness according to IIT. Let’s say A and B minimize Φ(A,B)/min (|A|,|B|). Let C be the set of all inputs x such that x is connected to both an output in A and an output in B.

Suppose first that C is nonempty. Then |C| is approximately proportional to the size the boundary of the set of outputs in A, or of the set of outputs in B, which will be at least proportional to the square root of the smaller of |A| and |B|. Moreover, EI(A→B) + EI(B→A) will be at least proportional to |C| given how the dependencies are arranged and given that all the values are at zero so that if you have any unknowns amount the values that an output value depends on, then the output value is also unknown. So, Φ(A,B)/min(|A|,|B|) will be at least proportional to (min(|A|,|B|))^−1/2. Thus if A and B minimize Φ(A,B)/min (|A|,|B|), then A and B will have to be both of the same order of magnitude, namely N², since the greater the disparity, the bigger (min(|A|,|B|))^−1/2 will be. In that case, Φ(A,B)/min (|A|,|B|) will be at least proportional to 1/N, and so Φ(A,B) will be at least proportional to (1/N) ⋅ N² = N.

Now suppose C is empty. Then one of A and B contains all the outputs. Let’s say it’s A. Then B consists solely of inputs, so EI(A→B) = 0, and Φ(A,B) = EI(B→A) will be at least proportional to the size of B. Then for large N, Φ(A,B)/min (|A|,|B|) will be bigger if instead A and B contain respectively the left and right halves of the grids as then Φ(A,B) would be at most proportional to the size of their boundaries, i.e., to N, and hence Φ(A,B)/min (|A|,|B|) would be at most proportional to 1/N. So the case where C is empty cannot be a case where A and B are optimal, at least if N is large.

Sunday, June 16, 2024

Integrated Information Theory doesn't seem to get integrated information right

I’m still thinking about Integrated Information Theory (IIT), in Aaronson’s simplified formulation. Aaronson’s famous criticisms show pretty convincingly that IIT fails to correctly characterize consciousness: simple but large systems of unchanging logic gates end up having human-level consciousness on IIT.

However, IIT attempts to do two things: (a) provide an account of what it is for a system to have integrated information in terms of a measure Φ, and (b) equate conscious systems with ones that have integrated information.

In this post, I want to offer some evidence that IIT fails at (a). If IIT fails at (a), then it opens up the option that notwithstanding the counterexamples, IIT gets (b) right. I am dubious of this option. For one, the family of examples in this post suggests that IIT’s account of integrated information is too restrictive, and making it less restrictive will only make it more subject to Aaronson-style counterexample. For another, I have a conclusive reason to think that IIT is false: God is conscious but has no parts, whereas IIT requires all conscious systems to have parts.

On to my argument against (a). IIT implies that a system lacks integrated information provided that it can be subdivided into two subsystems of roughly equal size such that each subsystem’s evolution over the next time step is predictable on the basis of that subsystem alone, as measured by information-theoretic entropy, i.e., only a relatively small number of additional bits of information need to be added to perfectly predict the subsystem’s evolution.

The family of systems of interest to me are what I will call “low dependency input-output (ldio) systems”. In these systems, the components can be partitioned into input components and output components. Input component values do not change. Output component values depend deterministically on the input components values. Moreover, each output component value depends only on a small number of input components. It is a little surprising that any ldio systems counts as having integrated information in light of the fact that the input components do not depend on output components, but there appear to be examples, even if details of proof have not yet been given. Aaronson is confident that low density parity check codes are an example. Another example is two large grids of equal size where the second (output) grid’s values consist of applying a step of an appropriate cellular automaton to the first (input) grid. For instance, one could put a one at the output grid provided that the neighboring points on the input grid have an odd number of ones, and otherwise put a zero.

Now suppose we have an ldio system with a high degree of integrated information as measured by IIT’s Φ measure. Then we can easily turn it into a system with a much, much lower Φ using a trick. Instead of having the system update all its outputs at once, have the system update the outputs one-by-one. To do this, add to the system a small number of binary components that hold hold an “address” for the “current” output component—say, an encoding of a pair of coordinates if the system is a grid. Then at each time step have the system update only the specific output indicated by the address, and also have the system advance the address to the address of the next output component, wrapping around to the first output component once done with all of them. We could imagine that these steps are performed really, really fast, so in the blink of an eye we have updated all the outputs—but not all at once.

This sequentialized version of the ldio is still an ldio: each output value depends on a small number of input values, plus the relatively small number of bits needed to specify the address (log₂N where N is the number of outputs). But the Φ value is apt to be immensely reduced compared to the original system. For divide up the sequentialized version into any two subsystems of roughly equal size. The outputs (if any) in each subsystem can be determined by specifying the current address (log₂N bits) plus a small number of bits for the values of the inputs that the currently addressed output depends on. Thus each subsystem has low number of bits of entropy when we randomize the values of the other subsystem, and hence the Φ measure will be low. While, say, the original system’s Φ measure is of the order N^p, the new system’s Φ measure will be at most of the order log₂N plus the maximum number of inputs that an output depends on.

But the sequentialized system will have the same time evolution as the original simultaneous-processing system as long as we look at the output of the sequentialized system after N steps, where N is the number of outputs. Intuitively, the sequentialized system has a high degree of integrated information if and only if the original system does (and is conscious if and only if the original system is).

I conclude that IIT has failed to correctly characterize integrated information.

There is a simple fix. Given a system S, there is a system S^k with the same components but each of whose steps consists in k steps of the system S. We could say that a system S has integrated information provided that there is some k such that Φ(S^k) is high. (We might even define the measure of integrated information as sup_kΦ(S^k).) I worry that this move will make it too easy to have a high degree of integrated information. Many physical systems are highly predictable over a short period of time but become highly unpredictable over a long period of time, with results being highly sensitive to small variation in most of the initial values: think of weather systems. I am fairly confident that if we fix IIT as suggested, then planetary weather systems will end up having super-human levels of consciousness.

Tuesday, June 11, 2024

A very simple counterexample to Integrated Information Theory?

I’ve been thinking a bit about Integerated Information Theory (IIT) as a physicalist-friendly alternative to functionalism as an account of consciousness.

The basic idea of IIT is that we measure the amount of consciousness in a system by subdividing the system into pairs of subsystems and calculating how well one can predict the next state of each of the two subsystems without knowing the state of the other. If there is a partition which lets you make the predictions well, then the system is considered reducible, with low integrated information, and hence low consciousness. So you look for the best-case subdivision—one where you can make the best predictions as measured by Shannon entropy with a certain normalization—and say that the amount Φ of “integrated information” in the system varies in reverse order with the quality of these best predictions. And then the amount of consciousness Φ in the system corresponds to the amount of integrated information.

Aaronson gives a simple mathematical framework and what sure look like counterexamples: systems that intuitively don’t appear to be mind-like and yet have a high Φ value. Surprisingly, though, Tononi (the main person behind IIT) has responded by embracing these counterexamples as cases of consciousness.

In this post, I want to offer a counterexample with a rather different structure. My counterexample has an advantage and a disadvantage with respect to Aaronson’s. The advantage is that it is a lot harder to embrace my counterexample as an example of consciousness. The disadvantage is that my example can be avoided by an easy tweak to the definition of Φ.

It is even possible that my tweak is already incorporated in the official IIT 4.0. I am right now only working with Aaronson’s perhaps simplified framework (for one, his framework depends on a deterministic transition function), because the official one is difficult for me to follow. And it is also possible that I am just missing something obvious and making some mistake. Maybe a reader will point that out to me.

The idea of my example is very simple. Imagine a system consisting of two components each of which has N possible states. At each time step, the two components swap states. There is now only one decomposition of the system into two subsystems, which makes things much simpler. And note that each subsystem’s state at time n has no predictive power for its own state at n + 1, since it inherits the other subsystem’s state at n + 1. The Shannon entropies corresponding to the best predictions are going to be log₂N, and so Φ of the system is 2log₂N. By making N arbitrarily large, we can make Φ arbitrarily large. In fact, if we have an analog system with infinitely many states, then Φ is infinite.

Advantage over Aaronson’s counterexamples: There is nothing the least consciousness-like in this setup. We are just endlessly swapping states between two components. That’s not consciousness. Imagine the components are hard drives and we just endlessly swap the data between them. To make it even more vivid, suppose the two hard drives have the same data, so nothing actually changes in the swaps!

Disadvantage: IIT can escape the problem by modifying the measure Φ of integrated information in some way in the special case where the components are non-binary. Aaronson’s counterexamples use binary components, so they are unaffected. Here are three such tweaks. (i) Just to divide by the logarithm of the maximum number of states in a component (seems ad hoc). (ii) Restrict the system to one with binary components, and therefore require that any component with more that two possible states be reinterpreted as a collection of binary components encoding the non-binary state (but which binarization should one choose?). (iii) Define Φ of a non-binary system as a minimum of the Φ values over all possible binarizations. Either (i) or (iii) kills my counterexample.

Monday, June 10, 2024

Computation

I’ve been imagining a very slow embodiment of computation. You have some abstract computer program designed for a finite-time finite-space subset of a Turing machine. And now you have a big tank of black and white paint that is constantly being stirred in a deterministic way, but one that is some ways into the ergodic hierarchy: it’s weakly mixing. If you leave the tank for eternity, every so often the paint will make some seemingly meaningful patterns. In particular, on very rare occasions in the tank one finds an artistic drawing of the next step of the Turing machine’s functioning while executing that program—it will be the drawing of a tape, a head, and various symbols on the tape. Of course, in between these steps will be a millenia of garbage.

In fact, it turns out that (with probability one) there will be some specific number n of years such that the correct first step of the Turing machine’s functioning will be drawn in exactly n years, the correct second step in exactly 2n years, the correct third one in exactly 3n years, and so on (remembering that there is only a finite number of steps, since we have working with a finite-space subset). (Technically, this is because weak mixing implies multiple weak mixing.) Moreover, each step causally depends on the preceding one. Will this be computation? Will the tank of paint be running the program in this process?

Intuitively, no. For although we do have causal connections between the state in n years and the next state in 2n years and so on, those connections are too counterfactually fragile. Let’s say you took the artistic drawing of the Turing machine in the tank at the first step (namely in n years) and you perturbed some of the paint particles in a way that makes no visible difference to the visual representation. Then probably by 2n years things would be totally different from what they should be. And if you changed the drawing to a drawing of a different Turing machine state, the every-n-years evolution would also change.

So it seems that for computation we need some counterfactual robustness. In a real computer, physical states define logical states in a infinity-to-one way (infinitely many “small” physical voltages count as a logical zero, and infinitely many “larger” physical voltages count as a logical one). We want to make sure that if the physical states were different but not sufficiently different to change the logical states, this would not be likely to affect the logical states in the future. And if the physical states were different enough to change the logical states, then the subsequent evolution would likely change in an orderly way. Not so in the paint system.

But the counterfactual robustness is tricky. Imagine a Frankfurt-style counterfactual intervener who is watching your computer while your computer is computing ten thousand digits of π. The observer has a very precise plan for all the analog physical states of your computer during the computation, and if there is the least deviation, the observer will blow up the computer. Fortunately, there is no deviation. But now with the intervener in place, there is no counterfactual robustness. So it seems the computation has been destroyed.

Maybe it’s fine to say it has been destroyed. The question of whether a particular physical system is actually running a particular program seems like a purely verbal question.

Unless consciousness is defined by computation. For whether a system is conscious, or at least conscious in a particular specific way, is not a purely verbal question. If consciousness is defined by computation, we need a mapping between physical states and logical computational states, and what that mapping is had better not be a purely verbal question.

Tuesday, June 4, 2024

The Epicurean argument on death

The Epicurean argument is that death considered as cessation of existence does us no harm, since it doesn’t harm us when we are alive (as we are not dead then) and it doesn’t harm us when we are dead (since we don’t exist then to be harmed).

Consider a parallel argument: It is not a harm to occupy too little space—i.e., to be too small. For the harm of occupying too little space doesn’t occur where we exist (since that is space we occupy) and it doesn’t occur where we don’t exist (since we’re not there). The obvious response is that if I am too small, then the whole of me is harmed by not occupying more space. Similarly, then, if death is cessation of existence, and I die, then the whole of me is harmed by not occupying more time.

Here’s another case. Suppose that a flourishing life for humans contains at least ten years of conversation while Alice only has five years of conversation over her 80-year span of life. When has Alice been harmed? Nowhen! She obviously isn’t harmed by the lack of conversation during the five years of conversation. But neither is she harmed at any given time during the 75 years that she is not conversing. For if she is harmed by the lack of conversation at any given time during those 75 years, she is harmed by the lack of conversation during all of them—they are all on par, except maybe infancy which I will ignore for simplicity. But she’s only missing five years of conversation, not 75. She isn’t harmed over all of the 75 years.

There are temporal distribution goods, like having at least ten years of conversation, or having a broad variety of experiences, or falling in love at least once. These distribution goods are not located at times—they are goods attached to the whole of the person’s life. And there are distribution bads, which are the opposites of the temporal distribution goods. If death is the cessation of existence, it is one of these.

I wonder, though, whether it is possible for a presentist to believe in temporal distribution goods. Maybe. If not, then that’s too bad for the presentist.

Monday, June 3, 2024

On a generalization of Double Effect

Traditional formulations of the Principle of Double Effect deal with things that are said to have absolute prohibitions against them, like killing the innocent: such things must never be intended, but sometimes may be produced as a side-effect.

Partly to generalize the traditional formulation, and partly to move beyond strict deontology, contemporary thinkers sometimes modify Double Effect to be some principle like:

It is worse, or harder to justify, to intentionally cause harm than to do so merely foreseeably.

While (1) sounds plausible, there is a family of cases where intentionally causing harm is permitted but doing so unintentionally is not. To punish someone, you need to intend a harm (but maybe not an all-things-considered harm) to them. But there are cases where a harsh treatment is only permissible as a punishment. In some cases where someone has committed a serious crime and deserves to be imprisoned, and yet the imprisonment is not necessary to protect society (e.g., because the criminal has for other reasons—say, a physical injury—become incapable of repeating the crimes), then the imprisonment can be justified as punishment, but not otherwise. In those cases, the harm is permitted only if it is intended.

It still seems that it tends to be the case that intentional harm is harder to justify than merely foreseen harm.

Friday, May 31, 2024

An argument in pictures and symbols

Drawings: Adobe stock; Photos: elephant and human

Friday, May 24, 2024

Three or four ways to implement Bayesianism

We tend to imagine a Bayesian agent as starting with some credences, “the ur-priors”, and then updating the credences as the observations come in. It’s as if there was a book of credences in the mind, with credences constantly erased and re-written as the observations come in. When we ask the Bayesian agent for their credence in p, they search through the credence book for p and read off the number written beside it.

In this post, I will assume the ur-priors are “regular”: i.e., everything contingent has a credence strictly between zero and one. I will also assume that observations are always certain.

Still the above need not be the right model of how Bayesianism is actually implemented. Another way is to have a book of ur-priors in the mind, and an ever-growing mental book of observations. When you ask such a Bayesian agent what their credence in p, they on the spot look at their book of ur-priors and their book of observations, and then calculate the posterior for p.

The second way is not very efficient: you are constantly recalculating, and you need an ever-growing memory store for all the accumulated evidence. If you were making a Bayesian agent in software, the ever-changing credence book would be more efficient.

But here is an interesting way in which the second way would be better. Suppose you came to conclude that some of your ur-priors were stupid, through some kind of an epistemic conversion experience, say. Then you could simply change your ur-priors without rewriting anything else in your mind, and all your posteriors would automatically be computed correctly as needed.

In the first approach, if you had an epistemic conversion, you’d have to go back and reverse-engineer all your priors, and fix them up. Unfortunately, some priors will no longer be recoverable. From your posteriors after conditionalizing on E, you cannot recover your original priors for situations incompatible with E. And yet knowing what these priors were might be relevant to rewriting all your priors, including the ones compatible with E, in light of your conversion experience.

Here is a third way to implement Bayesianism that combines the best of the two approaches. You have a book of ur-priors and a book of current credences. You update the latter in ordinary updates. In case of an epistemic conversion experience, you rewrite your book of ur-priors, and conditionalize on the conjunction of all the propositions that you currently have credence one in, and replace the contents of your credence book with the result.

We’re not exactly Bayesian agents. Insofar as we approximate being Bayesian agents, I think we’re most like the agents of the first sort, the ones with one book which is ever rewritten. This makes epistemic conversions more difficult to conduct responsibly.

Perhaps we should try to make ourselves a bit more like Bayesian agents of the third sort by keeping track of our epistemic history—even if we cannot go all the way back to ur-priors. This could be done with a diary.

Thursday, May 23, 2024

A supertasked Sleeping Beauty

One of the unattractive ingredients of the Sleeping Beauty problem is that Beauty gets memory wipes. One might think that normal probabilistic reasoning presupposes no loss of evidence, and weird things happen when evidence is lost. In particular, thirding in Sleeping Beauty is supposed to be a counterexample to Van Fraassen’s reflection principle, that if you know for sure you will have a rational credence of p, you should already have one. But that principle only applies to rational credences, and it has been claimed that forgetting makes one not be rational.

Anyway, it occurred to me that a causal infinitist can manufacture something like a version of Sleeping Beauty with no loss of evidence.

Suppose that:

On heads, Beauty is woken up at 8 + 1/n hours for n = 2, 4, 6, ... (i.e., at 8.5 hours or 8:30, at 8.25 hours or 8:15, at 8.66… hours or 8:10, and so on).
On tails, Beauty is woken up at 8 + 1/n hours for n = 1, 2, 3, ... (i.e. at 9:00, 8:30, 8:20, 8:15, 8:10, …).

Each time Beauty is woken up, she remembers infinitely many wakeups. There is no forgetting. Intuitively she has twice as many wakeups on tails, which would suggest that the probability of heads is 1/3. If so, we have a counterexample to the reflection principle with no loss of memory.

Alas, though, the “twice as many” intuition is fishy, given that both infinities have the same cardinality. So we’ve traded the forgetting problem for an infinity problem.

Still, there may be a way of avoiding the infinity problem. Suppose a second independent fair coin is tossed. We then proceed as follows:

On heads+heads, Beauty is woken up at 8 + 1/n hours for n = 2, 4, 6, ...
On heads+tails, Beauty is woken up at 8 + 1/n hours for n = 1, 3, 5, ...
On tails+whatever, Beauty is woken up at 8 + 1/n hours for n = 1, 2, 3, ....

Then when Beauty wakes up, she can engage in standard Bayesian reasoning. She can stipulatively rigidly define t₁ to be the current time. Then the probability of her waking up at t₁ if the first coin is heads is 1/2, and the probability of her waking up at t₁ if the first coin is tails is 1. And so by Bayes, it seems her credence in heads should be 1/3.

There is now neither forgetting nor fishy infinity stuff.

That said, one can specify that the reflection principle only applies if one can be sure ahead of time that one will at a specific time have a specific rational credence. I think one can do some further modifying of the above cases to handle that (e.g., one can maybe use time-dilation to set up a case where in one reference frame the wakeups for heads+heads are at different times from the wakeups for heads+tails, but in another frame they are the same).

All that said, the above stories all involve a supertask, so they require causal infinitism, which I reject.

Tuesday, May 21, 2024

A problem for probabilistic best systems accounts of laws

Suppose that we live in a Humean universe and the universe contains an extremely large collection of coins scattered on a flat surface. Statistical analysis of all the copper coins fits extremely well with the hypothesis that each coin was independently randomly placed with the chance of heads being 1/16 and that of tails being 15/16.

Additionally, there is a gold coin where you haven’t observed which side it’s on.

And there are no other coins.

On a Lewisian best systems account of laws of nature, if the number of coins is sufficeintly large, it will be a law of nature that all coins are independently randomly placed with the chance of heads being 1/16 and that of tails being 15/16. This is true regardless of whether the gold coin is heads or tails. If you know the information I just gave, and have done the requisite statistical analysis of the copper coins, you can be fully confident that this is indeed a law of nature.

If you are fully confident that it is a law of nature that the chance of tails is 15/16, then your credence for tails for the unobserved gold coin should also be 15/16 (I guess this is a case of the Principal Principle).

But that’s wrong. The fact that the coin is of a different material from the observed coins should affect your credence in its being tails. Inductive inferences are weakened by differences between the unobserved and the observed cases.

One might object that perhaps the Lewisian will say that instead of a law saying that the chance of tails on a coin is 15/16, there would be a law that the chance of tails on a copper coin is 15/16. But that’s mistaken. The latter law is not significantly more informative than the former (given that all but one coin is copper), but is significantly less brief. And laws are generated by balancing informativeness with brevity.

Friday, May 17, 2024

Yet another argument for thirding in Sleeping Beauty?

Suppose that a fair coin has been flipped in my absence. If it’s heads, there is an independent 50% chance that I will be irresistably brainwashed tonight after I go to bed in a way that permanently forces my credence in heads to zero. If it’s tails, there will be no brainwashing. When I wake up tomorrow, there will be a foul taste in my mouth of the brainwashing drugs if and only if I’ve been brainwashed.

So, I wake up tomorrow, find no taste of drugs in my mouth, and I wonder what I should to my credence of heads. The obvious Bayesian approach would be to conditionalize on not being brainwashed, and lower my credence in heads to 1/3.

Next let’s evaluate epistemic policies in terms of a strictly proper scoring accuracy rule (T,F) (i.e., T(p) and F(q) are the epistemic utilities of having credence p when the hypothesis is in fact true or false respectively). Let’s say that the policy is to assign credence p upon observing that I wasn’t brainwashed. My expected epistemic utility is then (1/4)T(p) + (1/4)T(0) + (1/2)F(p). Given any strictly proper scoring rule, this is optimized at p = 1/3. So we get the same advice as before.

So far so good. Now consider a variant where instead of a 50% chance of being brainwashed, I am put in a coma for the rest of my life. I think it shouldn’t matter whether I am brainwashed or put in a coma. Either way, I am no longer an active Bayesian agent with respect to the relevant proposition (namely, whether the coin was heads). So if I find myself awake, I should assign 1/3 to heads.

Next consider a variant where instead of a coma, I’m just kept asleep for all of tomorrow. Thus, on heads, I have a 50% chance of waking up tomorrow, and on tails I am certain to wake up tomorrow. It shouldn’t make a difference whether we’re dealing with a life-long coma or a day of sleep. Again, if I find myself away, I should assign 1/3 to heads.

Now suppose that for the next 1000 days, each day on heads I have a 50% chance of waking up, and on tails I am certain to wake up, and after each day my memory of that day is wiped. Each day is the same as the one day in the previous experiment, so each day I am awake I should assign 1/3 to heads.

But by the Law of Large Numbers, this is basically an extended version of Sleeping Beauty: on heads I will wake up on approximately 500 days and on tails on 1000 days. So I should assign 1/3 to heads in Sleeping Beauty.