Alexander Pruss's Blog: June 2024

Thursday, June 27, 2024

Improving the Epicurean argument for the harmlessness of death

The famous Epicurean argument that death (considered as leading to nonexistence) is not a harm is that death doesn’t harm one when one is alive and it doesn’t harm one when one is dead, since the nonexistent cannot be harmed.

However, the thesis that the nonexistent cannot be harmed is questionable: posthumous infamy seems to be a harm.

But there’s a neat way to fix this gap in the Epicurean argument. Suppose Bob lives 30 years in an ordinary world, and Alice lives a very similar 30 years, except that in her world time started with her existence and ended with her death. Thus, literally, Alice is always alive—she is alive at every time. But notice that the fact that the existence of everything else ends with Alice does not make Alice any better off than Bob! Thus, if death is a harm to Bob, it is a harm to Alice. But even if it is possible for the nonexistent to be harmed, Alice cannot be harmed at a time at which she doesn’t exist—because there is no time at which Alice doesn’t exist.

Hence, we can run a version of the Epicurean argument without the assumption that the nonexistent cannot be harmed.

I am inclined to think that the only satisfactory way out of the argument, especially in the case of Alice, is to adopt eternalism and say that death is a harm without being a harm at any particular time. What is a harm to Alice is that her life has an untimely shortness to it—a fact that is not tied to any particular time.

Tuesday, June 25, 2024

Infinite evil

Alice and Bob are both bad people, and both believe in magic. Bob believes that he lives in an infinite universe, with infinitely many sentient beings. Alice thinks all the life there is is life on earth. They each perform a spell intended to cause severe pain to all sentient beings other than themselves.

There is a sense in which Bob does something infinitely worse than Alice: he tries to cause severe pain to infinitely many beings, while Alice is only trying to harm finitely many beings.

It is hard to judge Bob as an infinitely worse person than Alice, because we presume that if Alice thought that there were infinitely many sentient beings, she would have done as Bob did.

But even if we do not judge Bob as an infinitely worse person, shouldn’t we judge his action as infinitely worse? Yet even that doesn’t seem right. And neither seems to deserve that much more punishment than a sadistic dictator who tries to infect “mere millions” with a painful disease.

Could it be that punishment maxes out at some point?

Using as a mere means

Carl is an inventor and Davita works for a competing company. They are stuck on a deserted island for a week. Carl informs Davita about something he has just invented. Davita is perfectly honest and if questioned in a court of law will testify to what Carl said. In announcing it to Davita, according to the patent laws of their country, Carl establishes the priority of his invention. Davita does not want to help a competitor establish priority. She does not consent to being using in this way. But Carl has no one else to tell about his invention and thereby establish priority.

Carl has used Davita. In fact, he has used her body, by vibrating her eardrums in order to convey to her information that she rationally does not want to hear. But I am inclined—though not extremely strongly—to think that Carl has acted permissibly. It is an important feature of human sociality that we be permitted to communicated messages that our interlocutor does not want to hear, though there are some exceptions, such as facts that will traumatize us, or that violate the privacy of someone else, or that are selected to be misleading, etc. But it is hard to see that Carl’s action falls under some such exception.

Does Carl use Davita as a mere means in the Kantian sense? I think so. Davita does not consent. She is rational in refusing to consent.

I am inclined to conclude that Kant is simply wrong about a blanket prohibition on using others as mere means.

But there still are cases where such a prohibition stands. For instance, in sexual contexts. So I think the prohibition on using others as mere means depends on substantive features of the situation.

All that said, I am not completely sure about the case. Carl does seem sneaky. If I were Davita, I would be annoyed with him. But I don’t think I could morally object to what he did. It would be like the annoyance one has with an opponent in a game who exploits one’s weakness.

Monday, June 24, 2024

Another slit rectangle IIT system

One more observation on Integrated Information Theory (IIT), in Aaronson’s simplified formulation.

Let R_M, N be a wide rectangular grid of points (x,y) with x and y integers such that 0 ≤ x < M and 0 ≤ y < N. Suppose M ≫ 4N and M is divisible by four. Let R_M, N, t be the grid R_M, N with all points with coordinates (M/4,y) where y ≥ tN removed. This is a grid with a “bottleneck” at x = M/4.

Let S_M, N, t be a system with a binary cell at each coordinate of R_M, N, t evolving according to the rule that at the next time step, each cell’s value changes to the xor of the up-to-four neighboring cells’ values (near the boundaries and the slit, the count will be less than four).

My intuitions about IIT say that the measure of integrated information Φ(S_M, N, t) will be equal to exactly 2N bits when t = 1, and will stay at 2N bits as we decrease t, until t is below or around 1/4, at which point it will jump to 2ceil(tN) bits. This shows two problems with this version of IIT. First, we as we cut a small slit in the rectangle, we should always be decreasing the amount of integrated information in the system—not just suddenly when the slit reaches around 3/4 of the width of the rectangle. Second, we would expect the amount of integrated information to vary much more continuously rather than suddenly jump from 2N to N/2 bits at around t = 1/4.

The problem here is the rather gerrymandered character in which IIT minimizes one quantity to generate an optimal decomposition of the system, and then defines the measure of integrated information using another quantity.

Specifically, we calculate Φ by finding a cut of the system into two subsystems A and B that minimizes Φ(A,B)/min (|A|,|B|), and intuitively, there are two types of candidates for an optimal cut if M ≫ 4N:

cut the grid around x = M/2 into two equally sized portions using a cut that snips each horizontal line at exactly one cell (a vertical line is the most obvious option, but a diagonal cut will give the same Φ(A,B) value); the Φ(A,B) value is twice the vertical length of the cut, namely 2N bits
cut the grid around the bottleneck into two portions, A and B, where A contains slightly more than a quarter of the cells in the system using a cut that follows the same rule as above: more precisely, we start the cut at the top of the bottleneck, and cut diagonally down and to the right (the result is that A is an (M/4) by N rectangle together with an isosceles triangle with equal sides of size tN; the triangle’s area is swamped by the rectangle’s area because M ≫ 4N); the Φ(A,B) value is (maybe modulo an off-by-one error) twice the vertical length of the cut, namely 2ceil(tN) bits.

Assuming these intuitions are right, when t is close to 1, the first type of cut results in a smaller Φ(A,B)/min (|A|,|B|) value, but when t is below or around 1/4, the second type of cut results in a smaller value.

Friday, June 21, 2024

Conjectures about a system in the context of Integrated Information Theory

I show really be done with Integrated Information Theory (IIT), in Aaronson’s simplified formulation, but I noticed a rather interesting difficult.

In my previous post on the subject, I noticed that a double grid system where there are two grids stacked on top of one another, with the bottom grid consisting of inputs and the upper grid of outputs, and each upper value being the logical OR of the (up to) five neighboring input values will be conscious according to IIT if all the values are zero and the grid is large enough.

In this post, I am going to give some conjectures about the mathematics rather than even a proof sketch. But I think the conjectures are pretty plausible and, if true, it shows something fishy about IIT’s measure of integrated information.

Consider our dual grid system, except now the grids are with some exceptions rectangular, with a length of M along the x-axis and a width of N along the y-axis (and the stacking along the z-axis). But there are the following exceptions to the rectangularity:

at x-coordinates M/4 − 1 and M/4 the width instead of being N is N/8
at x-coordinates M/2 − 1 and M/2 the width is N/10.

In other words, at two x-coordinate areas, the grids have bottlenecks, of slightly different sizes. We suppose M is significantly larger than N, and N is very, very large (say, 10¹⁵).

Let A_k be the components on the grids with x-coordinates less than k and let B_k be the remaining components. I suspect (with a lot of confidence) that the optimal choice for a partition {A, B} that minimizes the “modified Φ value” Φ(A,B)/min (|A|,|B|) will be pretty close to {A_k, B_k} where k is in one of the bottlenecks. Thus to estimate Φ, we need only look at the Φ and modified Φ values for {A_M/4, B_M/4} and {A_M/2, B_M/2}. Note that if k is M/4 or M/2, then min (|A|,|B|) is approximately 2MN/4 and 2MN/2, respectively, since there are two grids of components.

I suspect (again with a lot of confidence) that Φ(A_k,B_k) will be approximately proportional to the width of the grid around coordinate k. Thus, Φ(A_M/4,B_M/4)/min (A_M/4,B_M/4) will be approximately proportional to (N/8)/(2NM/4) = 0.25/M while Φ(A_M/2,B_M/2)/min (A_M/2,B_M/2) will be approximately proportional to (N/10)/(2NM/2) = 0.1/M.

Moreover, I conjecture that the optimal partition will be close to {A_k, B_k} for some k in one of the bottlenecks. If so, then our best choice will be close to {A_M/2, B_M/2}, and it will yield a Φ value approximately proportional to N/10.

Now modify the system by taking each output component at an x-coordinate less than M/4 and putting four more output components besides the original output component, and with the very same value as the original output component.

I strongly suspect that the optimal partition will again be obtained by cutting the system at one of the two bottlenecks. The Φ values of at the M/4 and M/2 bottlenecks will be unchanged—mere duplication of outputs does not affect information content—but the modified Φ values (obtained by dividing Φ(A,B) by min (|A|,|B|)) will be (N/8)/(6NM/4) = 0.083/M and (N/10)/(2NM/2) = 0.1/M. Thus the optimal choice will be to partition the system at the M/4 bottleneck. This will yield a Φ value approximately proportional to N/8. Which is bigger than N/10.

For concreteness, let’s now imagine that each output is an LED. We now see that if we replace some of the LEDs by five LEDs (namely, the ones in the left-hand quarter of the system), we increase the amount of integrated information from N/10 to N/8. This has got to be wrong. Simply by duplicating LEDs we don’t add anything to the information content. And we certainly don’t make a system more conscious just by lighting up a portion of it with additional LEDs.

Notice, too, that IIT has a special proviso: if one system is a part of another with a higher degree of consciousness, the part system has no consciousness. So now imagine that a Φ value proportional to N/10 is sufficiently large for significant consciousness, so our original system, without extra output LEDs, is conscious. Now, besides the left quarter of the LEDs, add the quadruples of new LEDs that simply duplicate the original LED values (they might not even be electrically connected to the original system: they might sense whether the original LED is on, and light up if so). According to IIT, then, the new system is more conscious than the old—and the old system has had its consciousness destroyed, simply by adding enough duplicates of its LEDs. This seems wrong.

Of course, my conjectures and back-of-the-evelope calculations could be false.

Thursday, June 20, 2024

Panomnipsychism

We have good empirical ways of determining the presence of a significant amount of gold and we also have good empirical ways of determining the absence of a significant amount of gold.

Not so with consciousness. While I can tell that some chunks of matter exhibit significant consciousness (especially, the chunk that I am made of), to tell that a chunk of matter—say, a rock or a tree—does not exhibit significant consciousness relies very heavily on pre-theoretical intuition.

This makes it very hard to study consciousness scientifically. In science, we want to come up with conditions that help us explain why a phenonenon occurs where it occurs and doesn’t occur where it doesn’t occur. But if we can’t observe where consciousness does not occur, things are apt to get very hard.

Consider panomnipsychism: every chunk of matter exhibits every possible conscious state at every moment of its existence. This explains all our observations of consciousness. And since we don’t observe any absences of consciousness, panomnipsychism is not refutable by observation. Moreover, panomnipsychism is much simpler than any competing theory, since competing theories will have to give nontrivial psychophysical laws that say what conscious states are correlated with what physical states. It’s just that panomnipsychism doesn’t fit with our intuitions that rocks and trees aren’t conscious.

One might object that panomnipsychism incorrectly predicts that I am right now having an experience of hang gliding, and I can tell that I am not having any such experience. Not so! Panomnipsychism does predict that the chunk of matter making me up currently is having an experience of hang-gliding-while-not-writing-a-post, and that this chunk is also having an experience of writing-a-post-while-not-hang-gliding. But these experiences are not unified with each other on panomnipsychism: they are separate strands of conscious experience attached to a single chunk of matter. My observation of writing without gliding is among the predictions of panomnipsychism.

It is tempting to say that panomnipsychism violates Ockham’s razor. Whether it does or does not will depend on whether we understand Ockham’s razor in terms of theoretical complexity or in terms of the number of entities (such as acts of consciousness). If we understand it in terms of theoretical complexity, then as noted panomnipsychism beats its competitors. But if we understand Ockham’s razor in terms of the number of entities, then we should reject Ockham’s razor. For we shouldn’t have a general preference for theories with fewer entities. For instance, the argument that the world will soon come to an end because otherwise there are more human beings in spacetime is surely a bad one.

I think there is nothing wrong with relying on intuition, including our intuitions about the absence of consciousness. But it is interesting to note how much we need to.

Wednesday, June 19, 2024

Entropy

If p is a discrete probability measure, then the Shannon entropy of p is H(p) = − ∑_xp({x})log p({x}). I’ve never had any intuitive feeling for Shannon entropy until I noticed the well-known fact that H(p) is the expected value of the logarithmic inaccuracy score of p by the lights of p. Since I’ve spent a long time thinking about inaccuracy scores, I now get some intuitions about entropy for free.

Entropy is a measure of the randomness of p. But now I am thinking that there are other measures: For any strictly proper inaccuracy scoring rule s, we can take E_ps(p) to be some sort of a measure of the randomness of p. These won’t have the nice connections with information theory, though.

A bit more fun with Integrated Information Theory

I hope this is my last post for a while on Integrated Information Theory (IIT), in Aaronson’s simplified formulation.

One of the fun and well-known facts is that if you have an impractically large square two-dimensional grid of interconnected logic gates (presumably with some constant time-delay in each gate between inputs and outputs to prevent race conditions) in a fixed point (i.e., nothing is changing), the result can still have a degree of integrated information proportional to the square root of the number of gates. A particular known case is where you have a very large grid of XOR gates, with each gate’s output being connected to the inputs of its neighbors, all of them at 0.

That said, that kind of a grid does give off the “every part affects the rest of the system” vibe that IIT says consciousness consists in, so objecting that this grid isn’t conscious doesn’t impress IIT afficionados. Moreover, such a grid is more complex than it seems at first sight, because to avoid race conditions while maintaining the ostensible state transitions a practical implementation would require some kind of a synchronization between the gates.

Today I want to note that there seems to be an even less intuitive conscious system according to IIT. Imagine a large N by N grid of binary data, “the inputs”, and then another large N by N grid of binary data, “the outputs”, aligned above the first grid. Each value on the output grid is then the logical OR of the input value under it with the four (or three for edge points and two for corner points) neighbors of that input value. And all of the input grid is at zero.

This does not give off any “every part affects the rest of the system” vibe. And getting consciousness out of zeroed OR gates with no feedback system seems really absurd.

To see the alleged consciousness, recall the IIT measure Φ of integration information, which is supposed to be be proportional to the amount of consciousness. For any partition of the components into two nonempty subsets A and B, we compute the “effective information” EI(A→B) that A provides for B. This is the entropy of the new values of the components in B given the old values of the components in A while randomizing over all possible values of the components in A. Let Φ(A,B) = EI(A→B) + EI(B→A) be the two-way effective information in the partition. Then choose A and B to minimize Φ(A,B)/min (|A|,|B|), and let the system’s Φ be Φ(A,B) for that choice of A and B. Aaronson says it’s not clear what to do if the minimum is not unique. To be conservative (i.e., count fewer systems as conscious), if there are multiple pairs that minimize Φ(A,B)/min (|A|,|B|), I’ll assume we choose one that also minimizes Φ(A,B).

Let’s now do a handwavy proof that Φ for our pair of grids when everything is at zero is at least proportional to N, and hence for large N we have consciousness according to IIT. Let’s say A and B minimize Φ(A,B)/min (|A|,|B|). Let C be the set of all inputs x such that x is connected to both an output in A and an output in B.

Suppose first that C is nonempty. Then |C| is approximately proportional to the size the boundary of the set of outputs in A, or of the set of outputs in B, which will be at least proportional to the square root of the smaller of |A| and |B|. Moreover, EI(A→B) + EI(B→A) will be at least proportional to |C| given how the dependencies are arranged and given that all the values are at zero so that if you have any unknowns amount the values that an output value depends on, then the output value is also unknown. So, Φ(A,B)/min(|A|,|B|) will be at least proportional to (min(|A|,|B|))^−1/2. Thus if A and B minimize Φ(A,B)/min (|A|,|B|), then A and B will have to be both of the same order of magnitude, namely N², since the greater the disparity, the bigger (min(|A|,|B|))^−1/2 will be. In that case, Φ(A,B)/min (|A|,|B|) will be at least proportional to 1/N, and so Φ(A,B) will be at least proportional to (1/N) ⋅ N² = N.

Now suppose C is empty. Then one of A and B contains all the outputs. Let’s say it’s A. Then B consists solely of inputs, so EI(A→B) = 0, and Φ(A,B) = EI(B→A) will be at least proportional to the size of B. Then for large N, Φ(A,B)/min (|A|,|B|) will be bigger if instead A and B contain respectively the left and right halves of the grids as then Φ(A,B) would be at most proportional to the size of their boundaries, i.e., to N, and hence Φ(A,B)/min (|A|,|B|) would be at most proportional to 1/N. So the case where C is empty cannot be a case where A and B are optimal, at least if N is large.

Sunday, June 16, 2024

Integrated Information Theory doesn't seem to get integrated information right

I’m still thinking about Integrated Information Theory (IIT), in Aaronson’s simplified formulation. Aaronson’s famous criticisms show pretty convincingly that IIT fails to correctly characterize consciousness: simple but large systems of unchanging logic gates end up having human-level consciousness on IIT.

However, IIT attempts to do two things: (a) provide an account of what it is for a system to have integrated information in terms of a measure Φ, and (b) equate conscious systems with ones that have integrated information.

In this post, I want to offer some evidence that IIT fails at (a). If IIT fails at (a), then it opens up the option that notwithstanding the counterexamples, IIT gets (b) right. I am dubious of this option. For one, the family of examples in this post suggests that IIT’s account of integrated information is too restrictive, and making it less restrictive will only make it more subject to Aaronson-style counterexample. For another, I have a conclusive reason to think that IIT is false: God is conscious but has no parts, whereas IIT requires all conscious systems to have parts.

On to my argument against (a). IIT implies that a system lacks integrated information provided that it can be subdivided into two subsystems of roughly equal size such that each subsystem’s evolution over the next time step is predictable on the basis of that subsystem alone, as measured by information-theoretic entropy, i.e., only a relatively small number of additional bits of information need to be added to perfectly predict the subsystem’s evolution.

The family of systems of interest to me are what I will call “low dependency input-output (ldio) systems”. In these systems, the components can be partitioned into input components and output components. Input component values do not change. Output component values depend deterministically on the input components values. Moreover, each output component value depends only on a small number of input components. It is a little surprising that any ldio systems counts as having integrated information in light of the fact that the input components do not depend on output components, but there appear to be examples, even if details of proof have not yet been given. Aaronson is confident that low density parity check codes are an example. Another example is two large grids of equal size where the second (output) grid’s values consist of applying a step of an appropriate cellular automaton to the first (input) grid. For instance, one could put a one at the output grid provided that the neighboring points on the input grid have an odd number of ones, and otherwise put a zero.

Now suppose we have an ldio system with a high degree of integrated information as measured by IIT’s Φ measure. Then we can easily turn it into a system with a much, much lower Φ using a trick. Instead of having the system update all its outputs at once, have the system update the outputs one-by-one. To do this, add to the system a small number of binary components that hold hold an “address” for the “current” output component—say, an encoding of a pair of coordinates if the system is a grid. Then at each time step have the system update only the specific output indicated by the address, and also have the system advance the address to the address of the next output component, wrapping around to the first output component once done with all of them. We could imagine that these steps are performed really, really fast, so in the blink of an eye we have updated all the outputs—but not all at once.

This sequentialized version of the ldio is still an ldio: each output value depends on a small number of input values, plus the relatively small number of bits needed to specify the address (log₂N where N is the number of outputs). But the Φ value is apt to be immensely reduced compared to the original system. For divide up the sequentialized version into any two subsystems of roughly equal size. The outputs (if any) in each subsystem can be determined by specifying the current address (log₂N bits) plus a small number of bits for the values of the inputs that the currently addressed output depends on. Thus each subsystem has low number of bits of entropy when we randomize the values of the other subsystem, and hence the Φ measure will be low. While, say, the original system’s Φ measure is of the order N^p, the new system’s Φ measure will be at most of the order log₂N plus the maximum number of inputs that an output depends on.

But the sequentialized system will have the same time evolution as the original simultaneous-processing system as long as we look at the output of the sequentialized system after N steps, where N is the number of outputs. Intuitively, the sequentialized system has a high degree of integrated information if and only if the original system does (and is conscious if and only if the original system is).

I conclude that IIT has failed to correctly characterize integrated information.

There is a simple fix. Given a system S, there is a system S^k with the same components but each of whose steps consists in k steps of the system S. We could say that a system S has integrated information provided that there is some k such that Φ(S^k) is high. (We might even define the measure of integrated information as sup_kΦ(S^k).) I worry that this move will make it too easy to have a high degree of integrated information. Many physical systems are highly predictable over a short period of time but become highly unpredictable over a long period of time, with results being highly sensitive to small variation in most of the initial values: think of weather systems. I am fairly confident that if we fix IIT as suggested, then planetary weather systems will end up having super-human levels of consciousness.

Tuesday, June 11, 2024

A very simple counterexample to Integrated Information Theory?

I’ve been thinking a bit about Integerated Information Theory (IIT) as a physicalist-friendly alternative to functionalism as an account of consciousness.

The basic idea of IIT is that we measure the amount of consciousness in a system by subdividing the system into pairs of subsystems and calculating how well one can predict the next state of each of the two subsystems without knowing the state of the other. If there is a partition which lets you make the predictions well, then the system is considered reducible, with low integrated information, and hence low consciousness. So you look for the best-case subdivision—one where you can make the best predictions as measured by Shannon entropy with a certain normalization—and say that the amount Φ of “integrated information” in the system varies in reverse order with the quality of these best predictions. And then the amount of consciousness Φ in the system corresponds to the amount of integrated information.

Aaronson gives a simple mathematical framework and what sure look like counterexamples: systems that intuitively don’t appear to be mind-like and yet have a high Φ value. Surprisingly, though, Tononi (the main person behind IIT) has responded by embracing these counterexamples as cases of consciousness.

In this post, I want to offer a counterexample with a rather different structure. My counterexample has an advantage and a disadvantage with respect to Aaronson’s. The advantage is that it is a lot harder to embrace my counterexample as an example of consciousness. The disadvantage is that my example can be avoided by an easy tweak to the definition of Φ.

It is even possible that my tweak is already incorporated in the official IIT 4.0. I am right now only working with Aaronson’s perhaps simplified framework (for one, his framework depends on a deterministic transition function), because the official one is difficult for me to follow. And it is also possible that I am just missing something obvious and making some mistake. Maybe a reader will point that out to me.

The idea of my example is very simple. Imagine a system consisting of two components each of which has N possible states. At each time step, the two components swap states. There is now only one decomposition of the system into two subsystems, which makes things much simpler. And note that each subsystem’s state at time n has no predictive power for its own state at n + 1, since it inherits the other subsystem’s state at n + 1. The Shannon entropies corresponding to the best predictions are going to be log₂N, and so Φ of the system is 2log₂N. By making N arbitrarily large, we can make Φ arbitrarily large. In fact, if we have an analog system with infinitely many states, then Φ is infinite.

Advantage over Aaronson’s counterexamples: There is nothing the least consciousness-like in this setup. We are just endlessly swapping states between two components. That’s not consciousness. Imagine the components are hard drives and we just endlessly swap the data between them. To make it even more vivid, suppose the two hard drives have the same data, so nothing actually changes in the swaps!

Disadvantage: IIT can escape the problem by modifying the measure Φ of integrated information in some way in the special case where the components are non-binary. Aaronson’s counterexamples use binary components, so they are unaffected. Here are three such tweaks. (i) Just to divide by the logarithm of the maximum number of states in a component (seems ad hoc). (ii) Restrict the system to one with binary components, and therefore require that any component with more that two possible states be reinterpreted as a collection of binary components encoding the non-binary state (but which binarization should one choose?). (iii) Define Φ of a non-binary system as a minimum of the Φ values over all possible binarizations. Either (i) or (iii) kills my counterexample.

Monday, June 10, 2024

Computation

I’ve been imagining a very slow embodiment of computation. You have some abstract computer program designed for a finite-time finite-space subset of a Turing machine. And now you have a big tank of black and white paint that is constantly being stirred in a deterministic way, but one that is some ways into the ergodic hierarchy: it’s weakly mixing. If you leave the tank for eternity, every so often the paint will make some seemingly meaningful patterns. In particular, on very rare occasions in the tank one finds an artistic drawing of the next step of the Turing machine’s functioning while executing that program—it will be the drawing of a tape, a head, and various symbols on the tape. Of course, in between these steps will be a millenia of garbage.

In fact, it turns out that (with probability one) there will be some specific number n of years such that the correct first step of the Turing machine’s functioning will be drawn in exactly n years, the correct second step in exactly 2n years, the correct third one in exactly 3n years, and so on (remembering that there is only a finite number of steps, since we have working with a finite-space subset). (Technically, this is because weak mixing implies multiple weak mixing.) Moreover, each step causally depends on the preceding one. Will this be computation? Will the tank of paint be running the program in this process?

Intuitively, no. For although we do have causal connections between the state in n years and the next state in 2n years and so on, those connections are too counterfactually fragile. Let’s say you took the artistic drawing of the Turing machine in the tank at the first step (namely in n years) and you perturbed some of the paint particles in a way that makes no visible difference to the visual representation. Then probably by 2n years things would be totally different from what they should be. And if you changed the drawing to a drawing of a different Turing machine state, the every-n-years evolution would also change.

So it seems that for computation we need some counterfactual robustness. In a real computer, physical states define logical states in a infinity-to-one way (infinitely many “small” physical voltages count as a logical zero, and infinitely many “larger” physical voltages count as a logical one). We want to make sure that if the physical states were different but not sufficiently different to change the logical states, this would not be likely to affect the logical states in the future. And if the physical states were different enough to change the logical states, then the subsequent evolution would likely change in an orderly way. Not so in the paint system.

But the counterfactual robustness is tricky. Imagine a Frankfurt-style counterfactual intervener who is watching your computer while your computer is computing ten thousand digits of π. The observer has a very precise plan for all the analog physical states of your computer during the computation, and if there is the least deviation, the observer will blow up the computer. Fortunately, there is no deviation. But now with the intervener in place, there is no counterfactual robustness. So it seems the computation has been destroyed.

Maybe it’s fine to say it has been destroyed. The question of whether a particular physical system is actually running a particular program seems like a purely verbal question.

Unless consciousness is defined by computation. For whether a system is conscious, or at least conscious in a particular specific way, is not a purely verbal question. If consciousness is defined by computation, we need a mapping between physical states and logical computational states, and what that mapping is had better not be a purely verbal question.

Tuesday, June 4, 2024

The Epicurean argument on death

The Epicurean argument is that death considered as cessation of existence does us no harm, since it doesn’t harm us when we are alive (as we are not dead then) and it doesn’t harm us when we are dead (since we don’t exist then to be harmed).

Consider a parallel argument: It is not a harm to occupy too little space—i.e., to be too small. For the harm of occupying too little space doesn’t occur where we exist (since that is space we occupy) and it doesn’t occur where we don’t exist (since we’re not there). The obvious response is that if I am too small, then the whole of me is harmed by not occupying more space. Similarly, then, if death is cessation of existence, and I die, then the whole of me is harmed by not occupying more time.

Here’s another case. Suppose that a flourishing life for humans contains at least ten years of conversation while Alice only has five years of conversation over her 80-year span of life. When has Alice been harmed? Nowhen! She obviously isn’t harmed by the lack of conversation during the five years of conversation. But neither is she harmed at any given time during the 75 years that she is not conversing. For if she is harmed by the lack of conversation at any given time during those 75 years, she is harmed by the lack of conversation during all of them—they are all on par, except maybe infancy which I will ignore for simplicity. But she’s only missing five years of conversation, not 75. She isn’t harmed over all of the 75 years.

There are temporal distribution goods, like having at least ten years of conversation, or having a broad variety of experiences, or falling in love at least once. These distribution goods are not located at times—they are goods attached to the whole of the person’s life. And there are distribution bads, which are the opposites of the temporal distribution goods. If death is the cessation of existence, it is one of these.

I wonder, though, whether it is possible for a presentist to believe in temporal distribution goods. Maybe. If not, then that’s too bad for the presentist.

Monday, June 3, 2024

On a generalization of Double Effect

Traditional formulations of the Principle of Double Effect deal with things that are said to have absolute prohibitions against them, like killing the innocent: such things must never be intended, but sometimes may be produced as a side-effect.

Partly to generalize the traditional formulation, and partly to move beyond strict deontology, contemporary thinkers sometimes modify Double Effect to be some principle like:

It is worse, or harder to justify, to intentionally cause harm than to do so merely foreseeably.

While (1) sounds plausible, there is a family of cases where intentionally causing harm is permitted but doing so unintentionally is not. To punish someone, you need to intend a harm (but maybe not an all-things-considered harm) to them. But there are cases where a harsh treatment is only permissible as a punishment. In some cases where someone has committed a serious crime and deserves to be imprisoned, and yet the imprisonment is not necessary to protect society (e.g., because the criminal has for other reasons—say, a physical injury—become incapable of repeating the crimes), then the imprisonment can be justified as punishment, but not otherwise. In those cases, the harm is permitted only if it is intended.

It still seems that it tends to be the case that intentional harm is harder to justify than merely foreseen harm.

Alexander Pruss's Blog