Wednesday, June 19, 2024

A bit more fun with Integrated Information Theory

I hope this is my last post for a while on Integrated Information Theory (IIT), in Aaronson’s simplified formulation.

One of the fun and well-known facts is that if you have an impractically large square two-dimensional grid of interconnected logic gates (presumably with some constant time-delay in each gate between inputs and outputs to prevent race conditions) in a fixed point (i.e., nothing is changing), the result can still have a degree of integrated information proportional to the square root of the number of gates. A particular known case is where you have a very large grid of XOR gates, with each gate’s output being connected to the inputs of its neighbors, all of them at 0.

That said, that kind of a grid does give off the “every part affects the rest of the system” vibe that IIT says consciousness consists in, so objecting that this grid isn’t conscious doesn’t impress IIT afficionados. Moreover, such a grid is more complex than it seems at first sight, because to avoid race conditions while maintaining the ostensible state transitions a practical implementation would require some kind of a synchronization between the gates.

Today I want to note that there seems to be an even less intuitive conscious system according to IIT. Imagine a large N by N grid of binary data, “the inputs”, and then another large N by N grid of binary data, “the outputs”, aligned above the first grid. Each value on the output grid is then the logical OR of the input value under it with the four (or three for edge points and two for corner points) neighbors of that input value. And all of the input grid is at zero.

This does not give off any “every part affects the rest of the system” vibe. And getting consciousness out of zeroed OR gates with no feedback system seems really absurd.

To see the alleged consciousness, recall the IIT measure Φ of integration information, which is supposed to be be proportional to the amount of consciousness. For any partition of the components into two nonempty subsets A and B, we compute the “effective information” EI(AB) that A provides for B. This is the entropy of the new values of the components in B given the old values of the components in A while randomizing over all possible values of the components in A. Let Φ(A,B) = EI(AB) + EI(BA) be the two-way effective information in the partition. Then choose A and B to minimize Φ(A,B)/min (|A|,|B|), and let the system’s Φ be Φ(A,B) for that choice of A and B. Aaronson says it’s not clear what to do if the minimum is not unique. To be conservative (i.e., count fewer systems as conscious), if there are multiple pairs that minimize Φ(A,B)/min (|A|,|B|), I’ll assume we choose one that also minimizes Φ(A,B).

Let’s now do a handwavy proof that Φ for our pair of grids when everything is at zero is at least proportional to N, and hence for large N we have consciousness according to IIT. Let’s say A and B minimize Φ(A,B)/min (|A|,|B|). Let C be the set of all inputs x such that x is connected to both an output in A and an output in B.

Suppose first that C is nonempty. Then |C| is approximately proportional to the size the boundary of the set of outputs in A, or of the set of outputs in B, which will be at least proportional to the square root of the smaller of |A| and |B|. Moreover, EI(AB) + EI(BA) will be at least proportional to |C| given how the dependencies are arranged and given that all the values are at zero so that if you have any unknowns amount the values that an output value depends on, then the output value is also unknown. So, Φ(A,B)/min(|A|,|B|) will be at least proportional to (min(|A|,|B|))−1/2. Thus if A and B minimize Φ(A,B)/min (|A|,|B|), then A and B will have to be both of the same order of magnitude, namely N2, since the greater the disparity, the bigger (min(|A|,|B|))−1/2 will be. In that case, Φ(A,B)/min (|A|,|B|) will be at least proportional to 1/N, and so Φ(A,B) will be at least proportional to (1/N) ⋅ N2 = N.

Now suppose C is empty. Then one of A and B contains all the outputs. Let’s say it’s A. Then B consists solely of inputs, so EI(AB) = 0, and Φ(A,B) = EI(BA) will be at least proportional to the size of B. Then for large N, Φ(A,B)/min (|A|,|B|) will be bigger if instead A and B contain respectively the left and right halves of the grids as then Φ(A,B) would be at most proportional to the size of their boundaries, i.e., to N, and hence Φ(A,B)/min (|A|,|B|) would be at most proportional to 1/N. So the case where C is empty cannot be a case where A and B are optimal, at least if N is large.

1 comment:

Alexander R Pruss said...

Here is a fun embodiment of the array setup: Put people on an extremely large grid (way bigger than the earth). Have each person count how many of their four neighbors are smiling. If the array is large enough, it instantly becomes super-human conscious according to IIT. What's worse, all the people on the array become unconscious, because IIT forbids a conscious system from being a part of a more conscious system.