Sunday, June 16, 2024

Integration Information Theory doesn't seem to get integrated information right

I’m still thinking about Integrated Information Theory (IIT), in Aaronson’s simplified formulation. Aaronson’s famous criticisms show pretty convincingly that IIT fails to correctly characterize consciousness: simple but large systems of unchanging logic gates end up having human-level consciousness on IIT.

However, IIT attempts to do two things: (a) provide an account of what it is for a system to have integrated information in terms of a measure Φ, and (b) equate conscious systems with ones that have integrated information.

In this post, I want to offer some evidence that IIT fails at (a). If IIT fails at (a), then it opens up the option that notwithstanding the counterexamples, IIT gets (b) right. I am dubious of this option. For one, the family of examples in this post suggests that IIT’s account of integrated information is too restrictive, and making it less restrictive will only make it more subject to Aaronson-style counterexample. For another, I have a conclusive reason to think that IIT is false: God is conscious but has no parts, whereas IIT requires all conscious systems to have parts.

On to my argument against (a). IIT implies that a system lacks integrated information provided that it can be subdivided into two subsystems of roughly equal size such that each subsystem’s evolution over the next time step is predictable on the basis of that subsystem alone, as measured by information-theoretic entropy, i.e., only a relatively small number of additional bits of information need to be added to perfectly predict the subsystem’s evolution.

The family of systems of interest to me are what I will call “low dependency input-output (ldio) systems”. In these systems, the components can be partitioned into input components and output components. Input component values do not change. Output component values depend deterministically on the input components values. Moreover, each output component value depends only on a small number of input components. It is a little surprising that any ldio systems counts as having integrated information in light of the fact that the input components do not depend on output components, but there appear to be examples, even if details of proof have not yet been given. Aaronson is confident that low density parity check codes are an example. Another example is two large grids of equal size where the second (output) grid’s values consist of applying a step of an appropriate cellular automaton to the first (input) grid. For instance, one could put a one at the output grid provided that the neighboring points on the input grid have an odd number of ones, and otherwise put a zero.

Now suppose we have an ldio system with a high degree of integrated information as measured by IIT’s Φ measure. Then we can easily turn it into a system with a much, much lower Φ using a trick. Instead of having the system update all its outputs at once, have the system update the outputs one-by-one. To do this, add to the system a small number of binary components that hold hold an “address” for the “current” output component—say, an encoding of a pair of coordinates if the system is a grid. Then at each time step have the system update only the specific output indicated by the address, and also have the system advance the address to the address of the next output component, wrapping around to the first output component once done with all of them. We could imagine that these steps are performed really, really fast, so in the blink of an eye we have updated all the outputs—but not all at once.

This sequentialized version of the ldio is still an ldio: each output value depends on a small number of input values, plus the relatively small number of bits needed to specify the address (log2N where N is the number of outputs). But the Φ value is apt to be immensely reduced compared to the original system. For divide up the sequentialized version into any two subsystems of roughly equal size. The outputs (if any) in each subsystem can be determined by specifying the current address (log2N bits) plus a small number of bits for the values of the inputs that the currently addressed output depends on. Thus each subsystem has low number of bits of entropy when we randomize the values of the other subsystem, and hence the Φ measure will be low. While, say, the original system’s Φ measure is of the order Np, the new system’s Φ measure will be at most of the order log2N plus the maximum number of inputs that an output depends on.

But the sequentialized system will have the same time evolution as the original simultaneous-processing system as long as we look at the output of the sequentialized system after N steps, where N is the number of outputs. Intuitively, the sequentialized system has a high degree of integrated information if and only if the original system does (and is conscious if and only if the original system is).

I conclude that IIT has failed to correctly characterize integrated information.

There is a simple fix. Given a system S, there is a system Sk with the same components but each of whose steps consists in k steps of the system S. We could say that a system S has integrated information provided that there is some k such that Φ(Sk) is high. (We might even define the measure of integrated information as supkΦ(Sk).) I worry that this move will make it too easy to have a high degree of integrated information. Many physical systems are highly predictable over a short period of time but become highly unpredictable over a long period of time, with results being highly sensitive to small variation in most of the initial values: think of weather systems. I am fairly confident that if we fix IIT as suggested, then planetary weather systems will end up having super-human levels of consciousness.

No comments: