## Wednesday, May 7, 2008

### Dembski's definition of specified complexity

A central part of Dembski's definition of specified complexity is a way of measuring whether an event E is surprising. This is not just a probabilistic measure. If you roll eleven dice and get the "unsurprising" sequence 62354544555, this sequence has the same probability 1/611 as the intuitively more "surprising" sequences 12345654321 or 11111111111. It would be a mistake (a mistake actually made by some commenters on the design argument) to conclude from the probabilistic equality that there is no difference in surprisingness, since what one should conclude that is that surprisingness is not just a matter of the probabilities. Instead of talking about "surprisingness", however, Dembski talks about "specification". The idea is that you can "specify" the sequences 12345654321 or 11111111111 ahead of time in a neat way. The first you specify as the only sequence of eleven dice throws consisting of a strict monotonic ending precisely where a strict monotonic decline begins. The second is one of only six sequences of eleven dice throws that each yield the same result.

I will describe Dembski's account of specification, and that will be somewhat technical, and then I will criticize it, and consider a way of fixing it up which is not entirely satisfactory.

Dembski proposes a measure of specification.[note 1] Suppose we have a probability space S (e.g., the space of all sequence of eleven dice throws) with a probability measure PH (defined by some chance hypothesis H). Let f be a "detached" real-valued function on S (a lot more on detachment later). Then an event E in the probability space S is just a measurable subset of the probability space. For any real-valued function f defined on S and real number y, let fy be the set of all points x in S such that f(x)≥y. This is an event in S. Indeed, fy is the event of being at a point x in our probability space where f(x) is at least y.

We now say that an event E in S is specified to significance a provided that there is a function f on S "detached" from E (a lot more on detachment later on) and a real number y such that fy contains E and PH(fy)<a.

For instance, in our eleven dice throw case, if x is a sequence of eleven dice throw results, let f(x) be the greatest number n such that at least n of the throw results in x are the same. Then, f11 is equivalent to the event that all eleven of the dice throws were the same. Let E be the event of the sequence 11111111111 occurring. Then E is contained in f11, and PH(f11)=1/611<10-8, and so E is specified to significance 10-8, as long as we can say that f is detached from E. Similarly, we can let f be the length of the largest interval over which a sequence of dice throws is monotonic increasing plus the length of the largest interval over which a sequence of dice throws is monotonic decreasing, and then our sequence 12345654321 will be a member of f12, and if f is detachable, we can thus compute a significance for this result.

The crucial part of the account is the notion of "detachability". Without such a condition, every improbable event E is significant. Given our intuitively unsurprising sequence 62354544555 (which was as a matter of fact generated by a pretty random process: I made it by using random.org[note 2]). Let f be the function assigning 1 to the sequence 62354544555 and 0 to every other sequence. Then our given sequence is the only member of f1, and so without any detachability condition on f, we would conclude that we have specification to a high degree of significance. But of course this is cheating. The function f was jerryrigged by me to detect the event we were looking at, and one can always thus jerryrig a function. To check for specification, however, we need a function f that could in principle have been specified beforehand, i.e., before we found out what the result of the dice throwing experiment was. If we get significance with such a function, then we can have some confidence that our event E is specified.

Dembski, thus, owes us an account of detachability. In No Free Lunch, he offers the following:

a rejection function f is detachable from E if and only if a subject possesses background knowledge K that is conditionally independent of E (i.e., P(E|H& K) = P(E|H)) and such that K explicitly and univocally identifies the function f.

Or, to put it in our notation, f is detachable from E iff the epistemic agent has background knowledge K such that PH(E|K)=PH(E). It is hard to overstress how central this notion of detachability is to Dembski's account of specification, and therefore to his notion of specified complexity, and thus to his project.

But there is a serious problems with detachability: I am not sure that the independence condition PH(E)=PH(E|K) makes much sense. Ordinarily, the expression P(...|K) makes sense only if K is an event in the probability space or K is a random variable on the probability space (i.e., a measurable function on the probability space). In this case, K is "knowledge". This is ambiguous between the content of the knowledge and the state of knowing. Let's suppose first that K is the content of the knowledge—that's, after all, what we normally mean in probabilistic epistemology when we talk of conditioning on knowledge. So, K is some proposition which, presumably, expresses some event—probabilities are defined with respect to events, not propositions, strictly speaking.[note 3] What is this proposition and event? The knowledge is supposed to "identify" the function f. It seems, then, that K is a proposition of the form "There is a unique function f such that D(f)", where D is an explicit and univocal identification.

But on this reading of "knowledge", the definition threatens uselessness. Let K be the proposition that there is a unique function f such that f(x)=1 if and only if x equals 62354544555 and f(x)=0 otherwise. This function f was our paradigm of a non-detachable function. But what is PH(E|K)? Well, K is a necessary truth: It is a fact of mathematics that there is a unique function as described. If PH is an objective probability, then all necessary truths have probability 1, and so to condition on a necessary truth changes nothing: PH(E|K)=PH(E), and we get detachability for free for f, and indeed for every other function.

So on the reading where K is the content of the knowledge, if necessary truths get unit probability, Dembski's definition is pretty much useless—every function that has a finite mathematical description becomes detachable, since truths about whether a given finite mathematical description uniquely describes a function are necessary truths.

But perhaps PH is an epistemic probability, so that necessary truths might have probability less than 1. One problem with this is that much of the nice probabilistic apparatus now breaks down. How on earth do we define a probability space in such a way that we can assign probabilities less than 1 to necessary truths? Do we partition the space of possibilities-and-impossibilities into regions where it is true that there is a unique function f such that f(x)=1 iff x=62354544555 and f(x)=0 otherwise and regions where this is false? I am not sure what we can make of probabilities in the regions where this is false. Presumably they are regions where mathematics breaks down. How do we avoid incoherence in applying probability theory—as Dembski wants to!—over the space of possibilities-and-impossibilities?

Moreover, it seems to me that on any reasonable notion of epistemic probabilities, those necessary truths that the epistemic agent would immediately see as necessary truths were they presented to her should get probability 1. Any epistemic agent who is sufficiently smart to follow Dembski's arguments and who knows set theory would immediately see as a necessary truth the claim that there is a unique function f on S such that f(x)=1 iff x=62354544555 and f(x)=0 otherwise. So even if we allow that some necessary truths, such as that horses are mammals, might get epistemic probabilities less than 1, the ones that matter for Dembski are not like that—they are self-evident necessary truths in the sense that once you understand them, you understand that they are true. The prospects for an account of epistemic probability that does not assign 1 to such necessary truths strike me as unpromising, though I think this is the route Dembski actually wants to go according to Remark 2.5.7 of No Free Lunch.

Besides, as a matter of fact, any agent who is sufficiently smart to understand Dembski's methods will be one who will assign 1 to the claim that there is a unique function f as above. So on the objective probability reading, Dembski's definition of detachability applies to all finitely specifiable functions. On the epistemic, it does so too, at least for agents who are sufficiently smart. This makes Dembski's definition just about useless for any legitimate purposes.

Let's now try the second interpretation of K, where K is not the content of the knowledge, but the event of the agent's actually knowing the identification of f. This is more promising, I think. Let p be the proposition that there is a unique function f on S such that f(x)=1 iff x=62354544555 and f(x)=0 otherwise. Let us suppose, then, that K is the event of the agent knowing that p. It is essential, we've seen, to judging f to be non-detachable that PH(K) be not equal to 1. This requires a theory of knowledge where for an agent to know p is more than just for an agent to be in a position to know p, as when the agent knows things that self-evidently entail p. An actual explicit belief is required for knowledge on this view. Seen this way, PH(K)<1, since the agent might never have thought about K. Since K is a bona fide event on this view, we can apply probability theory without any worries about dealing with incoherence. So far so good.

But new problems show up. It is essential to Dembski's application of his theory to Intelligent Design that it apply in cases where people have only thought of f after seeing the event E—cases of "old evidence". Take, for instance, Dembski's example of the guy whose allegedly random choices of ballot orderings heavily favored one party. Dembski proposes a function f that counts the number of times that one party is on the top of the ballot. But I bet that Dembski did not actually think of this function before he heard of the event E of skewed ballot orderings. Moreover, hearing of the event surely made him at least slightly more likely to think of this function. If he never heard of this event, he might never have thought about the issue of ballot orderings, and hence about functions counting them. There is surely some probabilistic dependence between Dembski's knowing that there is such a function and the event E. Similarly, seeing the sequence 11111111111 does make one more likely to think of the function counting the number of repetitions. One might have thought of that function anyway, but the chance of thinking of it is higher when one does see the result. Hence, there is no independence, and, thus, no detachability.

This problem is particularly egregious in some of the biological cases that ultimately one might want to apply Dembski's theory to. Let's consider the event E that there is intelligent life. Let K be any state of knowledge identifying a function. Surely, there is probabilistic dependence between E and K. After all, PH(K|~E)=0, since were there no intelligent life, nobody would know anything, as there would be nobody to do the knowing. Thus, PH(E|K)=1, which entails that E and K are not probabilistically independent unless P(E)=1.

So the problem is that in just about no interesting case where we already knew about E will f be detachable from E, and yet the paradigmatic applications of Dembski's theory to Intelligent Design are precisely such cases. Here is a suggestion for how to fix this up (inspired by some ideas in Dembski's The Design Inference). We allow a little bit of dependence between E and K, but require that the amount of dependence not be too big. My intuition is that smaller the significance a of the specification (note that the smaller the significance a, the more significant the specification—that's how it goes in statistics), the more dependence we can permit. To do that right, we'd have to choose an appropriate measure of dependence, but since I'm just sketching this, I will leave out the details.

However, there is a difficulty. The difficulty is that in "flagship cases" of Intelligent Design, such as the arising of intelligence or of reproducing life-forms, there is a lot of dependence between E and K, since our language is in large part designed (consciously or not) for discussing these kinds of events. It is in large part because reproducing life-forms are abundant on earth that our language makes it easy to describe reproduction, and that our language makes it easy to describe reproduction significantly increases the probability that we will think of functions f that involve reproductive concepts. In these cases, the amount of dependence between E and K will be quite large.

There may still be cases where there is little dependence, at least relative to some background data. These will be cases where our language did not develop to describe the particular cases observed but developed to describe other cases, perhaps similar to the ones observed but largely probabilistically independent of them. Thus, our language about mechanics and propulsion plainly did not develop to describe bacterial flagella, and it may be that the existence of bacterial flagella is probabilistically independent of the things for which our language developed. So maybe the above account works if K is a state of knowing a specification that includes bacterial flagella. Or not! There are hard questions here. One of the hard questions is with regard to how particular K is. Is K the event of one particular knower, say William Dembski, having the identification of f? If so, then there is a lot of probabilistic dependence between the existence of bacterial flagella and K, since the probability of Dembski's existing in a world where there are no bacterial flagella is very low, since history would have gone very differently without bacterial flagella, and probably Dembski would never have come into existence.

Or is K the event of some knower or other having the identification of f? Then, to evaluate the dependence between K and the existence of bacterial flagella we would have to examine the almost intractable question of what a world without bacterial flagella would have been like.