## Tuesday, April 10, 2012

### Top-down and bottom-up syntax

There are two fundamentally different approaches to syntax. One way starts at the bottom, with fundamental building blocks like names, variables and predicates, and thinks of a sentence as built up out of these by applying various operators. Thus, we get "The cat is on the mat and the dog is beside the mat" from elements like "the cat", "is on", "the mat", "the dog" and "is beside", by using operators like conjunction and binary-predication:

1. "The cat is on the mat and the dog is beside the mat" = conjunction(binary-predication("is on", "the cat", "the mat"), binary-predication("is beside", "the cat", "the mat")).
We can then parse the sentence back down into the elements it came from by inverting the operators (and if the operators are many-to-one there will be parsing ambiguity).

The other approach starts at the top with a sentence (or, more generally, well-formed formula) and then parses it by using parsing relations like conjoins (e.g., "p and q" conjoins "p" and "q") or binarily-applies (e.g., "the cat is on the mat" binarily-applies "is on" to "the cat" and "the mat").

There are four reasons I know of for preferring the top-down approach.

A. The possibility of multiple ways of expressing the same structure. For instance, "p and q" conjoins "p" and "q", but it's not the only way of conjoining these: "p but q" also conjoins "p" and "q". The bottom-up approach can handle this by having multiple conjunction operators like conjoin-with-and, conjoin-with-but and conjoin-with-and-also, but then we need to introduce a higher order property of these operators that says that they are conjunctions. Moreover, we should not suppose separate operators in cases where the meaning is the same, and sometimes the meaning will be exactly the same.

B. Partial sense. There is no way of forming the sentence

1. 2+2=5 and the borogove is mimsy
in the bottom-up approach, because "borogove" is not a noun of English and "is mimsy" is not a predicate of English, so there is nothing to plug into a unary-predication operator to form the second conjunct. But on the top-down approach, we can do a first step of parsing the sentence: (2) conjoins "2+2=5" and "the borogove is mimsy". And we know that one conjunct is false, so we conclude that (2) isn't true before we even start asking whether the second conjunct makes sense.

C. Ungrammatical sentences. The bottom-up approach has no way of making sense of ungrammatical sentences like a non-native speaker's

1. Jane love Bob.
For there is no predicate F such that the sentence is equal to binary-predicate(F, "Jane", "Bob"), so there is no way of parsing. But the top-down approach is not committed to all sentences coming from application of specified predicates. But the top-down approach can say that (3) binarily-applies "loves" to "Jane" and "Bob", school-marmish opinions to the contrary notwithstanding. The bottom-up approach can handle ungrammatical sentences in two different ways. One way is to suppose that any particular ungrammatical sentence is in fact a mistaken version of a grammatically correct sentence. Maybe that's true for (3), but I doubt that this is tenable for the full range of understandable but grammatically incorrect sentences. The second is to include a range of ungrammatical operators, such as binary-predicate-dropping-suffix-s. This is not satisfactory—there are too many such.

D. Extensibility. It's an oversimplification to think that a sentence that applies a predicate is formed simply out of the predicate and its arguments by means of a predication operator. There are other elements that need to be packaged up into the sentence, such as emphasis, degree of confidence, connotation, etc. These may be conveyed by tone of voice, context or choice of "synonym". One could handle this in two ways on the bottom-up view. One way is to add additional argument slots to the predication operators, slots for emphasis type, confidence, connotation, etc. This is messy, because as we discover new features of our language, we will have to keep on revising the arity of these operators. The second approach is to suppose that a sentence is formed by applying additional operators, such an emphasis operator or a confidence operator, after applying, say, the last predication operator. Thus, a particular instance of "Socrates is wise" might be the result of:

1. confidence(emphasis(predication("Socrates, "is wise"), 3.4), .98).
But now we can't take the resulting sentence and directly parse it into subject and predicate by simply inverting the predication operator. We first have to invert the confidence operator, and then we have to invert the emphasis operator. In other words, parsing requires a large number of other operators to invert. But on the top-down approach, this is easy. For if S is our confidenced and emphasized token of "Socrates is wise", then applies(S, "is wise", "Socrates"). No need to invert several additional operators to say that. If we are interested in the other features of S, however, then we can see what other parsing predicates, such as has-confidence, can be applied to S. But that's optional. Because we are not parsing in principle by inverting compositional operators, we don't need to worry about the other operators when we don't care about that aspect of the communicative content.

There is also a down-side to the top-down approach. Because of point C, we have no way of codifying its parsing predicates like binarily-applies for natural languages. That, I think, is exactly how it should be.