I tell you: "I've been thinking about philosophy of language." Where is my token utterance? (With apologies to Dennett.)
The obvious answer is that my utterance is in the air between us, constituted of patterns of higher and lower density molecules. It is a temporally extended entity, perhaps an event.
But where in the air? The waves can be detected all around me, at many different points. And suppose I am talking to you through a wall. The time-varying patterns of air pressure changes produce pressure changes in the wall, which in turn result in more vibrations of the air on the other side of the wall, all the way into your ears. We can't exclude the wall--we shouldn't say that the utterance straddles the wall. Plus, I we could talk, with great difficulty and poor audibility (glug, glug, glug), under water, and we wouldn't want to exclude the water, so we shouldn't limit to the air. And I could attach my eardrum right to the wall with a metal rod and hear the utterance in the wall, though I don't recommend the experiment. So the utterance is found in the wall, too.
But what if we're talking on cell phones? The sound waves get converted into movements of a magnet in a coil, thence into electricity, then into radio waves, and then into electricity in wires, light in fiber optics and/or radio waves going to and from satellites, and finally into radio waves, electricity, movements of a magnet, and finally sound waves again. Perhaps my utterance stops at my phone's microphone. But you hear my utterance when you listen at your phone. So my utterance is found on both ends of the communication. Does it straddle the electromagnetic middleman, or is it there, too, just as it was in the wall? Should we limit the utterance to vibrations of matter? This does not seem plausible once we've noticed that utterances are found in walls, but let me try a different tack.
I could talk with you even if you were deaf and knew how to lip read. In lip reading, you don't perceive the vibrations. Rather, you perceive the shape of the mouth that helps form the vibrations. If we do not count an utterance as found in the shape of the mouth, then utterances become inessential for human communication, which would be absurd. Now, perhaps you could say that when I am speaking with a lip reader, I am using a different modality--I am lipping. However, I could speak to several people, and unbeknownst to me one of them is a lip reader. Each of them perceives the same utterance of mine, and I speak in the same modality to them all. It's just that the lip reader accesses my utterance differently from those who hear it. Plus, we could speak to each other subvocally.
If my utterance is found in the wall and on the lips and in subvocal communication, it will also be found in the electromagnetic modalities of cell phone conversations. Now it looks like the utterance includes the whole causal process mediating between the speaker and the listener, broadly understood.
But what if there is no listener? "My hearing must be poor as I didn't hear what she said." Yet there is something she uttered. There are unheard utterances. But how far do they go? As I speak on the bottom of a high rocky mountain, the vibrations spread throughout the rock. By the time they reach the top, no human can hear, but perhaps an alien with very sensitive ears can. We could try to bite the bullet and deny that there are unheard utterances. It takes a speaker and a listener to make an utterance (they might be the same if the speaker is speaking to herself). But that doesn't seem right. Is it really true that when the alien at the top of the mountain starts listening, that suddenly makes the vibrations in the rock to have been (after all the vibrations happened before the alien listened, since they take non-zero time to propagate) part of the utterance.
So it seems my utterance is throughout the rock, wherever an alien might listen in. If so, then more generally, we need to say that my utterance is everywhere in the universe where in principle one could decode it. If I am speaking over a cell phone, the electromagnetic radiation spreads throughout the forward lightcone of the communication, and goes on for lightyears into space, perhaps being in principle decodable for quite a great distance. Our utterances, then, are really large and go far--we cannot stop what we said. (This reminds one of James 3:5.)
The proposal, then, is that my utterance token is wherever there is sufficient information allowing decoding. But what is it to decode what I said? It seems to be to classify the utterance token under its type. "Oh, he was saying that we should develop the theory, not that we should devil-up the theory!" (One of my grad students had this kind of aha! moment; my odd accentuation of "develop" made him think through much of the fall semester that I was metaphorically talking of devilling up theories) So, an utterance token is wherever it can be decoded into an utterance type. But of course, by this we mean correctly decoded. So we need a notion of the correct decoding of an utterance token.
Suppose I misspeak, and say: "Snow is right." You say: "Did I hear you correctly? Did you say 'Snow is right.'" I say: "Yes, but I misspoke. I meant to say that snow is white." So, in this case, it seems that the correct decoding is "Snow is right", even though I meant to say that snow is white. The sound from my misspeaking propagates through rocks and up a mountain whereon an alien is listening. At that point, the alien hears "Snow is white." The alien mishears, and if there is no way to correctly decode where the alien is, there is no utterance there.
So when we try to evaluate what the correct decoding of an utterance token is, we don't look in the speaker's mind but we don't look very far away either. Maybe the idea is this: the correct decoding of an utterance token is the one that we would get in a normal environment. But that's not right, either. For a speaker can compensate for an abnormal environment. If there is some weird background noise which is making rs be heard as ws and ws as rs, the speaker can move her vocal cords in a way that in a normal environment would produce the utterance "Snow is right" but under the circumstances produces the utterance "Snow is white." And she isn't misspeaking.
Perhaps, then, the story is this. The correct decoding of an utterance token is that decoding which the intended (expect?) listener would get if she and the communication were functioning in the way in which she is functioning according to the speaker's implicit or explicit model of the listener's functioning and environment. (What if the speaker intends two people to hear different utterances, for instance because they have different hearing impairments? Then the speaker makes two different utterances.)
Our story about utterance tokens and decodings is getting complex. Where the utterance is depends on what its correct decoding is. What its correct decoding is depends both on the speaker's model of the communication process and on what the speaker in fact produces.
That's the best I can do for utterances, and it's not so bad, I think. But at this point, or even earlier, the concept of an utterance is apt to seem rather unnatural. Such a messy mishmash of the intentional and external shouldn't be central to our concept of language. Instead of talking of utterances, we should simply talk of the causal mediation between the mental states of people: e.g., between the speaker's intending to communicate A to y and the listener's having apparently had A communicated by x (here, A is the message; in some cases we can model it as an ordered pair of a proposition and an illocutionary force). Sometimes things misfire. Aliens listen in who weren't intended to. (Caveat audiens is particularly applicable when the audiens is not intended to be an audiens.) Sounds get made that aren't intended. These are defective cases of communication. Classifying certain kinds of defects, like defects of decoding, may require a notion of utterance. But concepts that appear in the analysis of defects should not be expected to be particularly natural. We should, rather, start our analysis with the correctly functioning case, the case of proper communication. The philosophically crucial thing is we have mental states of intending to communicate and being apparently communicated to, typically in different people, and what we call "language" is the story about the connection between these mental states.