Consider two random variables, X and Y, whose probability densities pX and pY are shown in the following graph, with pX(x)=1, in blue, and pY(x)=2x, in red.
Looking at the graph, it is tempting to say things like this: X is a uniform distribution and has equal probability of having any value between 0 and 1, while values closer to 0 are much less likely than values close to 1 for Y. We might even look at the graph and say things like: P(X=0.1)=P(X=0.2) while P(Y=0.2)>P(Y=0.1).
Of course, with these continuous distributions, classical probability theory assigns equal zero probability to every value: P(X=a)=P(Y=a)=0 for all a. But this seems wrong, and so we may want to bring in infinitesimals to remedy this, assigning to P(Y=0.2) an infinitesimal twice as big as the one we assign to P(Y=0.1), while P(X=0.2)=P(X=0.1).
Or we might attempt to express the pointwise non-uniformity of Y by using conditional probability P(Y=0.2|Y=0.1 or Y=0.2)=2/3 and P(Y=0.1|Y=0.1 or Y=0.2)=1/3, while P(X=0.2|X=0.1 or X=0.2)=1/2=P(X=0.1|X=0.1 or X=0.2).
In other words, it is tempting to say: X is pointwise uniform while Y is not.
Such pointwise thinking is problematic, however. For I could have generated Y by taking our uniformly distributed random variable X and setting Y=X1/2. (It's an easy exercise to see that if X is uniform then the probability density of X1/2 is given by p(x)=2x.) Suppose that I am right in what I said about the uniformity of pointwise and conditional probabilities for X. Then P(Y=0.1)=P(X=0.01)=P(X=0.04)=P(Y=0.2). And P(Y=0.2|Y=0.1 or Y=0.2)=P(X=0.04|X=0.01 or X=0.04)=1/2=P(X=0.01|X=0.01 or X=0.04)=P(Y=0.1|Y=0.1 or Y=0.2), since Y=0.1 if and only if X=0.01 and Y=0.2 if and only if X=0.04.
So in fact, Y could have the nonuniform distribution of the red line in the graph and yet be just as pointwise uniform as X.
Lesson 1: It is a mistake to describe a uniform distribution on a continuous set as one "where every outcome is equally likely". For even if one finds a way of making nontrivial sense of this, by infinitesimals or conditional probabilities say (and I think similar arguments will work for any other plausible characterization), a nonuniform distribution can satisfy this constraint just as happily.
Lesson 2: One cannot characterize continuous distributions by facts about pointwise probabilities. It is tempting to characterize the uniform distribution by P(X=a)=P(X=b) (infinitesimal version, but similarly for conditional probabilities) and the nonuniform one by P(Y=a)=(a/b)P(Y=b). But in fact both could have the same pointwise properties. I find this lesson deeply puzzling. Intuitively, it seems that chances of aggregate outcomes (like the chance that X is between 0.1 and 0.2) should come out of pointwise chances. But no.
The converse characterization would also be problematic: pointwise facts can't be derived from the distribution facts. For imagine a random variable Z which is such that Z=X unless X=1/2, and Z=1/4 if X=1/2 (cf. this paper). This variable has the same distribution as X, but it has obviously different pointwise probability facts.
Looking at the graph, it is tempting to say things like this: X is a uniform distribution and has equal probability of having any value between 0 and 1, while values closer to 0 are much less likely than values close to 1 for Y. We might even look at the graph and say things like: P(X=0.1)=P(X=0.2) while P(Y=0.2)>P(Y=0.1).
Of course, with these continuous distributions, classical probability theory assigns equal zero probability to every value: P(X=a)=P(Y=a)=0 for all a. But this seems wrong, and so we may want to bring in infinitesimals to remedy this, assigning to P(Y=0.2) an infinitesimal twice as big as the one we assign to P(Y=0.1), while P(X=0.2)=P(X=0.1).
Or we might attempt to express the pointwise non-uniformity of Y by using conditional probability P(Y=0.2|Y=0.1 or Y=0.2)=2/3 and P(Y=0.1|Y=0.1 or Y=0.2)=1/3, while P(X=0.2|X=0.1 or X=0.2)=1/2=P(X=0.1|X=0.1 or X=0.2).
In other words, it is tempting to say: X is pointwise uniform while Y is not.
Such pointwise thinking is problematic, however. For I could have generated Y by taking our uniformly distributed random variable X and setting Y=X1/2. (It's an easy exercise to see that if X is uniform then the probability density of X1/2 is given by p(x)=2x.) Suppose that I am right in what I said about the uniformity of pointwise and conditional probabilities for X. Then P(Y=0.1)=P(X=0.01)=P(X=0.04)=P(Y=0.2). And P(Y=0.2|Y=0.1 or Y=0.2)=P(X=0.04|X=0.01 or X=0.04)=1/2=P(X=0.01|X=0.01 or X=0.04)=P(Y=0.1|Y=0.1 or Y=0.2), since Y=0.1 if and only if X=0.01 and Y=0.2 if and only if X=0.04.
So in fact, Y could have the nonuniform distribution of the red line in the graph and yet be just as pointwise uniform as X.
Lesson 1: It is a mistake to describe a uniform distribution on a continuous set as one "where every outcome is equally likely". For even if one finds a way of making nontrivial sense of this, by infinitesimals or conditional probabilities say (and I think similar arguments will work for any other plausible characterization), a nonuniform distribution can satisfy this constraint just as happily.
Lesson 2: One cannot characterize continuous distributions by facts about pointwise probabilities. It is tempting to characterize the uniform distribution by P(X=a)=P(X=b) (infinitesimal version, but similarly for conditional probabilities) and the nonuniform one by P(Y=a)=(a/b)P(Y=b). But in fact both could have the same pointwise properties. I find this lesson deeply puzzling. Intuitively, it seems that chances of aggregate outcomes (like the chance that X is between 0.1 and 0.2) should come out of pointwise chances. But no.
The converse characterization would also be problematic: pointwise facts can't be derived from the distribution facts. For imagine a random variable Z which is such that Z=X unless X=1/2, and Z=1/4 if X=1/2 (cf. this paper). This variable has the same distribution as X, but it has obviously different pointwise probability facts.
Regarding the final comment, in fact it's possible to get a random variable whose distribution is the same as Y's but whose pointwise infinitesimal (and I expect conditional--but I haven't worked that out) probabilities are plausibly more like we'd expect from the graph.
ReplyDeleteLet X be uniform and let X' be an independent copy of X. Let α = P(X=a) for all a, where α is infinitesimal. Let Y' = max(X,X'). Then it's pretty easy to check that Y' has the same probability distribution as Y. But pointwise, Y' is rather different from Y, when Y is as in the post. For it's easy to check that if we allow these infinitesimal calculations, then P(Y'=a) = 2aα + O(α^2), so to a first order approximation in α, we have exactly what the graph would lead us to think. On the other hand, P(Y=a) = α for all a.
I wonder if the pointwise data carries some additional information about a random process beyond what is carried by the distributional data. Or if the pointwise stuff is all nonsense. My previous work on infinitesimals suggests the pointwise stuff is junk, but perhaps the conditional probability version works better.
I know nothing about probability, and I can’t follow your whole post, but here is a probably worthless idea.
ReplyDeleteWe are confronted with two random variables X and Y, and we wonder how to think about them probabilistically. We do it this way: take a random sample of N values of X, where N is some finite number. The exact size of N doesn’t matter. Call the (non-continuous) variable describing the sample X*. Do the same for Y and Y*. I think it is now true to say that P(X*=0.1)=P(X*=0.2) while P(Y*=0.2)>P(Y*=0.1).
Then we say that for continuous variables, the probability properties are the properties of finite random samples of the variables.
This accords with what we intuitively want to say about the variables. I don’t know what all it would mess up.