Monday, November 30, 2020

Independence, uniformity and infinitesimals

Suppose that a random variable X is uniformly distributed (in some intuitive sense) over some space. Then :

  1. P(X = y)=P(X = z) for any y and z in that space.

But I think something stronger should also be true:

  1. Let Y and Z be any random variables taking values in the same space as X, and suppose each variable is independent of X. Then P(X = Y)=P(X = Z).

Fixed constants are independent of X, so (1) follows from (2).

But if we have (2), and the plausible assumption:

  1. If X and Y are independent, then X and f(Y) are independent for any function f,

we cannot have infinitesimal probabilities. Here’s why. Suppose X and Y are independent random variables uniformly distributed over the interval [0, 1). Assume P(X = a) is infinitesimal for a in [0, 1). Then, so is P(X = Y).

Let f(x)=2x for x < 1/2 and f(x)=2x − 1 for 1/2 ≤ x. Then if X and Y are independent, so are X and f(Y). Thus:

  1. P(X = Y)=P(X = f(Y)).

Let g(x)=x/2 and let h(x)=(1 + x)/2. Then:

  1. P(Y = g(X)) = P(Y = X)


  1. P(Y = h(X)) = P(Y = X).

But now notice that:

  1. Y = g(X) if and only if X = f(Y) and Y < 1/2


  1. Y = h(X) if and only if X = f(Y) and 1/2 ≤ Y.


  1. (Y = g(X) or Y = h(X)) if and only if X = f(Y)

and note that we cannot have both Y = g(X) and Y = h(X). Hence:

  1. P(X = Y)=P(X = f(Y)) = P(Y = g(X)) + P(Y = h(X)) = P(Y = X)+P(Y = X)=2P(X = Y).


  1. P(X = Y)=0,

which contradicts the infinitesimality of P(X = Y).

This argument works for any uniform distribution on an infinite set U. Just let A and B be a partition of U into two subsets of the same cardinality as U (this uses the Axiom of Choice). Let g be a bijection from U onto A and h a bijection from U onto B. Let f(x)=g−1(x) for x ∈ A and f(x)=h−1(x) for x ∈ B.

Note: We may wish to restrict (3) to intuitively “nice” functions, ones that don’t introduce non-measurability. The functions in the initial argument are “nice”.


KelâmM said...

Hi Alex
I know my question is not relevant, but I am curious, what would be your advice to philosophy students (in short) and how much time do you spend on philosophy before or now. Thank you :)

IanS said...

The set {X = Y} is non-measurable in the natural product measure.

The post could be read as an indirect argument for this. But you can see it directly. The base sets of the natural product measure are of the form (measurable set on X) x (measurable set on Y). The full algebra of measurable sets is the minimal algebra containing this collection. This won’t contain any sloping lines.

A simple example. Suppose ε is an infinitesimal. Say that any finite subset of [0, 1) of size n has measure nε, that its complement has measure 1 – nε, and that all other subsets of [0, 1) are non-measurable. This is an incomplete but finitely additive measure on [0, 1). The natural product algebra consists of ‘small’ sets of n vertical lines and m horizontal lines (n and m finite, possibly zero), with a finite number of points added and a finite number of other points removed, and ‘large’ sets, the complements of the small sets. It contains no sloping lines.

Of course, you can extend the natural product measure by giving the set {X = Y} some suitable value (maybe ε, as in the post, maybe something else). But then you would not expect the transformation properties to hold.

Alexander R Pruss said...

The diagonal line is measurable in the classical product measure: it has measure zero. Generally, I think, the infinitesimalist wants to *extend* classical measures rather than contract them.

Note that a lot of the time when we are talking of {X = y} (for a constant y), it seems we really mean something like {X = Y} (for a random variable Y). Here's what I mean. Let's say that I have a point marked on a target (e.g., it's the center of some small perfect circle), and then I uniformly randomly throw a perfectly sharp (or symmetrical) dart at the target. Let's say I think there is an infinitesimal probability that I hit the marked point. But when I say "the marked point", that's really a random variable. I don't know the actual exact coordinates of the marked point. All I really have is some continuous distribution over some small area. So, when I ask how likely it is that the dart hit "the marked point", I am really asking for a probability of {X = Y} where Y has some continuous distribution over a small area.

Alexander R Pruss said...

KelâmM: I don't have any sage advice on this, sorry. Everybody is different.

Alexander R Pruss said...

BTW, this also works for qualitative probabilities with the additivity assumption that if A and B are disjoint and A~B~(A union B), then A~emptyset. The conclusion then is that (X=x)~emptyset. And it works for primitive conditional probabilities: there is no way to define conditional probabilities for all non-empty sets while respecting the assumptions.

Alexander R Pruss said...

Here's an argument for premise 2. Suppose a uniform lottery will pick a winning number, and you need to guess it ahead of time. You shouldn't be able to game the system without information about the winning number: all guessing strategies should be equally bad. But if P(X=Y)>P(X=a) for some Y independent of X and a constant a, then you could game the system by choosing your winning number using Y (as long as you didn't look at it after you chose it, for if you looked at it, you'd go back down to P(X=b) for some constant b, weird as that is!). Similarly, if the lottery picks a losing number, you shouldn't be able to use any strategy to decrease your probability of getting that losing number, so P(X=Y)<P(X=a) is also not possible.

Alexander R Pruss said...

Here's another reason the nonmeasurability move is not plausible. One particular embodiment of this scenario is as two independent countably infinite sequences of coin tosses, each indexed by the natural numbers. Then the event that the two sequences are the same sequence can be thought of as follows: For each natural number n, define a "virtual coin toss": it's heads if the nth toss in the first sequence is the same as the nth toss in the second sequence, and tails otherwise. So now we have a third sequence, this time of virtual coin tosses. If we think (as the infinitesimalist does) that it makes sense to ask for the probability of the first sequence being all heads, it should also make sense to ask for the probability of the third sequence being all heads, too.

Otherwise, one has an implausible privileging of real coin tosses over virtual ones. But after all, what's a "real" coin toss? There are many ways of implementing it. And tossing two coins, and seeing if they are the same, is just as good a way of implementing a coin toss, and sometimes superior. For instance, when my son and I decide who serves first in ping pong, we sometimes spin our rackets and one of us calls "same" or "different".

IanS said...

I’d have said that infinitesimalists are more concerned to refine classical probabilities than to extend them. Classical probabilities can’t discriminate between a singleton set and a two-element set: both have probability zero. Infinitesimals allow you to say that one is twice as likely as the other.

Granted, some approaches (e.g. Benci , Wenmackers) do assign probabilities to all sets, and this has been claimed as a virtue. But it comes at the expense of symmetry. To take an extreme case, if you want full permutation symmetry on a countably infinite set, classical (countably additive) probability can say only that the empty set has probability zero and the whole set has probability 1. With finitely additive real probabilities, you can say that finite sets have probability zero and cofinite sets have probability 1. Using infinitesimals, you can do better: finite sets of size n have probability nε and their complements have probability 1 – nε. None of these approaches gives probabilities to finite cofinite sets. This is not a defect, it is forced by the symmetry.

Now cross two independent such setups. With finitely additive real valued probabilities, the diagonal set has probability zero. (By suitable permutations of X you can produce arbitrarily many disjoint images of the diagonal, so its (real) probability can only be zero.) With infinitesimals the diagonal set is non-measurable in the natural product measure. (The full algebra is given explicitly above.) Further, it can’t even be made measurable by extension. (A suitable permutation of X can move an infinite number of points off the diagonal and leave an infinite number on it. The intersection of the diagonal with its image under this permutation will be a finite relatively cofinite subset of the diagonal. Then simultaneous identical permutations of X and Y can give a paradoxical decomposition.) But this does not make the infinitesimal approach less informative. There are lower bounds (nε^2 for any integer n) and upper bounds (any strictly positive real number). For example, the diagonal set is more likely than any singleton set. Real-valued probability can say only that both the diagonal and the singleton have probability zero.

IanS said...

On the argument for premise (2): Guessing “Y, whatever it turns out as”, is different from guessing a particular number.

Note that if Y and Z can take only a finite number of values, (2) is provably true. Conditional on each possible Y value, P(X = Y) = P(X = a) for some fixed a. So unconditionally P(X = Y) = P(X = a). But if Y can take infinitely many values, non-conglomerability spoils the argument.

On the ‘virtual coin toss’: The infinitesimalist would say that you have to pick a model that includes everything of interest and stick with it. No doubt you have read the ongoing(!) literature on Williamson’s coin. It goes round and round on this very issue. The classicals argue that the shifted sequence is similar to the original (just as virtual coin tosses are similar to real ones). The infinitesimalists say that you have to pick a model upfront and stick with it. All the authors are smarter and more knowledgeable than I am…

IanS said...

Here are some thoughts on the case of a complete hyperreal distribution, with each singleton having equal probability ε (an infinitesimal), but with no symmetry requirement. I suspect that this is what you originally had in mind, before I confused the issue with symmetry.

The natural product algebra consists of all sets that can be described like this: a union of selected cross product cells from some finite partition on X crossed with some finite partition on Y.

The diagonal set in not in this algebra. (That would require infinite partitions.) So it is not measurable in the natural product measure. But, with no symmetry requirements, you can extend the measure to include it.

There are constraints. The diagonal contains infinitely many points, so nε^2, for any positive integer n, is a lower limit. The smallest possible measurable set formed from an n x n partition that could cover it would have probability n/n^2 = 1/n (in the special case that each set of the X partition had probability exactly 1/n). So 1/n, for any positive integer n, is an upper bound.

Those are broad limits, but the best that can be done. Note that if you want to make the diagonal set measurable, you have to make all its subsets measurable (to complete the algebra). A simple way to do this is pick a strictly positive real number k, and give every subset of the diagonal k times the probability of its projection on X. This would give the whole diagonal probability k.

The moral: it does not seem to be true, at least without extra requirements, that P(X = Y) must equal P(X = a) for a fixed a. (Or so I say. As usual, I may not be thinking straight.)

Alexander R Pruss said...


It is true that the mathematics does not constrain P(X=Y) very much. Thanks for working out the range here.

However, if we are talking of chances rather than epistemic probabilities, I think reality, or hypothetical reality, should determine the probabilistic facts about X=Y. If we run an experiment producing X in one universe and an experiment producing Y in another causally isolated universe, and if we've fully probabilistically described X and fully probabilistically described Y, that should result in a full description of X=Y. In other words, the probabilistic facts about X=Y supervene on the probabilistic facts about X and about Y and the facts about the nature of the independence. But the independence is just full causal independence: there is nothing more to be said about it, I assume.

So the question is philosophical rather than mathematical: in such a hypothetical experiment, what would be the probability of X=Y? Mathematics puts constraints on it, such as your range. But it is a philosophical question what value it would have. Or whether it would have a value at all (your initial suggestion).

In the initial case of X and Y being uniformly distributed over an interval, I think there are two intuitive answers as to the value of P(X=Y): ε and ε*sqrt(2). I still think my intuitive argument for the answer being ε makes sense.

It occurs to me that there is another move available here to the infinitesimalist. That would be to deny that even the hyperreal probability distributions capture all the probabilistic facts about X and Y. Instead here is an additional probabilistic fact, namely the fact about what the probability of X=Y would be, which fact does not supervene on the hyperreal probability distribution. And of course, there would be many, many other probabilistic facts like that (e.g., what is the probability of X=f(Y) for a variety of functions f).

Such a move could lead to some interesting technical developments. The move here would be analogous to the fundamental conditional probability theorist who says that classical probabilities do not capture all the probabilistic facts about a random variable, because to capture some of the probabilistic facts you need data about P(A|B) where P(B)=0.

Alexander R Pruss said...

Here's a pithy way to put my argument for (2): There are no better or worse ways to guess the result of a fair lottery. :-)

IanS said...

First, some typos: in the second last paragraph of my last comment: ‘…kε [not k] times the probability…’ and ‘… give the whole diagonal probability kε [not k].’

There are problems with taking infinitesimal probabilities as chances. Most basically, the intuitive arguments that lead to infinitesimal probabilities typically only determine them up to real factor. A putative causal mechanism would need a definite scale factor, and there is no obvious natural value for it. Even if you think only about the ratios, there are still problems. Could there be a causal way to make one infinitely unlikely effect three times (for example) more likely than another? Maybe, but it doesn’t seem obvious how. It’s better (and usual, I think) to take infinitesimal probabilities as merely epistemic.

I have often thought that it is the special role of independence that makes probability theory different from straight measure theory. In theory and in practice, we use (or assume) independence to assemble the building blocks of complex models. As the example shows, this is not straightforward with infinitesimal probabilities – we have to explicitly model everything jointly.

Given the importance of independence, this is a serious problem for defenders of infinitesimal probabilities. It is part of the reason for the standard infinitesimalist position that if you want to use infinitesimal probabilities, you have to fully model everything of interest in advance. (This is in effect the ‘move’ in the last two paragraphs of your comment.)

On the pithy argument for (2): Guessing that X = Y is not guessing X. You could equally well take it as using X to guess Y. But the infinitesimal probabilities for the Xs could be scaled differently from those for the Ys. (This is consistent with both being fair infinite lotteries.) Then your argument, if it worked, would lead to P(X = a) = P(X = Y) = P(Y = b), a contradiction. This is a classic case of non-conglomerability.