Consider a fair spinner that uniformly chooses an angle between 0 and 360∘. Intuitively, I’ve just fully described a probabilistic situation. In classical probability theory, there is indeed a very natural model of this: Lebesgue probability measure on the unit circle. This model’s probability measure can be proved to be the unique function λ on the subsets of the unit circle that satisfies these conditions:
Kolmogorov axioms with countable additivity
completeness: if λ(B) is zero and A ⊆ B, then λ is defined for A
rotational invariance
at least one arc on the circle of length greater than zero and less than 360∘ has an assigned probability
minimality: any other function that satisfies 1-4 agrees with λ on the sets where λ is defined.
In that sense “uniformly chooses” can be given a precise and unique meaning.
But we may be philosophically unhappy with λ as our probabilistic model of the spinner for one of two reasons. First, but less importantly, we may want to have meaningful probabilities for all subsets of the unit circle, while λ famously has “non-measurable sets” where it is not defined. Second, we may want to do justice to such intuitions as that it is more likely that the spinner will land exactly at 0∘ or 180∘ than that it will land exactly at 0∘. But λ as applied to any finite (in fact, any countable) set of positions yields zero: there is no chance of the spinner landing there. Moreover, we want to be able to update our probabilities on learning, say, that the spinner landed on 0∘ or 180∘—presumably, after learning that disjunction, we want 0∘ and 180∘ to have probability 1/2—but λ provides no guidance how to do that.
One way to solve this is to move to probabilities whose values are in some field extending the reals, say the hyperreals. Then we can assign a non-zero (but in some cases infinitesimal) probability to every subset of the circle. But this comes with two serious costs. First, we lose rotational invariance: it is easy to prove that we cannot have rotational invariance in such a context. Second, we lose uniqueness: there are many ways of assigning non-zero probabilities, and we know of no plausible set of conditions that makes the assignment unique. Both costs put in serious question whether we have captured the notion of “uniform distribution”, because uniformity sure sounds like it should involve rotational invariance and be the kind of property that should uniquely determine the probability model given some plausible assumptions like (1)–(5).
There is another approach for which one might have hope: use Popper functions, i.e., take conditional probabilities to be primitive. It follows from results of Armstrong and the supramenability of the group of rotations on the circle that there is a rotation-invariant (and, if we like, rotation and reflection invariant) finitely-additive full conditional probability on the circle, which assigns a meaningful real number to P(A|B) for any subsets A and B with B non-empty. Moreover, if Ω is the whole circle, then we can further require that P(A|Ω) = λ(A) if λ(A) is defined. And now we can compare the probability of two points and the probability of one point. For although P({x,y}|Ω) = λ({x,y}) = 0 = λ({x}) = P({x}|Ω) when x ≠ y, there is a natural sense in which {x, y} is more likely than {x} because P({x}|{x,y}) = 1/2.
Unfortunately, the conditional probability approach still doesn’t have uniqueness, and this is the point of this post. Let’s say that what we require of our conditional probability assignment P is this:
standard axioms of finitely-additive full conditional probabilities
(strong) rotational and reflection invariance
being defined for all pairs of subsets of the circle with the second one non-empty
P(A|Ω) = λ(A) for any Lebesgue-measurable A.
Unfortunately, these conditions fail to uniquely define P. In fact, they fail to uniquely define P(A|B) for countably infinite B.
Here’s why. Let E be a countably infinite subset of the circle with the following property: for any non-identity isometry ρ of the circle (combination of rotations and reflections), E ∩ ρE is finite. (One way to generate E is this. Let E0 be any singleton. Given En, let Gn be the set of isometries ρ such that ρx = y for some x, y in E. Then Gn is finite. Let z be any point not in {ρx : ρ ∈ Gn, x ∈ E}. Let En + 1 = En ∪ {z} (since z is not unique, we’re using the Axiom of Dependent Choice, but a lot of other stuff depends on stronger versions of Choice anyway). Let E be the union of the En. Then it’s easy to see that E ∩ ρE contains at most one point for any non-identity isometry ρ.)
Let μ be any finitely additive probability on E that assigns zero to finite subsets. Note that μ is not unique: there are many such μ. Now define a finitely additive measure ν on Ω as follows. If A is uncountable, let ν(A) = ∞. Otherwise, let ν(A) = ∑ρμ(E∩ρA), where the sum is taken over all isometries ρ. The condition that E ∩ ρE is finite for non-identity ρ and that μ is zero for finite sets ensures that if A ⊆ E, then ν(A) = μ(A). It is clear that ν is isometrically invariant.
Let λ* be any invariant extension of Lebesgue measure to a finitely additive measure on all subsets of the circle. By Armstrong’s results (most relevantly Proposition 1.7), there is a full conditional probability P satisfying (6)–(8) and such that P(A|E) = μ(A∩E) and P(A|Ω) = λ*(A) (here we use the fact that ν(A) = ∞ whenever λ*(A) > 0, since λ*(A) > 0 only for uncountable A). Since μ wasn’t unique and E is countable, conditions (6)–(9) fail to uniquely define P for countably additive conditions.
There is no problem coming up with explicit Es. (If I’m thinking straight…) Here is one: {1/2, 1/3, 1/4, …}. Any countable collection with a single one-sided convergence point will do.
ReplyDeleteThe post shows that complete conditional probabilities exist but are not unique. How far can you go if you want uniqueness but don’t insist on completeness? Not very far, I think.
There is an obvious intuitively natural conditional probability on the sets that can be described as the union of a finite number of intervals, plus or minus a finite number of single points. But going beyond this doesn’t seem easy.
Nice points. I guess the advantage of my non-explicit construction is that it generalizes more easily to other groups of symmetries on other spaces. (For instance, it works for any case of an infinite group acting on itself.)
ReplyDeleteThere are some results in dimensions 1 and 2 about the uniqueness of Lebesgue measure in certain contexts in the Wagon-Tomkowicz boon on the Banach-Tarski paradox.