Consider a fair spinner that uniformly chooses an angle between 0 and 360∘. Intuitively, I’ve just fully
described a probabilistic situation. In classical probability theory,
there is indeed a very natural model of this: Lebesgue probability
measure on the unit circle. This model’s probability measure can be
proved to be the unique function λ on the subsets of the unit circle
that satisfies these conditions:
Kolmogorov axioms with countable additivity
completeness: if λ(B) is zero and A ⊆ B, then λ is defined for A
rotational invariance
at least one arc on the circle of length greater than zero and
less than 360∘ has an
assigned probability
minimality: any other function that satisfies 1-4 agrees with
λ on the sets where λ is defined.
In that sense “uniformly chooses” can be given a precise and unique
meaning.
But we may be philosophically unhappy with λ as our probabilistic model of the
spinner for one of two reasons. First, but less importantly, we may want
to have meaningful probabilities for all subsets of the unit
circle, while λ famously has
“non-measurable sets” where it is not defined. Second, we may want to do
justice to such intuitions as that it is more likely that the spinner
will land exactly at 0∘ or
180∘ than that it will land
exactly at 0∘. But λ as applied to any finite (in fact,
any countable) set of positions yields zero: there is no chance of the
spinner landing there. Moreover, we want to be able to update our
probabilities on learning, say, that the spinner landed on 0∘ or 180∘—presumably, after learning
that disjunction, we want 0∘
and 180∘ to have probability
1/2—but λ provides no guidance
how to do that.
One way to solve this is to move to probabilities whose values are in
some field extending the reals, say the hyperreals. Then we can assign a
non-zero (but in some cases infinitesimal) probability to every subset
of the circle. But this comes with two serious costs. First, we lose
rotational invariance: it is easy
to prove that we cannot have rotational invariance in such a
context. Second, we lose uniqueness: there are many ways of assigning
non-zero probabilities, and we know of no plausible set of conditions
that makes the assignment unique. Both costs put in serious question
whether we have captured the notion of “uniform distribution”, because
uniformity sure sounds like it should involve rotational invariance and
be the kind of property that should uniquely determine the probability
model given some plausible assumptions like (1)–(5).
There is another approach for which one might have hope: use Popper
functions, i.e., take conditional probabilities to be primitive. It
follows from results of
Armstrong and the supramenability of the group of rotations on the
circle that there is a rotation-invariant (and, if we like, rotation and
reflection invariant) finitely-additive full conditional probability on
the circle, which assigns a meaningful real number to P(A|B) for any
subsets A and B with B non-empty. Moreover, if Ω is the whole circle, then we can
further require that P(A|Ω) = λ(A)
if λ(A) is defined.
And now we can compare the probability of two points and the probability
of one point. For although P({x,y}|Ω) = λ({x,y}) = 0 = λ({x}) = P({x}|Ω)
when x ≠ y, there is
a natural sense in which {x, y} is more likely than
{x} because P({x}|{x,y}) = 1/2.
Unfortunately, the conditional probability approach still doesn’t
have uniqueness, and this is the point of this post. Let’s say that what
we require of our conditional probability assignment P is this:
standard axioms of finitely-additive full conditional
probabilities
(strong)
rotational and reflection invariance
being defined for all pairs of subsets of the circle with the
second one non-empty
P(A|Ω) = λ(A)
for any Lebesgue-measurable A.
Unfortunately, these conditions fail to uniquely define P. In fact, they fail to uniquely
define P(A|B) for
countably infinite B.
Here’s why. Let E be a
countably infinite subset of the circle with the following property: for
any non-identity isometry ρ of
the circle (combination of rotations and reflections), E ∩ ρE is finite.
(One way to generate E is
this. Let E0 be any
singleton. Given En, let Gn be the set of
isometries ρ such that ρx = y for some
x, y in E. Then Gn is finite.
Let z be any point not in
{ρx : ρ ∈ Gn, x ∈ E}.
Let En + 1 = En ∪ {z}
(since z is not unique, we’re
using the Axiom of Dependent Choice, but a lot of other stuff depends on
stronger versions of Choice anyway). Let E be the union of the En. Then it’s
easy to see that E ∩ ρE contains at
most one point for any non-identity isometry ρ.)
Let μ be any finitely
additive probability on E that
assigns zero to finite subsets. Note that μ is not unique: there are many such
μ. Now define a finitely
additive measure ν on Ω as follows. If A is uncountable, let ν(A) = ∞. Otherwise, let
ν(A) = ∑ρμ(E∩ρA),
where the sum is taken over all isometries ρ. The condition that E ∩ ρE is finite
for non-identity ρ and that
μ is zero for finite sets
ensures that if A ⊆ E, then ν(A) = μ(A).
It is clear that ν is
isometrically invariant.
Let λ* be any
invariant extension of Lebesgue measure to a finitely additive measure
on all subsets of the circle. By Armstrong’s results (most relevantly
Proposition 1.7), there is a full conditional probability P satisfying (6)–(8) and such that
P(A|E) = μ(A∩E)
and P(A|Ω) = λ*(A)
(here we use the fact that ν(A) = ∞ whenever λ*(A) > 0,
since λ*(A) > 0
only for uncountable A). Since
μ wasn’t unique and E is countable, conditions (6)–(9)
fail to uniquely define P for
countably additive conditions.