Suppose that I am throwing a perfectly sharp dart uniformly randomly at a continuous target. The chance that I will hit the center is zero.
What if I throw an infinite number of independent darts at the target? Do I improve my chances of hitting the center at least once?
Things depend on what size of infinity of darts I throw. Suppose I throw a countable infinity of darts. Then I don’t improve my chances: classical probability says that the union of countably many zero-probability events has zero probability.
What if I throw an uncountable infinity of darts? The answer is that the usual way of modeling independent events does not assign any meaningful probabilities to whether I hit the center at least once. Indeed, the event that I hit the center at least once is “saturated nonmeasurable”, i.e., it is nonmeasurable and every measurable subset of it has probability zero and every measurable superset of it has probability one.
Proposition: Assume the Axiom of Choice. Let P be any probability measure on a set Ω and let N be any non-empty event with P(N)=0. Let I be any uncountable index set. Let H be the subset of the product space ΩI consisting of those sequences ω that hit N, i.e., ones such that for some i we have ω(i)∈N. Then H is saturated nonmeasurable with respect to the I-fold product measure PI (and hence with respect to its completion).
One conclusion to draw is that the event H of hitting the center at least once in our uncountable number of throws in fact has a weird “nonmeasurable chance” of happening, one perhaps that can be expressed as the interval [0, 1]. But I think there is a different philosophical conclusion to be drawn: the usual “product measure” model of independent trials does not capture the phenomenon it is meant to capture in the case of an uncountable number of trials. The model needs to be enriched with further information that will then give us a genuine chance for H. Saturated nonmeasurability is a way of capturing the fact that the product measure can be extended to a measure that assigns any numerical probability between 0 and 1 (inclusive) one wishes. And one requires further data about the system in order to assign that numerical probability.
Let me illustrate this as follows. Consider the original single-case dart throwing system. Normally one describes the outcome of the system’s trials by the position z of the tip of the dart, so that the sample space Ω equals the set of possible positions. But we can also take a richer sample space Ω* which includes all the possible tip positions plus one more outcome, α, the event of the whole system ceasing to exist, in violation of the conservation of mass-energy. Of course, to be physically correct, we assign chance zero to outcome α.
Now, let O be the center of the target. Here are two intuitions:
If the number of trials has a cardinality much greater than that of the continuum, it is very likely that O will result on some trial.
No matter how many trials—even a large infinity—have been performed, α will not occur.
But the original single-case system based on the sample space Ω* does not distinguish O and α probabilistically in any way. Let ψ be a bijection of Ω* to itself that swaps O and α but keeps everything else fixed. Then P(ψ[A]) = P(A) for any measurable subset A of Ω* (this follows from the fact that the probability of O is equal to the probability of α, both being zero), and so with respect to the standard probability measure on Ω*, there is no probabilistic difference between O and α.
If I am right about (1) and (2), then what happens in a sufficiently large number of trials is not captured by the classical chances in the single-case situation. That classical probabilities do not capture all the information about chances is something we should already have known from cases involving conditional probabilities. For instance P({O}|{O, α}) = 1 and P({α}|{O, α}) = 0, even though O and α are on par.
One standard solution to conditional probability case is infinitesimals. Perhaps P({α}) is an infinitesimal ι but P({O}) is exactly zero. In that case, we may indeed be able to make sense of (1) and (2). But infinitesimals are not a good model on other grounds. (See Section 3 here.)
Thinking about the difficulties with infinitesimals, I get this intuition: we want to get probabilistic information about the single-case event that has a higher resolution than is given by classical real-valued probabilities but lower resolution than is given by infinitesimals. Here is a possibility. Those subsets of the outcome space that have probability zero also get attached to them a monotone-increasing function from cardinalities to the set [0, 1]. If N is such a subset, and it gets attached to it the function fN, then fN(κ) tells us the probability that κ independent trials will yield at least one outcome in N.
We can then argue that fN(κ) is always 0 or 1 for infinite. Here is why. Suppose fN(κ)>0. Then, κ must be infinite, since if κ is finite then fN(κ)=1 − (1 − P(N))κ = 0 as P(N)=0. But fN(κ + κ)=(fN(κ))2, since probabilities of independent events multiply, and κ + κ = κ (assuming the Axiom of Choice), so that fN(κ)=(fN(κ))2, which implies that fN(κ) is zero or one. We can come up with other constraints on fN. For instance, if C is the union of A and B, then fC(κ) is the greater of fA(κ) and fB(κ).
Such an approach could help get a solution to a different problem, the problem of characterizing deterministic causation. To a first approximation, the solution would go as follows. Start with the inadequate story that deterministic causation is chancy causation with chance 1. (This is inadequate, because in the original dart-throwing case, the chance of missing the center is 1, but throwing the dart does not deterministically cause one to hit a point other than the center.) Then say that deterministic causation is chancy causation such that the failure event F is such that fF(κ)=0 for every cardinal κ.
But maybe instead of all this, one could just deny that there are meaningful chances to be assigned to events like the event of uncountably many trials missing or hitting the center of the target.
Sketch of proof of Proposition: The product space ΩI is the space of all functions ω from I to Ω, with the product measure PI generated by the product measures of cylinder sets. The cylinder sets are product sets of the form A = ∏i∈IAi such that there is a finite J ⊆ I such that Ai = Ω for i ∉ J, and the product measure of A is defined to be ∏i∈JP(Ai).
First I will show that there is an extension Q of PI such that Q(H)=0 (an extension of a measure is a measure on a larger σ-algebra that agrees with the original measure on the smaller σ-algebra). Any PI-measurable subset of H will then have Q measure zero, and hence will have PI-measure zero since Q extends PI.
Let Q1 be the restriction of P to Ω − N (this is still normalized to 1 as N is a null set). Let Q1I be the product measure on (Ω − N)I. Let Q be a measure on Ω defined by Q(A)=Q1I(A ∩ ΩN). Consider a cylinder set A = ∏i∈IAi where there is a finite J ⊆ I such that Ai = Ω whenever i ∉ J. Then
Q(A)=∏i∈JQ1(Ai − N)=∏i∈JP(Ai − N)=∏i∈JP(Ai)=PN(A).
Since PN and Q agree on cylinder sets, by the definition of the product measure, Q is an extension of PN.
To show that H is saturated nonmeasurable, we now only need to show that any PI-measurable set in the complement of H must have probability zero. Let A be any PI-measurable set in the complement of H. Then A is of the form {ω ∈ ΩI : F(ω)}, where F(ω) is a condition involving only coordinates of ω numbered by a fixed countable set of indices from I (i.e., there is a countable subset J of I and a subset B of ΩJ such that F(ω) if and only if ω|J is a member of B, where ω|J is the restriction of ω to J). But no such condition can exclude the possibility that a coordinate of Ω outside that countable set is in H, unless the condition is entirely unsatisfiable, and hence no such set A lies in the complement of H, unless the set is empty. And that’s all we need to show.