Alexander Pruss's Blog: regularity

Showing posts with label regularity. Show all posts

Wednesday, September 4, 2024

Independent invariant regular hyperreal probabilities: an existence result

A couple of years ago I showed how to construct hyperreal finitely additive probabilities on infinite sets that satisfy certain symmetry constraints and have the Bayesian regularity property that every possible outcome has non-zero probability. In this post, I want to show a result that allows one to construct such probabilities for an infinite sequence of independent random variables.

Suppose first we have a group G of symmetries acting on a space Ω. What I previously showed was that there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity (i.e., P(A) > 0 for every non-empty A) if and only if the action of G on Ω is “locally finite”, i.e.:

For any finitely generated subgroup H of G and any point x in G, the orbit Hx is finite.

Here is today’s main result (unless there is a mistake in the proof):

Theorem. For each i in an index set, suppose we have a group G_i acting on a space Ω_i. Let Ω = ∏_iΩ_i and G = ∏_iG_i, and consider G acting componentwise on Ω. Then the following are equivalent:

there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity and the independence condition that if A₁, ..., A_n are subsets of Ω such that A_i depends only on coordinates from J_i ⊆ I with J₁, ..., J_n pairwise disjoint if and only if the action of G on Ω is locally finite
there is a hyperreal G-invariant finitely additive probability assignment on all the subsets of Ω that satisfies Bayesian regularity
the action of G on Ω is locally finite.

Here, an event A depends only on coordinates from a set J just in case there is a subset A′ of ∏_j ∈ JΩ_j such that A = {ω ∈ Ω : ω|_J ∈ A′} (I am thinking of the members of a product of sets as functions from the index set to the union of the Ω_i). For brevity, I will omit “finitely additive” from now on.

The equivalence of (b) and (c) is from my old result, and the implication from (a) to (b) is trivial, so the only thing to be shown is that (c) implies (a).

Example: If each group G_i is finite and of size at most N for a fixed N, then the local finiteness condition is met. (Each such group can be embedded into the symmetric group S_N, and any power of a finite group is locally finite, so a fortiori its action is locally finite.) In particular, if all of the groups G_i are the same and finite, the condition is met. An example like that is where we have an infinite sequence of coin tosses, and the symmetry on each coin toss is the reversal of the coin.

Philosophical note: The above gives us the kind of symmetry we want for each individual independent experiment. But intuitively, if the experiments are identically distributed, we will want invariance with respect to a shuffling of the experiments. We are unlikely to get that, because the shuffling is unlikely to satisfy the local finiteness condition. For instance, for a doubly infinite sequence of coin tosses, we would want invariance with respect to shifting the sequence, and that doesn’t satisfy local finiteness.

Now, on to a sketch of the proof from (c) to (a). The proof uses a sequence of three reductions using an ultraproduct construction to cases exhibiting more and more finiteness.

First, note that without loss of generality, the index set I can be taken to be finite. For if it’s infinite, for any finite partition K of I, and any J ∈ K, let G_J = ∏_i ∈ JG_i, let Ω_J = ∏_i ∈ JΩ_i, with the obvious action of G_J on Ω_J. Then G is isomorphic to ∏_J ∈ KG_J and Ω to ∏_J ∈ KΩ_J. Then if we have the result for finite index sets, we will get a regular hyperreal G-invariant probability on Ω that satisfies the independence condition in the special case where J₁, ..., J_n are such that J_i and J_j for distinct i and j are such that at least one of J_i ∩ J and J_j ∩ J is empty for every J ∈ K. We then take an ultraproduct of these probability measures with respect to K and an ultrafilter on the partially ordered set of finite partitions of I ordered by fineness, and then we get the independence condition in full generality.

Second, without loss of generality, the groups G_i can be taken as finitely generated. For suppose we can construct a regular probability that is invariant under H = ∏_iH_i where H_i is a finitely generated subgroup of G_i and satisfies the independence condition. Then we take an ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finitely generated groups (H_i)_i ∈ I where H_i is a subgroup of G_i and where the set is ordered by componentwise inclusion.

Third, also without loss of generality, the sets Ω_i can be taken to be finite, by replacing each Ω_i with an orbit of some finite collection of elements under the action of the finitely generated G_i, since such orbits will be finite by local finiteness, and once again taking an appropriate ultraproduct with respect to an ultrafilter on the partially ordered set of sequences of finite subsets of Ω_i closed under G_i ordered by componentwise inclusion. The Bayesian regularity condition will hold for the ultraproduct if it holds for each factor in the ultraproduct.

We have thus reduced everything to the case where I is finite and each Ω_i is finite. The existence of the hyperreal G-invariant finitely additive regular probability measure is now trivial: just let P(A) = |A|/|Ω| for every A ⊆ Ω. (In fact, the measure is countably additive and not merely finitely additive, real and not merely hyperreal, and invariant not just under the action of G but under all permutations.)

Monday, October 25, 2021

On two arguments for Bayesian regularity

Standard Bayesianism requires regularity: it requires that I not assign prior probability zero to any contingent proposition. There are two main reasons for this: one technical and one epistemological.

The technical reason is that it is difficult to make sense of conditionalizing on events with probability zero. (Granted, there are technical ways around this, but there are also problems with these.) But the difficulty of conditionalizing on events with probability zero does not give one any reason to prohibit assigning probability to zero to events that one would never conditionalize on.

The Bayesian agent conditionalizes on evidence. But while the question of what constitutes evidence is highly controversial, there are some plausible things we could say about what could and could not be evidence for beings like us. Thus, the proposition that it’s looking like the multimeter is showing 3.1V seems like the sort of thing that could be evidence for a being like us, but the conjunction of the propositions constituing Relativity Theory does not seem like the sort of thing that could be evidence for a being like us (maybe it could be evidence for some supernatural being that has an infallible vision of the laws of nature; and maybe God could make us be such beings; but we don’t need to adapt our epistemology to such out of the world possibilities).

If this is right, then the technical difficulties with conditionalizing on events with probability zero do not give us a good reason to assign a non-zero prior probability to Relativity Theory, or any other proposition that is not of the right sort to constitute a body of potential evidence (where a body of potential evidence is a consistent finite conjunction of individual pieces of evidence).

There is, however, a second reason not to assign prior probability zero to any contingent proposition. If we assign prior probability zero to some hypothesis H, say Relativity Theory, then the only way a body of evidence E could raise the probability of H to something non-zero would be if P(E)=0 (for if P(E)>0, then P(H|E)=P(HE)/P(E)≤P(H)/P(E)=0). Thus, if we assign prior probability zero to a hypothesis, it seems that we will be unacceptably stuck at probability zero for that hypothesis no matter what evidence comes in. This is not a merely technical reason: it is an epistemological one.

Note that this formulation of the second reason for regularity depends on the first, though in a subtle way. The first reason gave us reason to have regularity for evidential propositions, i.e., propositions reporting a body of evidence. The second reason, if formulated as above, tells us that if we should have regularity for evidential propositions, then we should also have regularity for contingent hypotheses that are not themselves evidential propositions.

But now notice that the second reason for regularity seems to show rather more if we think it through. The reasoning here is that propositions like Relativity Theory should be confirmable but if we assign credence zero to them, they are not confirmable (assuming the first reason successfully shows that all bodies of evidence have non-zero probability). But now notice that the requirement of confirmability for a hypothesis shows something a lot stronger than that the hypothesis have non-zero probability. For surely it is not merely our view that Relativity Theory should be confirmable given infinite time. Rather, Relativity Theory should be the sort of proposition that would be confirmable by observation prior to the heat death of the universe, or maybe even within a single human lifetime. But the number of potential pieces of observational evidence for a being like us is finite (there are only finitely many perceptual states our brain can distinguish), and gathering a piece of evidence takes a minimum amount of time, and if Relativity Theory starts with a sufficiently low prior probability, we have no hope of confirming it before the deadline.

Hence, the confirmability intuition, if correct, yields a lot more than regularity: it yields substantive non-formal constraints on the priors. We shouldn’t assign a prior of, say, 10⁻¹⁰⁰ to Relativity Theory, at least not if our priors for observational evidence propositions are anything like what we tend to think they are. I am not, however, claiming that every contingent proposition should be confirmable before the heat death of the universe. We would not expect the proposition that there have been 10¹⁰⁰⁰⁰ fair and independent coin tosses made over the lifetime of the universe and that they all turned out to be heads to be confirmable in this strong sense.

In any case, here is what I think has happened. The first reason for regularity, the technical one, only applied to potential bodies of evidence. The second, on the other hand, shows more than it claims: it yields non-formal constraints on priors that go beyond regularity. In particular, I think, the subjective Bayesian is on thin ice if they want to require regularity.

Monday, August 30, 2021

Absence of evidence

It seems that the aphorism “Absence of evidence is not evidence of absence” is typically false.

For if H is a hypothesis and E is the claim that there is evidence for H, then E raises the probability of H: P(H|E)>P(H). But then (as long as P(E)>0, as Bayesian regularity will insist), it mathematically follows that P(∼H|∼E)>P(∼H). Thus the absence of evidence is evidence for the falsity (“absence”) of the hypothesis.

I think there is only one place where one can challenge this argument, namely the claim:

If there is evidence for H, then the fact that there is evidence for H is itself evidence for H.

First, let’s figure out what (1) is saying. I think the best reading is that it presupposes some kind of notion of a body of first-order evidence—maybe all the stuff that human beings have ever observed—and says that if the actual contents of that body of first-order evidence supports H, then the fact that that body supports H itself supports H.

Here is a way to make this precise. We suppose there is some random variable O whose value (not real valued, of course) is all first-order observations humans ever made. Let W be the set of all possible values that O could take on. For simplicity, we can take W to be finite: there is a maximum number of observations a human can make in a lifespan, a finite resolution to each observation, and a maximum number of human beings who could have lived on earth. Let o₀ be the actual value that O has. Let W_H = {o ∈ W : P(H|O = o)>P(H)}.

Assuming we have Bayesian regularity, we can suppose O = o has non-zero probability for each o ∈ W_H. Then the claim that there is evidence for H is itself evidence for H comes to this:

P(H|O ∈ W_H)>P(H).

And it is easy to check that this follows by finite conglomerability from the fact that P(H|O = o)>P(H) for each o ∈ W_H.

There might be cases where we expect infinite conglomerability to be lacking. In those cases (1) would be dubious. Here is one such case. Suppose Alice and Bob each get a ticket from a fair infinite jar with tickets numbered 1,2,3,…. Alice looks at her ticket. Bob doesn’t look at his yet, but knows that Alice has looked at hers. Bob notes that whatever number Alice has seen, it is nearly certain that his number is bigger (there are infinitely many numbers bigger than Alice’s number and only finitely many smaller ones). Thus, Bob knows that the evidence available to humans supports the thesis that his number is bigger than Alice’s. But Bob’s knowing this is not actually evidence that his number is bigger than Alice’s, for until Bob actually observes one or the other number, he is in the same evidential position as before Alice looked at her ticket—and at that point, it is obvious that it’s not more likely that Bob’s ticket has a bigger number than Alice’s.

But apart from weird cases where conglomerability fails, (1) is true, and so absence of evidence is evidence of absence, assuming we have enough Bayesian regularity.

Perhaps a charitable reading of the aphorism that absence of evidence isn’t evidence of absence is just that absence of evidence isn’t always significant evidence of absence. That seems generally correct.

Thursday, October 22, 2020

Preprint: Conditional, Regular Hyperreal and Regular Qualitative Probabilities Invariant Under Symmetries

Abstract: Classical countably additive real-valued probabilities come at a philosophical cost: in many infinite situations, they assign the same probability value---namely, zero---to cases that are impossible as well as to cases that are possible. There are three non-classical approaches to probability that can avoid this drawback: full conditional probabilities, qualitative probabilities and hyperreal probabilities. These approaches have been criticized for failing to preserve intuitive symmetries that can easily be preserved by the classical probability framework, but there has not been a systematic study of the conditions under which these symmetries can and cannot be preserved. This paper fills that gap by giving complete characterizations under which symmetries understood in a certain "strong" way can be preserved by these non-classical probabilities, as well as by offering some results to make it plausible that the strong notion of symmetry here may be the right one. Philosophical implications are briefly discussed, but the main purpose of the paper is to offer technical results to inform more sophisticated further philosophical discussion.

Preprint here.

Tuesday, August 11, 2020

Yet another variant of the Borel-Kolmogorov paradox

Suppose that a point is uniformly randomly choosen in the unit square. Then you learn that either the point lies on the diagonal y = x (red), or it lies on the horizontal line y = 1/2 (blue). What probability should you assign to its lying on the diagonal?

Answer 1: The diagonal has length 2^1/2 and the horizontal line has length 1. Thus, the total length of the lines where the point might be is 2^1/2 + 1, and the probability that it’s on the diagonal is 2^1/2/(2^1/2 + 1) ≈ 0.59.

Answer 2: We can think of the uniform random choice of a point in the unit square as the choice of two independent coordinates, x and y. Suppose that x has been chosen. Then to be on the diagonal line, y has to equal x, while to be on the horizontal line, y has to equal 1/2. These two things are clearly equally likely, regardless of what x is, so the probability must be 1/2.

Both answers seem reasonable.

Suppose you are attracted to Answer 1 which gives 0.59. Then I can give you an argument for a third answer.

Answer 1b: Here is a way to uniformly choose a point in a square. I first uniformly choose a point in a rectangle whose height is twice its width, and then divide the y coordinate by a factor of two. Being on the diagonal or on the middle horizontal line of the point uniformly chosen in the square are respectively equivalent to being on the diagonal or on the middle horizontal line of the rectangle. But the length of the diagonal of the rectangle is 5^1/2, while the middle horizontal line has the same length 1 as in the square. So applying the reasoning behind Answer 1 to the rectangle case, the probability that the point is on the diagonal is 5^1/2/(5^1/2 + 1) ≈ 0.69.

Thus, if you are attracted to the “geometrical” reasoning behind Answer 1, there are infinitely many other answers available, corresponding to the infinitely many ways of generating a point uniformly in a square by generating it in a rectangle and squashing or stretching.

This might push you to Answer 2, since the reasoning behind Answer 2 seems much more determinate. But there are variants to Answer 2. Here is another way to generate a point uniformly on the unit square. Rotate the unit square by 45 degrees clockwise around the orgin to get a diamond whose size along the x and y axes is 2^1/2. Now choose x with a symmetric triangular probability density between 0 and 2^1/2, choose y₀ uniformly between −1 and 1, and then rescale y₀ to make its range fit within the diamond. Parallel reasoning to that used in Answer 2 will now generate a different answer, indeed an answer making the diagonal be more likely.

Note that while I put the paradox in terms of conditioning on a measure zero (the union of the two line segments), one can also put the paradox in terms of comparing probabilities if one likes to be able to compare zero probability sets.

Lesson: Either there are infinitely many different kinds of “uniform distributions of a point in a square”, or else we shouldn’t compare sets of zero measure.

Friday, July 17, 2020

Arbitrariness, regularity and comparative probabilities

In “Underdetermination of infinitesimal probabilities”, following a referee’s suggestion, I allow that qualitative (i.e., comparative) probabilities may escape arbitrariness problems for infinitesimal probabilities. But I now think this may be wrong.

Consider an infinite line of independent fair coins, numbered with the integers ..., −2, −1, 0, 1, 2, .... Let R_n be the hypothesis that coins n, n + 1, ... are all heads. Let L_n be the hypothesis that coins ..., n − 2, n − 1, n are all heads.

Suppose “being less likely or equally likely” is transitive, reflexive and total. Write A < B for A being less likely than B, A ≈ B for A and B being equally likely, and A ⪅ B for A being less likely or equally like than B.

We have strong regularity provided that if event A is a proper subset of event B, then A is strictly less likely than B.

Given strong regularity, the events L_n are strictly decreasing in probability: ... > L₋₂ > L₋₁ > L₀ > L₁ > ... and the events R_n are strictly increasing in probability: ... < R₋₂ < R₁ < R₀ < R₁ < ....

Theorem: Given strong regularity, exactly one of the following options is true:

For all n and m, L_n < R_m (heads-runs are right-biased)
For all n and m, R_m < L_n (heads-runs are left-biased)
There is a unique n such that for all m ≤ n we have R_m ⪅ L_m and for all m > n we have L_m < R_m (there is a switch-over point at n).

But if (1) or (2) is true, it is difficult to see what objective reality could possibly ground whether heads-runs are right-biased or left-biased in our infinite sequence of coin tosses. And if (3) is true, it is difficult to see what objective reality could possibly make n be a switch-over point. The choice between left- and right-bias seems completely arbitrary, and the choice of a switch-over point is also arbitrary.

The Theorem follows from the following lemma:

Lemma: Let (S, ⪅) be a totally preordered set, and let L_n and R_n be sequences of members of S as n ranges over the integers. Suppose L_n is strictly decreasing and R_n is strictly increasing. Then exactly one of the following is true:

For all n and m, L_n < R_m
For all n and m, R_m < L_n
There is a unique n such that for all m ≤ n we have R_m ≤ L_m and for all m > n we have L_m < R_m.

Proof of Lemma: Suppose that for all n we have L_n < R_n. I claim that (1) is true. For suppose that (1) is false and hence (by totality) we have R_m ⪅ L_n for some m and n. Then m ≠ n, since L_n < R_n. Either m < n or n < m. If m < n, then R_m ⪅ L_n < L_m, and if m > n, then R_n < R_m ⪅ L_n. In either case we have a violation of the fact that L_n′ < R_n′ for all n′.

Now suppose that for all n we have R_n < L_n. I now claim that (2) is true. For if (2) is false, we have L_n ⪅ R_m for some m ≠ n. Suppose m < n. Then L_n ⪅ R_m < R_n, a contradiction. And if n < m, then L_m < L_n ⪅ R_m, also a contradiction.

Let A(n) be the statement that for all m ≤ n we have R_m ⪅ L_m and for all m > n we have L_m < R_m. There is at most one n such that A(n). For suppose that we had A(n) and A(n′) and that n < n′. Then by A(n) we have L_n′ < R_n′ (since n′>n), and by A(n′) we have R_n′ ⪅ L_n′, resulting in a contradiction.

Assume (1) and (2) are false. By what we have shown earlier and totality, if (2) is not true, there is an n such that L_n ⪅ R_n. Note that if L_n ⪅ R_n, then L_m < L_n ⪅ R_n < R_m for all m > n. Hence, either L_n ⪅ R_n is true for all n or else there is a smallest n for which it’s true. If it’s true for all n, then for all n we have L_n + 1 < L_n ⪅ R_n < R_n + 1, and hence for all n we have L_n < R_n, which we saw would imply (1), which we assumed to be false.

So suppose that n is the smallest integer for which L_n ⪅ R_n. Thus, R_m < L_m whenever m < n. Moreover, since L_i is strictly decreasing and R_i is strictly increasing, we have L_m < R_n whenever m > n. There are now two possibilities. Either L_n < R_n or L_n ≈ R_n. If L_n ≈ R_n, then we have A(n) and the proof is complete. Suppose now that L_n < R_n. Then we have A(n − 1) and the proof is also complete.

Tuesday, July 14, 2020

Regularity and rotational invariance

Suppose that we have some sort of (not merely real-valued) probability assignment P to the Lebesgue measurable subsets of the unit circle Ω.

Theorem: Suppose that the probability values are rotationally invariant (P(A)=P(ρA) for any rotation ρ) and satisfy the two axioms:

If A and B are disjoint, A and C are disjoint, and P(B)=P(C), then P(A ∪ B)=P(A ∪ C)
P(Ω − A)=P(Ω − B) if and only if P(A)=P(B).

Then P(A)=P(∅) for every singleton A.

In other words, we cannot have regularity (non-empty sets having different probability from empty sets) if we have the additivity-type condition (1), the complement condition (2) and rotational invariance.

Proof: Fix an irrational number r and let B be the set of points at angles in degrees r, 2r, 3r, .... Let x₀ be the point at angle 0. Then B and C = B ∪ {x₀} are rotationally equivalent (you get the former from the latter by rotating by r degrees). So, P(B)=P(C). Let A = Ω − C. Then A and B are disjoint as are A and C. Hence, P(A ∪ B)=P(A ∪ C). But A ∪ C = Ω. So, P(A ∪ B)=P(Ω) by axiom 1. But A ∪ B = Ω − {x₀}. So, P({x₀}) = P(∅) by axiom 2. But all singletons are rotationally equivalent so they all have the same measure.

This result is a variant of the results here.

Saturday, May 18, 2019

Regularity

Plausibly—though there are some set-theoretic worries that require some care if the language is rich enough—for a fixed language, there are only countably many situations we can describe. Consequently, we only need to do Bayesian epistemology for countably many events. But this solves the problem of regularity for uncountable sample spaces. For even if there are uncountably many events, only countably many are describable and hence matter, and they form a field (i.e., are closed under finite unions and complements) and:

Proposition: For any countable field F of subsets of a set Ω, there is a countably additive probability measure P on the power set of Ω such that every event in F has non-zero probability.

Proof: Let the non-empty members of F be u₁, u₂, .... Let a₁, a₂, ... be any sequence of positive numbers adding up to 1 (e.g., a_n = 2⁻ⁿ). Choose one point x_n ∈ u_n. Let P(A)=∑_na_nA_n where A_n is 1 if x_n ∈ A and 0 otherwise.

Note that this proof uses the countable Axiom of Choice, but almost nobody is worried about that.

Monday, May 13, 2019

A tweak to regularity

Let G_p be the law of gravitation that states that F = Gm₁m₂/r^p, for some real number p. There was a time when it was rational to believe G₂. But here is a problem. When 0 < |p − 2|<10⁻¹⁰⁰ (say), G_p is practically empirically indistinguishable from G₂, in the sense that within the accuracy of our instruments it predicts exactly the same observations. Moreover, there are uncountably many values of p such that 0 < |p − 2|<10⁻¹⁰⁰. This means that the prior probability for most (i.e., all but at most countably many) such values of p must have been 0. On the other hand, if the prior probability for G₂ had been 0, then the posterior probability would have always stayed at 0 in our Bayesian updates (because the probability of our measurements conditionally on the denial of G₂ never was 0, which it would have to have been to budge us from a zero prior).

So, G₂ is exceptional in the sense that it has a non-zero prior probability, whereas most hypotheses G_p have zero prior probability. This embodies a radical preference for a more elegant theory.

Let N be the set of values of p such that the rational prior probability P(G_p) is non-zero. Then N contains at most countably many values of p. I conjecture that N is the set of all the real numbers that can be specifically defined in the language of mathematics (e.g., 2, 3.8, e^π and the smallest real root of z⁷ + 3z⁶ + 2z⁵ + 7πz³ − z + 18).

If this is right, then Bayesian regularity—the thesis that all contingent hypotheses should have non-zero probability—should be replaced by the weaker thesis that all contingent expressible hypotheses should have non-zero probability.

Note that all this doesn’t mean that we are a priori certain that the law of gravitation involves a mathematically definable exponent. We might well assign a non-zero probability to the disjunction of G_p over all non-definable p. We might even assign a moderately large non-zero probability to this disjunction.

Friday, August 10, 2018

Mathematical structures, physics and Bayesian epistemology

It seems that every mathematical structure (there are some technicalities as to how to define it) could metaphysically be the correct description of fundamental physical structure. This means that making Bayesianism be the whole story about epistemology—even for idealized agents—is a hopeless endeavor. For there is no hope for an epistemologically useful probability measure over the collection of all mathematical structures unless we rule out the vast majority of structures as having zero probability.

A natural law or divine command epistemology can solve this problem by requiring us to assign zero probability to some non-actual physical structures that are metaphysically possible but that our Creator wants us to be able to rule out a priori. In other words, our Creator can make us so that we only take epistemically seriously a small subset of the possibilia. This might help with the problem of scepticism, too.

Saturday, December 14, 2013

There is no regular approximately invariant finitely additive probability measure on all subsets of a cube or ball

For a totally ordered field K, say a hyperreal one, write x≈y (and say that they are approximately equal) provided that x−y is 0 or infinitesimal. A K-valued probability P defined for all subsets of Ω is said to be regular provided that P(A)>0 whenever A is non-empty. It is approximately rigid motion invariant provided that P(A)≈P(gA) for every rigid motion g and set A such that A∪gA⊆Ω. The following can be proved in Zermelo-Fraenkel (ZF) set theory without any Axiom of Choice:

Theorem 1. There is no totally ordered field K and a regular K-valued approximately rigid motion invariant finitely additive probability on all subsets of a ball or cube Ω.

If we delete "approximately", this follows from this.

The result follows from this post. Given such a regular measure we can define a preorder ≤ by letting A≤B if and only if P(A)≤P(B). By the Theorem from that post, it follows in ZF that Banach-Tarski is true. But Banach-Tarski implies that there is no approximately rigid motion invariant finitely additive probability on all subsets of a ball or cube.

(Why ball or cube? This saves me from having to worry about some edge effects given our definition of invariance.)

Another result, proved by similar methods:

Theorem 2. Let Ω be a subset of three-dimensional Euclidean space invariant under rotations about the origin 0. If K is a totally ordered field and P is a regular K-valued finitely additive probability on all subsets of Ω approximately invariant under rotations about the origin, then P({0})≈1.

Suppose now that we have a particle undergoing Brownian motion released at time t₀ at the origin, and then observed at time t₁. The probability of its being in some set at t₁ should be at least approximately invariant under rotations, and of course it is unacceptable to say that the probability that it is at the origin is approximately one—on the contrary, with approximately unit probability it is going to be away from the origin.

Update: Similar things hold for full-conditional probabilities, where approximate invariance is replaced with
invariance conditionally on the whole space (but there is no requirement of invariance conditionally on subsets of the space).

Saturday, November 23, 2013

The Axiom of Choice in some claims about probabilities

I spent the last week trying to get clear on the logical interconnections between a number of results about probabilities that are relevant to formal epistemology and that use a version of the Axiom of Choice in proof, such as:

For every non-empty set Ω, there is an ordered field K and a K-valued probability function that assigns non-zero finitely additive probability to every non-empty subset of Ω.
For every non-empty set Ω, there is a full finitely additive conditional probability on Ω (i.e., a Popper function with all non-empty subsets normal).
The Banach-Tarski Paradox holds: one can decompose a three-dimensional ball into a finite number of pieces that can be moved around and made into two balls of the same size.
There are Lebesgue non-measurable sets in the unit interval [0,1].

All of these results require some version of the Axiom of Choice. It turns out that there is a very simple map of their logical interconnections in Zermelo-Fraenkel (ZF) set theory:

BPI→(1)→(2)→(3)→(4),

where BPI is the Boolean Prime Ideal theorem, a weaker version of the Axiom of Choice.

The proof from BPI to (1) is standard--just let K be an ultrapower of the reals with an appropriate ultrafilter. That from (1) to (2) is almost immediate: just define the conditional probabilities via the ratio formula and take the standard part. Pawlikowski's proof of Banach-Tarski easily adapts to use (2) (officially, he uses Hahn-Banach). Finally, Foreman and Wehrung show in ZF that every subset of Rⁿ is Lebesgue measurable iff every subset of [0,1] is. But it follows from (2) that not every subset of R³ is Lebesgue measurable.

This has important consequences. Without the Axiom of Choice, one can prove that either (a) there are sets that have no regular probabilities no matter what ordered field is chosen for the values and no full conditional probabilities, or (b) the Banach-Tarski Paradox holds and hence there are no rigid-motion-invariant probabilities on regions of three-dimensional space big enough to hold a ball. And in either case, Bayesianism has a problem.

Wednesday, September 4, 2013

Something positive about Bayesian regularity

The brunt of a lot of my recent posts has been that there is no hope for Bayesian regularity if one requires natural invariance conditions. But here is a positive result. For this result, we will need the values of the probabilities to be taken in a very special space which is a variant of a space defined by Dos Santos. We now define this space. Let I be a totally ordered set under ≤. Let R(I) be the set of monotone non-increasing functions f from A to [0,∞] with the property that either f(x)=0 for all x or there is a unique (!) member i of I such that 0<f(i)<∞. Note that R(I) is itself a totally ordered set under pointwise comparison and it has a natural pointwise addition operation that respects the ordering. You can think of R(I) as very much like a set of non-negative hyperreals where numbers whose ratio is infinitesimally close to 1 are identified.

It is fairly easy to see that it follows from Proposition 1.7 of Armstrong that if G is a supramenable group and X is any space acted on by G, then there is an I and a finitely additive measure P on all subsets of X with values in A that is strictly positive in the sense that P(B)=0 if and only if B is the empty set. Moreover, we can normalize P into something like a probability by supposing that I has a final element, call it 1, and P(X) is the member f of R(I) such that f(1)=1.

In particular, there will be a strictly positive R(I)-valued finitely additive measure on the circle and the line, invariant under isometries. But not in dimensions greater than one due to Banach-Tarski related stuff.

Fact: There is a natural correspondence between real-valued Popper functions on X that make every non-empty subset normal and strictly-positive finitely-additive R(I)-valued measures. It's easy to see how this correspondence goes in one direction. Suppose we have such a strictly positive measure P. We want to define P(A|B) for some non-empty B. Choose the unique i in I such that P(B)(i) is in (0,∞) and then define P(A|B)=P(A∩B)(i)/P(B)(i). Moreover, the Popper function will be strongly-invariant (P(gA|B)=P(A|B) if gA and A are subsets of B)) if and only if the corresponding R(I)-valued measure is invariant.

For epistemological purposes, this is a move in the happy direction, but the fact that nothing like this can work in Euclidean settings in higher dimensions is a problem.

Note that P as above will be regular in the weak sense that 0<P(A) if A is non-empty but typically not in the strong sense that if A is a proper subset of B, then P(A)<P(B).

Saturday, August 17, 2013

Regularity on the circle

Suppose that a point is uniformly chosen on the circumference of the circle T. Write A≤B for "the point is at least as likely to be in B as in A" and say A<B when A≤B but not B≤A. Here are some very plausible axioms:

If A≤B and B≤C, then A≤C. (Transitivity)
A≤A. (Reflexivity)
Either A≤B or B≤A (or both). (Totality)
If A is a proper subset of B, then A<B. (Regularity)

Moreover, we have a very plausible invariance condition:

If r is any reflection in a line going through the center of the circle T and A≤B, then rA≤rB,

i.e., the probability comparison holds between A and B if and only if it holds between their reflections.

Proposition. There is no relation ≤ satisfying (1)-(5) for all countable subsets A, B and C of T.

I do not as yet know if the Proposition is true if we replace reflections by rotations in (5).

Totality and/or Regularity should go. Other cases suggest to me that both should go.

Proof of Proposition: Suppose ≤ satisfies (1)-(5). Say that A~B if and only if A≤B and B≤A. It is easy to see that ~ is transitive since ≤ is transitive and if A~B then rA~rB. Now observe that rA~A. For either rA≤A or A≤rA by totality. If rA≤A then A=r²A≤rA (the square of a reflection is the identity). If A≤rA then rA≤r²A=A. In both cases, thus, A~rA.

Therefore, if A≤B, then rA~A≤B and so rA≤B. Now, any rotation can be written as the composition of a pair of reflections (a rotation by angle θ equals the composition of reflections in lines subtending angle θ/2). Thus, for every every rotation r, if we have A≤B, then we have rA≤B and rA≤rB. It follows easily that A<B if and only if rA<B.

Now, let r be a rotation by an angle which is an irrational number of degrees and let x₀ be any point on the circle. Let A be the set {x₀,rx₀,r²x₀,r³x₀,...}. Observe that rA={rx₀,r²x₀,r³x₀,r⁴x₀,...} is a proper subset of A (x₀ is not equal to rⁿx₀ for any positive integer n as r was a rotation by an irrational number of degrees). Thus, rA<A by Regularity. Thus, A<A, which is a contradiction.

Wednesday, January 2, 2013

More on qualitative probability and regularity

For probabilities that have values, regularity is normally defined as the claim that P(A)>0 whenever A is non-empty. Given finite additivity, this entails that if A is a proper subset of B, then P(A)<P(B). If instead of assigning probability values we deal with qualitative probabilities—i.e., probability comparisons—we now have a choice of how to define regularity:

the probability comparison ≤ is weakly regular provided that ∅<A whenever A is non-empty
the probability comparison ≤ is strongly regular provided that A<B whenever A is a proper subset of B.

If one assumes the axiom:

(Additivity, de Finetti) If A∩C=∅ and B∩C=∅, then A≥B if and only if A∪C≥B∪C,

then weak regularity entails strong regularity.

In a recent post, I showed that there is no rotationally invariant strongly regular qualitative probability defined on all countable subsets of the circle. But perhaps there is a useful weakly regular one that does not satisfy Additivity?

I don't know. Here's a start. Let P be the Bernstein-Wattenberg hyperreal-valued measure on the circle. Define A≤B if and only if for some rotations r and s we have P(rA)≤P(sB). Define A≤B if and only if not B<A. Then ≤ is weakly regular, and rotationally invariant. But I can't prove that it's transitive. Is it?

Saturday, December 29, 2012

Qualitative probabilities, regularity and nonmeasurable sets

Normally, regularity is formulated as saying that P(A)>0 for every non-empty A. But suppose that instead of working with numerical probability assignments, we work with qualitative probabilities, i.e., probability comparisons. Thus, instead of saying B is at least as likely as A provided that P(B)≥P(A), we might take the relation of being at least as likely as to be primitive, and then give axioms.

Given a theory of qualitative probabilities, it will be possible to define an equiprobability relation ~ such that we can say A~B if and only if A and B are equiprobable. (The typical way would be to say that A~B provided that B is at least as likely as A and A is at least as likely as B.) This relation ~ will satisfy some axioms, but we actually won't need them for the argument. We shall suppose that ~ is defined on some collection of subsets of a sample space, which we will call the measurable sets. Our setup generalizes classical probabilities, as well as hyperreal probabilities, since if we have probability-values, we can say that A~B if and only if P(A)=P(B).

We can plausibly formulate regularity in terms of an equiprobability relation:

An equiprobability relation ~ is regular if and only if whenever A and B are measurable sets such that A is a proper subset of B, then we do not have A~B.

Now suppose that our sample space is (the circumference of) a circle. Then:

An equiprobability relation ~ is rotation-invariant if and only if whenever A and B are measurable sets such that B is a rotation of A, then A~B.

Now, we know that given the Axiom of Choice, and given classical probabilities, there is no way of defining probabilities for all subsets of our circle in a rotation-invariant way. Surprisingly, but very simply, if we assume regularity, we need neither classical probability—any equiprobability relation will do—nor the Axiom of Choice. In fact, we will have a countable nonmeasurable set, so when we add regularity to the mix, we have to sacrifice the measurability of sets that are unproblematically measurable using classical measures.

Theorem: There is no equiprobability relation ~ such that (a) all countable subsets of the circle are measurable; (b) the relation is regular; and (c) the relation is rotation-invariant.

Proof: Let u be any irrational number. Let B be the set of all points on the circle at angles 2πnu (to some fixed axis, say the x-axis), for positive integers n. Let A be a rotation of B by the angle 2πu. Then A is a proper subset of B (A contains all the points on the circle at angles 2πnu for n an integer greater than one, and by the irrationality of u that will not include the point at angle 2πu). So if we had regularity, we couldn't have A~B. But if we had rotation-invariance, we would have to have A~B. ∎

The above proof is based on the counterintuitive fact that there is a subset of the circle, i.e., B, that can be rotated to form a proper subset of itself, i.e., A. (This reminds me of the Sierpinski-Mazurkiewicz paradox and other cases of paradoxical decomposition, though it's much more trivial.)

This is, of course, a trivial modification of the Bernstein and Wattenberg inspired argument here.

Tuesday, December 11, 2012

Uniform measure and nonmeasurable sets, without the Axiom of Choice

Given the Axiom of Choice, there is no translation invariant probability measure on the interval [0,1) (the relevant translation is translation modulo 1). But this fact really does need something in the way of the Axiom of Choice. Moreover, the fact only obtains for countably additive measures. Interestingly, however, if we add the assumption that our measure assigns non-zero (presumably infinitesimal) weight to each point of [0,1), then the non-existence of a translation invariant finitely additive measure follows without the Axiom of Choice. I got the proof of this from Paul Pedersen who thinks he got it from the classic Bernstein and Wattenberg piece (I don't have their paper at hand). I am generalizing trivially.

Theorem: Let P be any finitely additive measure taking values in a partially ordered group G and defined on a collection of subsets of [0,1) such that every countable subset has a measure in G. Suppose P({x})>0 for some x in G. Then P is not translation invariant (modulo 1).

Proof: To obtain a contradiction, suppose P is translation invariant. Then P({x})>0 for every x in [0,1). Let r be any irrational number in (0,1), and let R be the set of numbers of the form nr modulo 1, as n ranges over the positive integers. Let R' be the set of numbers of the form nr modulo 1, as n ranges over the integers greater than 1. Then R' is a translation of R by r, modulo 1. Observe that r is not a member of R' since there is no natural number n greater than 1 such that r=nr modulo 1, since if there were, we would have (n−1)r=0 modulo 1, and hence r would be a rational number with denominator n−1. Thus by finite additivity P(R)=P(R')+P({r})>P(R'). Hence, R is a counterexample to translation invariance, contradicting our assumption.

Note 1: On the assumption that the half-open intervals are all measurable and the measurable sets form an algebra (the standard case), translation invariance modulo 1 follows from ordinary translation invariance within the interval, namely the condition that P(A)=P(A+x) whenever both A and A+x={y+x:y in A} are subsets of [0,1).

Note 2: The proof above shows that if P({x})>0 for every x in [0,1), then the set of all positive integral multiples of any fixed irrational number (modulo 1) is nonmeasurable. It is interesting to note that this nonmeasurable set is actually measurable using standard Lebesgue measure. Thus, by enforcing regularity using infinitesimals, one is making some previously measurable sets nonmeasurable if one insists on translation invariance.

Note 3: Bernstein and Wattenberg construct a hyperreal valued measure that is almost translation invariant: the difference between the measure of a set and of a translation of the set is infinitesimal.

Sunday, December 18, 2011

Bayesianism and regularity

Take regularity as the thesis that the rational agent assigns a probability of 0 only to impossible propositions and a probability of 1 only to necessary propositions. Bayesians like regularity in large part because regularity allows them to prove convergence theorems. These convergence theorems say that if if you start with a regular probability assignment, and keep on gathering evidence, your probability assignments will converge to the truth. Here, a probability assignment for p "converges to the truth" provided that if p is true, then one's credences converge to 1, and if p is false, then one's credences converge to 0.

But they cannot use this argument for regularity. For consider the proposition C_p: "If you keep on gathering evidence in manner M, your probability assignment for p will converge to the truth" (take that as a material conditional). The kinds of convergence theorems that the Bayesians like in fact show that P(C_p)=1.[note 1] And that's why the Bayesians like these theorems. They give us confidence of convergence. But now notice that these very convergence theorems are incompatible with regularity. For it is clear that C_p is not a necessary truth. Just as it is possible to get an infinite run of heads (it's no less likely than any other infinite sequence) when tossing a coin, it's possible to have an infinite run of misleading evidence.

In summary, one of the main reasons Bayesians like regularity is that it yields convergence theorems. But the convergence theorems are not compatible with regularity. Ooops. Not only do the convergence theorems refute regularity, but they are supposed to be the main motivation of regularity.

In email discussion, a colleague from another institution suggested that the regularist Bayesian might instead try to assign probability 1−e to C_p where e is an infitesimal. I don't have a proof that that can't work for the particular convergence theorems they're using, but I can show that that won't work for the strong Law of Large Numbers, and since the convergence theorems they're using are akin to the strong Law of Large Numbers, I don't hold out much hope for this here.

Alexander Pruss's Blog