Tuesday, December 6, 2011

A variant puzzle about probabilities and infinities

A being you know for sure to be completely reliable on what it says tells you that that the universe will contain an infinite sequence of people who can be totally ordered by time of conception. The being also says that the people can also be totally ordered by their distance from the universe's center (center of mass? or maybe the universe has some symmetries that define a center) at the time of their conception. Finally, the being tells you:

  1. If you order the people by time of conception, the sequence looks like this: 99 people who will die of cancer, then one person who won't, then 99 people who will die of cancer, then one person who won't, and so on.
  2. If you order the people by distance of conception from the center of the universe, the sequence looks like this: 99 people who won't die of cancer, then one person who will, then 99 people who won't, then one who will, and so on.
This is a consistent set of information.

Question: What probability should you assign to the hypothesis that you will die of cancer?

If you just had (1), you'd probably say: 99%. If you just had (2), you'd probably say 1%. So, do we just average these and say 50%?

Now imagine you just have (1), and no information about how things look when ordered by distance of conception from the center of the universe. Then you know that there are infinitely many ways of imposing an ordering on the people in the universe. Further, you know that among these infinitely many ways of imposing an ordering on the people in the universe, there are just as many where the sequence looks like the one in (2) as there are ones where the sequence looks like in (1). Why should the ordering by time of conception take priority over all of these other orderings?

An obvious answer is that the ordering in (1) is more natural, less gerrymandered, than most of the infinitely many orderings you can impose on the set of all people. But I wonder why naturalness matters for probabilities. Suppose there are presently infinitely many people in the universe and when you order them by present distance from the center of the universe, you get the sequence in (1). That seems a fairly natural ordering, though maybe a bit less so than the pure time-of-conception ordering. But now imagine a different world where the very same people, with the very same cancer outcomes, are differently arranged in respect of distance from the center of the universe, so you get the sequence in (2). Why should the probabilities of death by cancer be different between these two worlds?

So what to do? Well, the options seem to me to be:

  1. Dig heels in and insist that the natural orderings count for more. And where results with several natural orderings conflict, you do a weighted average, weighted by naturalness. And ignore worries about worlds where things are rearranged.
  2. Deny that there could be infinitely many people in the world, even successively, perhaps by denying the possibility of an infinite past, a simultaneous infinity and the reality of the future.
  3. Deny that probabilities can be assigned in cases where the relevant sample-space is countably infinite and there are infinitely many cases in each class.
I find (4) implausible. That leaves (3) and (5). I don't know which one is better. I worry about (3)—I don't know if it's defensible or not. Now, if (5) is the only option left, then I think we get the interesting result that if we live in an infinite multiverse, we can't do statistical scientific work. But since statistical work is essential to science, it follows that if we live in an infinite multiverse, science is undercut. And hence one cannot rationally infer that we lie in an infinite multiverse on scientific grounds.

18 comments:

  1. I really have no idea what the answer is, but here is a tentative defense of (3). With an infinite data set (and a few other conditions) someone gerrymandering an order can come up with any result they like. If we are not too sure about such a person's motives, then we cannot draw any inferences about probabilities from the ordering. There is too much threat of a "design argument" to explain the ordering, rather than just the natural fallout of the way things are.

    To say that an ordering is "natural" is roughly to say that no such design argument applies. That *may* give us more rational confidence in drawing inferences from the ordering.

    ReplyDelete
  2. This can perhaps be made into a good argument against the Multiverse hypothesis.

    By the way, as an alternative to the trilemma at the end - perhaps one could argue (referring to 4) that some types of infinite sets are implausible. For instance, it seems to me that while Aquinas thought that some causally-successive chains of causes must have a logical stopper, there are other sets (such as a set of events which are properly related to each other as 'accidental')could be infinite. The trick is really to give some account for the distinction between the kinds of sets which can be infinite and the kind for which positing an infinite such set would be offensive to reason. I suspect that this will be largely informed by intuition, and thus it is difficult to make out a very good argument for this distinction, but it seems to me that it is at least plausible.

    ReplyDelete
  3. This seems to me to be along the same lines as the Bertrand paradox.

    I suspect that "objective" (i.e. ontological as opposed to epistemic) probabilities only make sense in the framework of the symmetries of the physical laws governing the system in question. In classical statistical thermodynamics, for instance, the so-called ergodic hypothesis and Liouville's theorem can be used to show that the Boltzmann probabilities in the equations of statistical thermodynamics must be based on a phase-space-uniform distribution, thus (in that case) solving the problem of how to avoid needing to invoke an unspecified and ambiguous notion of "naturalness" of the distribution. Similarly, with a more "everyday" example, consider rolling of dice. If we assume an approximate physical symmetry between the faces rather than just an epistemic "principle of indifference" symmetry, then we can "borrow" the usage of the phase-space measurement and use it for our "objective" probabilities. I think it is plausible that other cases are like this as well.

    Disclaimer: Much of the above is my personal opinion or speculation only.

    ReplyDelete
  4. Nightvid:

    You can also get objective probabilities out of things with objective tendencies, like quantum measurements.

    ReplyDelete
  5. (4) seems pretty plausible to me, because an infinite collection or series of objects standing in causal relations to one another does seem to pose a number of difficulties. I don't think these difficulties exist when one posits causally unconnected objects, like isolated worlds.

    ReplyDelete
  6. A. Pruss,

    I find it odd that anyone would invoke quantum probabilities in this sort of discussion (although you are not the first one I have seen doing just that), because, ironically, quantum mechanics violates that axioms of probability (this has been shown via clever experiments on Bell's inequality). However, in the special case that the off-diagonal elements of the density matrices are vanishing, one can recover classical probability theory from quantum dynamics.

    ReplyDelete
  7. Can you give me a reference to alleged violation of the Kolmogorov axioms? Thanks!

    ReplyDelete
  8. What Nightvid refers to is controversial. Here is a paper from 1996 arguing that Kolmogorov must be replaced with quantum probability. Here is a paper disputing the claim. Also, this paper by Pitowsky is very pretty, though I'm not entirely sure what conclusion he reaches vis-a-vis the relationship between Kolmogorov and the Aspect experiments. Finally, this recent paper lays out a lot of the more recent debate and argues that the incompatibility amounts to this: you can't have a Kolmogorov probability space defined for all of the random variables that you want in Aspect experiments. That isn't a violation of the Kolmogorov axioms, though, it's a fact about what things in the the physical world have complete probabilistic models.

    ReplyDelete
  9. Thanks for the references.

    I don't work in philosophy of quantum mechanics, so what I say below may be dubious.

    I had a look at the 1996 paper and got confused at the authors' description of the Aspect experiment:

    "In this experiment a random choice out of two different polarization measurements was performed on each side ..., giving rise to four random variables P1:=P1(alpha), P2:=P2(alpha2) and Q1:=Q1(beta1), Q2:=Q2(beta2), two of which are measured and compared at each trial." (p.5)

    So, of four "random variables", two are measured. But how how in quantum mechanics can there be these other two unmeasured random variables?! I thought it was the point of the Copenhagen interpretation to deny that there is such a thing.

    Maybe, though, the way to read the paper is as follows:
    P1: 1 iff positive result measured on left at alpha1, 0 otherwise
    P2: 1 iff positive result measured on left at alpha2, 0 otherwise
    Q1: 1 iff positive result measured on right at beta1, 0 otherwise
    Q2: 1 iff positive result measured on right at beta2, 0 otherwise

    Now the random variables are fully defined.

    But that's not what the authors mean. For on page 6, the authors say: P[P(alpha)=Q(beta)=1] = (1/2) sin^2(alpha-beta). But that's just false if we define the random variables as I suggest. For P(alpha)=1 and Q(beta)=1 can only happen when the left measurement is at alpha and the right measurement is at beta, and that only happens at most 1/4 of the time. So the left hand side is at most 1/4, while the right hand side can go up to 1/2, and so the equation doesn't fit the random variables as I've defined them.

    I haven't checked that one can find a consistent probability space for the random variables I proposed. But I see no reason why not.

    ReplyDelete
  10. Of the four random variables, two are measured in each trial of the experiment. They are measured that way because we don't know how to measure all of the variables at once. (And this is what causes the trouble, as the Pitowsky paper makes clear.)

    So, what's happening is this. We have two detectors, each of which has two allowed settings (alpha_1, alpha_2, beta_1, and beta_2). Hence, we have four cases: , , , and . Under each case, we run a bunch of trials. The probabilities that the authors write down are effectively conditional probabilities: conditional on the settings for the detectors.

    I am going to simplify the notation and use "alpha_1" as the variable that takes either 0 or 1 depending on whether a given particle-thingy is detected or not.

    So, when the authors write down P(alpha_i = beta_j) = sin^2(alpha_i - beta_j), they are saying that in trials where the left detector is set to alpha_i and the right detector is set to beta_j, the probability that the two detectors agree is given by the sin^2 term. Both sides of the equation range over [0,1].

    Now, it is a theorem of probability that for any four 0-1-valued random variables, X1, ..., X4, on the same probability space, P(X1=X3) <= P(X1=X4) + P(X2=X3) + P(X2=X4). But this inequality is violated for some settings of the detectors.

    Again, the trouble is that we have four separate samplings: one from each case. Each of the samples was generated by a separate experimental run. Under the assumption that the samples all come from the same stable population, the Kolmogorov axioms are violated. But at that point, we are free (I think) to either (a) reject the axioms or (b) reject the assumption that the samples all come from the same stable population. In other words, we may reject the assumption that all of the random variables are defined in the same probability space. I prefer option (b), but as I indicated, what to say at that point is somewhat controversial.

    If that doesn't make sense, yet, I really recommend having a look at the Pitowsky paper.

    ReplyDelete
  11. Nuts ... I forgot about html tags ... again ...

    The cases should be <alpha_1, beta_1>, <alpha_1, beta_2>, <alpha_2, beta_1>, <alpha_2, beta_2>.

    Sorry for the typo!

    ReplyDelete
  12. "when the authors write down P(alpha_i = beta_j) = sin^2(alpha_i - beta_j), they are saying that in trials where the left detector is set to alpha_i and the right detector is set to beta_j, the probability that the two detectors agree is given by the sin^2 term"

    But described like that, it's a conditional probability, i.e., conditional on the dials being set as described, and so the unconditional probability notation is misleading and non-standard. The proper notation is something like:

    (*) P[P(alpha_i) = Q(beta_j) | L(alpha_i)&R(beta_j)] = sin^2(alpha_i - beta_j)

    where L(alpha_i) and R(beta_j) are respectively the events of the left detector being set to alpha_i and the right being set to beta_j.

    But with this clarification, Bell's inequality become inapplicable. For while one can (trivially) generalize Bell's inequality to work for conditional probabilities, it only works when the conditional probabilities all have the same conditions. And here, they don't. sin^2(alpha_1-beta_1) is equal to a probability conditioned on L(alpha_1)&R(beta_1), sin^2(alpha_1-beta_2) is equal to a probability conditioned on L(alpha_1)&R(beta_2), and so on.

    "Under the assumption that the samples all come from the same stable population..."

    I don't fully understand what "stable population" means here. The measurements affect the outcomes, and due to entanglement, do so at a distance. Is that compatible with the stability?

    ReplyDelete
  13. Yes, I think that's right about the conditional probabilities. That's why I said that the probabilities the authors write down are conditional probabilities, conditional on the settings of the detectors.

    What I think they are doing is using a kind of short-hand. When they write "P(P1(alpha)=1)=p," for example, they are writing a short-hand of "P(P1=1 | D=alpha)=p," where D is the setting on the detector. The alphas and betas are not the units in this case, so they are not actually the arguments of the random variables. Those are being suppressed in the notation -- though they are mentioned earlier on.

    I'll have to think more about what difficulties the description raises.

    And again, yes ... I think. By "stable" I just mean to be flagging that the population of statistical units is not changing its character from one measurement event to the next. If you think that the measurements change the population in some interesting way, I don't see how you get the contradictions out in the end. But I could be wrong about that.

    The theorem only requires that the four random variables be defined in the same probability space. So, you can set aside my gloss in terms of populations of statistical units if it helps.

    ReplyDelete
  14. It's fine if they suppress the conditions, but they use Bell's Inequality whose proof requires either that all four probabilities be unconditional or that they all be conditional on the same condition.

    In other words, Bell's Inequality says:
    P[P1=Q1|D] <= P[P1=Q2|D] + P[P2=Q1|D] + P[P2=Q2|D]
    for any condition D.

    But what they need for their incompatibility argument is not this, but instead:
    P[P1=Q1|L1R1] < P[P1=Q2|L1R2] + P[P2=Q1|L1R2] + P[P2=Q2|L2R2],
    where L1R1 means the first dial is set to alpha1 and the second to beta2, and so on.

    So, to sum up:

    1. On the natural unconditional probability interpretation, they are assuming that there are unmeasured variables.

    2. On my modified interpretation on which the variables indicate actual measurements that have been made, their formulas are false.

    3. On a conditional probability interpretation, Bell's Inequality is inapplicable.

    "If you think that the measurements change the population in some interesting way"

    Isn't it one of the basic lessons of QM under the Copenhagen interpretation that measurement leads to collapse of the wavefunction, and hence a real change in the system?

    ReplyDelete
  15. Here is a fourth interpretation:

    4. P(alpha1) is the probability of the counterfactual: were the alpha1 measurement to be made on the left, the result would be positive.

    But it is precisely the lesson of Bell's Inequality that the truth of such counterfactuals is a dubious matter. This dubiousness does not reflect at all badly on probability theory. Probability theory needs conditional probabilities. But conditional probabilities are not the same as probabilities of conditionals--David Lewis and others have conclusively shown the Adams thesis (which said that they are the same) to be false.

    ReplyDelete
  16. No, I don't think that response is quite right. The fact that the probabilities are conditional doesn't make any difference here.

    I don't want to repeat the write-up, but take a look at the response paper that I linked earlier for a derivation of the inequality using conditional probabilities where the conditioning events are all distinct. (The derivation is on pages 5-6.)

    ReplyDelete
  17. The form of the inequality that Gill derives in his paper is the second one you mention: the one that you say is needed for the incompatibility result. Maybe he has done something wrong in his derivation, but if so, I don't see it.

    ReplyDelete
  18. The Gill piece in its derivation assumes (page 5) that X=g(A,Z) and Y=g(B,Z), where Z is independent of A and B. I see no reason grounded in classical probability alone to suppose that the dependencies between A, B, X and Y are thus arranged.

    Oh, I see: Gill on page 7 criticizes precisely this assumption. :-)

    I actually think this criticism of his may be equivalent to mine, but it doesn't really matter.

    ReplyDelete