There is a physical constant T. You know nothing about it except that it is a positive real number. However, you can do an experiment in the lab. This experiment generates a number t which is uniformly distributed between 0 and T, and the number t is stochastically independent each time the experiment is run.
Suppose the experiment is run once, and you find that t=0.7. How should you estimate T? More exactly, what subjective probability distribution should you assign to T? This is difficult to solve by standard Bayesian methods because obviously either your priors on T should be a uniform distribution on the positive reals, or your priors on the logarithm of T should be a uniform distribution on all the reals. (I actually think the second is more natural, but I did the calculations below only for the first case. Sorry.) The problem is that there is no uniform probability measure on the positive reals or on the reals. (Well, we can have finitely additive measures, but those measures will be non-unique, and anyway won't help with this problem.)
So perhaps the conclusion we should draw from this is that you don't learn anything about T when you find out that t=0.7 other than the deductive fact that T is at least 0.7. But this is not quite right. For suppose you keep on repeating the experiment. If you draw a point at each measured value of t, you will eventually get a dotted line between 0 on the left and some number on the right, and the right hand bound of that interval will be a pretty good estimate for the value of T, and not just a lower bound for T. But if the first piece of data only gives a lower bound, then, by similar reasoning, further pieces of data either will be irrelevant (if they give a t that's less than 0.7) or will only give better (i.e., higher) lower bounds for T, and we'll never get an estimate for T, just a lower bound.
So, the first piece of data should give something. (The reasoning here is inspired by a sentence I overheard someone—I can't remember who—say to John Norton, perhaps in the case of Doomsday.)
Now here is something a little bit fun. We might try to calculate the distribution for T after n experiments in the following way. First assume that T is uniformly distributed between 0 and L (where L is large enough that all the t measurements fit between 0 and L), then calculate a conditional distribution for T given the t measurements, and finally take the limit as L tends to plus infinity. Interestingly, this procedure fails if n=1, i.e., if we have only one measurement of a t value—the resulting limiting distribution is zero everywhere. However, if n>1, then the procedure converges to a well-defined distribution of T. Or so my very rough sketches show.
So there is a radical difference here between what we get with one measurement—no distribution—and what we get with two or more measurements—a well-defined distribution. I have doubts whether standard Bayesian confirmation can make sense of this.
But if the first piece of data only gives a lower bound, then, by similar reasoning, further pieces of data either will be irrelevant (if they give a t that's less than 0.7) or will only give better (i.e., higher) lower bounds for T, and we'll never get an estimate for T, just a lower bound.
ReplyDeleteI'm no probability theorist (I've an interest in the meaning of 'probability'), but I find that reasoning intuitively implausible (and a good place to start). You are getting dots spread out randomly over a finite line interval (as you later say). The first dot gives you a lover bound (clearly). With two dots you have a very feint (and fuzzy) picture of a line, whether the second was nearer 0 or not. I think you are right about the radical difference. With two dots, the first dot is not just giving you a lover bound, even if the second dot is nearer to 0; whence I wonder if the first dot was not an even feinter (and fuzzier) picture of a line? It does seem (to me) that if the line to T was very much longer than twice the line to the first point, then getting that point would have been very unlikely; which seems like the sort of informal logic (or reasoning) that lies behind seeing lots of dots as a line, and also like that behind Bayesianism (?)