Alexander Pruss's Blog: priors

Showing posts with label priors. Show all posts

Friday, May 24, 2024

Three or four ways to implement Bayesianism

We tend to imagine a Bayesian agent as starting with some credences, “the ur-priors”, and then updating the credences as the observations come in. It’s as if there was a book of credences in the mind, with credences constantly erased and re-written as the observations come in. When we ask the Bayesian agent for their credence in p, they search through the credence book for p and read off the number written beside it.

In this post, I will assume the ur-priors are “regular”: i.e., everything contingent has a credence strictly between zero and one. I will also assume that observations are always certain.

Still the above need not be the right model of how Bayesianism is actually implemented. Another way is to have a book of ur-priors in the mind, and an ever-growing mental book of observations. When you ask such a Bayesian agent what their credence in p, they on the spot look at their book of ur-priors and their book of observations, and then calculate the posterior for p.

The second way is not very efficient: you are constantly recalculating, and you need an ever-growing memory store for all the accumulated evidence. If you were making a Bayesian agent in software, the ever-changing credence book would be more efficient.

But here is an interesting way in which the second way would be better. Suppose you came to conclude that some of your ur-priors were stupid, through some kind of an epistemic conversion experience, say. Then you could simply change your ur-priors without rewriting anything else in your mind, and all your posteriors would automatically be computed correctly as needed.

In the first approach, if you had an epistemic conversion, you’d have to go back and reverse-engineer all your priors, and fix them up. Unfortunately, some priors will no longer be recoverable. From your posteriors after conditionalizing on E, you cannot recover your original priors for situations incompatible with E. And yet knowing what these priors were might be relevant to rewriting all your priors, including the ones compatible with E, in light of your conversion experience.

Here is a third way to implement Bayesianism that combines the best of the two approaches. You have a book of ur-priors and a book of current credences. You update the latter in ordinary updates. In case of an epistemic conversion experience, you rewrite your book of ur-priors, and conditionalize on the conjunction of all the propositions that you currently have credence one in, and replace the contents of your credence book with the result.

We’re not exactly Bayesian agents. Insofar as we approximate being Bayesian agents, I think we’re most like the agents of the first sort, the ones with one book which is ever rewritten. This makes epistemic conversions more difficult to conduct responsibly.

Perhaps we should try to make ourselves a bit more like Bayesian agents of the third sort by keeping track of our epistemic history—even if we cannot go all the way back to ur-priors. This could be done with a diary.

Thursday, February 2, 2023

Rethinking priors

Suppose I learned that all my original priors were consistent and regular but produced by an evil demon bent upon misleading me.

The subjective Bayesian answer is that since consistent and regular original priors are not subject to rational evaluation, I do not need to engage in any radical uprooting of my thinking. All I need to do is update on this new and interesting fact about my origins. I would probably become more sceptical, but all within the confines of my original priors, which presumably include such things as the conditional probability that I have a body given that I seem to have a body but there is an evil demon bent upon misleading me.

This answer seems wrong. So much the worse for subjective Bayesianism. A radical uprooting would be needed. It would be time to sit back, put aside preconceptions, and engage in some fallibilist version of the Cartesian project of radical rethinking. That project might be doomed, but it would be my only hope.

Now, what if instead of the evil demon, I learned of a random process independent of truth as the ultimate origin of my priors. I think the same thing would be true. It would be a time to be brave and uproot it all.

I think something similar is true piece by piece, too. I have a strong moral intuition that consequentialism is false. But suppose that I learned that when I was a baby, a mad scientist captured me and flipped a coin with the plan that on heads a high prior in anti-consequentialism would be induced and on tails it would be a high prior in consequentialism instead. I would have to rethink consequentialism. I couldn’t just stick with the priors.

Saturday, December 17, 2022

Variation in priors and community epistemic goods

Here is a hypothesis:

It is epistemically better for the human community if human beings do not all have the same (ur-) priors.

This could well be true because differences in priors lead to a variety of lines of investigation, a greater need for effort in convincing others, and less danger of the community as a whole getting stuck in a local epistemic optimum. If this hypothesis is true, then we would have an interesting story about why it would be good for our community if a range of priors were rationally permissible.

Of course, that it would be good for the community if some norm of individual rationality obtained does not prove that the norm obtains.

Moreover, note that it is very plausible that what range of variation of priors is good for the community depends on the species of rational animal we are talking about. Rational apes like us are likely more epistemically cooperative than rational sharks would be, and so rational sharks would benefit less from variation of priors, since for them the good of the community would be closer to just the sum of the individual goods.

But does epistemic rationality care about what is good for the community?

I think it does. I have been trying to defend a natural law account of rationality on which just as our moral norms are given by what is natural for the will, our epistemic norms are given by what is natural for our intellect. And just as our will is the will of a particular kind of deliberative animal, so too our intellect is the intellect of a particular kind of investigative animal. And we expect a correlation between what a social animal’s nature impels it to do and what is good for the social animal’s community. Thus, we expect a degree of harmony between the norms of epistemic rationality—which on my view are imposed by the nature of the animal—and the good of the community.

At the same time, the harmony need not be perfect. Just as there may be times when the good of the community and the good of the individual conflict in respect of non-epistemic flourishig, there may be such conflict in epistemic flourishing.

I am grateful to Anna Judd for pointing me to a possible connection between permissivism and natural law epistemology.

Wednesday, April 21, 2021

Is it permissible to fix cognitive mistakes?

Suppose I observe some piece of evidence, attempt a Bayesian update of my credences, but make a mistake in my calculations and update incorrectly. Suppose that by luck, the resulting credences are consistent and satisfy the constraint that the only violations of regularity are entailed or contradicted by my evidence. Then I realize my mistake. What should I do?

The obvious answer is: go back and correct my mistake.

But notice that going back and correcting my mistake is itself a transition between probabilities that does not follow the Bayesian update rule, and hence a violation of the standard Bayesian update rule.

To think a bit more about this, let’s consider how this plays out on subjective and objective Bayesianisms. On subjective Bayesianism, consistency, the Bayesian update rule and perhaps the constraint that the only violations of regularity are entailed or contradicted by my evidence. My new “mistaken” credences would have been right had I started with other consistent and regular priors. So there is nothing about my new credences that makes them in themselves rationally worse than the ones that would have resulted had I done the calculation right. The only thing that went wrong was the non-Bayesian transition. And if I now correct the mistake, I will be committing the rational sin of non-Bayesian transition once again. I have no justification for that.

Moreover, the standard arguments for Bayesian update apply just as much now in my new “mistaken” state: if I go back and correct my mistake, I will be subject to a diachronic Dutch Book, etc.

So, I should just stick to my guns, wherever they now point.

This seems wrongheaded. It sure seems like I should go back and fix my mistake. This, I think, shows that there is something wrong with subjective Bayesianism.

What about objective Bayesianism? Objective Bayesianism adds to the consistency, update and (perhaps) regularity restrictions in subjective Bayesianism some constraints on the original priors. These constraints may be so strict that only one set of original priors counts as permissible or they may permissive enough to allow a range of original priors. Now note that the standard arguments for Bayesian update still apply. It looks, thus, like correcting my mistake will be adding a new rational sin to the books. And so it seems that the objective Bayesian also has to say that the mistake should not be fixed.

But this was too quick. For it might be that my new “mistaken” posteriors are such that given my evidential history they could not have arisen from any permissible set of original priors. If so, then it’s like my being in possession of stolen property—I have posteriors that I simply should not have—and a reasonable case can be made that I should go back and fix them. This fix will violate Bayesian update. And so we need to add an exception to the Bayesian update rules: it is permissible to engage in a non-Bayesian update in order to get to a permissible credential state, i.e., a credential state that could have arisen from a permissible set of priors given one’s evidential history. This exception seems clearly right. For imagine that you are the mythical Bayesian agent prior to having received any evidence—all you have are your original priors, and no evidence has yet shown up. Suddenly you realize that your credences violate the objective rules on what the priors should be. Clearly you should fix that.

Thus, the objective Bayesian does have some room for justifying a “fix mistakes” exception to the Bayesian update rule. That exception will still violate the standard arguments for Bayesian update, and so we will have to say something about what’s wrong with those arguments—perhaps the considerations they give, while having some force, do not override the need for one’s credences to be such that they could be backtracked to permissible original priors.

Considerations of mistakes gives us reasons to prefer objective Bayesianism to subjective Bayesianism. But the objective Bayesian is not quite home free. Consider first the strict variety where there is only one permissible set of original priors. We have good empirical reason to think that there are about as many sets of original priors as there are people on earth. And on the strict version of objective Bayesianism, at most one of these sets of original priors is permissible. Thus it’s overwhelmingly unlikely that my original priors are permissible. Simply fixing my last mistake is very unlikely to move me to a set of posteriors that are correct given the unique set of permissible original priors and my evidential history. So it’s a matter of compounding one rational sin—my mistake—with another, without fixing the underlying problem. Maybe I can have some hope that fixing the mistake gets me closer to having posteriors that backtrack to the unique permissible original priors. But this is not all that clear.

What about permissible objective Bayesianism? Well, now things depend on our confidence that our original priors were in fact permissible and that no priors that generate our new “mistaken” posteriors given our evidential history would have been permissible. If we have a high enough confidence in that, then we have some reason to fix the mistake. But given the obvious fact that human beings so often reason badly, it seems unlikely that my original priors were in fact permissible—if Bayesianism is objective, we should believe in the “original cognitive sin” of bad original priors. Perhaps, just as I speculated on strict objective Bayesianism, we have some reason to hope that our actual original priors were closer to permissible than any priors that would generate our new “mistaken” posteriors. Perhaps.

So every kind of Bayesian has some difficulties with what to do given a miscalculation. Objective Bayesians have some hope of having an answer, but only if they have some optimism in our actual original priors being not too far from permissibility.

It is interesting that the intuition that we should fix our “mistaken” posteriors leads to a rather “Catholic” view of things: although doubtless there is original cognitive sin in our original priors, these priors are sufficiently close to permissibility that cognitive repairs make rational sense. We have depravity of priors, but not total depravity.

Friday, May 24, 2019

Improving on Solomonoff priors

Let’s say that we want prior probabilities for data that can be encoded as a countably infinite binary sequence. Generalized Solomonoff priors work as follows: We have a language L (in the original setting, it’ll be based on Turing machines) and we generate random descriptions in L in a canonical way (e.g., add an end-of-string symbol to L and randomly and independently generate symbols until you hit the end-of-string symbol, and then conditionalize on the string uniquely describing an infinite binary sequence). Typically the set of possible descriptions in L is countable and we get a nice well-defined probability measure on the space of all countably infinite binary sequences, which favors those sequences that are simpler in the sense of being capable of a simpler encoding.

Here is a serious problem with this method. Let N be the set of all binary sequences that cannot be uniquely described in L. Then the method assigns prior probability zero to N, even though most sequences are in N. In particular, this means that if we get an L-indescribable sequence—and most sequences generated by independent coin tosses will be like that—then no matter how much of it we observe, we will be almost sure of the false claim that the sequence is L-describable.

Here, I think, is a better solution. Use a language L that can give descriptions of subsets of the space Ω of countably infinite binary sequences. Now our (finitely additive) priors will be generated as follows. Choose a random string of symbols in L and conditionalize on the string giving a unique description of a subset. If the subset S happens to be measurable with respect to the standard (essentially Lebesgue) measure on infinite binary sequences (i.e., the coin toss measure), then randomly choose a point in S using a finitely additive extension of the standard measure to all subsets of S. If the subset S is not measurable, then randomly choose a point in S using any finitely additive measure that assigns probability zero to all singletons.

For a reasonable language L, the resulting measure gives a significant probability to an unknown binary sequence being indescribable. For Ω itself will typically be easily described, and so there will be a significant probability p that our random description of a subset will in fact describe all of Ω, and the probability that we have an indescribable sequence will be at least p.

It wouldn’t surprise me if this is in the literature.

Tuesday, February 19, 2019

Conciliationism and natural law epistemology

Suppose we have a group of perfect Bayesian agents with the same evidence who nonetheless disagree. By definition of “perfect Bayesian agent”, the disagreement must be rooted in differences in priors between these peers. Here is a natural-sounding recipe for conciliating their disagreement: the agents go back to their priors, they replace their priors by the arithmetic average of the priors within the group, and then they re-updated on all the evidence that they had previous got. (And in so doing, they lose their status as perfect Bayesian agents, since this procedure is not a Bayesian update.)

Since the average of consistent probability functions is a consistent probability function, we maintain consistency. Moreover, the recipe is a conciliation in the following sense: whenever the agents previously all agreed on some posterior, they still agree on it after the procedure, and with the same credence as before. Whenever the agents disagreed on something, they now agree, and their new credence is strictly between the lowest and highest posteriors that the group assigned prior to conciliation.

Here is a theory that can give a justification for this natural-sounding procedure. Start with natural law Bayesianism which is an Aristotelian theory that holds that human nature sets constraints on what priors count as natural to human beings. Thus, just as it is unnatural for a human being to be ten feet tall, it is unnatural for a human being to have a prior of 10⁻¹⁰⁰ for there being mathematically elegant laws of nature. And just as there is a range of heights that is natural for a mature human being, there is a range of priors that is natural for the proposition that there are mathematically elegant laws.

Aristotelian natures, however, are connected with the actual propensities of the beings that have them. Thus, humans have a propensity to develop a natural height. Because of this propensity, an average height is likely to be a natural height. More generally, for any numerical attribute governed by a nature of kind K, the average value of that attribute amongst the Ks is likely to be within the natural range. Likely, but not certain. It is possible, for instance, to have a species whose average weight is too high or too low. But it’s unlikely.

Consequently, we would expect that if we average the values of the prior for a given proposition q over the human population, the average would be within the natural range for that prior. Moreover, as the size of a group increases, we expect the average value of an attribute over the group to approach the average value the attribute has in the full population. Then, if I am a member of the group of disagreeing evidence-sharing Bayesians, it is more likely that the average of the priors for q amongst the members of the group lies within the natural human range for that prior for q than it is that my own prior for q lies within the natural human range for q. It is more likely that I have an unnatural height or weight than that the average in a larger group is outside the natural range for height or weight.

Thus, the prior-averaging recipe is likely to replace priors that are defectively outside the normal human range with priors within the normal human range. And that’s to the good rationally speaking, because on a natural law epistemology, the rational way for humans to reason is the same as the normal way for humans to reason.

It’s an interesting question how this procedure compares to the procedure of simply averaging the posteriors. Philosophically, there does not seem to be a good justification of the latter. It turns out, however, that typically the two procedures give the same result. For instance, I had my computer randomly generate 100,000 pairs of four-point prior probability spaces, and compare the result of prior- to posterior-averaging. The average of the absolute value of the difference in the outputs was 0.028. So the intuitive, but philosophically unjustified, averaging of posteriors is close to what I think is the more principled averaging of priors.

The procedure also has an obvious generalization from the case where the agents share the same evidence to the case where they do not. What’s needed is for the agents to make a collective list of all their evidence, replace their priors by averaged priors, and then update on all the items in the collective list.

Friday, February 23, 2018

Wobbly priors and posteriors

Here’s a problem for Bayesianism and/or our rationality that I am not sure what exactly to do about.

Take a proposition that we are now pretty confident of, but which was highly counterintuitive so our priors were tiny. This will be a case where we were really surprised. Examples:

Simultaneity is relative
Physical reality is indeterministic.

Let’s say our current level of credence is 0.95, but our priors were 0.001. Now, here is the problem. Currently we (let’s assume) believe the proposition. But if our priors were 0.0001, our credence would have been only 0.65, given the same evidence, and so we wouldn’t believe the claim. (Whatever the cut-off for belief is, it’s clearly higher than 2/3: nobody should believe on tossing a die that they will get 4 or less.)

Here is the problem. It’s really hard for us to tell the difference in counterintuitiveness between 0.001 and 0.0001. Such differences are psychologically wobbly. If we just squint a little differently when looking mentally a priori at (1) and (2), our credence can go up or down by an order of magnitude. And when our priors are even lower, say 0.00001, then an order of magnitude difference in counterintuitiveness is even harder to distinguish—yet an order of magnitude difference in priors is what makes the difference between a believable 0.95 posterior and an unbelievable 0.65 posterior. And yet our posteriors, I assume, don’t wobble between the two.

In other words, the problem is this: it seems that the tiny priors have an order of magnitude wobble, but our moderate posteriors don’t exhibit a correspnding wobble.

If our posteriors were higher, this wouldn’t be a problem. At a posterior of 0.9999, an order of magnitude wobble in priors results in a wobble between 0.9999 and 0.999, and that isn’t very psychologically noticeable (except maybe when we have really high payoffs).

There is a solution to this problem. Perhaps our priors in claims aren’t tiny just because the claims are counterintuitive. It makes perfect sense to have tiny priors for reasons of indifference. My prior in winning a lottery with a million tickets and one winner is about one in a million, but my intuitive wobbliness on the prior is less than an order of magnitude (I might have some uncertainty about whether the lottery is fair, etc.) But mere counterintuitiveness should not lead to such tiny priors. The counterintuitive happens all too often! So, perhaps, our priors in (1) and (2) were, or should have been, more like 0.10. And now perhaps the wobble in the priors will probably be rather less: it might vary between 0.05 and 0.15, which will result in a less noticeable wobble, namely between 0.90 and 0.97.

Simple hypotheses like (1) and (2), thus, will have at worst moderately low priors, even if they are quite counterintuitive.

And here is an interesting corollary. The God hypothesis is a simple hypothesis—it says that there is something that has all perfections. Thus even if it is counterintuitive (as it is to many atheists), it still doesn’t have really tiny priors.

But perhaps we are irrational in not having our posteriors wobble in cases like (1) and (2).

Objection: When we apply our intuitions, we generate posteriors, not priors. So our priors in (1) and (2) can be moderate, maybe even 1/2, but then when we updated on the counterintuitiveness of (1) and (2), we got something small. And then when we updated on the physics data, we got to 0.95.

Response: This objection is based on a merely verbal disagreement. For whatever wobble there is in the priors on the account I gave in the post will correspond to a similar wobble in the counterintuitiveness-based update in the objection.

Thursday, February 22, 2018

In practice priors do not wash out often enough

Bayesian reasoning starts with prior probabilities and gathers evidence that leads to posterior probabilities. It is occasionally said that prior probabilities do not matter much, because they wash out as evidence comes in.

It is true that in the cases where there is convergence of probability to 0 or to 1, the priors do wash out. But much of our life—scientific, philosophical and practical—deals with cases where our probabilities are not that close to 0 or 1. And in those cases priors matter.

Let’s take a case which clearly matters: climate change. (I am not doing this to make any first-order comment on climate change.) The 2013 IPCC report defines several confidence levels:

virtually certain: 99-100%
very likely: 90-100%
likely: 66-100%
about as likely as not: 33-66%
unlikely: 0-33%
very unlikely: 0-10%
exceptionally unlikely: 0-1%.

They then assess that a human contribution to warmer and/or more frequent warm days over most land areas was “very likely”, and no higher confidence level occurs in their policymaker summary table SPM.1. Let’s suppose that this “very likely” corresponds to the middle of its confidence range, namely a credence of 0.95. How sensitive is this “very likely” to priors?

On a Bayesian reconstruction, there was some actual prior probability p₀ for the claim, which, given the evidence, led to the posterior of (we’re assuming) 0.95. If that prior probability had been lower, the posterior would have been lower as well. So we can ask questions like this: How much lower would the prior had to have been than p₀ for…

…the posterior to no longer be in the “very likely” range?
…the posterior to fall into the “about as likely as not range”?

These are precise and pretty simple mathematical questions. The Bayesian effect of evidence is purely additive when we work with log likelihood ratios instead of probabilities, i.e., with log p/(1 − p) in place of p, so a difference in prior log likelihood ratios generates an equal difference in posterior ones. We can thus get a formula for what kinds of changes of priors translate to what kinds of changes in posteriors. Given an actual posterior of q₀ and an actual prior of p₀, to have got a posterior of q₁, the prior would have to have been (1 − q₀)p₀q₀/[(q₁ − q₀)p₀ + (1 − q₁)q₀], or so says Derive.

We can now plug in a few numbers, all assuming that our actual confidence is 0.95:

If our actual prior was 0.10, to leave the “very likely” range, our prior would have needed to be below 0.05.
If our actual prior was 0.50, to leave the “very likely” range, our prior would have needed to be below 0.32.
If our actual prior was 0.10, to get to the “about as likely as not range”, our prior would have needed to be below 0.01.
If our actual prior was 0.50, to get to the “about as likely as not range”, our prior would have needed to be below 0.09.

Now, we don’t know what our actual prior was, but we can see from the above that variation of priors well within an order of magnitude can push us out of the “very likely” range and into the merely “likely”. And it seems quite plausible that the difference between the “very likely” and merely “likely” matters practically, given the costs involved. And a variation in priors of about one order of magnitude moves us from “very likely” to “about as likely as not”.

Thus, as an empirical matter of fact, priors have not washed out in the case of global warming. Of course, if we observe long enough, eventually our evidence about global warming is likely to converge to 1. But by then it will be too late for us to act on that evidence!

And there is nothing special about global warming here. Plausibly, many scientific and ordinary beliefs that we need to act on have a confidence level of no more than about 0.95. And so priors matter, and can matter a lot.

We can give a rough estimate of how differences in priors make a difference regarding posteriors using the IPCC likelihood classifications. Roughly speaking, a change between one category and the next (e.g., “exceptionally unlikely” to “unlikely”) in the priors results in a change between a category and the next (e.g., “likely” to “very likely”) in the posteriors.

The only time priors have washed out is cases where our credences have converged very close to 0 or to 1. There are many scientific and ordinary claims in this category. But not nearly enough for us to be satisfied. We do need to worry about priors, and we better not be subjective Bayesians.

Friday, March 3, 2017

Are the priors of subjective Bayesianism subjective?

Visual and auditory perception are not subject to rational evaluation. Nobody perceives visually or auditorily in a rational or irrational way. Nonetheless, perception is subject to evaluation. One’s perceptual faculties could function superlatively, adequately or inadequately. The mere fact that they are not subject to rational evaluation does not imply subjectivism about their functioning.

But now consider a Bayesian view on which:

Our priors are not subject to rational evaluation.

That view is known in the literature as “subjective Bayesianism”. But if we take seriously the lesson from perception, we should be sceptical of the inference from (1) to:

Our priors are merely subjective.

I have to confess to not taking this point seriously in the past, having been misled by the phrase “subjective Bayesianism” and by things I heard from subjective Bayesians.

What might a theory look like on which our priors are subject to evaluation but not rational evaluation? We could take our priors to be a kind of “probabilistic perception” of patterns in the world, a perception that is genetically and/or socially mediated. Such perceptions can be better or worse, just as the person who is looking at a horse and their visual system classifies it as 95% likely to be cat and 5% likely to be a horse is doing less well perceptually than one whose system makes the opposite classification. For instance, someone who has a prior close to 1 for the law of gravitation being an inverse 3.00001th power law is doing less well than someone who has a moderately high prior for it it being an inverse cube law and a moderately high prior for it being an inverse square law.

But if we take the perception analogy seriously, we get this question: What are we “perceiving” with our priors? Maybe something like facts about the sorts of laws worlds like ours have?

Wednesday, March 1, 2017

Internalism, externalism and Bayesianism

Maybe a Bayesian should be a hybrid of an internalist and an externalist about justification. The internalist aspect would come from correct updating of credences on evidence, internalistically conceived of. The externalist aspect would come from the priors, which need to be well adapted to one's epistemic environment in such a way as to lead reasonably quickly to truth in our world and maybe also in a range of nearby worlds.

This seems a natural way to think about the internalist and externalist question by means of the analogy of designing a Bayesian artificial intelligence system. The programmer puts in the priors. The system is not "responsible" for them in any way (scare quotes, since I don't think computers are responsible, justified, etc--but something analogous to these properties will be there)--it is the programmer who is responsible. Nonetheless, if the priors are bad, the outputs will not be "justified". The system then computes--that is what it is "responsible" for. It seems natural to think of the parts that the programmer is responsible for as the externalist moment in "justification" and the parts that the system is "responsible" for as the internalist moment. And if we are Bayesian reasoners, then we should be able to say the same thing, minus the scare quotes, and with the programmer replaced by God and/or natural selection and/or human nature.

Tuesday, February 28, 2017

The problem of priors

Counterfactuals about scientific practice reveal some curious facts about our prior probabilities. Our handling of experimental suggests an approximate flatness in our prior distributions of various constants (cf. this). But the flatness is not perfect. Suppose we are measuring some constant k in a law of nature, a constant that is either dimensionless or expressed in a natural unit system, and we come back with 2.00000. Then we will assign a fairly high credence to the hypothesis that k is exactly 2. But any kind of continuous prior distribution will assign zero prior to k being exactly 2, and the posterior will still be zero, so our prior for 2 must have been non-zero and non-infinitesimal. But for most numbers, the prior for k being that number must be zero or infinitesimal, or else the probabilities won’t add up to 1.

More generally, our priors favor simpler theories. And they favor them in a way that is tuned (finely or not). If our prior for k being exactly 2 were high, then we wpould believe that k = 2 even after a measurement of 3.2 (experimental error!). If our prior were too low, then we wouldn’t ever conclude that k = 2, no matter how many digits after the “2.” we measured to be zero.

There are is now an interesting non-normative question about the priors:

Why are human priors typically so tuned?

There is, of course, an evolutionary answer—our reasoning about the world wouldn’t work if we didn’t have a pattern of priors that was so tuned. But there is a second question that the evolutionary story does not answer. To get to the second question, observe that our priors ought to be so tuned. Someone whose epistemic practices involve the rejection of the confirmation of scientific theories on the basis of too strong a prejudice for simple theories (“There is only one thing, and it’s round—everything else is illusion”) or too weak a preference for simple theories (“There are just as many temperature trends where there is a rise for a hundred years and then a fall for a hundred years as where there is a rise for two hundred years, so we have no reason at all to think global warming will continue”) is not acting as she ought.

So now we have this normative question:

Why is it that our priors ought to be so tuned?

These give us the first two desiderata on a theory of priors:

The theory should explain why our priors are tuned with respect to simplicity as they are.
The theory should explain why our priors should be so tuned.

Here is another desideratum:

The theory should exhibit a connection between priors and truth.

Next, observe that our priors are pretty vague. They certainly aren’t numerically precise, and they shouldn’t be, because beings with our capacity couldn’t reason with precise numerical credences in the kinds of situations where we need to.

The theory should not imply that our having those priors we should requires us to always have numerically precise priors.

Further, there seems to be something to subjective Bayesianism, even if we should not go all the way with the subjective Bayesians. Which we should not, because then we cannot rationally criticize the person who has too strong or too weak an epistemic preference for simple theories.

The theory should not imply a unique set of priors that everyone should have.

Next, different kinds of agents should have different priors. For instance, agents like us typically shouldn’t be numerically precise. But angelic intellects that are capable of instantaneous mathematical computation might do better with numerically precise priors. Moreover, and more controversially, beings that lived in a world with simpler or less simple laws shouldn’t be held hostage to the priors that work so well for us.

The theory should allow for the possibility that priors vary between kinds of agents.

And then, of course, we have standard desiderata on all theories, such as that they be unified.

Finally, observe the actual methodology of philosophy of science: We observe how working scientists make inferences, and while we are willing at times to offer corrections, we use the actual inferential practices as evidence for how the inferential practices ought to go. In particular, we extract the kinds of priors that people have from their epistemic behavior when it is at its best:

The theory should allow for the methodology of inferring what kinds of priors we ought to have from looking at actual epistemic behavior.

Subjective Bayesianism fails with respect to desiderata 2 and 3, and if it satisfies 1, it is only by being conjoined with some further story, which decreases the unity of the story. Objective Bayesianism fails with respect to desiderata 5 and 6, and some versions of it have trouble with 4. Moreover, to satisfy 1, it needs to be conjoined with a further story. And it’s not clear that objective Bayesianism is entitled to the methodology advocated in 7.

What we need is something in between subjective and objective Bayesianism. Here is such a theory: Aristotelian Bayesianism. On general Aristotelian principles, we have natures which dictate a range of normal features with an objective teleology. For instance, the nature of a sheep specifies that they should have four legs in support of quadrapedal locomotion. Moreover, in Aristotelian metaphysics, the natures also explain the characteristic structure of beings with that nature. Thus, the nature of a sheep is not only that in virtue of which a sheep ought to have four legs, but also has guided the embryonic development of typical sheep towards a four-legged state. Finally, in an Aristotelian picture, when things act normally, they tend to achieve the goals that their nature assigns to that activity.

Now, in my Aristotelian Bayesianism, our human nature leads to characteristic patterns of epistemic behavior for the telos of truth. From the patterns of behavior that are compatible with our nature, one can derive constraints on priors—namely, that they be such as to underwrite such behavior. These priors are implicit in the patterns of behavior.

We can now take the desiderata one by one:

Our priors are tuned as they are since our development is guided by a nature that leads to epistemic behavior that determines priors to be so tuned.
Our priors ought to be so tuned, because all things ought to act in the way that their nature makes natural.
Natural behavior is teleological, and our epistemic behavior is truth-directed.
The the priors we ought to have are back-calculated from the epistemic behaviors we ought to have, and our behaviors cannot have precise numbers attached to them in such a way as to yield precise numerical priors.
Nothing in the theory requires that unique priors be derivable from what epistemic behavior is characteristic. Typically, in Aristotelian theories, there is a range of normalcy—a ratio of length of legs to length of arms between x and y, etc.
Different kinds of beings have different natures. Sheep ought to have four legs and we ought to have two. We are led to expect that different kinds of agents would have different appropriate priors. Moreover, animals tend to be adapted to their environment, so we would expect that in worlds that are sufficiently different, different priors would be appropriate.
Since beings have a tendency towards acting naturally, the actual behavior of beings—especially when they appear to be at their best—provides evidence of the kind of behavior that they ought to exhibit. And from the kind of epistemic behavior we ought to exhibit, we can back-calculate the kinds of priors that are implicit in that behavior.

This post is inspired by Barry Loewer saying in discussion that I was Kantian because I think there are objective constraints on priors. I am not Kantian. I am Aristotelian.

Thursday, February 23, 2017

Flatness of priors

I. J. Good is said to have said that we can know someone’s priors by their posteriors. Suppose that Alice has the following disposition with respect to the measurement of an unknown quantity X: For some finite bound ϵ and finite interval [a, b], whenever Alice would learn that:

The value of X + F is x where x is in [a, b], where
F is a symmetric error independent of the actual value of X and certain to be no greater than ϵ according to her priors, and
the interval [x − ϵ, x + ϵ] is a subset of [a, b]

then Alice’s posterior epistemically expected value for X would be x.

Call this The Disposition. Many people seem to have The Disposition for some values of ϵ, a and b. For instance, suppose that you’re like Cavendish and you’re measuring the gravitational constant G. Then within some reasonable range of values, if your measurement gives you G plus some independent symmetric error F, your epistemically expected value for G will be probably be equal to the number you measure.

Fact. If Alice is a Bayesian agent who has The Disposition and X is measurable with respect to her priors, then Alice’s priors for X conditional on X being in [a, b] are uniform over [a, b].

So, by Good’s maxim about priors, someone like the Cavendish-like figure has a uniform distribution for the gravitational constant within some reasonable interval (there is a lower bound of zero for G, and an upper bound provided by the fact that even before the experiment we know that we don’t experience strong gravitational attraction to other people).

Monday, October 12, 2015

Virtue epistemology and Bayesian priors

I wonder if virtue epistemology isn't particularly well-poised to solve the problem of prior probabilities. To a first approximation, you should adopt those prior probabilities that a virtuous agent would in a situation with no information. This is perhaps untenable, because maybe it's impossible to have a virtuous agent in a situation with no information (maybe one needs information to develop virtue). If so, then, to a second approximation, you should adopt those prior probabilities that are implicit in a virtuous agent's epistemic practices. Obviously a lot of work is needed to work out various details. And I suspect that the result will end up being kind-relative, like natural law epistemology (of which this might be a species).

Wednesday, February 24, 2010

Carnap's probability measure

Carnap's objective prior probability measure was designed to make induction possible. Almost nobody uses Carnap's probability measure any more—the only exception I am aware of is Tooley in his debate book with Plantinga on evil. I have no idea why Tooley is using the Carnap measure—I thought it was out of date. In any case, it's easy to point out at least two things that are wrong with the Carnap measure, and hence why Tooley's arguments based on it need to be reworked. To explain the problems with the Carnap measure, I need some details. If you're familiar with Carnap measure, you can skip ahead to "Problem 1".

Carnap's prior probability measure is best seen as a measure for the probability of claims made by sentences of a truth-functional language with n names, a₁,...,a_n, and k unary predicates, Q₁,...,Q_k. Let N be the set of names, Q the set of predicates and T the set {True, False}. Call the language L(Q,N). Say that a state s is a function from the Cartesian product QxN to T, and let S be the set of all states. There is a natural way of saying whether a sentence u of L(N,P) is true at a state s. Basically, you say that the sentence Q_i(a_j) is true at s if and only if s(Q_i,a_j)=True, and then extend truth-functionally to all states.

There is a natural probability measure on S, which I will call the "Wittgenstein measure", defined by P_W(A)=|A|/|S| for every subset A of S, where |X| is the cardinality of the set X. This probability measure assigns equal probability to every state. Given a probability measure P on states, we get a probability measure for the sentences of L(Q,N). If u is such a sentence, define the subset u^T={s:u is true at s} of S. Then, we can let P(u)=P(u^T). The Wittgenstein measure does not allow induction. Suppose that we have three names, and two predicates, Raven and Black. Our evidence E is: Raven(a₁), Raven(a₂), Raven(a₃), Black(a₁) and Black(a₂). Then, P_W(Black(a₃)|E)=1/2=P_W(Black(a₃)), as can be easily verified, because all states are equally likely, and hence the state that makes all the a_i be black ravens is no more likely than the state that makes all the a_i be ravens but with only a₁ and a₂ black.

So, Carnap wanted to come up with a probability measure that allows induction but is still fairly natural. What he did was this. Instead of assigning equal probability to each state, he assigned equal probability to each equivalence class of states. Say that s~t for states s and t if there is some permutation p of the names N such that s(R,p(a))=t(R,a) for every predicate R and every name a. Let [s] be the equivalence class of s under this relation: [s]={t:t~s}. Let S* be the set of these equivalence classes. Then, if s is a state, we define: P_C({s})=1/(|[s]||S*|). In other words, each state in an equivalence class has equal probability, and each equivalence class has equal probability. If A is any subset of S, we then define P_C(A) as the sum of P_C({a}) as a ranges over the elements of A.

The merit of Carnap measure is that it assigns a greater probability to more uniform states. Thus, P_C(Black(a₃)|E) should be greater than 1/2 (I haven't actually worked the numbers).

Problem 1: Carnap measure is not invariant under increase of the number of predicates. Intuitively, adding irrelevant predicates to the language, predicates that do not appear in either the evidence or the hypothesis, should not change the degree of confirmation. But it does. In fact, we have the following theorem. Let u be any sentence of L(Q,N). Let Q_r be Q with r additional predicates thrown in. Let u_r be a sentence of L(Q_r,N) which is just like u (i.e., u_r is u considered qua sentence of L(Q_r,N)).

Theorem 1: P_C(u_r) tends to P_W(u) as r tends to infinity.

In other words, as one increases the number of predicates, one loses the ability to do induction, since P_W is no good for induction. The proof (which is non-trivial, but not insanely hard) is left to the reader.

Problem 2: Let d be a sentence of L(Q,N) saying that indiscernibles are identical. For instance, let d_ij be the disjunction ~(Q₁(a_i) iff Q₁(a_j)) or ... or ~(Q_k(a_i) iff Q_k(a_j)), and let d be the conjunction of the d_ij for all distinct i and j.

Theorem 2: P_C(u|d)=P_W(u|d).

Thus, when we condition on the identity of indiscernibles, Carnap measure collapses to Wittgenstein measure. But Wittgenstein measure is worthless for induction. And often the identity of indiscernibles holds. For instance, suppose we have a₁,a₂,a₃ as our individuals, and our evidence is this: a₁,a₂,a₃ are each a raven, a₁ and a₂ are black. So far so good, we can do induction and we get some confirmation of a₃ being black. But suppose we also learn that identity of indiscernibles holds for these three ravens. Then we lose the confirmation! And we might well learn this. For instance, we might learn that exactly a₁ and a₃ are male, and exactly a₁ and a₂ each have an even number of feathers, and that means that identity of indiscernibles holds.

Moreover, I think most of us have a background belief that our world has such richness of properties that, at least as a contingent matter of fact, the identity of indiscernibles holds for macroscopic objects. If so, then Carnap measure makes induction impossible for macroscopic objects.

Sketch of proof of Theorem 2: Let D be the set of states at which identity of indiscernibles holds. Thus, D is the set of states s with the property that if a and b are distinct, then there is a predicate R such that s(R,a) differs from s(R,b). Observe that if s is any state in D, then |[s]|=n!, where n is the number of names. For, any permutation of the names induces a different state given the identity of indiscernibles, and there are n! permutations. Therefore, P_C({s})=1/(n!|S*|). Hence, P_C({s}) has the same value for every s in D. Therefore, P_C({s}|D)=1/|D|. But, likewise, P_W({s}|D)=1/|D|. The Theorem follows easily from this.

Remark: Theorem 2 gives an intuitive reason to believe Theorem 1. As one increases the number of predicates while keeping fixed the number of names, a greater and greater share of the state space satisfies the identity of indiscernibles.

Alexander Pruss's Blog