Thursday, February 2, 2023

Rethinking priors

Suppose I learned that all my original priors were consistent and regular but produced by an evil demon bent upon misleading me.

The subjective Bayesian answer is that since consistent and regular original priors are not subject to rational evaluation, I do not need to engage in any radical uprooting of my thinking. All I need to do is update on this new and interesting fact about my origins. I would probably become more sceptical, but all within the confines of my original priors, which presumably include such things as the conditional probability that I have a body given that I seem to have a body but there is an evil demon bent upon misleading me.

This answer seems wrong. So much the worse for subjective Bayesianism. A radical uprooting would be needed. It would be time to sit back, put aside preconceptions, and engage in some fallibilist version of the Cartesian project of radical rethinking. That project might be doomed, but it would be my only hope.

Now, what if instead of the evil demon, I learned of a random process independent of truth as the ultimate origin of my priors. I think the same thing would be true. It would be a time to be brave and uproot it all.

I think something similar is true piece by piece, too. I have a strong moral intuition that consequentialism is false. But suppose that I learned that when I was a baby, a mad scientist captured me and flipped a coin with the plan that on heads a high prior in anti-consequentialism would be induced and on tails it would be a high prior in consequentialism instead. I would have to rethink consequentialism. I couldn’t just stick with the priors.

Socrates and thinking for yourself

There is a popular picture of Socrates as someone inviting us to think for ourselves. I was just re-reading the Euthyphro, and realizing that the popular picture is severely incomplete.

Recall the setting. Euthyphro is prosecuting a murder case against his father. The case is fraught with complexity and which a typical Greek would think should not be brought for multiple reasons, the main one being that the accused is the prosecutor’s father and we have very strong duties towards parents, and a secondary one being that the killing was unintentional and by neglect. Socrates then says:

most men would not know how they could do this and be right. It is not the part of anyone to do this, but of one who is far advanced in wisdom. (4b)

We learn in the rest of the dialogue that Euthyphro is pompous, full of himself, needs simple distinctions to be explained, and, to understate the point, is far from “advanced in wisdom”. And he thinks for himself, doing that which the ordinary Greek thinks to be a quite bad idea.

The message we get seems to be that you should abide by cultural norms, unless you are “far advanced in wisdom”. And when we add the critiques of cultural elites and ordinary competent craftsmen from the Apology, we see that almost no one is “advanced in wisdom”. The consequence is that we should not depart significantly from cultural norms.

This reading fits well with the general message we get about the poets: they don’t know how to live well, but they have some kind of a connection with the gods, so presumably we should live by their message. Perhaps there is an exception for those sufficiently wise to figure things out for themselves, but those are extremely rare, while those who think themselves wise are extremely common. There is a great risk in significantly departing from the cultural norms enshrined in the poets—for one is much more likely to be one of those who think themselves wise than one of those who are genuinely wise.

I am not endorsing this kind of complacency. For one, those of us who are religious have two rich sets of cultural norms to draw on, a secular set and a religious one, and in our present Western setting the two tend to have sufficient disagreement that complacency is not possible—one must make a choice in many cases. And then there is grace.

From strict anti-anti-Bayesianism to strict propriety

In my previous post, I showed that a continuous anti-anti-Bayesian accuracy scoring rule on probabilities defined on a sub-algebra of events satisfying the technical assumption that the full algebra contains an event logically independent of the sub-algebra is proper. However, I couldn’t figure out how to prove strict propriety given strict anti-anti-Bayesianism. I still don’t, but I can get closer.

First, a definition. A scoring rule on probabilities on the sub-algebra H is strictly anti-anti-Bayesian provided that one expects it to penalize non-trivial binary anti-Bayesian updates. I.e., if A is an event with prior probability p neither zero nor one, and Bayesian conditionalization on A (or, equivalently, on Ac) modifies the probability of some member of H, then the p-expected score of finding out whether A or Ac holds and conditionalizing on that is strictly better than if one adopted the procedure of conditionalizing on the complement of the actually obtaining event.

Suppose we have continuity, the technical assumption and anti-anti-Bayesianism. My previous post shows that the scoring rule is proper. I can now show that it is strictly proper if we strengthen anti-anti-Bayesianism to strict anti-anti-Bayesianism and add the technical assumption that the scoring rule satisfies the finiteness condition that Eps(p) is finite for any probability p on H. Since we’re working with accuracy scoring rules and these take values in [−∞,M] for finite M, the only way to violate the finiteness condition is to have Eps(p) =  − ∞, which would mean that s is very pessimistic about p: by p’s own lights, the expected score of p is infinitely bad. The finiteness condition thus rules out such maximal pessimism.

Here is a sketch of the proof. Suppose we do not have strict propriety. Then there will be two distinct probabilities p and q such that Eps(p) ≤ Eps(q). By propriety, the inequality must be an equality. By Proposition 9 of a recent paper of mine, it follows that s(p) = s(q) everywhere (this is where the finiteness condition is used). Now let r = (p+q)/2. Using the trick from the Appendix here, we can find a probability p′ on the full algebra and an event Z such that r is the restriction of p to H, p is the restriction of the Bayesian conditionalization of p′ on Z to H, and q is the restriction of the Bayesian conditionalization of q on Zc to H. Then the scores of p and q will be the same, and hence the scores of Bayesian and anti-Bayesian conditionalization on finding out whether Z or Zc is actual are guaranteed to be the same, and this violates strict anti-anti-Bayesianism.

One might hope that this will help those who are trying to construct accuracy arguments for probabilism—the doctrine that credences should be probabilities. The hitch in those arguments is establishing strict propriety. However, I doubt that what I have helps. First, I am working in a sub-algebra setting. Second, and more importantly, I am working in a context where scoring rules are defined only for probabilities, and so the strict propriety inequality I get is only for scores of pairs of probabilities, while the accuracy arguments require strict propriety for pairs of credences exactly one of which is not a probability.

Wednesday, February 1, 2023

From anti-anti-Bayesianism to propriety

Let’s work in the setting of my previous post, including technical assumption (3), and also assume Ω is finite and that our scoring rules are all continuous.

Say that an anti-Bayesian update is when you take a probability p, receive evidence A, and make your new credence be p(⋅|Ac), i.e., you conditionalize on the complement of the evidence. Anti-Bayesian update is really stupid, and you shouldn’t get rewarded for it, even if all you care about are events other than A and Ac.

Say that an H-scoring rule s is anti-anti-Bayesian providing that the expected score of a policy of anti-Bayesian update on an event A whose prior probability is neither zero nor one is never better than the expected score of a policy of Bayesian update.

I claim that given continuity, anti-anti-Bayesianism implies that the scoring rule is proper.

First, note that by continuity, if it’s proper at all the regular probabilities (ones that do not assign 0 or 1 to any non-empty set) on H, then it’s proper (I am assuming we handle infinities like in this paper, and use Lemma 1 there).

So all we need to do is show that it’s proper at all the regular probabilities on H. Let p be a regular probability, and contrary to propriety suppose that Eps(p) < Eps(q) for another probability q. For t ≥ 0, let pt be such that tq + (1−t)pt = p, i.e., let pt = (ptq)/(1−t). Since p is regular, for t sufficiently small, pt will be a probability (all we need is that it be non-negative). Using the trick from the Appendix of the previous post with q in place of p1 and pt in place of p2, we can set up a situation where the Bayesian update will have expected score:

  • tEqs(q) + (1−t)Epts(pt)

and the anti-Bayesian update will have the expected score:

  • tEqs(pt) + (1−t)Epts(q).

Given anti-anti-Bayesianism, we must have

  • tEqs(pt) + (1−t)Epts(q) ≤ tEqs(q) + (1−t)Epts(pt).

Letting t → 0 and using continuity, we get:

  • Ep0s(q) ≤ Ep0s(p0).

But p0 = p. So we have propriety.

Open-mindedness and propriety

Suppose we have a probability space Ω with algebra F of events, and a distinguished subalgebra H of events on Ω. My interest here is in accuracy H-scoring rules, which take a (finitely-additive) probability assignment p on H and assigns to it an H-measurable score function s(p) on Ω, with values in [−∞,M] for some finite M, subject to the constraint that s(p) is H-measurable. I will take the score of a probability assignment to represent the epistemic utility or accuracy of p.

For a probability p on F, I will take the score of p to be the score of the restriction of p to H. (Note that any finitely-additive probability on H extends to a finitely-additive probability on F by Hahn-Banach theorem, assuming Choice.)

The scoring rule s is proper provided that Eps(q) ≤ Eps(p) for all p and q, and strictly so if the inequality is strict whenever p ≠ q. Propriety says that one never expects a different probability from one’s own to have a better score (if one did, wouldn’t one have switched to it?).

Say that the scoring rule s is open-minded provided that for any probability p on F and any finite partition V of Ω into events in F with non-zero p-probability, the p-expected score of finding out where in V we are and conditionalizing on that is at least as big as the current p-expected score. If the scoring rule is open-minded, then a Bayesian conditionalizer is never precluded from accepting free information. Say that the scoring rule s is strictly open-minded provided that the p-expected score increases of finding out where in V we are and conditionalizing increases whenever there is at least one event E in V such that p(⋅|E) differs from p on H and p(E) > 0.

Given a scoring rule s, let the expected score function Gs on the probabilities on H be defined by Gs(p) = Eps(p), with the same extension to probabilities on F as scores had.

It is well-known that:

  1. The (strict) propriety of s entails the (strict) convexity of Gs.

It is easy to see that:

  1. The (strict) convexity of Gs implies the (strict) open-mindedness of s.

Neither implication can be reversed. To see this, consider the single-proposition case, where Ω has two points, say 0 and 1, and H and F are the powerset of Ω, and we are interested in the proposition that one of these point, say 1, is the actual truth. The scoring rule s is then equivalent to a pair of functions T and F on [0,1] where T(x) = s(px)(1) and F(x) = s(px)(0) where px is the probability that assigns x to the point 1. Then Gs corresponds to the function xT(x) + (1−x)F(x), and each is convex if and only if the other is.

To see that the non-strict version of (1) cannot be reversed, suppose (T,F) is a non-trivial proper scoring rule with the limit of F(x)/x as x goes to 0 finite. Now form a new scoring rule by letting T * (x) = T(x) + (1−x)F(x)/x. Consider the scoring rule (T*,0). The corresponding function xT * (x) is going to be convex, but (T*,0) isn’t going to be proper unless T* is constant, which isn’t going to be true in general. The strict version is similar.

To see that (2) cannot be reversed, note that the only non-trivial partition is {{0}, {1}}. If our current probability for 1 is x, the expected score upon learning where we are is xT(1) + (1−x)F(0). Strict open-mindedness thus requires precisely that xT(x) + (1−x)F(x) < xT(1) + (1−x)F(0) whenever x is neither 0 nor 1. It is clear that this is not enough for convexity—we can have wild oscillations of T and F on (0,1) as long as T(1) and F(1) are large enough.

Nonetheless, (2) can be reversed (both in the strict and non-strict versions) on the following technical assumption:

  1. There is an event Z in F such that Z ∩ A is a non-empty proper subset of A for every non-empty member of H.

This technical assumption basically says that there is a non-trivial event that is logically independent of everything in H. In real life, the technical assumption is always satisfied, because there will always be something independent of the algebra H of events we are evaluating probability assignments to (e.g., in many cases Z can be the event that the next coin toss by the investigator’s niece will be heads). I will prove that (2) can be reversed in the Appendix.

It is easy to see that adding (3) to our assumptions doesn’t help reverse (1).

Since open-mindedness is pretty plausible to people of a Bayesian persuasion, this means that convexity of Gs can be motivated independently of propriety. Perhaps instead of focusing on propriety of s as much as the literature has done, we should focus on the convexity of Gs?

Let’s think about this suggestion. One of the most important uses of scoring rules could be to evaluate the expected value of an experiment prior to doing the experiment, and hence decide which experiment we should do. If we think of an experiment as a finite partition V of the probability space with each cell having non-zero probability by one’s current lights p, then the expected value of the experiment is:

  1. A ∈ Vp(A)EpAs(pA) = ∑A ∈ Vp(A)Gs(pA),

where pA is the result of conditionalizing p on A. In other words, to evaluate the expected values of experiments, all we care about is Gs, not s itself, and so the convexity of Gs is a very natural condition: we are never oligated to refuse to know the results of free experiments.

However, at least in the case where Ω is finite, it is known that any (strictly) convex function (maybe subject to some growth conditions?) is equal to Gu for a some (strictly) proper scoring rule u. So we don’t really gain much generality by moving from propriety of s to convexity of Gs. Indeed, the above observations show that for finite Ω, a (strictly) open-minded way of evaluating the expected epistemic values of experiments in a setting rich enough to satisfy (3) is always generatable by a (strictly) proper scoring rule.

In other words, if we have a scoring rule that is open-minded but not proper, we can find a proper scoring rule that generates the same prospective evaluations of the value of experiments (assuming no special growth conditions are needed).

Appendix: We now prove the converse of (2) assuming (3).

Assume open-mindedness. Let p1 and p2 be two distinct probabilities on H and let t ∈ (0,1). We must show that if p = tp1 + (1−t)p2, then

  1. Gs(p) ≤ tGs(p1) + (1−t)Gs(p2)

with the inequality strict if the open-mindedness is strict. Let Z be as in (3). Define

  1. p′(AZ) = tp1(A)

  2. p′(AZc) = (1−t)p2(A)

  3. p′(A) = p(A)

for any A ∈ H. Then p′ is a probability on the algebra generated by H and Z extending p. Extend it to a probability on F by Hahn-Banach. By open-mindedness:

  1. Gs(p′) ≤ p′(Z)EpZs(pZ) + p′(Zc)EpZcs(pZc).

But p′(Z) = p(ΩZ) = t and p′(Zc) = 1 − t. Moreover, pZ = p1 on H and pZc = p2 on H. Since H-scores don’t care what the probabilities are doing outside of H, we have s(pZ) = s(p1) and s(pZc) = s(p2) and Gs(p′) = Gs(p). Moreover our scores are H-measurable, so EpZs(p1) = Ep1s(p1) and EpZcs(p2) = Ep2s(p2). Thus (9) becomes:

  1. Gs(p) ≤ tGs(p1) + (1−t)Gs(p2).

Hence we have convexity. And given strict open-mindedness, the inequality will be strict, and we get strict convexity.

Tuesday, January 31, 2023

Scoring rules and publication thresholds

One of the most problematic aspects of some science practice is a cut-off, say at 95%, for the evidence-based confidence needed for publication.

I just realized, with the help of a mention of p-based biases and improper scoring rules somewhere on the web, that what is going on here is precisely a problem of a reward structure that does not result in a proper scoring rule, where a proper scoring rule is one where your current probability assignment is guaranteed to have an optimal expected score according to that very probability assignment. Given an improper scoring rule, one has a perverse incentive to change one’s probabilities without evidence.

To a first approximation, the problem is really, really bad. Insofar as publication is the relevant reward, it is a reward independent of the truth of the matter! In other words, the scoring rule has a reward for gaining probability 0.95 (say) in the hypothesis, regardless of whether the hypothesis is true or false.

Fortunately, it’s not quite so bad. Publication is the short-term reward. But there are long-term rewards and punishments. If one publishes, and later it turns out that one was right, one may get significant social recognition as the discoverer of the truth of the hypothesis. And if one publishes, and later it turns out one is wrong, one gets some negative reputation.

However, notice this. Fame for having been right is basically independent of the exact probability of the hypothesis one established in the original paper. As long as the probability was sufficient for publication, one is rewarded for fame. Thus if it turns out that one was right, one’s long-term reward is fame if and only if one’s probability met the threshold for publication and one was right. And one’s penalty is some negative reputation if and only if one’s probability met the threshold for publication and yet one was wrong. But note that scientists are actually extremely forgiving of people putting forward evidenced hypotheses that turn out to be false. Unlike in history, where some people live on in infamy, scientists who turn out to be wrong do not suffer infamy. At worst, some condescension. And it barely varies with your level of confidence.

The long-term reward structure is approximately this:

  • If your probability is insufficient for publication, nothing.

  • If your probability meets the threshold for publication and you’re right, big positive.

  • If your probability meets the threshold for publication and you’re wrong, at worst small negative.

This is not a proper scoring rule. It’s not even close. To make it into a proper scoring rule, the penalty for being wrong at the threshold would need to be way higher than the reward for being right. Specifically, if the threshold is p (say 0.95), then the ratio of reward to penalty needs to be (1−p) : p. If p = 0.95, the reward to penalty ratio would need to be 1:19. If p = 0.99, it would need to be a staggering 1:99, and if p = 0.9, it would need to be a still large 1:9. We are very, very far from that. And when we add the truth-independent reward for publication, things become even worse.

We can see that something is problematic if we think about cases like this. Suppose your current level of confidence is just slightly above the threshold, and a graduate student in your lab proposes to do one last experiment in her spare time, using equipment and supplies that would otherwise go to waste. Given the reward structure, it will likely make sense for you to refuse this free offer of additional information. If the experiment favors your hypothesis, you get nothing out of it—you could have published without it, and you’d still have the same longer term rewards available. But if the experiment disfavors your hypothesis, it will likely make your paper unpublishable (since you were at the threshold), but since it’s just one experiment, it is unlikely to put you into the position of yet being able to publish a paper against the hypothesis. At best it loses you the risk of the small negative reputation for having been wrong, and since that’s a small penalty, and an unlikely one (since most likely your hypothesis is true by your data), so that’s not worth it. In other words, the the structure rewards you for ignoring free information.

How can we fix this? We simply cannot realistically fix it if we have a high probability threshold for publication. The only way to fix it while keeping a high probability threshold would be by having a ridiculously high penalty for being wrong. But we should’t do stuff like sentencing scientists to jail for being wrong (which has happened). Increasing the probability threshold for publication would only require the penalty for being wrong to be increased. Decreasing probability thresholds for publication helps a little. But as long as there is a larger reputational benefit from getting things right than the reputational harm from getting things wrong, we are going to have perverse incentives from a probability threshold for publication bigger than 1/2, no matter where that threshold lies. (This follows from Fact 2 in my recent post, together with the observation that Schervish’s characterization of scoring rules shows implies that any reward function corresponds to a unique up to additive constant penalty function.)

What’s the solution? Maybe it’s this: reward people for publishing lots of data, rather than for the data showing anything interestingly, and do so sufficiently that it’s always worth publishing more data?

Monday, January 30, 2023

Epistemic goods

We think highly morally of teachers who put an enormous effort into getting their students to know and understand the material. Moreover, we think highly of these teachers regardless of whether they are in a discipline, like some branches of engineering, where the knowledge and understanding exists primarily for the sake of non-epistemic goods, as when they are in a discipline, like cosmology, where the knowledge and understanding is primarily aimed at epistemic goods.

The virtues and vices in disseminating epistemic goods are just as much moral virtues and vices as those in disseminating other goods, such as food, shelter, friendship, or play, and there need be little difference in kind. The person who is jealous of another’s knowledge has essentially the same kind of vice as the one who is jealous of another’s physical strength. The person generous with their time in teaching exhibits essentially the same virtue as the one generous with their time in feeding others.

There is, thus, no significant difference in kind between the pursuit of epistemic goods and the norms of the pursuit of other goods. We not infrequently have to weigh one against the other, and it is a mark of the virtuous person that they do this well.

But if this is all correct, then by parallel we should not make a significant distinction in kind between the pursuit of epistemic goods for oneself and the pursuit of non-epistemic goods for oneself. Hence, norms governing the pursuit of knowledge and understanding seem to be just a species of prudential norms.

Does this mean that epistemic norms are just a species of prudential norms?

I don’t think so. Consider that prudentially we also pursue goods of physical health. However, norms of physical health are not a species of prudential norms. It is the medical professional who is the expert on the norms of physical health, not the prudent person as such. Prudential norms apply to voluntary behavior as such, while the norms of physical health apply to the body’s state and function. We might say that norms of the voluntary pursuit of the fulfillment of the norms of physical health are prudential norms, but the norms of physical health themselves are not prudential norms. Similarly, the norms of the voluntary pursuit of the fulfillment of epistemic norms are prudential norms, but the epistemic norms themselves are no more prudential norms than the health norms are.

Saturday, January 28, 2023

Making single-proposition scoring rules

Accuracy scoring rules measure the value of your probability assignment’s closeness to the truth. A scoring rule for a single proposition p can be thought of as a pair of functions, T and F on the interval [0,1] where T(x) tells us the score for assigning x to p when p is true and F(x) tells us the score for assigning x to p when p is false. The scoring rule is proper provided that:

  • xT(x) + (1−x)F(x) ≥ xT(y) + (1−x)F(y)

for all x and y. If you assign probability x to p, then xT(y) + (1−x)F(y) measures your expected value of the score for someone who assigns y. The propriety condition thus says that by your lights there isn’t a better probability to assign. After all, if there were, wouldn’t you assign it?

I’ve been playing with how to construct proper scoring rules for a single proposition, and I found two nice ways that are probably in the literature but I haven’t seen explicitly. First, let F be any monotone (not necessarily strictly) decreasing function on [0,1] that is finite except perhaps at 1. Then let:

  • TF(x) = F(1/2) − ((1−x)/x)F(x) − ∫1/2xu−2F(u)du.

I think we then have the following:

Fact 1: The pair (TF,F) is a proper scoring rule.

Second, let T be any monotone increasing function on [0,1] that is finite except perhaps at 0. Let:

  • FT(x) = T(1/2) − (x/(1−x))T(x) + ∫1/2x1/(1−u)2T(u)du.

I think then we have the following:

Fact 2: The pair (T,FT) a proper scoring rule.

In other words, to generate a proper scoring rule, we just need to choose one of the two functions making up the scoring rule, make sure it is monotone in the right direction, and then we can generate the other function.

Here’s a sketch of the proof of Fact 1. Note first that if F = c is constant, then TF(x) = c − ((1−x)/x)c + c(x−1−(1/2)−1) = c + c − 2c = 0 for all x. Since the map F ↦ TF is linear, it follows that if F and H differ by a constant, then TF and TH are the same. Thus subtracting a constant from F, we can assume without loss of generality that F is non-positive.

We can then approximate F by functions of the form ici1[ai,1] with ci non-positive (here I have to confess to not having checked all the details of the approximation) and by linearity we only need to check propriety for F =  − 1[a,0]. If a = 0, then F is constant and TF will be zero, and we will trivially have propriety. So suppose a > 0. Let T(x) =  − ((1−x)/x)F(x) − ∫0xu−2F(u)du. This differs by a constant from TF, so (TF,F) will be proper if and only if (T,F) is. Note that T(x) = 0 for x < a and for x ≥ a we have:

  • T(x) = ((1−x)/x) + ∫axu−2du = ((1−x)/x) − (x−1a−1) = a−1 − 1.

Thus, T = ((1−a)/a) ⋅ 1[a,0]. Now let’s check if we have the propriety condition:

  1. xT(y) + (1−x)F(y) ≤ xT(x) + (1−x)F(x).

Suppose first that x ≥ a. Then the right-hand-side is x(1−a)/a − (1−x). This is non-negative for x ≥ a, and the left-hand-side of (1) is zero if y < a, so we are done if y < a. Since T and F are constant on [a,1], the two sides of (1) are equal for y ≥ a.

Now suppose that x < a. Then the right-hand-side is zero. And the left-hand-side is zero unless y ≥ a. So suppose y ≥ a. Since T and F are constant on [a,1], we only need to check (1) at y = 1. At y = 1, the left-hand-side of (1) is x(1−a)/a − (1−x) ≤ 0 if x < a.

Fact 2 follows from Fact 1 together with the observation that (T,F) is proper if and only if (F*,T*) is proper, where T * (x) = T(1−x) and F * (x) = F(1−x).

Thursday, January 26, 2023

A cure for some cases of TMI

Sometimes we know things we wish we didn’t. In some cases, without any brainwashing, forgetting or other irrational processes, there is a fairly reliable way to make that wish come true.

Suppose that a necessary condition for knowing is that my evidence yields a credence of 0.9900, and that I know p with evidence yielding a credence of 0.9910. Then here is how I can rid myself of the knowledge fairly reliably. I find someone completely trustworthy who would know for sure whether p is true, and I pay them to do the following:

  1. Toss three fair coins.

  2. Inform me whether the following conjunction is true: all coins landed heads and p is true.

Then at least 7/8 of the time, they will inform me that the conjunction is false. That’s a little bit of evidence against p. I do a Bayesian update on this evidence, and my posterior credence will be 0.9897, which is not enough for knowledge. Thus, with at least 7/8 reliability, I can lose my knowledge.

This method only works if my credence is slightly above what’s needed for knowledge. If what’s needed for knowledge is 0.990, then as soon as my credence rises to 0.995, there is no rational method with reliability better than 1/2 for making me lose the credence needed for knowledge (this follows from Proposition 1 here). So if you find yourself coming to know something that you don’t want to know, you should act fast, or you’ll have so much credence you will be beyond rational help. :-)

More seriously, we think of knowledge as something stable. But since evidence comes in degrees, there have got to be cases of knowledge that are quite unstable—cases where one “just barely knows”. It makes sense to think that if knowledge has some special value, these cases have rather less of it. Maybe it’s because knowledge comes in degrees, and these cases have less knowledge.

Or maybe we should just get rid of the concept of knowledge and theorize in terms of credence, justification and truth.

Wednesday, January 25, 2023

The special value of knowledge

Suppose there is a distinctive and significant value to knowledge. What I mean by that is that if two epistemic are very similar in terms of truth, the level and type of justification, the subject matter and its relevant to life, the degree of belief, etc., but one is knowledge and the other is not, then the one that is knowledge has a significantly higher value because it is knowledge.

Plausibly, then, if we imagine Alice has some evidence for a truth p that is insufficient for knowledge, and slowly and continuously her evidence for p mounts up, when the evidence has crossed the threshold needed for knowledge, the value of Alice’s state with respect to p will have suddenly and discontinuously increased.

This hypothesis initially seemed to me to have an unfortunate consequence. Suppose Alice has just barely exceeded the threshold for knowledge of p, and she is offered a cost-free piece of information that may turn out to slightly increase or slightly decrease her overall evidence with respect to p, where the decrease would be sufficient to lose her knowledge of p (since she has only “barely” exceeded the evidential threshold for knowledge). It seems that Alice should refuse to look at the information, since the benefit of a slight improvement in credence if the evidence is non-misleading is outweighed by the danger of a significant and discontinuous loss of value due to loss of knowledge.

But that’s not quite right. For from Alice’s point of view, because the threshold for knowledge is not 1, there is a real possibility that p is false. But it may be that just as there is a discontinuous gain in epistemic value when your (rational) credence becomes sufficient for knowledge of something that is in fact true, it may be that there is a discontinuous loss of epistemic value when your credence becomes sufficient for knowledge of something false. (Of course, you can’t know anything false, but you can have evidence-sufficient-for-knowledge with respect to something false.) This is not implausible, and given this, by looking at the information, by her lights Alice also has a chance of a significant gain in value due to losing the illusion of knowledge in something false.

If we think that it’s never rational for a rational agent to refuse free information, then the above argument can be made rigorous to establish that any discontinuous rise in the epistemic value of credence at the point at which knowledge of a truth is reached is exactly mirrored by a discontinuous fall in the epistemic value of a state of credence where seeming-knowledge of a falsehood is reached. Moreover, the rise and the fall must be in the ratio 1 − r : r where r is the knowledge threshold. Note that for knowledge, r is plausibly pretty large, around 0.95 at least, and so the ratio between the special value of knowledge of a truth and the special disvalue of evidence-sufficient-for-knowledge for a falsehood will need to be at most 1:19. This kind of a ratio seems intuitively implausible to me. It seems unlikely that the special disvalue of evidence-sufficient-for-knowledge of a falsehood is an order of magnitude greater than the special value of knowledge. This contributes to my scepticism that there is a special value of knowledge.

Can we rigorously model this kind of an epistemic value assignment? I think so. Consider the following discontinuous accuracy scoring rule s1(x,t), where x is a probability and t is a 0 or 1 truth value:

  • s1(x,t) = 0 if 1 − r ≤ x ≤ r

  • s1(x,t) = a if r < x and t = 1 or if x < 1 − r and t = 0

  • s1(x,t) =  − b if r < x and t = 0 or if x < 1 − r and t = 1.

Suppose that a and b are positive and a/b = (1−r)/r. Then if my scribbled notes are correct, it is straightforward but annoying to check that s1 is proper, and it has a discontinuous reward a for meeting threshold r with respect to a truth and a discontinuous penalty  − a for meeting threshold r with respect to a falsehood. To get a strictly proper scoring rule, just add to it any strictly proper continous accuracy scoring rule (e.g., Brier).

Tuesday, January 24, 2023

Thresholds and precision

In a recent post, I noted that it is possible to cook up a Bayesian setup where you don’t meet some threshold, say for belief or knowledge, with respect to some proposition, but you do meet the same threshold with respect to the claim that after you examine a piece of evidence, then you will meet the threshold. This is counterintuitive: it seems to imply that you can know that you will have enough evidence to know something even though you don’t yet. In a comment, Ian noted that one way out of this is to say that beliefs do not correspond to sharp credences. It then occurred to me that one could use the setup to probe the question of how sharp our credences are and what the thresholds for things like belief and knowledge are, perhaps complementarily to the considerations in this paper.

For suppose we have a credence threshold r and that our intuitions agree that we can’t have a situation where:

  1. we have transparency as to our credences,

  2. we don’t meet r with respect to some proposition p, but

  3. we meet r with respect to the proposition that we will meet the threshold with respect to p after we examine evidence E.

Let α > 0 be the “squishiness” of our credences. Let’s say that for one credence to be definitely bigger than another, their difference has to be at least α, and that to definitely meet (fail to meet) a threshold, we must be at least α above (below) it. We assume that our threshold r is definitely less than one: r + α ≤ 1.

We now want this constraint on r and α:

  1. We cannot have a case where (a), (b) and (c) definitely hold.

What does this tell us about r and α? We can actually figure this out. Consider a test for p that have no false negatives, but has a false positive rate of β. Let E be a positive test result. Our best bet to generating a counterexample to (a)–(c) will be if the priors for p are as close to r as possible while yet definitely below, i.e., if the priors for p are r − α. For making the priors be that makes (c) easier to definitely satisfy while keeping (b) definitely satisfied. Since there are no false negatives, the posterior for p will be:

  1. P(p|E) = P(p)/P(E) = (rα)/(rα+β(1−(rα))).

Let z = r − α + β(1−(rα)) = (1−β)(rα) + β. This is the prior probability of a positive test result. We will definitely meet r on a positive test result just in case we have (rα)/z = P(p|E) ≥ r + α, i.e., just in case

  1. z ≤ (rα)/(r+α).

(We definitely won’t meet r on a negative test result.) Thus to get (c) definitely true, we need (3) to hold as well as the probability of a positive test result to be at least r + α:

  1. z ≥ r + α.

Note that by appropriate choice of β, we can make z be anything between r − α and 1, and the right-hand-side of (3) is at least r − α since r + α ≤ 1. Thus we can make (c) definitely hold as long as the right-hand-side of (3) is bigger than or equal to the right-hand-side of (4), i.e., if and only if:

  1. (r+α)2 ≤ r − α

or, equivalently:

  1. α ≤ (1/2)((1+6r−3r2)1/2−1−r).

It’s in fact not hard to see that (6) is necessary and sufficient for the existence of a case where (a)–(c) definitely hold.

We thus have our joint constraint on the squishiness of our credences: bad things happen if our credences are so precise as to make (6) true with respect to a threshold r for which we don’t want (a)–(c) to definitely hold. The easiest scenario for making (a)–(c) definitely hold will be a binary test with no false negatives.

We thus have our joint constraint on the squishiness of our credences: bad things happen if our credences have a level of precision equal to the right-hand-side of (6). What exactly that says about α depends on where the relevant threshold lies. If the threshold r is 1/2, the squishiness α is 0.15. That’s surely higher than the actual squishiness of our credences. So if we are concerned merely with the threshold being more-likely-than-not, then we can’t avoid the paradox, because there will be cases where our credence is definitely below the threshold and it’s definitely above the threshold that examination of the evidence will push us about the threshold.

But what’s a reasonable threshold for belief? Maybe something like 0.9 or 0.95. At r = 0.9, the squishiness needed for paradox is α = 0.046. I suspect our credences are more precise than that. If we agree that the squishiness of our credences is less than 4.6%, then we have an argument that the threshold for belief is more than 0.9. On the other hand, at r = 0.95, the squishiness needed for paradox is 2.4%. At this point, it becomes more plausible that our credences lack that kind of precision, but it’s not clear. At r = 0.98, the squishiness needed for paradox dips below 1%. Depending on how precise we think our credences are, we get an argument that the threshold for belief is something like 0.95 or 0.98.

Here's a graph of the squishiness-for-paradox α against the threshold r:

Note that the squishiness of our credences likely varies with where the credences lie on the line from 0 to 1, i.e., varies with respect to the relevant threshold. For we can tell the difference between 0.999 and 1.000, but we probably can’t tell the difference between 0.700 and 0.701. So the squishiness should probably be counted relative to the threshold. Or perhaps it should be correlated to log-odds. But I need to get to looking at grad admissions files now.

Monday, January 23, 2023

Respecting conscience

One of the central insights of Western philosophy, beginning with Socrates, has been that few if any things are as bad for an individual as culpably doing wrong. It is better, we are told through much of the Western philosophical tradition, that it is better to suffer than do injustice.

Now, acting against one’s conscience is always wrong, and is almost always culpably wrong. For the most common case when doing something wrong isn’t culpable is that one is ignorant of the wrongness, but when one acts against one’s conscience one surely isn’t ignorant that one is acting against conscience, and that we ought follow our conscience is obvious.

That said, I think a qualification is plausible. Some wrongdoings are minor, and in those cases the harm to the wrongdoer may be minor as well. But in any case, to get someone to act against their conscience in a matter that according to their conscience is major is to do them grave harm, a harm not that different from death.

Now, the state, just like individuals, should ceteris paribus avoid causing grave harm. Hence, the state should generally avoid getting people to do things that violate their conscience in major matters.

The difficult case, however, is when people’s consciences are mistaken to such a degree that conscience requires them to do something that unjustly harms others. (A less problematic mistake is when conscience is mistaken to such a degree that conscience requires them to do something that’s permissible, but not wrong. In those cases, tolerance is clearly called for. We shouldn’t pressure vegetarians to eat animals even if their conscientious objection to eating animals happens to be mistaken.)

One might think that what I said earlier implies that in this difficult case the state should always allow people to follow their conscience, because after all it is worse to do wrong—and violating conscience is wrong—than to have wrong done to one. But that would be absurd and horrible—think of a racist murderer whose faulty conscience requires them to kill.

A number of considerations, however, keep one from reaching this absurd conclusion.

  1. The harm of violating one’s conscience only happens to one if one willingly violates one’s conscience. If law enforcement physically prevents me from doing something that conscience requires from me, then I haven’t suffered the harm. Thus, interestingly, the consideration I sketched against violating one’s conscience does not apply when one is literally forced (fear of punishment, unless it is severe enough to suspend one’s freedom of will, does not actually force, but only incentives).

  2. In cases where doing wrong and suffering wrong are of roughly the same order of magnitude, it is very intuitive that we should prevent the suffering of wrong rather than the doing of wrong. Imagine that Alice is drowning while at the same time Bob is getting ready to assassinate a politician, but we know for sure that Bob’s bullets have all been replaced with blanks. If our choice is whether to try to dissuade Bob from attempting murder or keep Alice from drowning, we should keep Alice from drowning, evne if on the Socratic view the harm to Bob from attempting murder will be greater than that to Alice from drowning. (I am assuming that in this case the two harms are nonetheless of something like the same order of magnitude.)

  3. A reasonable optimism says that in most cases most people’s consciences are correct. Thus typically we would expect that most violators of a legitimate law will not be acting out of conscience—for a necessarily condition for the legitimacy of a law is that it does not conflict with a correct conscience. Thus, even if there is the rare murderer acting from mistaken conscience, most murderers act against conscience, and by incentivizing abstention from murder, in most cases the law helps people follow their conscience, and the small number of other cases can be tolerated as a side effect. Thus the considerations of conscience favor intolerant laws in such cases. Nonetheless, there are cases where most violators of a law would likely be acting from conscience. Thus, if we had a law requiring eating meat, we would expect that most of the violators would be conscientious. Similarly, a law against something—say, the wearing of certain clothes or symbols—that is rarely done except as a religious practice would likely be a law most violators of which would be conscientious.

  4. When someone’s conscience mistakenly requires something that violates an objective moral rule, there is a two-fold benefit to that person from a law incentivizing following the moral rule. The law is a teacher, and the state’s disapproval may change one’s mind about the matter. And even if it a harm to one to violate conscience, it is also a harm to one to do something wrong even inculpably. Thus, the harm of violating conscience is somewhat offset by the benefit from not doing something else that is wrong.

  5. In some cases the person of mistaken conscience will still do the wrong deed despite the law’s contrary incentive. In such a case, both the perpetrator and the victim may be slightly better off for the law. The victim has a dignitary benefit from the very fact that the state says that the harm was unlawful. That dignitary benefit may be a cold comfort if the victim suffered a grave harm, but it is still a benefit. And the perpetrator is slightly better off, because following one’s conscience against external pressure has an element of admirability even when the conscience is mistaken.

Nonetheless, there will be cases where these considerations do not suffice, and the law should be tolerant of mistaken conscience.

In a just defensive war, to refuse to fight to defend one’s fellow citizens without special reason (perhaps priests and doctors should not kill) is wrong. But a grave harm is done to a conscientious objector who is gotten to fight by legal incentives. Let’s think through the five considerations above. The first mainly applies to laws prohibiting a behavior rather than ones requiring a behavior. Short of brainwashing, it is impossible to make someone fight. (We could superglue their hands to a gun, and then administer electric shocks causing their fingers to spasm and fire a bullet, but that wouldn’t count as fighting.) The second applies somewhat: we do need to weigh the harms to innocent citizens from enemy invaders, harms that might be prevented if our conscientious objector fought. But note that there is something rather speculative about these harms. Someone who fights contrary to conscience is unlikely to be a very effective fighter, and it is far from clear that their military activity would actually prevent any actual harm to innocents. Now, regarding the third consideration, one can design a conscription law with an exemption that few who aren’t conscientious objectors would take advantage of. One way to do this is to require evidence of one’s conscience’s objection to fighting (e.g., prior membership in a pacifist organization). Another way is to impose non-combat duties on conscientious objectors that are as onerous and maybe as dangerous as combat would be. Regarding the fourth consideration, it seems unlikely that a typical conscientious objector’s objections to war would be changed by legal penalties. And the fifth seems a weak consideration in general. Putting all these together, we do not outweigh the prima facie considerations against pressuring conscientious objectors to act against their (mistaken) conscience from the harms in going against conscience.

Saturday, January 21, 2023

Knowing you will soon have enough evidence to know

Suppose I am just the slightest bit short of the evidence needed for belief that I have some condition C. I consider taking a test for C that has a zero false negative rate and a middling false positive rate—neither close to zero nor close to one. On reasonable numerical interpretations of the previous two sentences:

  1. I have enough evidence to believe that the test would come out positive.

  2. If the test comes out positive, it will be another piece of evidence for the hypothesis that I have C, and it will push me over the edge to belief that I have C.

To see that (1) is true, note that the test is certain to come out positive if I have C and has a significant probability of coming out positive even if I don’t have C. Hence, the probability of a positive test result will be significantly higher than the probability that I have C. But I am just the slightest bit short of the evidence needed for belief that I have C, so the evidence that the test would be positive (let’s suppose a deterministic setting, so we have no worries about the sense of the subjunctive conditional here) is sufficient for belief.

To see that (2) is true, note that given that the false negative rate is zero, and the false positive rate is not close to one, I will indeed have non-negligible evidence for C if the test is positive.

If I am rational, my beliefs will follow the evidence. So if I am rational, in a situation like the above, I will take myself to have a way of bringing it about that I believe, and do so rationally, that I have C. Moreover, this way of bringing it about that I believe that I have C will itself be perfectly rational if the test is free. For of course it’s rational to accept free information. So I will be in a position where I am rationally able to bring it about that I rationally believe C, while not yet believing it.

In fact, the same thing can be said about knowledge, assuming there is knowledge in lottery situations. For suppose that I am just the slightest bit short of the evidence needed for knowledge that I have C. Then I can set up the story such that:

  1. I have enough evidence to know that the test would come out positive,

and:

  1. If the test comes out positive, I will have enough evidence to know that I have C.

In other words, oddly enough, just prior to getting the test results I can reasonably say:

  1. I don’t yet have enough evidence to know that I have C, but I know that in a moment I will.

This sounds like:

  1. I don’t know that I have C but I know that I will know.

But (6) is absurd: if I know that I will know something, then I am in a position to know that the matter is so, since that I will know p entails that p is true (assuming that p doesn’t concern an open future). However, there is no similar absurdity in (5). I may know that I will have enough evidence to know C, but that’s not the same as knowing that I will know C or even be in a position to know C. For it is possible to have enough evidence to know something without being in a position to know it (namely, when the thing isn’t true or when one is Gettiered).

Still, there is something odd about (5). It’s a bit like the line:

  1. After we have impartially reviewed the evidence, we will execute him.

Appendix: Suppose the threshold for belief or knowledge is r, where r < 1. Suppose that the false-positive rate for the test is 1/2 and the false-negative rate is zero. If E is a positive test result, then P(C|E) = P(C)P(E|C)/P(E) = P(C)/P(E) = 2P(C)/(1+P(C)). It follows by a bit of algebra that if my prior P(C) is more than r/(2−r), then P(C|E) is above the threshold r. Since r < 1, we have r/(2−r) < r, and so the story (either in the belief or knowledge form) works for the non-empty range of priors strictly between r/(2−r) and r.

Friday, January 20, 2023

Partial and complete explanations

  1. Any explanation for an event E that does not go all the way back to something self-explanatory is merely partial.

  2. A partial explanation is one that is a part of a complete explanation.

  3. So, if any event E has an explanation, it has an explanation going all the way back to something self-explanatory. (1,2)

  4. Some event has an explanation.

  5. An explanation going back to something self-explanatory involves the activity of a necessary being.

  6. So, there is an active necessary being. (4,5)

I am not sure I buy (1). But it sounds kind of right to me now. Additionally, (3) kind of sounds correct on its own. If A causes B and B causes C but there is no explanation of A, then it seems that B and C are really unexplained. Aristotle notes that there was a presocratic philosopher who explained why the earth doesn’t fall down by saying that it floats on water, and he notes that the philosopher failed to ask the same question about the water. I think one lesson of Aristotle’s critique is that if it is unexplained why the water doesn’t fall down it is unexplained why the earth falls down.

Thursday, January 19, 2023

What am I "really"?

What am I? A herring-eater, a husband, a badminton player, a philosopher, a father, a Canadian, a six-footer and a human are all correct answers. There is an ancient form of the “What is x?” question where we are looking for a “central” answer, and of course “a human” is usually taken to be that one. What makes for that answer being central?

Sometimes the word “essential” is thrown in: I am essentially human. But what does that mean? In contemporary analytic jargon, it just means that I cannot exist without being human. But the “central” answer to “What am I?” is not just an answer that states a property that I cannot lack. There are, after all, many such properties essential in such a way that do not answer the question. “Someone conceived in 1972” and “Someone in a world where 2+2=4” attribute properties I cannot lack, but are not the central answers.

So the sense of the “What am I centrally, really, deep-down, essentially?” question isn’t just modal. What is it?

Here is a start.

  1. Necessarily, I am good and a human if and only if I am a good human.

But the same is not true for any other attribute besides “human” among those of the first paragraph. I can be good and a badminton player while not being a good badminton player, and I can be a good herring-eater without being good and a herring-eater. And I have no idea what it is to be a good six-footer, but perhaps the fact that I am a quarter of an inch short of six feet right now makes me not be one. (In some cases one direction may hold. It may be that if I am good and a human, then I am a good human.)

So our initial account of what is being asked for is that we are asking for an attribute F such that:

  1. Necessarily, x is a good F if and only if x is good.

But that’s not quite right. For:

  1. Necessarily, I am good and a virtuous human if and only if I am a good virtuous human.

And yet “a virtuous human” would not be the answer to the ancient “What am I centrally, really, deep-down, essentially?” question even if I were in fact a virtuous human.

But perhaps we can do better. The necessary biconditional (2) holds in the case “virtuous human”, but in a kind of trivial way: “a good virtuous human” is repetition. I think that, as often, we need to pass from a modal to a hyperintensional characterization. Consider that not only is (1) true, but also:

  1. Necessarily, if I am human, what it is for me to be good is for me to be a good human.

In other words, if I am a human, being a good human explains my being good. On the other hand, even if I were a virtuous human, my being a good virtuous human would not explain my being good. For redundancy is to be avoided in explanation, and “good virtuous human” is redundant.

Thus, I propose that:

  1. x is “centrally, really, deep-down, essentially” F just in case what it is for x to be good is for x to be a good F.

In other words, that which I am centrally, really, deep-down and essentially is that which sets the norms for me to be good simpliciter.

Objection 1: Some things are “centrally” electrons, but something’s being good at electronicity is absurd.

Response: I deny that it’s absurd. It’s just that all the electrons we meet are good at electronicity.

Objection 2: “Good” is attributive, and hence there is no such thing as being good simpliciter.

Response: “Good” is attributive in the sense that the schema

  1. x is good and x is F if and only if x is a good F

is not generally logically valid. But some instances of a schema can be logically valid even if the schema is not logically valid in general.