Wednesday, January 29, 2025

More on experiments

We all perform experiments very often. When I hear a noise and deliberately turn my head, I perform an experiment to find out what I will see if I turn my head. If I ask a question not knowing what answer I will hear, I am engaging in (human!) experimentation. Roughly, experiments are actions done in order to generate observations as evidence.

There are typically differences in rigor between the experiments we perform in daily life and the experiments scientists perform in the lab, but only typically so. Sometimes we are rigorous in ordinary life and sometimes scientists are sloppy.

The epistemic value to one of an experiment depends on multiple factors in a Bayesian framework.

  1. The set of questions towards answers to which the experiment’s results are expected to contribute.

  2. Specifications of the value of different levels of credence regarding the answers to the questions in Factor 1.

  3. One’s prior levels of credence for the answers.

  4. The likelihoods of different experimental outcomes given different answers.

It is easiest to think of Factor 2 in practical terms. If I am thinking of going for a recreational swim but I am not sure whether my swim goggles have sprung a leak, it may be that if the probability of the goggles being sound is at least 50%, it’s worth going to the trouble of heading out for the pool, but otherwise it’s not. So an experiment that could only yield a 45% confidence in the goggles is useless to my decision whether to go to the pool, and there is no difference in value between an experiment that yields a 55% confidence and one that yields a 95% confidence. On the other hand, if I am an astronaut and am considering performing a non-essential extravehicular task, but I am worried that the only available spacesuit might have sprung a leak, an experiment that can only yield 95% confidence in the soundness of the spacesuit is pointless—if my credence in the spacesuit’s soundness is only 95%, I won’t use the spacesuit.

Factor 3 is relevant in combination with Factor 4, because these two factors tell us how likely I am to end up with different posterior probabilities for the answers to the Factor 1 questions after the experiment. For instance, if I saw that one of my goggles is missing its gasket, my prior credence in the goggle’s soundness is so low that even a positive experimental result (say, no water in my eye after submerging my head in the sink) would not give me 50% credence that the goggle is fine, and so the experiment is pointless.

In a series of posts over the last couple of days, I explored the idea of a somewhat interest-independent comparison between the values of experiments, where one still fixes a set of questions (Factor 1), but says that one experiment is at least as good as another provided that it has at least as good an expected epistemic utility as the other for every proper scoring rule (Factor 2). This comparison criterion is equivalent to one that goes back to the 1950s. This is somewhat interest-independent, because it is still relativized to a set of questions.

A somewhat interesting question that occurred to me yesterday is what effect Factor 3 has on this somewhat interest-independent comparison of experiments. If experiment E2 is at least as good as experiment E1 for every scoring rule on the question algebra, is this true regardless of which consistent and regular priors one has on the question algebra?

A bit of thought showed me a somewhat interesting fact. If there is only one binary (yes/no) question under Factor 1, then it turns out that the somewhat interest-independent comparison of experiments does not depend on the prior probability for the answer to this question (assuming it’s regular, i.e., neither 0 nor 1). But if the question algebra is any larger, this is no longer true. Now, whether an experiment is at least as good as another in this somewhat interest-independent way depends on the choice of priors in Factor 3.

We might now ask: Under what circumstances is an experiment at least as good as another for every proper scoring rule and every consistent and regular assignment of priors on the answers, assuming the question algebra has more than two non-trivial members? I suspect this is a non-trivial question.

No comments: