When you’re investigating reality as a scientist (and often as an
ordinary person) you perform experiments. Epistemologists and
philosophers of science have spent a lot of time thinking about how to
evaluate what you should do with the results of the
experiments—how they should affect your beliefs or credences—but
relatively little on the important question of which
experiments you should perform epistemologically speaking. (Of course,
ethicists have spent a good deal of time thinking about which
experiments you should not perform morally speaking.) Here I
understand “experiment” in a broad sense that includes such things as
pulling out a telescope and looking in a particular direction.
One might think there is not much to say. After all, it all depends
on messy questions of research priorities and costs of time and
material. But we can at least abstract from the costs and quantify over
epistemically reasonable research priorities, and define:
- E2 is
epistemically at least as good an experiment as E1 provided that for
every epistemically reasonable research priority, E2 would serve the
priority at least as well as E1 would.
That’s not quite right, however. For we don’t know how well an
experiment would serve a research priority unless we know the
result of the experiment. So a better version is:
- E2 is
epistemically at least as good an experiment as E1 provided that for
every epistemically reasonable research priority, the expected degree to
which E2 would
serve the priority is at least as high as the expected degree to which
E1 would.
Now we have a question we can address formally.
Let’s try.
- A reasonable epistemic research priority is a strictly proper
scoring rule or epistemic utility, and the expected degree to which an
experiment would serve that priority is equal to the expected value of
the score after Bayesian update on the result of the experiment.
(Since we’re only interested in expected values of scores, we
can replace “strictly proper” with “strictly open-minded”.)
And we can identify an experiment with a partition of the probability
space: the experiment tells us where we are in that partition. (E.g., if
you are measuring some quantity to some number of significant digits,
the cells of the partition are equivalence classes under equality of the
quantity up to those many significant digits.) The following is then
easy to prove:
Proposition 1: On definitions (2) and (3), an
experiment E2 is
epistemically at least as good as experiment E1 if and only if the
partition associated with E2 is essentially at
least as fine as the partition associated with E1.
A partition R2
is essentially at least as fine as a partition R1 provided that for
every event A in R1 there is an event
B in R2 such that with
probability one B happens if
and only if A happens. The
definition is relative to the current credences which are assumed to be
probabilistic. If the current credences are regular—all non-empty events
have non-zero probability—then “essentially” can be dropped.
However, Proposition 1 suggests that our choice of definitions isn’t
that helpful. Consider two experiments. On E1, all the faculty
members from your Geology Department have their weight measured to the
nearest hundred kilograms. On E2, a thousand randomly
chosen individiduals around the world have their weight measured to the
nearest kilogram. Intuitively, E1 is better. But
Proposition 1 shows that in the above sense neither experiment is better
than the other, since they generate partitions neither of which is
essentially finer than the other (the event of there being a member of
the Geology Department with weight at least 150 kilograms is in the
partition of E2 but
nothing coinciding with that event up to probability zero is in the
partition of E1).
And this is to be expected. For suppose that our research priority is to
know whether any members of your Geology Department are at least than
150 kilograms in weight, because we need to know if for a departmental
cave exploring trip the current selection of harnesses all of which are
rated for users under 150 kilograms are sufficient. Then E1 is better. On the
other hand, if our research priority is to know the average weight of a
human being to the nearest ten kilograms, then E2 is better.
The problem with our definitions is that the range of possible
research priorities is just too broad. Here is one interesting way to
narrow it down. When we are talking about an experiment’s epistemic
value, we mean the value of the experiment towards a set of questions.
If the set of questions is a scientifically typical set of questions
about human population weight distribution, then E1 seems better than
E2. But if it is an
atypical set of questions about the Geology Department members’ weight
distribution, then E2 might be better. We
can formalize this, too. We can identify a set Q of questions with a partition of
probability space representing the possible answers. This partition then
generates an algebra FQ on the
probability space, which we can call the “question algebra”. Now we can
relativize our definitions to a set of questions.
E2 is
epistemically at least as good an experiment as E1 for a set of questions
Q provided that for every
epistemically reasonable research priority on Q, the expected degree to which
E2 would serve the
priority is at least as high as the expected degree to which E1 would.
A reasonable epistemic research priority on a set of questions
Q is a strictly proper scoring
rule or epistemic utility on FQ, and the
expected degree to which an experiment would serve Q is equal to the expected value of
the score after Bayesian update on the result of the
experiment.
We recover the old definitions by being omnicurious, namely letting
Q be all possible
questions.
What about Proposition 1? Well, one direction remains: if E2’s partition is
essentially at least as fine as E1’s, then E2 is better with regard
any set of questions, an in particular better with regard to
Q. But what about the other
direction? Now the answer is negative. Suppose the question is what the
average weight of the six members of the Geology Department is up to the
nearest 100 kg. Consider two experiments: on the first, the members are
ordered alphabetically by first name, and a fair die is rolled to choose
one (if you roll 1, you choose the
first, etc.), and their height is measured. On the second, the same is
done but with the ordering being by last name. Assuming the two
orderings are different, neither experiment’s partition is essentially
at least as fine as the other’s, but the expected contributions of both
experiments towards our question is equal.
Is there a nice characterization in terms of partitions of when E2 is at least as good as
E1 with regard to a
set of questions Q? I don’t
know. It wouldn’t surprise me if there was something in the literature.
A nice start would be to see if we can answer the question in the
special case where Q is a
single binary question and where E1 and E2 are binary
experiments. But I need to go for a dental appointment now.