Friday, May 24, 2019

Improving on Solomonoff priors

Let’s say that we want prior probabilities for data that can be encoded as a countably infinite binary sequence. Generalized Solomonoff priors work as follows: We have a language L (in the original setting, it’ll be based on Turing machines) and we generate random descriptions in L in a canonical way (e.g., add an end-of-string symbol to L and randomly and independently generate symbols until you hit the end-of-string symbol, and then conditionalize on the string uniquely describing an infinite binary sequence). Typically the set of possible descriptions in L is countable and we get a nice well-defined probability measure on the space of all countably infinite binary sequences, which favors those sequences that are simpler in the sense of being capable of a simpler encoding.

Here is a serious problem with this method. Let N be the set of all binary sequences that cannot be uniquely described in L. Then the method assigns prior probability zero to N, even though most sequences are in N. In particular, this means that if we get an L-indescribable sequence—and most sequences generated by independent coin tosses will be like that—then no matter how much of it we observe, we will be almost sure of the false claim that the sequence is L-describable.

Here, I think, is a better solution. Use a language L that can give descriptions of subsets of the space Ω of countably infinite binary sequences. Now our (finitely additive) priors will be generated as follows. Choose a random string of symbols in L and conditionalize on the string giving a unique description of a subset. If the subset S happens to be measurable with respect to the standard (essentially Lebesgue) measure on infinite binary sequences (i.e., the coin toss measure), then randomly choose a point in S using a finitely additive extension of the standard measure to all subsets of S. If the subset S is not measurable, then randomly choose a point in S using any finitely additive measure that assigns probability zero to all singletons.

For a reasonable language L, the resulting measure gives a significant probability to an unknown binary sequence being indescribable. For Ω itself will typically be easily described, and so there will be a significant probability p that our random description of a subset will in fact describe all of Ω, and the probability that we have an indescribable sequence will be at least p.

It wouldn’t surprise me if this is in the literature.