Decision Theory

Perfect Prior Probabilities

The prior probability of a hypothesis is 1/x, where x is the number of bits it would take to write a computer program that prints a complete description of the hypothesis.

Consider a computer program that prints a string of bits that represents the entire universe. Or one possible configuration of it. That bitstring is a complete hypothesis.

One such computer program might represent the operations of our best approximation of the laws of physics. It would be relatively short, giving it a fairly high prior probability. And it would make pretty good predictions, so you’d update it to have a higher posterior probability upon observing, well, almost anything. This is physics we’re talking about, after all.

A rival theory is “god did it.” But to get a computer program to print out a complete description of the universe, you need to have some math in there. And if your program has the same output either way, the version without the description of god will be shorter, and thus more likely.

Tada! Occam’s razor!

Throw in some Bayesian updating, and you’ve got the best possible method for arriving at accurate beliefs.

Soon: Bayes v. Science. Two enter, one leaves.

Bayesian Epistemology: The Short Version

There may be worlds in which it is possible to know a thing for certain, but in ours there is always a very tiny chance that you are being deceived by an alien reptile conspiracy. Some people throw up their hands in response to this and start buying lottery tickets, because “there’s always a chance, right?” The more general version of this naïve epistemological theory is “use wishful thinking to choose among all beliefs that are even remotely plausible.”

There are better ways of choosing beliefs than this, and in fact there is one particular epistemology that is mathematically guaranteed to give you the best chance of being right: Bayesian induction with a Solomonoff prior.¹

Bayesian Induction

In math:

Bayes Theorm

The math, in English:

So we’ve got this hypothesis that my roommate is a reptiloid alien, which we’ll call A, and we want to know how likely it is now that we’ve just seen that he has two sets of eyelids, an observation which we’ll call B. This is the posterior probability of A, which is P(A|B) in the formula above. (You read that as “the probability of A given B.” No butts are involved.)

To figure out that probability, we need to know three different things:

  1. The prior probability of A, which is how likely A was before we’d even heard of B. If we didn’t already have an idea of likely A was, then we’d want to use the Solomonoff prior for A, which we’ll get to in a minute. (Note that the more likely A was before we saw B, the more likely it should be afterwards.)
  2. The conditional probability of B, given A, which is the P(B|A) in the formula. If my roommate is a reptiloid, then it is very likely that he’ll have two sets of eyelids, but it’s not guaranteed. (Nothing is.) Maybe reptiloids don’t have any eyelids at all. (The more likely it is for reptiloids to have two eyelids, the more likely it is for my roommate to be a reptiloid if he does have two sets.)
  3. The prior probability of B, which is how likely B is. If there’s a common birth defect that causes people to have two pairs of eyelids, then it should be less likely that my roommate is an alien, because he probably just has the birth defect. But if there is no such birth defect, well, we should be more suspicious. (The more likely B is, the less likely our hypothesis A is.)

Once we have those numbers worked out, we can do some multiplication and some division and figure out exactly how paranoid I should be.

In English, with no math:

Represent all of your beliefs as probabilities or degrees of belief. When you encounter new evidence, update your beliefs to be more or less likely.

The Solomonoff Prior

To make any of this Bayesian induction stuff work, you need to be able to have some kind of prior probability for hypotheses before you see any evidence at all. The simplest way to do this would be to use an ignorance prior, which would mean treating all hypotheses as equally likely before you see evidence for or against them. But starting out with the assumption that my roommate is as likely to be a reptiloid as not seems… over-generous. We can do better.

Occam’s razor is a pretty handy principle, and we can adopt a version of it to serve as our universal prior: the simpler a hypothesis is, the more likely it is to be true. All we need now is a mathematical definition of simple and we’re set. How about:

The prior probability of a hypothesis is 1/x, where x is the number of bits it would take to write a computer program that prints a complete description of the hypothesis.

I’ll pry that definition apart tomorrow and explain how it ticks.

 

1: This is actually a slight lie. If the Church-Turing thesis is false, Solomonoff induction will perform poorly in cases where hypercomputation is relevant. But this possibility is exotic enough that it can be ignored for our purposes.

Unhappy With Utility

Philosophy is a multifaceted, sprawling academic discipline with an astonishing amount of accumulated historical baggage. While this is occasionally a good thing, it more often produces obnoxious and unnecessary misunderstandings. One such misunderstanding that has been particularly aggravating to me in recent weeks concerns the terms “utility” and “utility maximization”, which each have two meanings in philosophy which are substantially different but easily confused.¹

The first kind of utility is experience-utility, a measure of the amount of happiness associated with an outcome; it is the subject of the moral theory of utilitarianism, originally developed by Jeremy Bentham and John Stuart Mill over the course of the late-18th and mid-19th centuries. Bentham and Mill argued that happiness is the greatest good, and so good actions are those which increase the amount of happiness in the world. Maximizing experience-utility is a Level 1 moral imperative; it is a theory of what people ought to do.²

The second kind of utility is decision-utility, a measure of preference or desire for an outcome; it is a key concept in economics, game theory, ethics, and artificial intelligence. This kind of utility can be formalized and discussed mathematically, and it has been proven (Math warning!) that you can construct a mathematical function (the utility function) for every “rational” agent which incorporates all of their preferences. Maximizing decision-utility is a description of the behavior of rational agents.³

Consider the case of a woman, Anne, who is walking through a park on the way to an important business meeting. As she passes a pond, she sees that there is a young boy, Billy, about 20 feet from shore, and it is immediately clear from the way he is flailing that he will soon drown. Anne is faced with a choice: she can dive into the pond and rescue the drowning boy, but then she will show up to her meeting very late and soaking wet. (She was a lifeguard in college, so it is all but certain that she will succeed with her rescue.) Or she can hurry on to her meeting. Anne is perfectly selfish and concerned only with her own happiness, so her experience-utility and decision-utility are always equal for a particular outcome; she prefers outcomes in proportion to how happy they will make her. Unfortunately for Billy, she knows that helping him won’t make her happy and that being on time for her meeting will. Billy drowns.

Now imagine that instead of Anne being presented with the problem of the drowning boy, Carl is the one walking through the park that day. Like Anne, Carl knows that he’ll be much happier if he has a successful meeting and doesn’t have to get drenched; his experience-utility will be higher if he walks away. But quite unlike Anne, Carl likes to read John Stuart Mill and considers himself a classical utilitarian. He realizes that the happiness that Billy and his family will feel if he survives would drastically outweigh the unhappiness that he himself will experience if he misses his meeting. He acts rationally and saves Billy, which maximizes his decision-utility and the world’s total amount of experience-utility, even though it means sacrificing some of his own experience-utility.

If I wasn’t distinguishing between the two varieties of utility in these hypotheticals, I could criticize Anne for failing to maximize utility because she has made the world an unhappier place, and then I could criticize Carl for failing to maximize utility because he has made himself unhappy. If I had longwindedly described scenarios for Danielle and Eric, the irrational but otherwise identical twins of Anne and Carl who chose the options they didn’t prefer, I could criticize them both for failing to maximize utility as well. But while they would have the advantage of being brief, none of those criticisms would be very helpful or informative. Please keep your utilities straight. It’s always ugly when someone mixes up Water Works and the Electric Company.

 

1: Even my beloved Wikipedia fails to make this distinction. I’m not brave enough to wade in and fix that article, but more courageous readers are encouraged to do what they can.

2: Or so it is argued. There are a number of serious problems with purely hedonistic moral systems, which I may explore in greater depth in the future.

3: Humans are not rational agents by this definition, but we resemble them enough that economic models can pretend that we are and still prove useful for many purposes; economics tends to go horribly wrong when it forgets that people are not actually perfectly rational utility maximizers.