Prelude: the essence of Markov chain Monte Carlo methods

Let me explain the idea of Markov chain Monte Carlo (MCMC) methods using a very simple example. Consider estimating the age of this girl celebrating her birthday. We have two kinds of information about her age: measurement data and a priori knowledge

This picture shows the measurement information, namely a blurred image of the birthday cake with one candle for each year of the girl's age.

We cannot see the number of candles clearly, so we model our data using probabilities.

The a prior knowledge about the girl's age comes from her drawing getting published.

This is statistical information about the ages of cover artists in this children's magazine.

The Bayes formula tells us to multiply together the two probability distributions we have, resulting in a posterior probability distribution combining the two kinds of information we have.

Now we can compute the expectation, or conditional mean, of this probability distribution as a weighted average.

We can compute the conditional mean estimate using an MCMC sampling algorithm. Of course, in this simple example it is silly because we can very easily calculate that number as a weighted average of a few given numbers. However, the use of MCMC becomes a necessity when the variables take values in a high-dimensional Euclidean space where integration quadratures are ineffective because of the curse of dimensionality.

Start by creating a one-to-one correspondence between possible ages of the girl and the sides of a dice.

In the Metropolis-Hastings algorithm we describe, the process is started by choosing any initial member in the chain. In this case the initial member is one (corresponding to the age of 7 years).

Then we randomly pick a candidate for the next (here second) member in the chain by throwing the dice. If the posterior probability of the candidate is higher than that of the previous member, as is the case here, we accept the candidate.

After accepting the candidate for the second place in the chain, we throw the dice again to get a new candidate. Now the candidate has lower posterior probability than the previous member in the chain. Do we reject the candidate then? Well, not necessarily.

The beauty of the Metropolis-Hastings method is that we may still sometimes accept a candidate even if it is less probable than teh previous member in the chain. This is how it's done: compute the ratio of the posterior probabilities of the previous member and of the candidate. It is a number between zero and one. Draw a random number from the uniform probability distribution between zero and one. If the ratio is greater than the random number, accept the candidate. Otherwise, reject the candidate and repeat the previous member in the chain.

In this example, we ended up rejecting the candidate and repeating the second member in the chain so that it occupies both the second and third place. Then we draw the next random candidate by throwing the dice and continue.

The longer chain we compute, the better is our MCMC estimate for the conditional mean.