Gibbs Sampling

These are lecture notes from my Computer Science course.

What is this?

Another form of sampling, used in bayesian networks etc.

In previous lecture we talked about forward, rejection, and importance sampling. So why do we need Gibbs sampling too?

Limitations of importance sampling

If the evidence is improbable we end up with very low weighted samples, which means that the estimates you get aren’t very good because small weighted samples means each of your data points aren’t getting you very close to the actual probability.

Markov chain Monte Carlo (MCMC)

Instead of sampling from the target distribution, sample from a sequence of distributions which get progressively closer to the target distribution. You do this if you can’t sample easily from the actual target distribution because it’s tricky for some reason (he literally said that).

If you do this long enough, you’ll be sampling from the target distribution.

By the way there are two types of MCMC: ‘perfect sampling’ which is some algorithm that gets you exactly to the target distribution, and something else, which doesn’t.

Only use this when the simpler sampling methods aren’t possible (e.g. when you can sample from the target distribution).

Did you know: any simulation-based method is a Monte Carlo method, and it was named after the casino.

The technical stuff

In terms of a linear infinite bayesian network, a Markov chain looks like a chain:

X0 -> X1 -> X2 -> ...


  • Each Xi has the same finite set of values.
  • Each conditional probability table (probability of Xi+1 given Xi) is the same.

X0 is known as the initial distribution. The elements of the CPTs are transition probabilities. Remember that the CPT is the same for all the values.

Key thing about Markov Chains; as you progress down the chain, the initial distribution affects the results less and less and less. They ‘forget’ their initial distributions.

Stationary distributions

This is a bit like converging. If you have some transition matrix, and some distribution. If multiplying the distribution by the transition matrix results in the same distribution again, then it’s a stationary distribution. This is A Good Thing, apparently.

This doesn’t mean you’re in the same state all the time, just means the probability of being in any of the states is the same (duh).


The aim of MCMC methods is to design a ‘good’ Markov chain whose stationary distribution is the target distribution (perfect sampling) such that Pi quickly converges to the stationary distribution irrespective of the initial distribution (the ‘forgetting’ stuff again). In reality you never really get to that, so you get as close as possible and then throw away all the rubbish stuff and just use the ones that are pretty close to the target. This bit that you throw away is apparently known as the initial ‘burn-in’ sample.


Then there was some maths stuff about doing Gibbs sampling, I think. But I had a completely unrelated train of thought and so lost the plot. Slide 14-15 onwards in lect13.pdf. Was the end of the lecture anyway.

This entry was posted in agm, lecture. Bookmark the permalink.