Conditional Independence in Factored Distributions

These are lecture notes from my Computer Science course.

Summary of how far we got

Have been talking about factors and how one can marginalise factors (removing variables) but also do multiplication and division of them.

Probably should look up why you do multiplication and division of factors? I think it’s something to do with conditional distributions.

Conditional independence

Key part of the module apparently. Like normal independence but you have a conditional variable.

“X and Y are independent once we know Z.”

Example of conditional independence

UponTime --- GetTrain --- WorkonTime

There’s conditional independence here. If you get UponTime, there’s a chance you’ll get to work on time.

Instantiate GetTrain to Yes. Then instantiating UponTime to Yes (or No) makes no difference, because UponTime is conditionally dependent on WorkonTime. Likewise, once knowing that GetTrain is yes, you can’t tell whether you got UponTime or not.

Why is this important?

Most real situations variables aren’t completely independent, they almost always have some sort of mutual dependence. By using conditional independence relationships we can model this better, and also speed up computation based on this (because it’s providing more information).


A hypergraph is a set of subsets of variables. E.g. if you have the variables A,B,C,D, a possible Hypergraph might be:

H = {{A,B},{B,C},{C,D},{A,D}}

A ‘Hyperedge’ is one of these subsets. e.g. {A,B}. A hypergraph where all hyperedges have just 2 elements is basically just a graph (well, they are all basically just graphics…).

A Hypergraph can be reduced by removing all hyperedges in H that are contained in some other hyperedge, e.g.

H = {{S}, {C,S}, {B,S}}
red(H) = {{C,S}, {B,S}}

A hypergraph can be represented as an undirected graph where if you have a hyperedge {A,B} the undirected graph has an (undirected) edge between two vertices A and B.

Why do we care?

A Hypergraph of factors can be used to ‘read off’ conditional independence relations. I think using ‘The global Markov Property’. This kinda rushed at the end of the lecture so I’m not sure about it, but I think it was important.

If you have a graph like

A -- B
|    |
D -- C

{B, D} separates {A} from {C} because any path between A and C has to go through B or D. Therefore, in the distribution if someone tells you the value of B and D, then the value of A does not affect the value of C (much like the WorkonTime example). In other words, the graph is a visualisation of information flow.

So in this example, A is independent of C given D or B. I think?

More about this next time.

This entry was posted in agm, lecture. Bookmark the permalink.