2.2. Conditional Probabilities

The conditional probability \(\P(A\given B)\) is the probability for event \(A\) given that we know that event \(B\) has occured too. For example we may ask for the probability of throwing a 6 with a fair die given that we have thrown an even number of points.

Definition 2.2.1 (Conditional Probability)

The conditional probability of \(A\) given \(B\) is:

\[\P(A\given B) = \frac{\P(A\cap B)}{\P(B)}\]

In practical applications in machine learning we find ourselves in the situation were we would like to calculate the probability of some event, say \(\P(A)\), but only the conditional probabilities \(\P(A\given B)\) and \(\P(A\given \neg B)\) are known. Then the following theorem can be used.

Theorem 2.2.2 (Total Probability)
\[\P(A) = \P(A\given B)\,\P(B) + \P(A\given\neg B)\,\P(\neg B)\]
Proof

The proof starts with observing that:

\[A = (A\cap B) \cup (A\cap \neg B)\]

and because \(A\cap B\) and \(A\cap \neg B\) are disjunct we may apply the third axiom and obtain:

\[\begin{split}\P(A) &= \P(A\cap B) + \P(A\cap \neg B)\\ &= \frac{\P(A\cap B)}{\P(B)}\,\P(B) + \frac{\P(A\cap \neg B)}{\P(\neg B)}\,\P(\neg B)\\ &= \P(A\given B)\,\P(B) + \P(A\given\neg B)\,\P(\neg B)\end{split}\]
../../_images/venn_totalProb.png

Fig. 2.2.1 Law of Total Probability. The hatched area indicates the set \(A\cap B_2\).

This theorem may be extended to partitions of the universe \(U\). A partition of \(U\) is a collection of subsets \(B_i\) for \(i=1,\ldots,n\) such that \(B_i \cap B_j=\emptyset\) for any \(i\not=j\) and \(B_1\cup B_2\cup\cdots\cup B_n=U\) .

Theorem 2.2.3 (Total Probability)

For any partition \(\{B_i\}\) of \(U\) we have:

\[\P(A) = \sum_{i=1}^{n} \P(A \given B_i)\,\P(B_i)\]

The proof is a generalization of the proof for the partition \(\{B, \neg B\}\).

Theorem 2.2.4 (Bayes Rule)

Bayes rule allows us to write \(\P(A\given B)\) in terms of \(\P(B\given A)\):

\[\P(A\given B) = \frac{\P(A)}{\P(B)}\,\P(B\given A)\]

The proof of Bayes rule simply follows from the definition of the conditional probability.

Definition 2.2.5 (Chain Rule)

The definition of the conditional probability can be written in another form:

\[\P(A\,B) = \P(A\given B)\,\P(B)\]

In this form it is known as the chain rule (or product rule). This rule can be generalised as:

\[\P(A_1\,A_2\cdots A_n) = \P(A_1\given A_2\,\ldots,A_n)\,\P(A_2\given A_3\,\ldots,A_n) \cdots \P(A_{n-1}\given A_n)\,\P(A_n)\]