Conditional Probabilities ========================= The conditional probability $\P(A\given B)$ is the probability for event $A$ given that we know that event $B$ has occured too. For example we may ask for the probability of throwing a 6 with a fair die given that we have thrown an even number of points. .. proof:definition:: Conditional Probability The conditional probability of $A$ given $B$ is: .. math:: \P(A\given B) = \frac{\P(A\cap B)}{\P(B)} In practical applications in machine learning we find ourselves in the situation were we would like to calculate the probability of some event, say $\P(A)$, but only the conditional probabilities $\P(A\given B)$ and $\P(A\given \neg B)$ are known. Then the following theorem can be used. .. proof:theorem:: Total Probability .. math:: \P(A) = \P(A\given B)\,\P(B) + \P(A\given\neg B)\,\P(\neg B) .. proof:proof:: The proof starts with observing that: .. math:: A = (A\cap B) \cup (A\cap \neg B) and because $A\cap B$ and $A\cap \neg B$ are disjunct we may apply the third axiom and obtain: .. math:: \P(A) &= \P(A\cap B) + \P(A\cap \neg B)\\ &= \frac{\P(A\cap B)}{\P(B)}\,\P(B) + \frac{\P(A\cap \neg B)}{\P(\neg B)}\,\P(\neg B)\\ &= \P(A\given B)\,\P(B) + \P(A\given\neg B)\,\P(\neg B) .. figure:: /figures/venn_totalProb.png :width: 30% :align: center **Law of Total Probability.** The hatched area indicates the set $A\cap B_2$. This theorem may be extended to **partitions** of the universe $U$. A partition of $U$ is a collection of subsets $B_i$ for $i=1,\ldots,n$ such that $B_i \cap B_j=\emptyset$ for any $i\not=j$ and $B_1\cup B_2\cup\cdots\cup B_n=U$ . .. proof:theorem:: **Total Probability** For any partition $\{B_i\}$ of $U$ we have: .. math:: \P(A) = \sum_{i=1}^{n} \P(A \given B_i)\,\P(B_i) The proof is a generalization of the proof for the partition $\{B, \neg B\}$. .. proof:theorem:: Bayes Rule **Bayes rule** allows us to write $\P(A\given B)$ in terms of $\P(B\given A)$: .. math:: \P(A\given B) = \frac{\P(A)}{\P(B)}\,\P(B\given A) The proof of Bayes rule simply follows from the definition of the conditional probability. .. proof:definition:: Chain Rule The definition of the conditional probability can be written in another form: .. math:: \P(A\,B) = \P(A\given B)\,\P(B) In this form it is known as the **chain rule** (or **product rule**). This rule can be generalised as: .. math:: \P(A_1\,A_2\cdots A_n) = \P(A_1\given A_2\,\ldots,A_n)\,\P(A_2\given A_3\,\ldots,A_n) \cdots \P(A_{n-1}\given A_n)\,\P(A_n)