============================== Conditional Random Variables ============================== Consider two discrete random variables $X$ and $Y$. Then we can define the conditional probability .. math:: \P(X=x \given Y=y) Note that for any value $y$ we have a random variable $X\given Y=y$ with probability mass function .. math:: X \given Y=y \sim p_{X\given Y=y}(x) The notation for conditional random variables is not the same in al literature. You often find the notation .. math:: p_{X\given Y}(x\given y) which I find a bit confusing as i like to reserve the $\given$ symbol to precede the conditioning event and not a mere value. But be warned that in case you advance to Bayesian inference statistics the $\given$ symbol will be used very often to denote the dependence on parameter values. Now consider the situation where $X$ is a continuous random variable whereas $Y$ is a discrete random variable. In that case we have a probability density function for the random variable $X\given Y=y$: .. math:: X\given Y=y \sim f_{X\given Y=y}(x) The conditional random variable $Y\given X=x$ is a discrete random variable. .. math:: Y\given X=x \sim p_{Y\given X=x}(y) And yes although the probability $\P(X=x)=0$ evidently $X=x$ can be the outcome of the random experiment (there always is some outcome and that can be $x$ of course). To illustrate this situation consider apples (1) and pears (0) to be the possible outcome of random variable $Y$ and let $X$ denote the weight of a piece of fruit (either an apple of a pear). Then we may wonder what the probability is of a piece of fruit with weight $x$ to be a pear or an apple i.e. $\P(Y=y\given X=x)$. It is tempting to use Bayes rule directly and write .. math:: \P(Y=y\given X=x) = \frac{\P(X=x\given Y=y)\P(Y=y)}{\P(X=x)} \quad\text{THIS IS WRONG} It is obviously wrong as $X$ and $X\given Y=y$ are both continuous random variables (and hence the above expression evaluates to $0/0$). This naive application of Bayes rule doesn't imply that we can't use it. To make the correct use of Bayes rule a bit more intuitive then just stating the result we introduce the (admittedly sloppy) notation: .. math:: X\approx x for the event $x\leq X \leq x+dx$ with probability $f_X(x)dx$ for $dx\rightarrow 0$, i.e. for infinitessimally small $dx$: .. math:: \P(X\approx x) = \P(x\leq X \leq x+dx) = f_X(x)dx (remember that probability density times interval length is a probability). With this definition we have: .. math:: \P(Y=y\given X=x) = \frac{\P(X\approx x\given Y=y)\P(Y=y)}{\P(X\approx x)} = \frac{f_{X\given Y=y}(x) dx \P(Y=y)}{f_X(x)dx} the $dx$ factors in nominator and denominator cancel out and so: .. math:: \P(Y=y\given X=x) = \frac{f_{X\given Y=y}(x) \P(Y=y)}{f_X(x)} where: - $\P(Y=y\given X=x)$: The **a posteriory probability** of class $y$ given the value $x$ - $\P(Y=y)$: The **a priori** probability of class $y$ - $f_{X\given Y=y}$: The **class conditional probability density** function for $X\given Y=y$. - $f_X$: The **evidence**, i.e. the probability density for $X$.