==============================
 Conditional Random Variables
==============================


Consider two discrete random variables $X$ and $Y$. Then we can define
the conditional probability

.. math::
   \P(X=x \given Y=y)

Note that for any value $y$ we have a random variable $X\given Y=y$
with probability mass function

.. math::
   X \given Y=y \sim p_{X\given Y=y}(x)

The notation for conditional random variables is not the same in al
literature. You often find the notation

.. math::
   p_{X\given Y}(x\given y)

which I find a bit confusing as i like to reserve the $\given$ symbol
to precede the conditioning event and not a mere value. But be warned
that in case you advance to Bayesian inference statistics the $\given$
symbol will be used very often to denote the dependence on parameter
values.

Now consider the situation where $X$ is a continuous random variable
whereas $Y$ is a discrete random variable. In that case we have a
probability density function for the random variable $X\given Y=y$:

.. math::
   X\given Y=y \sim f_{X\given Y=y}(x)

The conditional random variable $Y\given X=x$ is a discrete random
variable.

.. math::
   Y\given X=x \sim p_{Y\given X=x}(y)

And yes although the probability $\P(X=x)=0$ evidently $X=x$ can be
the outcome of the random experiment (there always is some outcome and
that can be $x$ of course).

To illustrate this situation consider apples (1) and pears (0) to be
the possible outcome of random variable $Y$ and let $X$ denote the
weight of a piece of fruit (either an apple of a pear). Then we may
wonder what the probability is of a piece of fruit with weight $x$ to
be a pear or an apple i.e. $\P(Y=y\given X=x)$. It is tempting to use
Bayes rule directly and write

.. math::
   \P(Y=y\given X=x) = \frac{\P(X=x\given Y=y)\P(Y=y)}{\P(X=x)}
   \quad\text{THIS IS WRONG}

It is obviously wrong as $X$ and $X\given Y=y$ are both continuous
random variables (and hence the above expression evaluates to $0/0$).
This naive application of Bayes rule doesn't imply that we can't use
it. To make the correct use of Bayes rule a bit more intuitive then
just stating the result we introduce the (admittedly sloppy) notation:

.. math::
   X\approx x

for the event $x\leq X \leq x+dx$ with probability $f_X(x)dx$ for
$dx\rightarrow 0$, i.e. for infinitessimally small $dx$:

.. math::
   \P(X\approx x) = \P(x\leq X \leq x+dx) = f_X(x)dx

(remember that probability density times interval length is a
probability). With this definition we have:

.. math::
   \P(Y=y\given X=x) = \frac{\P(X\approx x\given Y=y)\P(Y=y)}{\P(X\approx x)} =
   \frac{f_{X\given Y=y}(x) dx \P(Y=y)}{f_X(x)dx}

the $dx$ factors in nominator and denominator cancel out and so:

.. math::
   \P(Y=y\given X=x) =  \frac{f_{X\given Y=y}(x)  \P(Y=y)}{f_X(x)}


where:

- $\P(Y=y\given X=x)$: The **a posteriory probability** of class $y$
  given the value $x$

- $\P(Y=y)$: The **a priori** probability of class $y$

- $f_{X\given Y=y}$: The **class conditional probability density** function
  for $X\given Y=y$.

- $f_X$: The **evidence**, i.e. the probability density for $X$.