2.5.4. Expectation and Variance

The expectation \(\E(X)\) of a random variable is the value that is to be expected ‘on average’. That is if we repeat the random experiment a large number of times the average value of all obtained numbers is called the expectation.

Consider a simple die with 6 sides and assume it to be a fair die. Let \(X\) denote the number (from 1 to 8) thrown with the die. Then we have:

\[\begin{split}p_X(x) = \begin{cases}\tfrac{1}{6} &: 1\leq x \leq 6\\ 0 &: \text{elsewhere}\end{cases}\end{split}\]

Consider the experiment we throw the die 6,000,000 times then we intuitively expect we throw any possible outcome 1,000,000 times. Leading to average:

\[\frac{1,000,000 \times 1 + \cdots 1,000,000 \times 6}{6,000,000}\]

which is equal to

\[\frac{1}{6}\times 1 + \cdots + \frac{1}{6}\times 6\]

So we take the sum of \(x p_X(x)\) for each possible outcome \(x\). That is exactly how the expectation is defined. Evidently the number of times we see one of the six possible values will not be exactly 1,000,000 but something close to it.

Definition 2.5.1 (Expectation)

For a discrete RV we define the expectation as:

\[\E(X) = \sum_{x=-\infty}^{\infty} x\,p_X(x)\]

For a continuous RV the summation becomes an integral and thus the definition of the expectation becomes:

\[\E(X) = \int_{-\infty}^{\infty} x\,f_X(x)\,dx\]

Given a discrete random variable with probability mass function \(p_X\) a new random variable is constructed as \(Y=g(X)\), i.e. if we get a value for \(X\) (say \(x\)) we just calculate \(y=g(x)\) as a sample from the random variable \(Y=g(X)\). A simple example would be to take \(X\) to be the random variable representing the outcome of throwing a die and set \(Y=X^2\), so if we throw a 6 the value for \(Y\) would be 36.

For this new random variable \(Y\) it is simple to derive the probability mass function:

\[\begin{split}p_Y(y) = \begin{cases} \frac{1}{6} &: y\in\{1,4,9,16, 25, 36\} \\ 0 &: \text{otherwise} \end{cases}\end{split}\]

And using this expression for the probability mass function for \(Y\) the expectation for \(Y\) can be calculated. In other cases it is not so simple to come up with an expression for \(p_Y\) and then the following result comes in handy.

Theorem 2.5.2 (\(\E(g(X))\))

For a discrete random variable \(X\) and a function \(g: \setR\rightarrow\setR\) we have:

\[\E(g(X)) = \sum_{x=-\infty}^{\infty} g(x) p_X(x)\]

For a continuous random variable with probability density function \(f_X\) we have

\[\E(g(X)) = \int_{-\infty}^{\infty} g(x) f_X(x) dx\]

Using the above theorem we can prove an important property of the expectation: the scaling property. In case we make a new RV by multiplying the RV \(X\) with a constant value \(a\) and adding a constant \(b\) we have:

Theorem 2.5.3 (Scaling Random Variable)
\[\E(a X + b) = a\E(X) + b\]
Proof

We give the proof for a continuous RV.

\[\begin{split}\E(aX+b) &= \int_{-\infty}^{\infty} (ax+b)\,f_X(x)\,dx\\ &= \int_{-\infty}^{\infty} a x\,f_X(x)\,dx + \int_{-\infty}^{\infty} b\,f_X(x)\,dx\\ &= a\,\int_{-\infty}^{\infty} x\,f_X(x)\,dx + \,\int_{-\infty}^{\infty} f_X(x)\,dx\\ &= a \E(X) + b\end{split}\]

The expectation of a random variable \(X\) tells us what value to expect on average. It does not say anything about the spread of individual values from the random variable.

The (population) variance is a way to quantify the spread around the mean and is defined as:

Definition 2.5.4 (Variance)

The (population) variance \(\Var(X)\) is defined as:

\[\Var(X) = \E( (X-\E(X))^2 )\]

it equals the expected quadratic difference from the mean of \(X\).

Note that the variance is expressed in the square of the unit in which the random variable is expressed (e.g. if \(X\) is expressed in meters (m) than \(\Var(X)\) is expressed in meters squared (m\(^2\)). Taking the square of the variance leads to the standard deviation expressed in the same units as the random variable and will be denoted as \(\std(X)\).

Definition 2.5.5 (Standard Deviation)

The standard deviation is the square root of the variance:

\[\std(X) = \sqrt{\Var(X)}\]

For the variance we have the following scaling property:

Theorem 2.5.6 (Scaling Random Variables)
\[\Var(aX+b) = a^2 \Var(X)\]

Note that adding a constant does not change the variance and that the scaling factor \(a\) becomes a quadratic factor for the variance. We leave the proof as an exercise.