2.11. Random Vectors

We have looked at multiple random variables before. Let \(X_1,X_2,\ldots, X_n\) be random variables then we defined the joint distribution function \(F_{X_1,X_2,\ldots,X_n}\):

\[F_{X_1,X_2,\ldots,X_n}(x_1,x_2,\ldots,x_n) = \P(X_1\leq x_1 \cap \cdots \cap X_n\leq x_n)\]

In case all random variables are continuous the joint probability density function is defined as

\[f_{X_1,X_2,\ldots,X_n}(x_1,x_2,\ldots,x_n) = \frac{\partial^n F_{X_1,X_2,\ldots,X_n}}{\partial x_1\cdots\partial x_n} (x_1,x_2,\ldots,x_n)\]

To calculate the probability for an event such that the values of the random variables are within a subset \(A\subset\setR^n\) we calculate the multivariate integral:

\[P( (X_1,\ldots,X_n)\in A ) = \int\cdots\int_{A} f_{X_1,X_2,\ldots,X_n}(x_1,x_2,\ldots,x_n) dx_1\cdots dx_n\]

Note that \(dx_1dx_2\cdots dx_n\) is an infinitesimal small hypervolume element. Multiplying the probability density with this volume element results in a probability, adding these probabilities for all volume elements in the set \(A\) results in the desired probability.

Very often there is the need to characterize all random variables with one entity. Think of situations where there are hundreds or even thousands of random variables involved, a situation that often occurs in practice. For this the random vectors are introduced:

\[\begin{split}\v X = \matvec{c}{X_1\\ X_2\\ \vdots\\ X_n}\end{split}\]

A random vector is a vector whose elements are random variables. With this notation the distribution function can be abbreviated as \(F_{\v X}\) and the probability density function as \(f_{\v X}\). Using the vectorial notation the probability for the event \(\v X\in A\) equals:

\[P( \v X \in A) = \int_A f_{\v X}(\v x) d\v x\]

For discrete random variables the distribution function is defined equivalently but instead of defining a probability density function the probability mass function is defined:

\[p_{X_1,X_2,\ldots,X_n}(x_1,x_2,\ldots,x_n) = P(X_1=x_1\cap \cdots \cap X_n=x_n)\]

or equivalently in vectorial notation:

\[p_{\v X}(\v x) = P(\v X =\v x)\]

The probability \(\v X\in A\) for a discrete random variable is then a sum:

\[P(\v X\in A) = \sum_{\v x \in A} p_{\v X}(\v x)\]

and for a continuous random variable we get an integral:

\[P(\v X\in A) = \int_{\v x \in A} f_{\v X}(\v x) d\v x.\]

Note that \(f_{\v X}(\v x)\) is the probability density that multiplied with the infinitesimal volume \(d\v x\) is a probability. Thus we are summing probabilities and in the limit for inifinitesimal small intervals (\(dx\rightarrow0\)) we are integrating a probability density.

It is possible to define random vectors with a mixture of discrete and continuous random variables. In this lecture series we stick with either pure discrete or pure continuous random vectors.

Consider two random vectors \(\v X\) and \(\v Y\). We define these two random vectors to be independent in case each element \(X_i\) is independent of each element \(Y_j\). Note that elements of \(\v X\) might be dependent! In a formula we can express this notion of independence with:

\[f_{X_1,X_2,\ldots,X_n,Y_1,Y_2,\ldots,Y_m}(x_1,x_2,\ldots,x_n,y_1,y_2,\ldots,y_m) = f_{X_1,X_2,\ldots,X_n}(x_1,x_2,\ldots,x_n)\, f_{Y_1,Y_2,\ldots,Y_m}(y_1,y_2,\ldots,y_m)\]

This formula is again a clear demonstration of the power of using vectors (linear algebra) to describe multivariate statistics where we can write:

\[f_{\v X, \v Y}(\v x, \v y) = f_{\v X}(\v x)\, f_{\v Y}(\v y)\]