Probability and Statistics¶

What is the probability of throwing a 6 with a fair dice? Almost everyone immediately will answer: \(1/6\) -th. But what do we mean by that? Two obvious lines of reasoning to arrive at that answer are:

If we throw a dice there are six possible outcomes and because it is a fair dice every outcome is equally probable, the probabilities of each possible outcome should add up to one, and so the probability of throwing a 6 is equal to \(1/6\) -th.
Let’s repeat the experiment a large number of times, say \(N\). After throwing the dice \(N\) times we estimate the probability as:

\[\text{Probability(throwing 6)} = \frac{\text{\#\{throwing 6\}}}{N}\]

This estimate will approach the true probability if we let \(N\rightarrow\infty\).

These lines of reasoning are part of what is called the frequentist approach to probability. We will follow this route quite often in this course.

But now consider the question: what is the probability that it will rain tomorrow? Neither of the two lines of reasoning above can help us here? Tomorrow it will either rain or not but these possible outcomes are clearly not equally probable. And if we repeat the experiment \(N\) times (say we take \(N=10\times356\), then we have data over 10 years) and use the frequentist approach we undoubtly end up with the probability that it will rain on any randomly given day in the Netherlands. Certainly not the probability we were looking for. In this case when the weather man predicts a rain probability for tomorrow of 70% it more or less reflects his belief in his data and his models.

Fortunately the interpretation of what probability is less important. The classical rules of math for probability and statistics that we will look at in this course are largely independent of the interpretation. Loosely speaking we might say that probability is the mathematical language of choice when dealing with uncertainty.

Probabilty theory deals with random experiments and random processes. Experiments and processes that are not deterministic, it is simply not possible to know exactly what will be the result of such an experiment or process. In probability theory we assume that for each of the possible outcomes of a random experiment we know what the corresponding probabilty is.

Fig. 3 The Dutch term “steekproef” for a statistical sample, comes from the way Dutch cheese is tested on the traditional cheesemarket in Alkmaar. A cylindrical sample out of a cheese is tested to assess the quality of several cheeses.

In statistics we enter the realm of everyday life. We may assume that there exists a function that assigns a probability to each possible outcome of a random experiment, but alas all we have are observations (a sample or ‘steekproef’) of the random experiment. Statistics then deals (among others) with the question what can be known about the underlying random experiment using only the observations in the sample? Because the sample consists of random numbers the conclusions from statistics about the random experiment are in a sense random as well. We can never be completely sure about our (numerical) conclusions. A lot a statistics deals with this important question: can we quantisize the probability that we arrive the right (or wrong) conclusions?