Probability Spaces and Random Variables#
Part of a series: Stochastic fundamentals.
Follow reading here
Probability Spaces#
We start with the introduction of a probability space and a random variable. We will not go into too much detail here, neither will we any technical results from measure theory. Yet, we give the formal definitions and briefly discuss why this definition is actually most appropriate for the analysis of probabilities.
Definition#
A probability space is a triplet \((\Omega, \Sigma, P)\) consisting of the following:
A Sample Space \(\Omega\) which is the set of all possible outcomes,
A set of events \(\Sigma\) where each event is a set containing zero or more outcomes, and
A probability measure, \(P : \Sigma \longrightarrow [0,1]\)
The three elements must satisfy some axioms, which we collect below after a short example and a discussion.
Example#
Maybe the easiest stochastic experiments is coin tossing. The coin can show heads (\(H\)) or tails (\(T\)), hence the sample space consists of just two elements \(\Omega = \{ H,T \}\). We can now define events corresponding to all subsets of \(\Omega\):
\(\{ H \}\): The coin shows heads.
\(\{ T \}\): The coin shows tails.
\(\{ \}\): The coin shows neither heads, nor tails.
\(\{ H,T \}\): The coin shows either heads or tails.
If our coin is tail, the probability measure then satisfies: \(\begin{aligned} P\left( \{ H \} \right) = 0.5, \quad P\left( \{ T \} \right) = 0.5, \quad P\left( \{ \} \right) = 0, \quad P\left( \{ H,T \} \right) = 1. \end{aligned}\)
Why so complicated?#
At this point, one might wonder why the definition is so complicated, in fact why we distinguish between outcomes and events. Let us mention three reasons:
The first reason is formal: This approach allows to axiomatize probability theory and make it compatible with modern measure theory.
The example above shows that it is more convenient to assign probabilities to events (\(\Sigma\)) than to outcomes \((\Omega)\). The event “either heads or tails” is a perfectly valid event and should be assigned a probability right from the start.
The distinction becomes especially important if the set of outcomes is a continuous, i.e. uncountable, set. For instance, choose an interval on the real line \(\Omega = [0, \ell]\) and assume that all elements of \(\Omega\) are “equally probable”. The probability that exactly one outcome \(x \in \Omega\) is realized, is always zero, \(P\left( \{ x \} \right) = 0\). But we can assign probabilities to the events of the form “\(x\) lies between \(a\) and \(b\)”, for which we get \(\begin{aligned} P(a \le x \le b) = \frac{b-a}{\ell},\end{aligned}\) given \(0 \le a \le b \le \ell\).
The previous example is very simple, but already shows that one should not try to assign probabilities to outcomes directly but rather to events. One might now try to assign probabilities to all possible sets of outcomes, such that \(\Sigma\) just becomes the power set of \(\Omega\). While this is perfectly adequate for countable sets \(\Sigma\), this may cause highly paradoxical results for uncountable sets (e.g. Banach-Tarski paradox). Hence, we have to be more careful and introduce the concept of a \(\sigma\)-algebra now.
Axioms#
The elements of the probability space \(\Omega, \Sigma, P\) must satisfy the following axioms:
\(\Omega \ne \emptyset\),
\(\Sigma\) is a \(\sigma\)-algebra over the set \(\Omega\), meaning
\(\Omega \in \Sigma\)
\(A \in \Sigma \ \Rightarrow \ (\Omega \setminus A) \in \Sigma\)
\(A_1, A_2, \ldots \in \Sigma \Rightarrow \bigcup_{n\in \mathbb{N}} A_n \in \Sigma\) (i.e, all countable unions are in \(\Sigma\))
Using 1. and 2., one can prove that all countable intersections are also in \(\Sigma\).
The probability measure \(P\) must satisfy:
\(P(\Omega) = 1\)
\(P\) is countably additive, i.e, \(P(A_1 \cup A_2 \cup \ldots ) = P(A_1) + P(A_2) + \ldots\) for pairwise disjoint sets, \(A_1, A_2, \ldots\) .
We note the following aspects on the definition of a \(\sigma\)-algebra. The smallest possible \(\sigma\)-algebra of \(\Omega\) is just \(\{ \emptyset, \Omega \}\), The largest one is the power set of \(\Omega\). As discussed above, serious problems arise of \(\Omega\) is continuous and we choose the power set. Hence, we must use a smaller \(\sigma\)-algebra and we typically go for the Borel sets which are the smallest \(\sigma\)-algebras containing all open subsets.
Random Variables#
The basic definitions of probability theory are about sets, but often it is easier to deal with numbers than sets. For instance, when rolling two dice, we are typically not interested in the probabilities of all events but rather on the number number of points we get in total. Hence, we will now introduce the concept of a random variable.
Definition#
A random variable \(X\) is a measurable function defined from the set of outcomes \(\Omega\) to a measurable space \(E\), which we typically choose as the set of real numbers \(\mathbb{R}\). \(X : \Omega \longrightarrow E\)
If the range of \(X\) is countably infinite, we call \(X\) a discrete random variable. If the range of \(X\) is uncountably infinite, we call \(X\) a continuous random variable.
We can assign probabilities to random variables in the following way: \(\begin{split} P(X=x) &:= P({\omega \in \Omega : X(\omega) = x}) \\ P(X \leq x) &:= P({\omega \in \Omega : X(\omega) \leq x}) \end{split}\)
Example: Dice tossing#
As a simple example, we ask the the question: What is the probability of getting \(k\) sixes if we toss a die \(n\) times?
For \(n=2\) we could now fix the set of outcomes as
Both tosses yield a value less than \(6\).
The first toss yields a \(6\), the second less than \(6\).
The first toss yields a value less than six, the second yields a \(6\).
Both toss yield a value of \(6\).
In shorthand notation, we might write this as \(\Omega = \{<<, 6<, <6, 66\}.\) We can then introduce a random variable \(X\) to count the number of sixes by defining the mapping
\(\begin{aligned} X: << &\mapsto 0 \\ 6< &\mapsto 1 \\ <6 &\mapsto 1 \\ 66 &\mapsto 2. \end{aligned}\)
If \(p\) is the probability to get a six in one toss (\(p=1/6\) if the die is fair), then
\(\begin{aligned} P(X=0) &= (1-p)^2 \\ P(X=1) &= p(1-p) + (1-p)p \\ P(X=2) &= p^2. \end{aligned}\)
In general, we find for \(n\) tosses that \(\begin{aligned} P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}, \end{aligned}\) which is called the binomial distribution.
Example: Indicator variables#
Indicator functions are random variables which characterize the occurrence of an event. They can be quite helpful, because they allow to switch between the language of random variables and the language of sets and use whatever is more appropriate for a given problem.
So let \(A \in \Sigma\) be an event, then the indicator function associated with the set \(A\) is defined as
\(\begin{aligned} 1_A: \; & \Omega \longrightarrow \mathbb{R} \\ & \omega \in \Omega \mapsto 1_A(\omega) := \left\{ \begin{array}{ l l } 1 \; & \mbox{if} \; \omega \in A, \\ 0 & \mbox{otherwise}. \end{array} \right.\end{aligned}\)
We list some useful properties of indicator functions
\(\begin{aligned} 1_{A \cap B}(\omega) &= 1_A(\omega) \times 1_B(\omega) \\ 1_A(\omega) &= 1_A(\omega)^2 \\ 1_{\overline A}(\omega) &= 1 - 1_A(\omega), \end{aligned}\)
where \(A\) and \(B\) are events and \(\overline A = \Omega \setminus A\).