Stochastic Independence

Stochastic Independence#

Part of a series: Stochastic fundamentals.

Follow reading here

Let \((\Omega,\Sigma,P)\) be a probability space and the events \(A,B\in \Sigma\).

We defined the conditional probability \(P(A|B)\) as the probability that event \(A\) occurs - given that \(B\) has already occurred. If the events \(A\) and \(B\) are independent, \(A's\) probability should not depend on the occurrence of \(B\). \(\begin{aligned} \textrm{Independence} \longrightarrow P(A|B)=P(A) \end{aligned}\) This is the starting point of the definition of stochastic independence:

Def. The events \(A\) and \(B\) are called stochastically independent if and only if \(\begin{aligned} P(A\cap B)= P(A)\cdot P(B)\end{aligned}\)


  1. Independence is a stochastic feature. It does not exclude influences between two events \(A, B\) in a real world experiment.
    Example: Roll a fair dice twice and define the events:
    \(A\): The sum of the points is odd
    \(B\): The first roll yields an even number of points.
    Obviously the result of the first roll has some influence on \(A\), but \(P(A)=\frac{1}{2}\), \(P(B)=\frac{1}{2}\) and therefore: \(P(A\cap B)=\frac{1}{4}=P(A)P(B)\), i.e. \(A\) and \(B\) are stochastically independent.

  2. Let \(A,B,C\) be three events. Then \(\begin{aligned} P(A\cap B\cap C)=P(A)P(B)P(C) \end{aligned}\) does not imply stochastic independence.

Def. Let \(A_1, A_2, \ldots, A_n \in \Sigma\) be events. They are called stochastically independent (with respect to \(P\)) if

\(P\left( \bigcap _{j \in T} A_j \right) = \prod_{j \in T} P(A_j)\) for every subset \(T \subseteq {1,2, \ldots, n}\)

There is a generalization of independence in terms of conditional probabilities. Let \(A,B,C\) be events. Suppose we first measure \(C\) which gives us some knowledge about \(A\). Then we measure \(B\) additionally to \(C\). If we did not learn anything about \(A\) by measuring \(B\), these events are conditionally independent. Below we present the formal definition.

Def. The events \(A,B\) are conditionally independent given \(C\) iff \(P(A|B\cap C) = P(A|C)\) Loosely speaking, the realization of the event \(C\) restrict us to a subset of \(\Omega\) where \(A\) and \(B\) are independent.

The definition of conditional independence reduces to stochastic independence by setting \(C=\Omega\).

For random variables#

Let \(X_1, X_2, \ldots, X_n\) be real random variables, i.e. \(X_j : \longrightarrow \mathbb{R}\). In order to define stochastic independence in terms of random variables, we first need some definitions:

Def. The joint CDF of the random variables \(X_1, X_2, \ldots, X_n\) is definded as: \(F_{X_1, X_2, \ldots, X_n} (x_1, x_2, \ldots, x_n) = P(X_1 \leq x_1, X_2 \leq x_2, \ldots, X_n \leq x_n, )\)

Def. The joint PDF of \(X_1, X_2, \ldots, X_n\), if it exists, is given by,

\(f_{X_1, X_2, \ldots, X_n} (x_1, x_2, \ldots, x_n) = \frac{\partial^n F_{X_1, X_2, \ldots, X_n}} {\partial x_1, \partial x_2, \ldots \partial x_n} \Biggm\lvert _{X_1, \ldots, X_n}\)

Def. The marginal PDF of one of the random variables \(X_k\) is defined as: \(f_{X_k} (x_k) = \int f_{X_1, \ldots, X_n} (X_1, X_2, \ldots, X_{k-1}, X_{k+1}, \ldots, X_n) \ dx_1 dx_2 \ldots dx_{k-1} dx_{k+1} \ldots dx_n\)

The definition of independence directly carries over to random variables:

Def. A set of discrete random variables is called stochastically independent if the probability mass functions satisfy

\(P\left(\bigcap_{j\in T} X_j = x_j\right) = \prod_{j\in T} P(X_j = x_j) \quad \forall T \subseteq \{1,2,\ldots, n\}\)

Def. A set of continuous random variables is called stochastically independent if the CDFs satisfy: \(\begin{split} F_{\{X_j\}_{j\in T}} (\{x_j\}) &= P\left( \bigcap_{j\in T} X_j \leq x_j \right) \\ &= \prod_{j\in T} F_{X_j}(x_j) \end{split}\)

or, equivalently, if the PDFs satisfy: \(f_{\{X_j\}_{j\in T}} (\{x_j\}) = \prod_{j\in T} f_{X_j}(x_j) \quad \forall T \subseteq \{1,2,\ldots,n\}\)

Theorem (Multiplication rule) Let \(X,Y\) be stochastically independent random variables. Then \(E(XY) = E(X)\cdot E(Y)\)

Proof. Assuming \(XY\) have a PDF,

\[\begin{split} E(XY) &= \int_{\mathbb{R}^2} f_{X,Y}(x,y) xy \ dx dy \\ &= \int_{-\infty}^{+\infty} f_X(x) x dx \int_{-\infty}^{+\infty} f_Y(y) y dy \\ &= E(X) \cdot E(Y) \end{split}\]

Independence vs. correlation#

Let \(X,Y\) be real random variables. The covariance measures how much they change together. This is defined as:

\(\begin{aligned} Cov(X,Y)=E\left[\left(X-E(X)\right)\left(Y-E(Y)\right)\right] \end{aligned}\)

Def. If both \(X\) and \(Y\) have finite variance, we often use the normalized Pearson-Coefficient. \(\begin{aligned} \rho_{X,Y}=\frac{Cov(x,y)}{\sqrt(V(X))\sqrt(V(Y))} \end{aligned}\)

(Using Cauchy-Schwarz inequality we find that \(\rho_{X,Y}\in [-1,1]\).)
Covariance and correlation are often used to measure dependence. For example:

  • If \(X,Y\) are statistically independent, then

    \[ \rho_{X,Y}=0 \]
  • If \(Y=aX\) (perfect linear correlation):

    \[ \rho_{X,Y}=1 \]

But be careful with these kind of measures!
The relation between correlation, independence and causality is much more subtle. For example we can have a correlation coefficient that is zero, but still dependence between the variables.

Remark Correlation does not imply causation:
Assume that you measure a large correlation \(\rho_{X,Y}\) of the two variables \(X\) and \(Y\). Does this mean that \(X\) causes \(Y\)?
Not necessarily. There are several different options, e.g.:

  1. reverse causation, i.e. \(Y\) implies \(X\) in fact.

  2. They have a common cause.

  3. We just found a coincidental correlation.


Philipp Böttcher, Dirk Witthaut