# Stochastic Independence#

Part of a series: Stochastic fundamentals.

Follow reading here

Let \((\Omega,\Sigma,P)\) be a probability space and the events \(A,B\in \Sigma\).

We defined the conditional probability \(P(A|B)\) as the probability that event \(A\) occurs - given that \(B\) has already occurred. If the events \(A\) and \(B\) are independent, \(A's\) probability should not depend on the occurrence of \(B\). \(\begin{aligned} \textrm{Independence} \longrightarrow P(A|B)=P(A) \end{aligned}\) This is the starting point of the definition of stochastic independence:

**Def.**
The events \(A\) and \(B\) are called stochastically independent if and only
if \(\begin{aligned}
P(A\cap B)= P(A)\cdot P(B)\end{aligned}\)

**Notes:**

Independence is a stochastic feature. It does not exclude influences between two events \(A, B\) in a real world experiment.

**Example:**Roll a fair dice twice and define the events:

\(A\): The sum of the points is odd

\(B\): The first roll yields an even number of points.

Obviously the result of the first roll has some influence on \(A\), but \(P(A)=\frac{1}{2}\), \(P(B)=\frac{1}{2}\) and therefore: \(P(A\cap B)=\frac{1}{4}=P(A)P(B)\), i.e. \(A\) and \(B\) are*stochastically independent.*Let \(A,B,C\) be three events. Then \(\begin{aligned} P(A\cap B\cap C)=P(A)P(B)P(C) \end{aligned}\) does not imply stochastic independence.

**Def.**
Let \(A_1, A_2, \ldots, A_n \in \Sigma\) be events. They are called
*stochastically independent* (with respect to \(P\)) if

\(P\left( \bigcap _{j \in T} A_j \right) = \prod_{j \in T} P(A_j)\) for
*every* subset \(T \subseteq {1,2, \ldots, n}\)

There is a generalization of independence in terms of conditional
probabilities. Let \(A,B,C\) be events. Suppose we first measure \(C\) which
gives us some knowledge about \(A\). Then we measure \(B\) additionally to
\(C\). If we did not learn anything about \(A\) by measuring \(B\), these
events are *conditionally independent*. Below we present the formal
definition.

**Def.**
The events \(A,B\) are *conditionally independent* given \(C\) iff
\(P(A|B\cap C) = P(A|C)\) Loosely speaking, the realization of the event
\(C\) restrict us to a subset of \(\Omega\) where \(A\) and \(B\) are
independent.

The definition of *conditional independence* reduces to *stochastic
independence* by setting \(C=\Omega\).

## For random variables#

Let \(X_1, X_2, \ldots, X_n\) be real random variables, i.e. \(X_j : \longrightarrow \mathbb{R}\). In order to define stochastic independence in terms of random variables, we first need some definitions:

**Def.**
The *joint* CDF of the random variables \(X_1, X_2, \ldots, X_n\) is
definded as: \(F_{X_1, X_2, \ldots, X_n} (x_1, x_2, \ldots, x_n)
=
P(X_1 \leq x_1, X_2 \leq x_2, \ldots, X_n \leq x_n, )\)

**Def.**
The *joint* PDF of \(X_1, X_2, \ldots, X_n\), if it exists, is given by,

\(f_{X_1, X_2, \ldots, X_n} (x_1, x_2, \ldots, x_n) = \frac{\partial^n F_{X_1, X_2, \ldots, X_n}} {\partial x_1, \partial x_2, \ldots \partial x_n} \Biggm\lvert _{X_1, \ldots, X_n}\)

**Def.**
The *marginal* PDF of one of the random variables \(X_k\) is defined as:
\(f_{X_k} (x_k) = \int f_{X_1, \ldots, X_n}
(X_1, X_2, \ldots, X_{k-1}, X_{k+1}, \ldots, X_n) \
dx_1 dx_2 \ldots dx_{k-1} dx_{k+1} \ldots dx_n\)

The definition of independence directly carries over to random variables:

**Def.**
A set of *discrete* random variables is called *stochastically
independent* if the probability mass functions satisfy

\(P\left(\bigcap_{j\in T} X_j = x_j\right) = \prod_{j\in T} P(X_j = x_j) \quad \forall T \subseteq \{1,2,\ldots, n\}\)

**Def.**
A set of *continuous* random variables is called stochastically
independent if the CDFs satisfy:
\(\begin{split}
F_{\{X_j\}_{j\in T}} (\{x_j\}) &= P\left( \bigcap_{j\in T} X_j \leq x_j \right) \\
&= \prod_{j\in T} F_{X_j}(x_j)
\end{split}\)

or, equivalently, if the PDFs satisfy: \(f_{\{X_j\}_{j\in T}} (\{x_j\}) = \prod_{j\in T} f_{X_j}(x_j) \quad \forall T \subseteq \{1,2,\ldots,n\}\)

**Theorem (Multiplication rule)**
Let \(X,Y\) be stochastically independent random variables. Then
\(E(XY) = E(X)\cdot E(Y)\)

**Proof.**
Assuming \(XY\) have a PDF,

## Independence vs. correlation#

Let \(X,Y\) be real random variables. The *covariance* measures how much
they change together. This is defined as:

\(\begin{aligned} Cov(X,Y)=E\left[\left(X-E(X)\right)\left(Y-E(Y)\right)\right] \end{aligned}\)

**Def.**
If both \(X\) and \(Y\) have finite variance, we often use the *normalized
Pearson-Coefficient*.
\(\begin{aligned}
\rho_{X,Y}=\frac{Cov(x,y)}{\sqrt(V(X))\sqrt(V(Y))}
\end{aligned}\)

(Using Cauchy-Schwarz inequality we find that \(\rho_{X,Y}\in [-1,1]\).)

Covariance and correlation are often used to measure dependence. For
example:

If \(X,Y\) are statistically independent, then

\[ \rho_{X,Y}=0 \]If \(Y=aX\) (perfect linear correlation):

\[ \rho_{X,Y}=1 \]

**But** be careful with these kind of measures!

The relation between correlation, independence and causality is much
more subtle. For example we can have a correlation coefficient that
is zero, but still dependence between the variables.

**Remark**
Correlation does not imply causation:

Assume that you measure a large correlation \(\rho_{X,Y}\) of the two
variables \(X\) and \(Y\). Does this mean that \(X\) causes \(Y\)?

Not necessarily. There are several different options, e.g.:

reverse causation, i.e. \(Y\) implies \(X\) in fact.

They have a common cause.

We just found a coincidental correlation.