Law of Large Numbers and Central Limit Theorem


Law of Large Numbers and Central Limit Theorem#

Part of a series: Stochastic fundamentals.

Follow reading here

We consider a set of \(n\) stochastically independent real random variables \(X_1, X_2, \dots, X_n\) and analyze the sum \(X_1 + X_2 + \dots + X_n\) or arithmetic mean \(\frac 1 n \left( X_1 + X_2 + \dots + X_n \right) \,.\)

Why is this important?

  • Stochastic Processes:
    Many stochastic influences, but we are only interested in the cumulated influence.

  • Experiments:
    Suppose you measure a physical quantity, but the measurement process is subject to noise. Typically you will repeat many times and average.

  • Economics – in particular insurances:
    Insurance companies have a large number of similar contracts with “stochastic” payments to be made in the future. The important quantity (for the company) is the sum of these payments.

Now the law of large numbers and the central limit theorem tell us important results about the behavior of \(X_1 + X_2 + \dots + X_n\) in the limit \(n \rightarrow \infty\).

Theorem (Law of large numbers) Let \(X_1,X_2, \dots\) be a series of stochastically independent random variables with the same expected value \(\mu = E(x_j) < \infty\) and variance \(\sigma^2 = V(x_j) < \infty\).
Then the arithmetic mean \(\frac{1}{n} \sum_{j=1}^n x_j\) satisfies \(\lim_{n \rightarrow \infty} P \left( \bigg| \frac{1}{n} \sum_{j=1}^n x_j - \mu \bigg| \geq \epsilon \right) = 0\)
Interpretation: The probability that the arithmetic mean differs from \(\mu\) tends to zero for \(n \rightarrow \infty\).

Proof As the random variables are independent we can use the following properties from the expectation value and the variance
\(\begin{aligned} E \bigg( \frac{1}{n} \sum_{j=1}^n x_j \bigg) &= \frac{1}{n} \sum_{j=1}^n \underbrace{E(x_j)}_{= \mu} = \mu\\ V \bigg( \frac{1}{n} \sum_{j=1}^n x_j \bigg) &= \frac{1}{n^2} \sum_{j=1}^n V(x_j) = \frac{1}{n} \sigma^2 \end{aligned}\)
Using the Tschebyschew inequality we then obtain \(P \left( \bigg| \frac{1}{n} \sum_{j=1}^n x_j - \mu \bigg| \geq \epsilon \right) \leq \frac{\sigma^2}{n \epsilon^2} \underset{{n \rightarrow \infty}}{\longrightarrow} 0 \,.\) \(\Box\)

With the help of the central limit theorem we can even give the distribution of the arithmetic mean in the limit \(n \rightarrow \infty\). But before we can enjoy this result, we have to define what convergence means for distributions of random variables.

Definition A sequence of real random variables \(Z_1, Z_2, \dots\) is said to converge in distribution (or converge in law, weakly converge) to a random variable \(Z\) if \(\lim_{n \rightarrow \infty} F_{Z_n} (z) = F_Z (z)\) for every number \(z \in \mathbb{R}\) at which \(F_Z\) is continuous. Here \(F_{Z_n}\) and \(F_{Z}\) denote the CDFs of \(Z_n\) and \(Z\), respectively.

Theorem (Lévy’s continuity theorem) Let \(Z_1,Z_2,\dots\) be a sequence of real random variables with characteristic functions \(\varphi_{Z_n} (t)\).

Then the following two statements are equivalent:

  1. \(Z_n\) converges in distribution to some random variable \(Z\): \(Z_n \overset{d}{\longrightarrow} Z\)

  2. The sequence of characteristic functions converges point-wise to some function \(\varphi_Z (t)\): \(\varphi_{Z_n}(t) \longrightarrow \varphi_Z (t) ~~~ \forall t \in \mathbb{R}\) and \(\varphi_Z (t)\) is continuous at \(t = 0\) (and hence everywhere).

This theorem represents a great simplification as we only have to care about characteristic functions to proof convergence in distribution. the proof of Lévy’s continuity theorem is rather technical and hence we omit it here. It can be found in the literature, e.g. in [Fristedt and Gray, 1996].

Now we have all technical results and methods available to enjoy the central limit theorem and its proof.

Theorem (Lindeberg-Lévy Central limit theorem)

Let \(X_1,X_2,\dots\) be a sequence of iid real random variables with \(E(x_j) = \mu\) and \(V(x_j) = \sigma^2 < \infty\). Consider the arithmetic mean \(S_n = \frac 1 n \sum_{j=1}^n X_j \,.\) Then the random variable \(Z_n = \sqrt{n} (S_n - \mu)\) converges in distribution to a normal distribution: \(\sqrt{n} (S_n - \mu) \overset{d}{\longrightarrow} \mathcal{N} (0,\sigma^2)\)

Proof write \(Z_n = \sqrt{n} (S_n - \mu) = \sum_{j=1}^n \frac{x_j - \mu}{\sqrt{n}} \,.\) then the characteristic function of \(Z_n\) reads \(\varphi_{Z_n} (t) = \prod_{j=1}^n \varphi_{\frac{x_j-\mu}{\sqrt{n}}} (t) \underset{iid}{=} \left[ \varphi_{x_j - \mu} \left(\frac{t}{\sqrt{n}} \right)\right]^n \,.\)\

We have

\(\begin{align} \varphi_{x_j - \mu} \left(\frac{t}{\sqrt{n}} \right) &= E \left( e^{i t/\sqrt{n} (x_j - \mu)} \right)\\ &= \underbrace{E(1)}_{=1} + \frac{i t}{\sqrt{n}} \underbrace{E(x_j-\mu)}_{=0} - \frac{t^2}{2 n} \underbrace{E\left((x_j-\mu)^2\right)}_{=\sigma^2} + \frac{t^3}{6 n^{3/2}} \underbrace{i^3 E\left((x_j-\mu)^3\right)}_{= c} + \mathcal{O} \left(\frac{t^4}{n^2}\right) \,. \end{align}\)

Thus we find

\(\begin{aligned} \varphi_{Z_n} \left(\frac{t}{\sqrt{n}} \right) &= \left[ 1 - \frac{\sigma^2 t^2}{2 n} + c \frac{t^3}{6 n^{3/2}} + \mathcal{O} \left(\frac{t^4}{n^2}\right) \right]^n ~~\overset{n \rightarrow \infty}{\longrightarrow}~~ e^{-\sigma^2 t^2 /2} \end{aligned}\)
(which is the c.f. of \(\mathcal{N} (0, \sigma^2)\)).

Using Lévy’s continuity theorem we finally obtain \(Z_n \overset{d}{\longrightarrow} \mathcal{N} (0,\sigma^2)\) \(\Box\)

Remark The ‘classical’ Lindeberg-Lévy CLT uses some very strict assumptions which can be lifted at least partially:

  1. Identical distribution of the \(X_j\).
    This assumption can indeed be dropped to a large extend. See Lyapunov CLT below.

  2. Stochastic Independence \(X_j\).
    Some weak dependencies can be tolerated in the sence that one can replace the condition of independence by a weaker assumption (Not here, maybe later).

  3. The finite second moment \(\sigma^2 < \infty\) is essential in the proof of the CLT. Nevertheless, one can find something similar for other distributions. See next chapter.

Theorem (Lyapunow CLT) Let \(X_1, X_2, \dots\) be a sequence of independent random variables with finite expeced values \(\mu_j\) and variance \(\sigma_j^2\). Define \(S_n^2 = \sum_{j=1}^n \sigma_j^2 \,.\) If for some \(\delta>0\) the Lyapunov condition \(\lim_{n \rightarrow \infty} \frac{1}{S_n^{2+\delta}} \sum_{j=1}^n E\left( |x_j - \mu_j|^{2+\delta} \right) = 0\) is satisfied, then the sum of the \((x_j-\mu_j)/S_n\) converges in distribution to the normal distribution: \(\frac{1}{s_n} \sum_{j=1}^n (x_j - \mu_j) \overset{d}{\longrightarrow} \mathcal{N} (0,1).\)

Idea of proof The characteristic function of \(z_n = \frac{1}{S_n} \sum_{j=1}^n (x_j - \mu_j)\) is given by \(\varphi_{Z_n} (t) = \prod_{j=1}^n \bigg( 1 - \underbrace{\frac{\sigma_j^2 t^2}{2 s_n^2}}_{\sim 1/n} + \underbrace{\frac{i^3 t^3}{6} \frac{E \big( (x_j-\mu_j)^3 \big)}{s_n^3}}_{\text{vanishes faster than } 1/n} + \dots \bigg)\)
To ensure convergence we need: that the second term in the parentheses \(\sim \frac{1}{n}\) and that the third term vanishes faster than \(\frac{1}{n}\).

This is ensured by the Lyapunov condition and we can evaluate the limit to arrive at \(e^{-t^2/2}\).



Bert E. Fristedt and Lawrence F. Gray. A Modern Approach to Probability Theory. Springer, 1 edition, 1996. ISBN 978-0-8176-3807-8.


Philipp Böttcher, Dirk Witthaut