Bayesian Methods

Contents

Bayesian Methods#

Warning

This article is under construction.

The fundamental difference between frequentist and Bayesian statistics is the interpretation of parameters \(\theta\). In frequentist statistics, parameters are unknown but fixed values which can be estimated. The term “frequentist” refers to the fact that the aleatory uncertainty of such an estimation will decrease with repetition due to the law of large numbers. Here, a probability represents a relative frequency. In Bayesian statistics, on the other side, parameters are random variables, i.e. \(\theta \in \Theta\), with their own distributions (here denoted by \(\pi(\theta)\)). Hence, there is no single value to be found, but a high probability regions to be exploited. Here, a probability reflects the strength of belief in the corresponding event. The key concept of Bayesian thinking is that such beliefs are updated when new information in form of data, \(x\), is available via Bayes’ theorem:

(10)#\[ \pi(\theta|x) = \frac{\pi(x|\theta)\cdot \pi(\theta)}{\underbrace{\int_{\Theta}\pi(x|\theta')\cdot \pi(\theta') \ d\theta'}_{=\pi(x)}} \propto \pi(x|\theta)\cdot \pi(\theta) \]

The term Bayesian statistics may be seen as a tribute to emphasize the importance of Bayes’ theorem. Every component of Equation (10) is highly important in Bayesian statistics and thus has been coined in the literature:

  • \(\pi(\theta)\) is the prior distribution of \(\theta\) and should reflect all the available knowledge before seeing the new data \(x\).

  • \(\pi(x|\theta)\) is the likelihood of the data for a particular value for the parameter.

  • \(\pi(\theta|x)\) is the posterior distribution of \(\theta\) which incorporates the new information from \(x\). Studying the posterior is often the heart of a Bayesian analysis.

  • \(\pi(x)\) is the marginal data distribution over the whole parameter space \(\Theta\) and often computationally unfeasible.

If the marginal data distribution cannot be calculated, the posterior must be somehow approximated using the proportionality in Equation (10). Two widely-used approaches for this are variational Bayes (i.e. minimizing the distance between a simple function and the posterior) and Markov chain Monte-Carlo (MCMC; i.e. drawing samples whose limit distribution equals the posterior). The UQ dictionary covers two MCMC variants:

Many more MCMC variants exist because applicability and convergence time depend on the problem at hand as well as on the model formulation (e.g. whether the model is parametric or non-parametric). Bayesian mixture models, for instance, offer great flexibility when the data consists of heterogeneous sub-populations, but come with the price of more challenging posterior sampling. Despite the sampling algorithms, other aspects of MCMC have been researched to further boost sampling (e.g. inclusion of temperature). If you are uncertain about the model, Bayes Factors may come in handy to test hypotheses.

Author#

Jonas Bauer

Cotributors#

Hendrik Paasche