Data Assimilation [Draft]


Data Assimilation [Draft]#

This article is currently under construction.

When describing a physical process, for example, the weather on Earth, or an engineered system, such as a self-driving car, we typically have two primary information sources. One of the information sources is a mathematical model, and the other second is information collected by measurement (observation). Mathematical models are uncertain because of modeling approximations, unknown processes acting, and uncertain initial conditions. On the other hand, observations are imperfect versions of reality and have errors. On the other hand, models are uncertain due to modeling approximations, unknown processes acting, and uncertain initial conditions. Therefore, we want to use both information sources to make the best predictions for the weather or, most effectively, operate the self-driving car. Data assimilation (DA) just does so. It describes the mathematical, numerical, and computational framework for combining the information from mathematical models and observations to estimate the system’s state in hand best. DA combines models and observations in a way that accounts for the uncertainties of both.

The question is how DA combines the information from mathematical models and observations. There are currently three main approaches to data assimilation.

  1. Solve the actual problem

  2. Solve an optimization problem

  3. Solve a simplified problem

An elegant way to perform data assimilation is to compute the conditional probabilities describing the mathematical model given the observation [Asch et al., 2016, Law et al., 2015]. The mathematical model gives us specific probabilities, for example, in the weather forecast, the probability of rain or sun next weekend, and the observations are then used to improve these probabilities. Let us explain this using an example.

Suppose we roll a die and must guess which number comes up. The appearance of one number, say 1, is as likely as any other five numbers. That means the probability of rolling a 1, 2, 3, 4, 5, or 6 is 1/6. In this case, the distribution is called “uniform”. The expected value is the sum of the numbers multiplied by their probabilities; therefore, the expected value, E = 3.5. We can also approximate this expected value by repeatedly rolling the dice, that is, by drawing “samples”. The approximation gets better when we have more samples. If we could draw infinitely many samples, we would obtain the expected value of precisely 3.5 [Chorin and Hald, 2013].

The goal is to estimate the expected value of the distribution given by the conditional probabilities. The sampling method, as illustrated here, is called the Monte Carlo method. More precisely, this method estimates an expected value by drawing many samples from a given distribution and finding the average of all of these samples. The estimate should be closer to the desired value as we take more samples.

The main justification for using Monte Carlo methods comes from what is known as the “laws of large numbers”. There are two laws - the “weak” and the “strong” law. Roughly speaking, the weak law tells us that if we assume that the expected value we want to estimate exists, as we take the average of more and more samples, the error between the sample average and the desired expected value shrinks to zero. The strong law gives us slightly more information. It says that for any chosen margin of error, no matter how small, a number will always exist such that if we average at least that many samples, we will have an error that is, at most, the one we have chosen. Notice that we had to assume that the expected value we want to estimate exists. Nevertheless, we have already done this: if we did not believe this value existed, we would not be trying to estimate it.

In practice, when we use Monte Carlo for data assimilation, we create ways of drawing samples from the conditional probabilities defined jointly by our mathematical model and observation. Many scientists and mathematicians work together to find new and effective ways to draw samples from probability distributions that are not necessarily well understood. Designing such methods for data assimilation is very complicated. Briefly, the reason is that doing data assimilation in this way becomes increasingly difficult the larger the problem is in terms of the number of variables to account for and the quantity of data collected. This is called the “curse of dimensionality,” and for this reason, data assimilation techniques of this kind are only used in relatively “small” problems.

The other two techniques mentioned above, optimization and simplification of the problem, are often used in practice, even on large and significant problems such as numerical weather prediction. By an “optimization problem,” we mean that you try to calibrate your model so that its output is as close as possible to the observations. One can quantify the mismatch or error between model output and data by what is called a “cost function,” and your “optimal” model output is the one that yields the lowest error, which is also the smallest value given by the cost function. Typically, the cost function is related to formulating the data assimilation problem by conditional probabilities again [Talagrand and Courtier, 1987]. Optimization-based data assimilation algorithms are in use in operational numerical weather prediction, and some people say that progress in data assimilation is one of the main drivers for the increase in forecast skill over the past decades [Bauer et al., 2015].

However, since implementing these techniques in the form of computer code can be tricky, there are problems for which this optimization approach is too complicated and challenging. One can use a compelling method to solve these complex problems - just not doing it. Instead, we solve an easier problem whose solution is not too far from the difficult problem we really want to solve.

If the mathematical model is simple enough, the apparent conditional probabilities are also simple. By a simple mathematical model, we mean one that is linear. Linear models that appear in data assimilation satisfy similar conditions, and linear models are also “easy” to deal with. A linear model gives rise to “easy” conditional probabilities, described by a “Gaussian” distribution. What makes Gaussian probabilities simple is that all we need to know about them are two numbers, namely, the mean and the variance. The mean is the average value we expect to encounter - it is another name for the expected value. The variance describes the variation around the mean one should expect when performing repeated experiments. Unfortunately, almost every physical process or engineered system does not appear naturally as a linear model, and the probabilities we observe in practice are rarely Gaussian.

One way forward is to “linearize” our model. That means we replace the actual model with a linear version of it. The linearized model now gives rise to Gaussian probabilities, so we are in business to solve this simplified problem. The solution obtained can be very close to the solution of the nonlinear problem if the linearization is done carefully enough to capture most of the dominant behaviors of our model. The Kalman Filter is the original data assimilation method for a class of linear problems, and versions of it, Ensemble Kalman Filter are used in many nonlinear problems, for example, in Earth System Models.

Although seemingly straightforward, several challenges complicate its implementation. While the problem is elegantly expressed in terms of conditional probabilities, complexities are associated with its formulation. Setting up conditional probabilities relies on assumptions about the distribution of errors. One must determine the expected deviation between model outputs and collected data, which requires statements about things we truly do not know in a precise mathematical sense. For example, how can we define the errors of a satellite temperature measurement to the output of a model which does not measure temperature directly? Sometimes, one tries a few (or many) assumptions that describe such errors and then uses the one that leads to the “best” results. Here “best” usually refers to the assumptions that lead to the best fit between the model and measurement.

Another difficulty is that the conditional probabilities are often far from “standard,” that is, far from probabilities that mathematicians or scientists understand well. This is due mainly to the fact that models of complicated physical or engineered processes are also complicated. Drawing samples from conditional probabilities that arise from complex mathematical models is a current topic of mathematical research.

A third difficulty is that many problems are “big.” When we say “big,” we refer to the vast number of variables involved in defining the conditional probabilities. Returning to numerical weather prediction, a global weather model requires the specification of many variables. Ideally, we would need to define meteorological quantities such as temperature, wind, and humidity at every on the earth. However, achieving this level of detail is unattainable because it would require infinite numbers of values. Instead, a more feasible approach is adopted by defining meteorological variables on a predefined “grid” that covers a significant but finite number of global locations. In this way, a global numerical weather model still involves several hundred million variables. Moreover, the number of data points is significant. In practice, millions of measurements are utilized in a global data assimilative model every six hours.

The model itself and the data assimilation method must then be implemented as computer code. For a problem of the size of the global atmosphere, this requires careful algorithm design and coding. Current data assimilation technology for numerical weather prediction is often based on optimization techniques or methods for simplified (linearized) problems. Many researchers also work on combining the three approaches.

In conclusion, data assimilation is a robust framework to merge a mathematical model and observations to estimate the state of the modeling system best by addressing the uncertainties associated with both model and observations. Many scientific and engineering problems require data assimilation. The list of problems where data assimilation is useful is very long but includes numerical weather prediction, oceanography, hydrology, personalized medicine, cognitive science, and robotics.

Given data assimilation’s wide use in different applications, it is surprising that these problems have essentially the same formulation regarding conditional probabilities and three main avenues to success (Monte Carlo methods, optimization, and linearization). It is up to the scientist and engineer to combine these methods from a suitable recipe to solve the data assimilation problem. In the UQ project, we analyze linearized approximation for UQ in Ocean Biogeochemical Models.



Mark Asch, Marc Bocquet, and Maëlle Nodet. Data Assimilation: Methods, Algorithms, and Applications. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, USA, 2016.


Peter Bauer, Alan Thorpe, and Gilbert Brunet. The quiet revolution of numerical weather prediction. Nature, 525(7567):47–55, 2015. URL:, doi:10.1038/nature14956.


Alexandre J. Chorin and Ole H Hald. Stochastic Tools in Mathematics and Science. Texts in Applied Mathematics. Springer New York, NY, 3 edition, 2013.


Kody Law, Andrew Stuart, and Konstantinos Zygalakis. Data Assimilation: A Mathematical Introduction. Springer International Publishing, Switzerland, 2015.


Olivier Talagrand and Philippe Courtier. Variational assimilation of meteorological observations with the adjoint vorticity equation. i: theory. Quarterly Journal of the Royal Meteorological Society, 113(478):1311–1328, 1987. URL:, doi:


Nabir Mamnun