Adjoint Sensitivity Analysis in UQ [DRAFT]

Contents

Adjoint Sensitivity Analysis in UQ [DRAFT]#

Part of a series: Sensitivity analysis.

Follow reading here

Sensitivity analysis plays a crucial role in estimating the variations in the output of computational models when faced with uncertainties in input data. In uncertainty quantification (UQ) for numerical models that incorporate uncertainties in initial conditions, boundary conditions, and/or model parameters, derivative-based methods are particularly useful. Let’s consider a simple Ordinary Differential Equation (ODE) problem where initial conditions are uncertain:

\[\begin{split} \left. \begin{array}{ll} \frac{d}{dt} u(t,Z) - Z_1u(t,Z) &= 0 \\ u(0,Z) - Z_2 &= 0 \end{array} \right \}= \mathcal{E}(u,Z) = 0 . \end{split}\]

Here, the goal is to compute the expected value \(E[u]\), variance \(V[u]\) of quantities of interest (QoIs) such as the expected value at a final time \(T\) is \(R(u(Z)) = E[u(T,Z)]\) and the reduced QoI as \(\tilde{R}(Z)\).The sensitivities are the partial derivatives of QoI’s with respect to the input parameters and are essential for this task.

\[ \underbrace{\tilde{R}(Z)}_{\in \mathbb{R}} = \underbrace{\tilde{R}(Z^0)}_{\in \mathbb{R}} + \overbrace{\underbrace{\frac{\partial \tilde{R}(Z^0)}{\partial Z}}_{\in \mathbb{R}^{1 \times d}}}^{\text{sensitivities}} \underbrace{(Z - Z^0)}_{\in \mathbb{R}^d} + \mathcal{O}\left( (Z-Z^0)^2 \right). \]

The main goal in uncertainty quantification is to compute expected value and variance. In order to compute them, assume that \(Z\) has the pdf \(f\). Then, the expected value can be approximated as follows:

\[\begin{split} \begin{split} E[\tilde{R}(Z)] &= \int \tilde{R}(z) f(z) dz \\ &\approx \int \left( \tilde{R}(Z^0) + \frac{\partial \tilde{R} (Z^0)}{\partial Z}(z-Z^0) \right) f(z) dz \\ &= \tilde{R}(Z^0) + \frac{\tilde{R}(Z^0)}{\partial z} \int (z - Z^0) f(z)dz. \end{split} \end{split}\]

Linearizing around \(Z^0\) yields, \(Z^0 = E [Z]\), then \(E[\tilde{R}(Z)] = \tilde{R}(E[Z])\). Furthermore, the variance calculation is done as follows,

\[\begin{split} \begin{split} Var[\tilde{R}(Z)] &= E \left[ \left( \tilde{R}(Z)-\tilde{R}(Z^0) \right)^2 \right] \\ &= E \left[ \left( \tilde{R}(Z) \right)^2 \right] \\ &\approx \int \left( \frac{\partial \tilde{R}(Z^0)}{\partial z} (Z-Z^0) \right)^2 f(z)dz \\ &= \int S^T (Z-Z^0)(Z-Z^0)^T Sf(z)dz \\ &= S^T \underbrace{\int (Z-Z^0)(Z-Z^0)^T f(z)dz}_{cov(Z) \in \mathbb{R^{d \times d}}} \ S\\ &= S^T \ cov(Z) \ S. \end{split} \end{split}\]

The last expression is know as the sandwich rule.

The question of computing the sensitivities is still unanswered. Let’s perform a notational change of \(z\) for the uncertain parameters instead of \(Z\) (here interest is in calculating derivatives but not in randomness). Moreover, let’s define

\[ S^T = \frac{\partial \tilde{R}(z)}{\partial z} = \frac{\partial R (u(z))}{\partial u} = \frac{\partial}{\partial u} R(u(z)) \cdot \frac{\partial }{\partial z} u(z), \]

where \(z \in \mathbb{R}^d\). In order to get \(\frac{\partial }{\partial z} u(z)\), \eqref{DSA_1} is differentiated,

\[\begin{split} \begin{split} \mathcal{E}(u(u(z),z)) &= 0 \\ \underbrace{ \frac{\partial}{\partial u} \mathcal{E}(u(z),z) }_{\mathbb{R}^{n \times n}} \cdot \underbrace{ \frac{\partial }{\partial z} u(z)}_{\mathbb{R}^{n \times d}} + \underbrace{\frac{\partial}{\partial z} \mathcal{E}(u(z),z)}_{\mathbb{R}^{n \times d}} &= 0 \end{split} \end{split}\]

The dimensions of all the quantities are

\[ z \in \mathbb{R}^d , \ \ u(z) \in \mathbb{R}^n \ \ , R(u(z)) \in \mathbb{R}^q \]

and also the problem, \(\mathcal{E}(u,z) \in \mathbb{R}^n\). Upon making assumptions of existence and uniqueness and also invertiblity of \(\frac{\partial }{\partial u} \mathcal{E}(u(z),z)\), allows to write

\[ \frac{\partial }{\partial z} u(z) = - \left( \frac{\partial}{\partial u} \mathcal{E}(u(z),z) \right)^{-1} \frac{\partial}{\partial z} \mathcal{E}(u(u(z),z)). \]

In order to compute \(\frac{\partial}{\partial}u\), the linear system

\[ \frac{\partial}{\partial u} \mathcal{E} (u(z),z) \cdot \frac{\partial}{\partial z} u(z) = - \frac{\partial}{\partial z} \mathcal{E}(u(z),z). \]

has to be solved. Thus,

\[ S^T = \frac{\partial}{\partial u} R(u(z)) \cdot \left( \frac{\partial }{\partial u} \mathcal{E}(u(z),z)\right)^{-1} \frac{\partial}{\partial z} \mathcal{E}(u(z),z). \]

In the last step \(d\) linear systems of equations of size \(n \times n\) have to be solved. For small \(d\) solving this system can be not so difficult and a method for this task can be called as a direct sensitivity method. However, as \(d\) increases solving the system becomes computationally expensive. In this case a method called ajoint sensitivity method can be introduced. Let’s consider \(S^T\):

\[ \underbrace{S}_{\mathbb{R}^{d \times q}} = \underbrace{\left( - \frac{\partial }{\partial z} \mathcal{E}[u(z),z] \right)^{T}}_{\mathcal{R}^{d \times n}}\cdot \underbrace{\left( - \frac{\partial }{\partial u} \mathcal{E}[u(z),z] \right)^{-T}}_{\mathcal{R}^{n \times n}} \cdot \underbrace{\left( \frac{\partial }{\partial u} R(u(z))\right)^T}_{\mathcal{R}^{n \times q}}. \]

Here, one needs to solve:

\[ \left( \frac{\partial}{\partial u} \mathcal{E} [u(z),z] \right)^T \ \ \text{matrix} \]

and

\[ \left( \frac{\partial}{\partial u} R(u(z)) \right)^T \ \ \text{right hand side matrix.} \]

In this new formulation only \(q\) linear systems of size \(n \times n\) have to be solved and a \(q\) is usually quite small then \(d\), the adjoint sensitivity method saves a huge amount of computational cost for a very large \(d\). This technique is also called back propagation in the field of artificial neural networks (NN).

The formulation can be further simplified using a Lagrangian:

\[ S = - \left( \frac{\partial }{\partial z} \mathcal{E}[u(z),Z] \right)^T \cdot \underbrace{\left( \frac{\partial }{\partial u} \mathcal{E}[u(z),Z] \right)^{-T} \cdot \left( \frac{ \partial }{\partial u} R(u(z)) \right)^T}_{\lambda}. \]

This can be further simplified as,

\[ \left( \frac{\partial }{\partial u} \mathcal{E}[u(z),z] \right)^T \lambda = \left( \frac{\partial }{\partial u} R(u(z))\right)^T. \]

The lagrangian formulation can then be written in the form of QoI as,

\[ L(u,z,\lambda) = R(u) - \lambda^T \mathcal{E}(u,z). \]

The partial derivatives of this expression with respect to \(\lambda\), \(u\) and \(z\) yields an expression for adjoint sensitivity as,

\[ \frac{\partial }{\partial z} L(u,z,\lambda) = S^T = \left( \frac{\partial }{\partial z} \tilde{R}(z) \right)^T = \left( \frac{ \partial }{\partial z} R(u(z)) \right)^T. \]

This idea of adjoint sensitivity method can be implemented not just only at finite dimensional systems but can also extended for ODEs and PDEs. (For further explanation see Prof. Martin Frank’s lecture notes).

Authors#

Maqsood Mubarak Rajput