accurate solution of bayesian inverse uncertainty...

This is a repository copy of Accurate solution of Bayesian inverse uncertainty quantification problems combining reduced basis methods and reduction error models.

White Rose Research Online URL for this paper:http://eprints.whiterose.ac.uk/95248/

Version: Accepted Version

Article:

Manzoni, A., Pagani, S. and Lassila, T. (2016) Accurate solution of Bayesian inverse uncertainty quantification problems combining reduced basis methods and reduction error models. SIAM/ASA Journal on Uncertainty Quantification, 4 (1). pp. 380-412. ISSN 2166-2525

https://doi.org/10.1137/140995817

[email protected]://eprints.whiterose.ac.uk/

Reuse

Unless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher’s website.

Takedown

If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.

mailto:[email protected]

https://eprints.whiterose.ac.uk/

Accurate solution of Bayesian inverse uncertainty

quantification problems combining reduced basis methods

and reduction error models ∗

A. Manzoni† S. Pagani‡ T. Lassila§

December 9, 2015

Abstract

Computational inverse problems related to partial differential equations (PDEs) often con-tain nuisance parameters that cannot be effectively identified but still need to be considered aspart of the problem. The objective of this work is to show how to take advantage of a reducedorder framework to speed up Bayesian inversion on the identifiable parameters of the system,while marginalizing away the (potentially large number of) nuisance parameters. The keyingredients are twofold. On the one hand, we rely on a reduced basis (RB) method, equippedwith computable a posteriori error bounds, to speed up the solution of the forward problem.On the other hand, we develop suitable reduction error models (REMs) to quantify in aninexpensive way the error between the full-order and the reduced-order approximation of theforward problem, in order to gauge the effect of this error on the posterior distribution of theidentifiable parameters. Numerical results dealing with inverse problems governed by ellipticPDEs in the case of both scalar parameters and parametric fields highlight the combined roleplayed by RB accuracy and REM effectivity.

1 Introduction

The efficient solution of inverse problems governed by partial differential equations (PDEs) is arelevant challenge from both a theoretical and a computational standpoint. In these problems,unknown or uncertain parameters related to a PDE model have to be estimated from indirectobservations of suitable quantities of interest. Being able to design efficient solvers for inverseproblems is paramount in several applications, ranging from life sciences (e.g., electrical impedancetomography [26, 32], characterization of myocardial ischemias and identification of blood flowparameters [42, 21]), to material sciences (e.g. scattering problems [10] or subsurface damagedetection [3]) and environmental sciences (e.g., identification of permeability in groundwater flows[29], or basal sliding in ice dynamics [34]).

In a parametrized context, a forward problem consists in evaluating some outputs of interest(depending on the PDE solution) for specified parameter inputs. Whenever some parameters areuncertain, they can be identified by considering either a deterministic or a statistical framework.In the former case, we solve an optimization problem by minimizing (e.g. in the least-square sense)the discrepancy between the output quantities predicted by the PDE model and the observations.In the latter case, we assess the relative likelihood of the parameters which are consistent with the

∗We are grateful to Prof. Anna Maria Paganoni (Politecnico di Milano) for her valuable insights and carefulremarks, and to Prof. Alfio Quarteroni (EPFL and Politecnico di Milano) for his useful suggestions. This work wassupported by the Italian “National Group of Computing Science” (GNCS-INDAM).

†CMCS-MATHICSE-SB, Ecole Polytechnique Fédérale de Lausanne, Station 8, CH-1015 Lausanne, Switzerland,[email protected] (corresponding author)

‡MOX, Dipartimento di Matematica, Politecnico di Milano, P.za Leonardo da Vinci 32, I-20133 Milano, Italy,[email protected]

§CISTIB, Department of Electronic and Electrical Engineering, University of Sheffield, Mappin Street, SheffieldS1 3JD, United Kingdom, [email protected]

1

observed output, that is, we need to quantify uncertainties associated with the identifiable param-eters due to measurement errors and to nuisance parameters. Such a problem can be referred to asinverse uncertainty quantification (UQ) problem. By relying on a Bayesian approach, we model theunknown parameters as random variables and characterize their posterior probability density func-tion, which includes information both on prior knowledge on the parameters distribution and onthe model used to compute the PDE-based outputs [44]. In this way, inference of unknown param-eters from noisy data accounts for the information coming from (possibly complex and nonlinear)physical models [45, 19]. Nevertheless, we need to face some key numerical challenges, related to(i) parametric dimensionality, (ii) slow Markov chain Monte Carlo (MCMC) convergence and (iii)many forward queries; see e.g. [16, 40] for a general introduction to MCMC techniques. While thefirst issue can be addressed by considering a parametric reduction (e.g. through a modal decom-position exploiting a Karhunen-Loève expansion [26] or suitable greedy algorithms [25]), severaltechniques have emerged in the last decade to speed up both MCMC sampling algorithms [28, 23]and the solution of the forward problem.

In this work we focus on this latter aspect, showing how a low-dimensional, projection-basedreduced-order model (ROM) can be exploited to speed up the solution of Bayesian inverse problemsdealing with parametrized PDEs. Among projection-based ROMs, reduced basis (RB) methodsbuilt through greedy algorithms [6, 21] or proper orthogonal decomposition (POD) [14, 26, 29, 39]have been already successfully exploited in this field during the last decade. Such methods ap-proximate the solution of the forward problem by a handful of snapshots of the full-order problem– i.e., solutions computed for properly selected parameter values; as a result, each forward querybecomes relatively inexpensive and the overall computational cost of Bayesian inversion can begreatly reduced. Very recently, a possible way to compute snapshots adaptively from the posteriordistribution, yielding a data-driven ROM, has been shown in [12]. Proper generalized decompo-sition has also been combined with stochastic spectral methods to deal with dynamical systemsin the presence of stochastic parametric uncertainties [11]. Besides projection-based ROMs, alow-fidelity model can be also built according to simplified physics, coarser discretizations, or mul-tiscale formulations. Such a model can also be equipped with correction functions using globalpolynomials in term of the stochastic parameters. For instance, non-intrusive polynomial chaosusing orthogonal polynomials [15] and stochastic collocation using interpolation polynomials [2, 47]have been developed in conjunction with physics-based low fidelity models [31]. See, e.g., [6] for adetailed discussion on the use of low-fidelity or surrogate models to speed up inverse problems.

A relevant question, arising when a ROM is exploited to solve inverse UQ problems, is relatedto the propagation of reduction errors along the inversion process. In other words, we need toquantify those uncertainties due to the use of a ROM and associated with the identifiable pa-rameters, to which we can refer to as ROM uncertainty. This latter can be seen as a form ofepistemic uncertainty (i.e., pertaining to uncertainty about a deterministic quantity that never-theless cannot be fully ascertained) and its quantification is essential in order to obtain preciseand robust solutions to the inverse UQ problem. ROM uncertainty is indeed quite similar to theso-called model form uncertainty, arising whenever a limited understanding of the modeled processis available [18, 17]. Concerning the modeling of the reduction error itself, different approachesto approximation have been considered very recently: the so-called approximation error model[1, 26], Gaussian process (GP)-based calibration (or GP machine learning) [39], interpolant-basederror indicators [33], and regression-based error indicators [13]. In all of these cases, a statisticalrepresentation of the ROM error through calibration experiments is used to model the differencebetween the full-order and the lower-order model. In the approximation error model, an additiveGaussian noise term is introduced in the model to represent the reduction error; in GP machinelearning a Gaussian process is used, estimating the covariance function through the calibrationexperiments. However, error characterization through well-defined probability distributions mayimpose a large amount of (sometimes unjustified) statistical structure, thus leading to inaccurateresults. For instance, in the approximation error model, the uncertainty yielded by the ROM isdescribed as an independent Gaussian error term, which might be a restrictive assumption (seeSect. 7). On the other hand, GP machine learning techniques entail more severe computationalcosts; a possible reduction technique in this context has been recently presented in [13] and takesadvantage of a posteriori error bounds, weighted with a Gaussian process to generate a more accu-

2

rate correction. One of our proposed reduction error models (see Sect. 4.3) shares some similaritieswith this approach, but enables variance estimations directly from a regression analysis and aneasier treatment of sign effects/corrections, this latter aspect being not treated in [13].

In contrast to the approaches taken in [13, 33], where the objective was to construct reductionerror models that could be used to train or adapt the ROM accordingly in the case that no othererror estimator was readily available, our proposed approach is instead aimed at the accurate so-lution of inverse problems using ROMs, and may use existing ROM error bounds as additionalinformation, if available. In previous work [21] we have shown that using low-fidelity ROMs assurrogates for solving inverse problems can lead to biased and overly optimistic posterior distri-butions. Our goal is to exploit the existing features of the ROM, such as rigorous a posteriorierror estimators, and then to correct for the reduction error within the Bayesian inversion processthrough suitable reduction error models (REMs). By extending some preliminary ideas presentedin [21], we show how to take advantage of RB error bounds to gain a strong computational speedupwithout necessarily increasing the size of the reduced basis or the resulting online computationalcost. Hence, we (i) propose three reduction error models (REMs) to manage ROM uncertainties,(ii) include them within the Bayesian computational framework, and (iii) show how they allow toobtain posterior distributions that are free of bias and more reliable than those provided by theROM alone.

First, we test the approximation error model of [1] (called REM-1 from here on), which requiresno a posteriori error bounds to be constructed but however may perform quite poorly when dealingwith problems involving nuisance parameters and/or error empirical distributions far from theGaussian case. Then, we present two alternatives. In the second approach (REM-2), a radial basisinterpolant allows to evaluate a surrogate reduction error over the parameter space, relying on fewcalibration experiments. A special feature of this interpolant-based approach compared to otherpresented in literature is that instead of directly interpolating the ROM error we interpolate the(inverse) effectivity of the ROM error estimator. In the third approach (REM-3), we fit a log-linearregression model to explain the variability of the reduction errors by means of the correspondingerror bounds.

The structure of the paper is as follows. In Sect. 2 we provide a general formulation of the classof problems we are interested to, whereas in Sect. 3 we recall the main properties of RB methods.In Sect. 4, we introduce the three reduced error models considered, and we show how to incorporatethem into the Bayesian estimator and in Sect. 5 we sum up the proposed computational procedurefor the solution of a reduced inverse problem. In Sect. 6 we prove some results related to theeffectivity of the corrections made by the REMs, on the reduced-order likelihood function and thecorresponding posterior distribution. We assess the performance of the proposed framework ontwo numerical examples of inverse problems governed by elliptic PDEs in Sect. 7, and finally drawsome conclusions in Sect. 8.

2 Bayesian inverse problems governed by PDEs

Let us introduce the abstract formulation of the forward problem modeling our system of interest,and recall the basic features of the Bayesian framework for the solution of inverse problems governedby PDEs.

2.1 Abstract formulation of forward problem

In this paper we consider systems modelled by linear parametrized PDEs and (a set of) linearoutputs of interest. Let us denote by X a Hilbert space of functions defined over a domain Ω ⊂ R

d,d = 1, 2, 3. In the case of second-order, elliptic PDE operators typically (H1

0 (Ω))ν ⊂ X ⊂ (H1(Ω))ν ,

where ν = 1 (resp. ν = d) in the case of scalar (resp. vectorial) problems. Here we restrict ourselvesto the scalar case; the extension to the vectorial case is straightforward, concerning the formulationof an inverse problem. The forward problem depends on a finite-dimensional set of parametersµ ∈ P ⊂ R

P .

3

Its abstract formulation reads as follows: given µ ∈ P, find u(µ) ∈ X such that

a(u(µ), v;µ) = f(v;µ) ∀v ∈ X (state equation)

s(µ) = ℓ(u(µ)) (observation equation).(1)

Here a : X × X × P → R and f : X × P → R are a parametrized bilinear (resp. linear) form,s : P → R

s and ℓ = (ℓ1, . . . , ℓs), being ℓj : X → R a linear form for any j = 1, . . . , s. Thisassumption is not as restrictive as one would think, since for example field-valued inverse problemscan be treated by discretizing the underlying field with sufficient accuracy (P ≫ 1) or by employinga truncated Karhunen-Loève expansion. In particular, we assume that the parameter vector isdivided into two parts: µ = (γ, ζ) ∈ R

Pγ+Pζ ; we denote by γ the identifiable parameters, andby ζ the nuisance parameters. For the cases at hand, both identifiable and nuisance parameterswill be related to physical properties of the system, thus entering the differential operator andpossibly boundary conditions and source terms. See e.g. [21] for further details about the case ofgeometrical parameters.

We require that the forward problem is well-posed for any choice of the parameter vector µ ∈ P.To this aim, we assume that a(·, ·;µ) is continuous and uniformly coercive over X, for any µ ∈ P,and that f(·;µ) is continuous, that is, f(·;µ) ∈ X ′ for any µ ∈ P, being X ′ the dual space of X.We also require that ℓj ∈ X ′ for any j = 1, . . . , s. The Lax-Milgram lemma ensures uniqueness ofthe solution (and its continuous dependence from data) for any µ ∈ P. This framework can alsobe adapted to stable problems in the sense of an inf-sup condition, by using the Babuška-Nečastheorem; see e.g. [36] for further details.

A numerical approximation of the forward problem (1) can be obtained by introducing, e.g.,a Galerkin-finite element (FE) method relying on a finite-dimensional space Xh ⊂ X of (possiblyvery large) dimension dim(Xh) = Nh; here h denotes a FE mesh parameter. Hence, the full-ordermodel (FOM) related to the forward problem reads as follows: given µ ∈ P, find uh(µ) ∈ Xh suchthat

a(uh(µ), vh;µ) = f(vh;µ) ∀vh ∈ Xh (FOM state),

sh(µ) = ℓ(uh(µ)) (FOM observation).(2)

Under the above assumptions, problem (2) is well-posed and admits a unique solution for anyµ ∈ P. In particular, the following continuous dependence on data estimate holds:

‖uh(µ)‖X ≤ ‖f(·;µ)‖X′

αh(µ)∀µ ∈ P

being αh(µ) the (discrete) stability factor related to the PDE operator, that is

αh(µ) = infvh∈Xh

a(vh, vh;µ)

‖vh‖2X≥ α0 > 0 ∀µ ∈ P,

for a suitable α0 ∈ R. Being able to efficiently evaluate a tight lower bound 0 < αLBh (µ) ≤ αh(µ)of the stability factor for any µ ∈ P plays a key role in the a posteriori error bounds related to theRB approximation and, finally, in the solution of the inverse problem.

If the forward problem consists of solving (2) to predict the outcome of an experiment – bycomputing u(µ) and evaluating the output s(µ) – in an inverse problem observed data or measure-ments s

∗ are used to estimate unknown parameters µ characterizing the physical system. Sucha problem can be cast in the optimal control framework, see e.g. [21] for further details. If s

∗

is an experimental measure, possibly polluted by measurement error, we need to rely instead ona statistical framework in order to quantify uncertainties (due to both measurement errors andnuisance parameters) on the estimated parameter values.

2.2 Bayesian framework

We consider a Bayesian framework [19, 44, 45] for the solution of the inverse UQ problems. Wemodel both the observations s∗ and the parameters µ as random variables, by introducing suitableprobability density functions (PDFs). The solution of the inverse problem is given by a point or

4

interval estimation computed on the basis of the posterior probability density πpost : P × Y → R+0 ,

i.e. the probability density of the parameter µ given the measured value of s∗, which can be

obtained asπpost(µ | s∗) = π(s∗ | µ) πprior(µ)

η(s∗)(3)

thanks to Bayes’ theorem. Here πprior : P → R+0 is the prior probability density, expressing all

available information on µ independently of the measurements on s∗ that will be considered as

data; π : Y × P → R+0 is the likelihood function of s∗ conditionally to µ; finally

η(s∗) =

∫

P

π(s∗ | µ) πprior(µ).

In order to describe measurement errors, we consider an additive noise model, that is, if wesuppose that µ is the true parameter, the outcome of the experiment is

s∗ = sh(µ) + εnoise = ℓ(uh(µ)) + εnoise, (4)

where the measurement noise εnoise follows a probability distribution πnoise. In this way, our dataare d noisy s-variate measures s∗1, . . . , s∗d, s∗i ∈ R

s for any i = 1, . . . , d, modelled by assumingthat the outcome of the numerical model is given by the output evaluated for the true parametervalue. The most typical description of experimental uncertainties is the Gaussian model, that is,we deal with normally distributed, uncorrelated errors εnoise ∼ N (0, σ2

i δij), i, j = 1, . . . , s, withknown variances σ2

i , independent of µ. We also denote the likelihood function appearing in (3) byhighlighting the dependence on the FE space, as

π(s∗|µ) = πh(s∗|µ) = πε(s∗ − sh(µ)) (5)

so that the expression of the posterior PDF given by (3) is as follows:

πhpost(µ | s∗) = πh(s∗|µ) πprior(µ)ηh(s∗)

, being ηh(s∗) =

∫

P

πh(s∗|µ) πprior(µ). (6)

We highlight the dependence of the posterior PDF on the FE space, too. If the output dependslinearly on the parameters and we choose a Gaussian prior, the posterior is also Gaussian. Instead,as soon as µ 7→ s(µ) is a nonlinear map, the expression of the likelihood function yields a posteriordistribution which cannot be written in closed form, requiring instead an exhaustive explorationof the parameter space. This becomes very hard to perform, above all if the parameter spacehas a large dimension. We then need to rely on MCMC to sample the posterior PDF, such asthe well-known Metropolis-Hastings or Gibbs sampling techniques [16, 24]. These methods areexploited to draw a sequence of random samples from a (multi-dimensional) PDF which cannotbe expressed in closed form. This is meant in order not only to approximate the posterior PDF,but also to compute integrals related to this distribution. Then, since we are not interested in thenuisance parameters ζ, we proceed to marginalize them. This leads to computing the conditionalmarginal distribution

πhpost(γ | s∗) = 1

ηh(s∗)

∫

Pζ

πh(s∗|µ) πprior(γ, ζ) dζ. (7)

MCMC methods are needed to evaluate (possibly) high-dimensional integrals like the one in (7).These methods involve repeated evaluations of the likelihood function πh(s∗ | γ, ζ) – and thusrepeated evaluations of the forward problem (2) – so that relying on the FOM would be tooexpensive also in the case of linear elliptic problems.

Therefore, we seek to replace (2) with a computationally less expensive, reduced-order modelproviding an inexpensive approximation un(µ) to uh(µ). This allows to compute a reduced-order(and inexpensive) approximation sn(µ) to the full-order output sh(µ). Replacing the full-orderlikelihood function πh with its ROM approximation

πn(s∗|µ) = πε(s∗ − sn(µ)) (8)

clearly affects the posterior distribution, which changes as follows:

5

πnpost(µ | s∗) = πn(s∗|µ) πprior(µ)ηn(s∗)

, being ηn(s∗) =

∫

P

πn(s∗|µ) πprior(µ). (9)

Consequently, the marginal PDF of the identifiable parameters becomes

πnpost(γ | s∗) = 1

ηn(s∗)

∫

Pζ

πn(s∗|µ) πprior(γ, ζ) dζ. (10)

Nevertheless, being able to quantify the ROM effect on the inversion procedure by relying ona suitable measure of the error sh(µ)− sn(µ) is paramount: the main goal of this paper is to setup a suitable procedure to answer this question.

3 Reduced order models and a posteriori error bounds

Solving large PDE systems for several parameter values may require huge computational resources,unless efficient and reliable ROMs for parametrized PDEs are used. In this paper we rely on acertified reduced basis method, whose basic ingredients are recalled in this section. A generalintroduction to this method can be found e.g., in [37]; see also [41].

3.1 Reduced subspaces and projection-based ROMs

The RB method is a projection-based ROM which allows to compute an approximation un(µ) ofthe solution uh(µ) (as well as an approximation sn(µ) of the output sh(µ)) through a Galerkinprojection onto a reduced subspace Xn; given µ ∈ P, find un(µ) ∈ Xn s.t.

a(un(µ), vn;µ) = f(vn;µ) ∀vn ∈ Xn (ROM state)

sn(µ) = ℓ(un(µ)) (ROM observation),(11)

where dim(Xn) = n ≪ Nh. The reduced subspace Xn is constructed from a set of (well-chosen)full-order solutions, usually by exploiting one of the following techniques [22, 38]:

• Greedy algorithm [35, 46]. Basis functions are obtained by orthonormalizing a set of full-ordersolutions, corresponding to a specific choice Sn = µ1, . . . ,µn of parameter values, built bymeans of the following greedy procedure. Let us denote Ξtrain ⊂ P a (sufficiently rich) finitetraining sample, selected from P according to a uniform distribution. Given a prescribedµ1 ∈ Ξtrain and a sharp, inexpensive error bound ∆n(µ) (see Sect. 3.3) such that

‖uh(µ)− un(µ)‖X ≤ ∆n(µ) for all µ ∈ P,

we choose the remaining parameter values as

µn := arg maxµ∈Ξtrain

∆n−1(µ), for n = 2, . . . , nmax

until ∆nmax(µ) ≤ εRBtol for all µ ∈ Ξtrain, being εRBtol a suitably small tolerance.

• Proper orthogonal decomposition [4, 43]. In this case, the reduced subspace Xn is given by thefirst n (left) singular vectors of the snapshot matrix S = [uh(µ

1) | . . . | uh(µNs)] ∈ RNh×Ns ,

corresponding to the largest n singular values σ1 ≥ σ2 ≥ . . . ≥ σn. Here uh(µ1), . . . , uh(µNs)

areNs full-order solutions of the forward problem, computed for a random sample µ1, . . . ,µNs .By construction, the POD basis is orthonormal; moreover, the error in the basis is equal tothe squares of the singular values corresponding to the neglected modes, and the maximumsubspace dimension is such that

∑ri=n+1 σ

2i ≤ εPODtol , r = minNs, Nh, being εPODtol a suit-

ably small tolerance.

3.2 Affine parametric dependence and Offline/Online decomposition

Constructing the reduced subspace requires several evaluation of the FOM, which are performedonly once, during the so-called Offline stage. Each Online evaluation of the reduced solution (and

6

related output) requires to solve a problem of very small dimension n≪ Nh. Such an Offline/Onlinedecomposition is made possible under the assumption that a suitable affine parametric dependenceproperty is fulfilled by the µ-dependent operators. Hence, we require that a(·, ·;µ), f(·;µ) can bewritten as a separable expansion of µ-independent bilinear/linear forms:

a(u, v;µ) =

Qa∑

q=1

Θaq (µ)aq(u, v), f(v;µ) =

Qf∑

q=1

Θfq (µ)fq(v)

for some integers Qa, Qf . A similar decomposition would be required also on the linear forms ℓj ,j = 1, . . . , s, if they were µ-dependent, too.

3.3 A posteriori error bounds

We can easily derive an a posteriori (residual-based) error bound with respect to the full-ordersolution, for both the PDE solution and linear outputs [37, 38]. Let us denote by r(w;µ) =f(w;µ)− a(un(µ), v;µ), for any w ∈ Xh, the residual of the state equation (evaluated on the RBsolution un(µ)) and its dual norm by ‖r(·;µ)‖X′ = supv∈Xh

r(v;µ)/‖v‖X . Then, the error boundon the solution reads as follows:

‖uh(µ)− un(µ)‖X ≤ ∆n(µ) :=‖r(·;µ)‖X′

αLBh (µ)∀µ ∈ P. (12)

We remark that also the computation of the dual norm of residuals, as well as of the lower boundαLBh to the stability factors, takes advantage of a similar Offline-Online stratagem, allowing to getan inexpensive evaluation of the error bound for each µ ∈ P; see e.g. [38].

Regarding the error bound on the output, which is relevant to the Bayesian inversion, we recallhere its expression in the case of linear, noncompliant outputs, as the numerical test cases presentedin the remainder deal with this situation. For any ℓj ∈ X ′, let us introduce the following (full-orderapproximation of the) dual problem: find ψjh(µ) ∈ Xh such that

a(vh, ψjh(µ);µ) = −ℓj(vh) ∀ vh ∈ Xh.

In addition to the reduced space for the forward problem (2) (which can be referred to as the primalproblem), let us define a reduced subspace for each dual problem, by using the same algorithm(either greedy-RB or POD) chosen for the primal problem. Here Xj

n denotes the dual subspacerelated to the j-th output, although the dimension of each of these subspaces can be different, anddiffer from n. The resulting RB approximation ψjn(µ) ∈ Xj

n solves

a(vn, ψjn(µ);µ) = −ℓj(vn) ∀vn ∈ Xj

n

and is required to get the following error bound on the output:

|sjh(µ)− sjn(µ)| ≤ ∆jn(µ) ≡

‖r( · ;µ)‖(Xh)′

(αLBh (µ))1/2

‖rj( · ;µ)‖(Xh)′

(αLBh (µ))1/2

∀µ ∈ P (13)

where rj(w;µ) = −ℓj(w) − a(w,ψjn(µ);µ), ∀w ∈ Xh, is the dual residual related to the j-thoutput. Here sjh(µ) = ℓj(uh(µ)) denotes the full-order j-th output, whereas sjn(µ) = ℓj(un(µ)) isthe corresponding ROM output. In the same way, for the RB dual solution we have:

‖ψjh(µ)− ψjn(µ)‖X ≤ ∆ψj

n (µ) :=‖rj(·;µ)‖X′

αLBh (µ)∀µ ∈ P.

As already remarked for the primal residuals, also dual residuals can be efficiently evaluated bytaking advantage of the affine µ-dependence. We also denote by

Φjn(µ) =∆jn(µ)

sjh(µ)− s

jn(µ)

∀µ ∈ P, 1 ≤ j ≤ s

the effectivity of the estimator ∆jn(µ). It is possible to show that 1 ≤ |Φjn(µ)| ≤ Mh(µ)/α

LBh (µ),

where Mh(µ) is the FE continuity constant of a(·, ·;µ), see e.g. [37] for further details.

7

4 Reduction Error Models

Being able to evaluate the output quantity of a PDE system at a greatly reduced cost is essentialto speed up the solution of inverse UQ problems within a Bayesian framework. Our goal, once aRB approximation has been built in the offline stage, is to exploit its fast and inexpensive onlinequeries to speed up the evaluation of the posterior PDF, of related (point or interval) estimates,and of MCMC integrals like (7) or (10). Not only, by taking into account reduction errors oravailable error bounds, we can obtain reliable solutions at the end of the inversion process, too.Although ROMs have been exploited to speed up the solution of inverse problems in several works,very few papers have focused on the analysis of reduction error propagation, see, e.g., [23, 8].

In particular, we wish to incorporate a model for the error engendered by the ROM into theBayesian estimator. To this end, we provide suitable (both deterministic and statistical) reductionerror models (REMs), possibly by exploiting available error bounds on the outcome of the forwardproblem. A basic observation is made possible by expressing, when dealing with linear (withrespect to the PDE solution) outputs, the measurement equation (4) as

s∗ = ℓ(un(µ)) + [ℓ(uh(µ))− ℓ(un(µ))] + εnoise = ℓ(un(µ)) + δ(µ) (14)

where δ(µ) = εROM(µ) + εnoise and

εROM(µ) = [s1h(µ)− s1n(µ) , . . . , ssh(µ)− ssn(µ)]

T (15)

is the reduction error, that is, the error due to ROM approximation of the forward problem andrelated output. Although in principle εROM is deterministic, in practice its evaluation is out ofreach since it would require, for any µ, the solution of the full-order model. Here we proposethree approaches for approximating the reduction error εROM(µ) by a suitable indicator ε#(µ),# = 1, 2, 3, according to suitable reduction error models which can be easily incorporated in theBayesian framework. In particular:

• in [REM-1] we treat ROM errors as epistemic uncertainties, represented through randomvariables, following the so-called approximation error model [1];

• in [REM-2] we provide a deterministic approximation of ROM errors by considering radialbasis interpolants of the error over the parameter space;

• in [REM-3] we express ROM errors through a linear model depending on output errorbounds, fitted through regression analysis.

Hence, we end up with error indicators which can be either deterministic – that is, ε(µ) =mROM(µ), being mROM(µ) a suitable function of µ – or expressed through a random variableε(µ), whose distribution πε is characterized by mROM(µ) = E[ε(µ)] and ΣROM(µ) = Cov[ε(µ)].Correspondingly, we end up with a corrected reduced-order likelihood

πn(s∗|µ) =

πε(s∗ − sn(µ)−mROM(µ)) deterministic REM

πδ(s∗ − sn(µ)−mROM(µ)) statistical REM

(16)

being δ(µ) = ε(µ) + εnoise and Cov[δ(µ)] = Σnoise(µ) + ΣROM(µ), by assuming that ROM er-rors and measurement noise are independent. Correspondingly, we obtain the following correctedreduced-order posterior PDF

πnpost(µ | s∗) = πn(s∗|µ) πprior(µ)ηn(s∗)

, being ηn(s∗) =

∫

P

πn(s∗|µ) πprior(µ), (17)

yielding to a similar correction in the marginal PDF of the identifiable parameters (10). In thefollowing subsections we discuss the construction of these three REMs.

4.1 REM-1: approximation error model

Following the so-called approximation error model of Kaipio et al. [1], we assume that ROM errorsare the outcome of a Gaussian random variable, so that εROM(µ) is replaced by

8

ε1(µ) ∼ N (mROM,ΣROM) (18)

being mROM ∈ Rs and ΣROM ∈ R

s×s the sample constant mean and covariance, respectively,obtained by sampling Ncal errors sh(µk)− sn(µ

k)Ncal

k=1 :

mROM =1

Ncal

Ncal∑

k=1

εROM(µk), ΣROM =1

Ncal − 1

Ncal∑

k=1

(εROM(µk)−mROM)(εROM(µk)−mROM)T .

We refer to the random sample SNcal= µ1, . . . ,µNcal as the calibration set, since the compu-

tational experiments leading to εROM(µ), µ ∈ SNcal, are additional queries to both the FOM

and the ROM, required to characterize our REM-1. In this case, the correction does not dependon µ by construction. If we assume in addition that measurement errors are Gaussian – that is,εnoise ∼ N (0,Σnoise) – and independent from ROM errors, we have that

δ1(µ) := εnoise + ε1(µ) ∼ N (mROM,Σnoise +ΣROM). (19)

Indeed, the effect of the ROM error results in a shift of the likelihood function and an additionalcontribution to its variance, provided the normality assumption on the errors evaluated over SNcal

is fulfilled. In this case, only a slight modification of the numerical solver is required within theMCMC process, thus making this REM particularly easy to implement. Nevertheless, very oftenROM errors do not show a Gaussian distribution, so that further operations are required in orderto use such a REM.

To overcome this fact, we can generalize the previous framework by considering any (parametric)distribution for the ROM errors conveniently fitted on the set of calibration experiments, thatis, ε1(µ) ∼ πε(µ) possibly depending on a set of shape parameters. Although we assume thatROM errors are independent from (Gaussian) measurement errors, the distribution of δ1(µ) =εnoise + ε1(µ) cannot be found, in general, in a closed form. In these cases, at each MCMC step,we quantify δ1(µ) by calculating the realization of the sum of two random variables by the followingconvolution:

δ1(µ) =

∫

P

πε(µ+ ν)πε(ν)dν. (20)

The evaluation of this integral can be performed, e.g., by an internal MCMC algorithm, whichdoes not feature expensive extra calculations at each step of the outer MCMC process.

Thus, in the case of REM-1, a global approximation of the ROM error over the parameter spaceis provided – hence, not depending on µ – by prescribing the distribution of a random variablefitted over a sample of calibration experiments.

4.2 REM-2: radial basis interpolation

Despite being straightforward, REM-1 can perform badly, for instance when ROM errors can notbe explained by means of a sample statistical distributions in closed form, because of their complexvariability over the parameter space. For this reason, we turn instead to a local error model, stillexploiting the errors sh(µk)− sn(µ

k)Ncal

k=1 computed over a calibration set SNcal, by considering

a radial basis interpolant for each output component. In particular, a deterministic µ-dependentcorrection can be obtained as

sjh(µ)− sjn(µ) ≃ Πj(µ)∆jn(µ), j = 1, . . . , s,

where Πj(µ) is a weighted combination of radial basis functions (RBF), i.e.

Πj(µ) =

Ncal∑

k=1

wkφ(‖µ− µk‖), j = 1, . . . s,

and the coefficients wkNcal

k=1 are determined so that Πj fulfills the following interpolation con-straints over the calibration set:

9

sjh(µk)− sjn(µ

k)

∆jn(µk)

= Πj(µk), k = 1, . . . , Ncal, j = 1, . . . , s.

In other words, we compute an interpolant of the inverse effectivities (Φjn(µ))−1 ∈ [−1, 1], for

each j = 1, . . . , s. This choice yields more accurate results than those obtained by interpolatingROM errors over the calibration set, that is, by considering sjh(µ

k) − sjn(µk) = Πj(µk), k =

1, . . . , Ncal, j = 1, . . . , s (see Sect. 7.1). While φ : R+0 → R is a fixed shape function, radial

with respect to the Euclidean distance ‖ · ‖ over RP . Here we use multiquadric basis functions

(φ(r) =√1 + r2), see e.g. [9] for other available options. Thus, in our deterministic REM-2 we

replace ROM errors εROM(µ) by ε2(µ), with

mROM(µ) = [Π1(µ)∆1n(µ) , . . . , Π

s(µ)∆sn(µ)]

T , (21)

which features a µ-dependent shifting of the likelihood function. If in addition we want to takeinto account the variability of the ROM errors, we can incorporate the sample covariance evaluatedover the calibration set as in REM-1, and thus consider

δ2(µ) := εnoise + ε2(µ) ∼ N (mROM(µ),Σnoise +ΣROM(µ)).

Assuming that ROM errors are Gaussian might be very limiting, as highlighted in the previoussection. Here we consider this argument just as an heuristic correction.

Although very simple to be characterized, REM-2 suffers from the usual curse of dimensionalityof multivariate interpolation, which arises from the fact that the simple choices of interpolationnode sets (such as tensor product girds) grow exponentially in size as the parametric dimensionp = dim(P) is increased, so that both sampling the parameter space and evaluating calibrationexperiments rapidly becomes less and less affordable even for relatively small P (say P > 5).Sparse grid techniques provide partial but not totally satisfactory resolution to this problem [5].Nevertheless, RBF interpolation is suitable also for scattered data and, for the cases at hand, showsa very good compromise between accuracy and simplicity.

4.3 REM-3: linear regression model

A possible way to overcome the curse of dimensionality is to rely on a model where the surrogateROM error is computed as a function of a scalar quantity depending on µ, rather than on µ

itself, no matter which is the parametric dimension P of µ ∈ P ⊂ RP . In fact, the quantity

which provides a good, inexpensive and readily available representation of the ROM error is the aposteriori error bound (13).

In order to derive a REM depending on a posteriori error bounds, we remark that a lineardependence between (the absolute values of the) output errors and related error bounds is shownwhen considering a logarithmic transformation, as already pointed out in a recent contribution[13]. Thus, we can in principle consider the following model:

ln |sjh(µ)− sjn(µ)| = βj0 + βj1 ln(∆jn(µ)) + δjreg, j = 1, . . . , s (22)

being δjreg ∼ N (0, σ2reg,j), and fit it to the datasets ∆j

n(µk), sjh(µ

k) − sjn(µk)Ncal

k=1 obtained bysampling errors and corresponding error bounds for each output, over a calibration set SNcal

. Bydoing this, we get the estimates βj0, β

j1 of the coefficients by exploiting standard linear regression

theory, as well as the estimate of the variances σ2reg,j through the corresponding mean square

errors. Thus, by fitting model (22) we obtain the following relation for the absolute value of theROM error:

|sjh(µ)− sjn(µ)| = exp(βj0 + βj1 ln(∆jn(µ)) + δjreg), j = 1, . . . , s

being δjreg ∼ N (0, σ2δ,j). If we consider the deterministic REM-3, the absolute value of ROM errors

|εROM(µ)| can be replaced by

mROM(µ) =[

exp(

β10 + β1

1 ln(∆1n(µ))

)

, . . . , exp(

βs0 + βs1 ln(∆sn(µ))

)]T

. (23)

10

Instead, by taking into account the variability shown by ROM errors, |εROM(µ)| would be replacedby the following log-normal random variables:

ε3(µ) ∼[

logN (β10 + β1

1 ln(∆1n(µ)), σ

2reg,1(µ)) , . . . , logN (βs0 + βs1 ln(∆

sn(µ)), σ

2reg,s(µ))

]T

. (24)

However, in both cases no indications about the sign of the error are provided by the errorbound (13), so that a correction based on (22) would be necessarily too poor. Once we have fittedthe linear models, we shall infer the error sign (on each output component j = 1, . . . , s) from thecalibration set, in order to replace εjROM (µ) by ρj ε

j3(µ), where εj3(µ) is given by (23) or (24) and

ρj ∈ −1,+1 is determined by adopting one of the following strategies:

1. Nearest neighbor, assigning to εj3(µ) the sign of the sampled error in the calibration set closestto µ, that is

ρj = sgn(

sjh(µk)− sjn(µ

k))

, µk = argminµk∈SNcal

‖µ− µk‖;

2. Bernoulli trial, assigning to εj3(µ) the sign:

ρj = −1 + 2Xj , Xj ∼ Be(pj), pj =|Sj+|Ncal

∈ [0, 1],

being Sj+ = sjh(µ) − sjn(µ) > 0 : µ ∈ SNcal

and Sj− = sjh(µ) − s

jN (µ) < 0 : µ ∈ SNcal

i.e.weighting the Bernoulli trial according to the observed distribution of signs in the calibrationsample. While heuristic in nature, this rule does correctly treat the cases where the reductionerror is heavily biased towards one sign or another instead of simply assigning, say, a positivesign to all errors.

The splitting of the calibration set in two subsets Sj+ and Sj− suggests to consider a regressionmodel over each subset, depending on the computed errors sign. In conclusion, we obtain for eachµ ∈ P the random variable:

δ3(µ) := εnoise + ρε3(µ) (25)

which is the sum of two random variables with different distributions (and can be computedsimilarly to (20)).

5 Inversion procedure

Let us now summarize the whole numerical procedure we use to solve a parametric inverse UQproblem. A first offline stage (Algorithm 1) consists in the computation of the reduced space andin the additional calibration set evaluation. More advanced versions of the greedy algorithm wherethe two greedy loops are interlaced could also be used, but since this step is independent of theREM-step that follows, here we have presented for simplicity a basic version. During the onlinestage (Algorithm 2), a MCMC procedure is performed by considering at each iteration the reducedoutput sn(µ) instead of the full-order output sh(µ).

The selection of the basis functions is performed through a greedy (RB) algorithm. This pro-cedure requires the efficient evaluation of the a posteriori error bound (12), for which a suitableoffline/online procedure is exploited (see [27] for further details). The calibration procedure con-sists in evaluating the difference between the ROM output sn(µ) and the full-order output sh(µ)for each parameter in the calibration set SNcal

. Finally, the REM construction is performed ac-cordingly to the procedure described in Sect. 4.

During the online stage, the posterior distribution is sampled through a Metropolis–Hastingsalgorithm, which generates a sequence of sample values, whose distribution converges to the desiredcorrected distribution πnpost. Each MCMC iteration entails an online query, which is performedin an efficient way by the ROM and the REM. The quality of the sampling sequence is finallyimproved by performing a subsequent burn-in and thinning, in order to reduce the autocorrelationbetween the sampled points; see e.g. [16, 24] for further details.

11

Algorithm 1 Offline procedure1: procedure Basis computation

2: FE matrices:3: Ah

q , Fhq ← state problem

4: Lhq,s ← ouput evaluation & dual problems

5: Lower bound :6: αLB(µ)← successive constraint method / heuristic strategies7: Greedy procedure state problem:8: while maxi∈Ξtrain

∆n(µi) > εtolRB do

9: µn = argmaxi∈Ξtrain

∆n(µi); Sn = Sn−1 ∪ spanuh(µn)

10: Anq , F

nq ← compute reduced state matrices

11: Greedy procedure dual problems: for each output j = 1, . . . , s12: while maxi∈Ξtrain

∆jn(µi) > εtolRB do

13: µnj = argmaxi∈Ξtrain

∆jn(µi); S

jn = Sj

n−1 ∪ spanψjh(µ

nj )

14: Anq,j , L

nq,j ← compute reduced dual matrices

15: procedure REM Calibration

16: for j = 1 : Ncal do

17: sh(µj), uh(µ

j)← FE state problem

18: sn(µj), un(µj), ∆(1:s)n (µj),← RB state and dual problems

19: errj,1:s = s(1:s)h (µj)− s

(1:s)n (µj)

20: compute REM

Algorithm 2 Online procedure1: procedure Metropolis sampling

2: µ(1) ← initial value

3: sampling loop:4: for cont = 2 : K do

5: µ← random walk6: [sn(µ),∆n(µ)]← compute RB state + dual problems

7: mROM(µ)← evaluate REM mean8: if REM is deterministic then

9: πn ← πε(s∗ − sn(µ)−mROM(µ))

10: if REM is statistical then

11: ΣROM(µ)← evaluate REM covariance matrix12: πn ← π

δ(s∗ − sn(µ)−mROM(µ))

13: πnpost(µ|s

∗)← Bayes’ formula

14: γ ← πnpost(µ|s

∗)/πnpost(µ

(k)|s∗)15: y ← random sampling from U(0, 1)16: if y < γ then

17: µ(k+1) ← µ; k ← k + 1

18: burn-in:19: eliminate first M simulations

µ(1:M)

20: thinning:21: keep every d-th draw of the chain µ

(1:d:end)

6 Effectivity of proposed Reduction Error Models

Let us now analyze the effectivity of the corrections made on the reduced-order likelihood functionthanks to the proposed reduction error models. In particular, we aim at stating some conditionsto be fulfilled by REM corrections in order to guarantee that the corresponding posterior PDFπn is more robust and closer to the full-order PDF πh than the reduced-order PDF πn withoutcorrections.

To this end, let us recall the notion of Kullback-Leibler (KL) divergence, which is a non-symmetric measure of the difference between two probability distributions πA and πB :

DKL(πA||πB) =∫

πA(z) log

(

πA(z)

πB(z)

)

dz. (26)

Clearly, DKL(πA||πB) ≥ 0 whereas DKL(πA||πB) = 0 if πA = πB almost surely. This notionhas already been used to compare approximations of posterior distributions obtained throughgeneralized polynomial chaos representations, see e.g. [30, 8] for further details.

12

6.1 Consistency result

Before comparing our REMs and showing their effect on the reduced-order posterior PDFs, we provethat the reduced-order likelihood function πn approximates the full-order one πh in a consistentway, as long as the ROM dimension increases:

Proposition 1. Let us consider the additive Gaussian noise model (4) and the RB approximationsn(µ) of the output sh(µ) defined by (11) and (2), by assuming an analytic µ-dependence in thebilinear/linear forms. Then, for any µ ∈ P,

DKL(πh||πn) =

s∑

j=1

1

2σ2j

(sjh(µ)− sjn(µ))2, (27)

so that limn→Nh

DKL(πh||πn) = 0 exponentially.

Proof. The solution un(µ) ∈ Xn of (2) is obtained as a Galerkin projection over Xn, then

‖uh(µ)− un(µ)‖X ≤

(

M

α0

)1/2

infwn∈Xn

‖uh(µ)− wn‖X

being M(µ) ≤ M the continuity constant of a(·, ·;µ) and α0 > 0 such that αh(µ) ≥ α0 for any µ ∈ P.Thus we have that ‖uh(µ)−un(µ)‖X → 0 when n→ Nh. By the same argument, ‖ψh(µ)−ψn(µ)‖X → 0when n→ Nh. Moreover, the Kolmogorov n-width1 of the solution set Mh converges exponentially:

dn(Mh;X) ≤ Ce−αn for some C,α > 0, (28)

provided the µ–dependence in the bilinear/linear forms is analytic; see [20] for further details. Regardingthe output, by exploiting twice Galerkin orthogonality we have that

sjh(µ)− sjn(µ) = ℓj(uh(µ))− ℓj(un(µ)) = a(un(µ), ψn(µ);µ)− a(uh(µ), ψh(µ);µ)

= a(un(µ)− uh(µ), ψn(µ)− ψh(µ);µ).

Then, |sjh(µ)−sjn(µ)| ≤ M‖uh(µ)−un(µ)‖X‖ψh(µ)−ψn(µ)‖X so that, when n→ Nh, |sjh(µ)−s

jn(µ)| → 0

for any j = 1, . . . , s; furthermore, the order of convergence of the ROM output to the FEM output is twicelarger than the one of the state solution. In particular, by considering an additive Gaussian noise model,for any µ ∈ P

DKL(πh||πn) =

∫

Rs

πh(s|µ) log

(

πh(s|µ)

πn(s|µ)

)

ds =s∑

j=1

1

2σ2j

(sjh(µ)− sjn(µ))

2,

thanks to the definition (26) of Kullback-Leibler divergence, so that limn→Nh

DKL(πh||πn) → 0; finally,

convergence takes place at exponential rate thanks to (28).

The consistency property can be extended to the posterior PDFs according to the

Proposition 2. Under the assumptions of Proposition 1, limn→Nh

DKL(πhpost||πnpost) → 0 exponen-

tially.

Proof. Thanks to the definition of KL divergence,

DKL(πhpost||π

npost) =

∫

P

πh(s∗|µ)πprior(µ)

ηh(s∗)log

(

πh(s∗|µ)

πn(s∗|µ)

ηn(s∗)

ηh(s∗)

)

dµ

= log

(

ηn(s∗)

ηh(s∗)

)

+

∫

P

πh(s∗|µ)πprior(µ)

ηh(s∗)log

(

πh(s∗|µ)

πn(s∗|µ)

)

dµ.

(29)

By using the definition of πh and πn, and the Lipschitz-continuity of exp(−s) for s ≥ 0 (that is, |e−s−e−t| ≤Λ|s− t| for any s, t > 0, with Λ = 1), we obtain

1The Kolmogorov n-width dn(Mh;X) measures how a finite dimensional subspace uniformly approximates themanifold Mh = uh(µ),µ ∈ P of the PDE solutions: dn(Mh;X) = infXn⊂X supuh∈Mh

infwn∈Xn‖uh − wn‖X

where the first infimum is taken over all linear subspaces Xn ⊂ X of dimension n.

13

∣

∣πh(s∗|µ)− πn(s∗|µ)∣

∣ =

s∏

j=1

1√

2πσ2j

∣

∣

∣

∣

∣

exp

(

−(s∗j − sjn(µ))

2

2σ2j

)

− exp

(

−(s∗j − s

jh(µ))

2

2σ2j

)∣

∣

∣

∣

∣

≤s∏

j=1

1√

2πσ2j

∣

∣

∣

∣

∣

−(s∗j − sjn(µ))

2

2σ2j

+(s∗j − s

jh(µ))

2

2σ2j

∣

∣

∣

∣

∣

≤s∏

j=1

1√

2πσ2j 2σ

2j

∣

∣

∣sjn(µ)− s

jh(µ)

∣

∣

∣

∣

∣

∣2s∗j − s

jh(µ)− s

jn(µ)

∣

∣

∣

so that, for any µ ∈ P,∣

∣πh(s∗|µ)− πn(s∗|µ)∣

∣ → 0 when n → Nh because |sjh(µ) − sjn(µ)| → 0 for any

j = 1, . . . , s. In the same way, |ηn(s∗)−ηh(s

∗)| =∣

∣

∫

P(πh(s∗|µ)− πn(s∗|µ))πprior(µ)dµ

∣

∣→ 0 for any givens∗ ∈ R

s. Thus, both terms in the second line of (29) vanish for n→ Nh.

6.2 A result of effectivity for the proposed REMs

Since we are mainly interested in the case where the ROM dimension n is fixed (and possiblysmall) we want to show that performing a correction according to a REM improves the quality ofthe reduced posterior (in terms of the the KL divergence).

By following the structure of the previous section, we first state a result dealing with theapproximation of the likelihood function by a corrected ROM:

Proposition 3. Under the assumptions of Proposition 1, if there exists Cj < 1 such that, for anyj = 1, . . . , s, |sjh(µ)− sjn(µ)−mj

ROM (µ)| ≤ Cj |sjh(µ)− sjn(µ)| ∀µ ∈ P (30)

thenDKL(π

h||πn) ≤ ( maxj=1,...,s

C2j )DKL(π

h||πn) (31)

provided that the correction is made according to a deterministic REM.

Proof. In analogy with relation (27), a correction operated by means of a deterministic REM affects justE[s∗|µ], so that

DKL(πh||πn) =

s∑

j=1

1

2σ2j

(sjh(µ)− sjn(µ)−m

jROM (µ))2. (32)

Thus, under condition (30), (31) directly follows.

By means of (30), we require that the correction provided by a REM is effective, that is, ityields a reduction in the KL divergence between the reduced-order and the full-order posteriorPDFs, when in the former case a correction through a deterministic REM is considered. Instead,when relying on statistical REMs, we need to distinguish between two cases:

• the correction shall result in a normal random variable (REM-1 or REM-2), with meanmROM and covariance matrix (ΣROM )ij = (σROMj )2δij . In this case, we would obtain

DKL(πh||πn) = 1

2

s∑

j=1

(

(sjh(µ)− sjn(µ)− εj(µ))2

σ2j + (σROMj )2

+σ2

σ2j + (σROMj )2

− 1− log

(

σ2j

σ2j + (σROMj )2

))

(33)

instead of (32). Thus, in order to ensure that a relation like (31) still holds, we need tofurther require that (σROMj )2 is sufficiently small compared to σ2

j , j = 1, . . . , s – in practiceproblems only arise if σj = 0 (i.e. the jth likelihood function is a true delta-function),whereas any realistic inverse problem will inherently experience finite variance in its outputsarising from measurement noise, numerical approximations etc., and so provided that theROM approximation is convergent a sufficiently large reduced basis can always be foundsuch that remainder terms in (33) are negligible;

14

• the correction shall result in a non normal distributed random variable (such as, e.g., whenusing REM-3, where the correction is a multivariate log-normal random variable). In thiscase we cannot provide a closed-form expression for the KL divergence. However, also in thiscase the key factors affecting the comparison between DKL(π

h||πn) and DKL(πh||πn) are

the same as in the previous case (as shown by the numerical results in the following section).

Let us now turn to evaluate how the corrections introduced through our REMs impact on theposterior PDFs. First of all, let us remark that, by taking the expectation of the KL divergencebetween πh(s∗|µ) and πn(s∗|µ), and changing the order of integration, we obtain

E[DKL(πh||πn)] =

∫

P

DKL(πh||πn)πprior(µ)dµ. (34)

Moreover, thanks to the positivity of the KL divergence and relation (31), we get

E[DKL(πh||πn)] ≤

(

maxj=1,...,s

C2j

)

E[DKL(πh||πn)]. (35)

6.3 Posterior comparison for fixed n

We now want to compare the KL divergences between the full-order and the corrected/uncorrectedposterior PDFs for small n. We can show the following

Proposition 4. Under the assumptions of Proposition 3 and provided that ηn(s) ∼ ηh(s) forn→ Nh, for any s ∈ R

s, we have that

E[DKL(πhpost||πnpost)] ≤ E[DKL(π

hpost||πnpost)] (36)

if the correction is made according to a deterministic REM.

Proof. Let us express the right-hand side of (36) as

E[DKL(πhpost||π

npost)] =

∫

Rs

(

log

(

ηn(s)

ηh(s)

)

+

∫

P

πh(s|µ)πprior(µ)

ηh(s)log

(

πh(s|µ)

πn(s|µ)

)

dµ

)

ηh(s)ds. (37)

In the same way, the left-hand side of (36) becomes

E[DKL(πhpost||π

npost)] =

∫

Rs

(

log

(

ηn(s)

ηh(s)

)

+

∫

P


ηh(s)log

(

πh(s|µ)

πn(s|µ)

)

dµ

)

ηh(s)ds. (38)

We proceed by analyzing separately the two terms of the right-hand side of (37). The second term coincideswith (34), i.e.

∫

Rs

(∫

P


ηh(s)log

(

πh(s|µ)

πn(s|µ)

)

dµ

)

ηh(s)ds =

=

∫

P

(∫

Rs

πh(s|µ) log

(

πh(s|µ)

πn(s|µ)

)

ds

)

πprior(µ)dµ = E[DKL(πh||πn)].

In the same way, the second term of the right-hand side of (38) is such that∫

Rs

(∫

P


ηh(s)log

(

πh(s|µ)

πn(s|µ)

)

dµ

)

ηh(s)ds = E[DKL(πh||πn)].

On the other hand, by developing the first term of (37) with a Taylor expansion, we obtain

∫

Rs

log

(

ηn(s)

ηh(s)

)

ηh(s)ds =

∫

Rs

(

(

ηn(s)

ηh(s)− 1

)

−1

2

(

ηn(s)

ηh(s)− 1

)2

+O

(

ηn(s)

ηh(s)− 1

)3)

ηh(s)ds

=

∫

Rs

(ηn(s)− ηh(s))ds−

∫

Rs

1

2

(

ηn(s)2

ηh(s)− 2ηn(s) + ηh(s)

)

ds+

∫

Rs

O

(

ηn(s)

ηh(s)− 1

)3

ηh(s)ds.

The first term of the last sum can be rewritten as∫

Rs

(ηn(s)− ηh(s))ds =

∫

P

(∫

Rs

πn(s|µ)ds

)

πprior(µ)dµ−

∫

P

(∫

Rs

πh(s|µ)ds

)

πprior(µ)dµ, (39)

15

and it is vanishing, since

∫

P

(∫

Rs

πh(s|µ)ds

)

πprior(µ)dµ =

∫

P

(∫

Rs

πn(s|µ)ds

)

πprior(µ)dµ =

∫

P

πprior(µ)dµ = 1.

In this way∫

Rs

log

(

ηn(s)

ηh(s)

)

ηh(s)ds = −1

2

∫

Rs

(

ηn(s)2

ηh(s)− 1

)

ds+

∫

Rs

O

(

ηn(s)

ηh(s)− 1

)3

ηh(s)ds,

considering the integral of the remainder term of the Taylor expansion to be sufficient small when ηn ∼ ηh.Similarly, the first term of the right-hand side of (38) is negligible, so that

E[DKL(πhpost||π

npost)] = −

1

2

∫

Rs

(

ηn(s)2

ηh(s)− 1

)

ds+ E[DKL(πh||πn)] +

∫

Rs

O

(

ηn(s)

ηh(s)− 1

)3

ηh(s)ds

E[DKL(πhpost||π

npost)] = −

1

2

∫

Rs

(

ηn(s)2

ηh(s)− 1

)

ds+ E[DKL(πh||πn)] +

∫

Rs

O

(

ηn(s)

ηh(s)− 1

)3

ηh(s)ds.

In conclusion, by using (35), inequality (36) follows under the following condition:

1

2

∫

Rs

(

ηn(s)2

ηh(s)−ηn(s)

2

ηh(s)

)

ds+

∫

Rs

O

(

ηn(s)

ηh(s)− 1

)3

ηh(s)ds

≤

(

1− maxj=1,...,s

C2j

)

E[DKL(πh||πn)] +

∫

Rs

O

(

ηn(s)

ηh(s)− 1

)3

ηh(s)ds,

which can be seen as a robustness condition on the correction entailed by the REM.

Remark 5. According to (6), (9), ηn ∼ ηh as soon as the likelihood functions πh and πn are veryclose to each other, that is, DKL(π

h||πn) < ε for any given, small ε > 0.

7 Numerical results and discussion

We present two numerical examples illustrating the properties and the performance of the correctionstrategies proposed in Sect. 4. We recall that our final goal is to exploit fast and inexpensiveonline ROM queries by improving their accuracy through suitable REMs. For the cases at hand,we consider a model problem dealing with the scalar heat equation, where uncertain parametersdescribe either the scalar conductivities over different subdomains or the continuous conductivityfield over the whole domain. By measuring the average of the state solution over three boundaryportions, identifiable parameters are reconstructed by marginalizing away the nuisance parameters.

7.1 Test case 1

We consider in Ω = (0, 1.5)2 the following diffusion problem:

∇ · (k(x,µ)∇u) = 0 in Ω

k(x,µ)∇u · n = 0 on Γw

k(x,µ)∇u · n = 1 on Γb

u = 0 on Γt,

(40)

where ∂Ω = Γw ∪ Γb ∪ Γt (see Fig. 1, left). Here k(x,µ) is a parametrized diffusion coefficient:

k(x,µ) = 0.1IΩ0(x) +

3∑

i=1

µiIΩi(x),

IΩiis the characteristic function of Ωi, being Ω1 = (0, 0.5)2 ∪ (1, 1.5)2, Ω2 = (0, 0.5)× (1, 1.5) ∪

(1, 1.5)× (0, 0.5), Ω3 = (0.5, 1)2 and Ω0 = Ω \⋃3i=1 Ωi; the outputs are

sj(µ) =

∫

Γj

u(µ)dΓ, j = 1, 2, 3.

16

:

:

:

:

Figure 1: Test case 1: domain, boundary conditions (left) and different sample solutions of 40(right).

Our objective is to identify γ = µ3 by observing the outputs sj(µ), j = 1, 2, 3, in presence of twonuisance parameters ζ = (µ1, µ2). We suppose that the target value s

∗ corresponds to the fullorder output vector evaluated for γ∗ = 2 and a random value of ζ ∈ (0.01, 10)2.

The forward problem (40) is first discretized using the FE approach with linear P1 finite ele-ments, which generates a total of Nh = 1, 056 degrees of freedom. In view of the application ofthe MCMC algorithm, we adopt the RB method: we stop the greedy algorithm after selectingn = 10 basis functions for the forward problem and for each dual problem related to one of thethree outputs, in view of the analysis of the corrections effects. We intentionally select few basisfunctions (satisfying a tolerance ǫtolRB = 5 · 10−2) in order to assess the capability of our REMs incorrecting the outcome of a possible inaccurate ROM. Note that for more complex (e.g. non-affineand/or non-linear) problems a larger number of RB functions might be required to guarantee asufficient accuracy, with a consequent loss of efficiency. As a matter of fact, we reach in our casea considerable speedup (n/Nh ≃ 1/100). Nevertheless, output evaluations are affected by ROMerrors, which have to be taken into account by the correction methods. To this aim, we constructa calibration set of Ncal = 100 points in the parameter space from a Gauss-Patterson sparse grid,from which we calculate the full-order FE as well as the RB solutions and the relative outputs.In this way, we obtain a sample of outputs ROM errors, upon which we calibrate the proposedREMs. This operation can be performed in parallel and does not impact significantly on the of-fline complexity. We highlight that in our test cases very similar results in terms of calibrationperformances can be obtained by relying on a random sampling of the parameter space, due to thesmall parametric dimension.

Before analyzing the quality of the corrections, we provide few details related to REMs con-struction. By a direct inspection of the errors sample generated over the calibration set, we remarkthat their distribution is far from being Gaussian (Fig. 2).

−5 · 10−3 0 5 · 10−30

10

20

30

2nd output

−5 · 10−3 0 5 · 10−30

10

20

30

3rd output

−5 · 10−3 0 5 · 10−30

10

20

30

1st output

Figure 2: Histogram of the ROM errors for each output sj(µ), j = 1, 2, 3.

The Shapiro-Wilk test rejects the null hypothesis that the errors come from a normal distri-bution (p-value < 10−5 for each j = 1, 2, 3). Moreover, errors are not equally distributed over

17

the parameter space: for this reason µ-dependent approaches (like REM-2 and REM-3) shouldbetter capture the error variability. Concerning REM-2, we point out that the approach relyingon error bound effectivities is preferable to the one based on ROM errors if we aim at minimizingthe maximum norm of the interpolation error (see Fig. 3).

0 2 4 6 8 10

0

2

4·10−3

µ3

1st output

0 2 4 6 8 10

0

5

·10−3

µ3

3rd output

error

error RBF

REM-2

0 2 4 6 8 10

0

5·10−4

µ3

2nd output

Figure 3: REM-2 construction: comparison between the absolute values of the error (in blue), ofthe RBF interpolation of the errors (in black, dashed) and of the REM-2 reconstructed errors (inred, dashed), for each output sj(µ), j = 1, 2, 3, on varying µ3 (µ1 = 7, µ2 = 3).

Furthermore, for REM-3, a linear regression model on the log-variable is fitted for each outputand by distinguishing the errors sign. (see Fig. 4). It is not surprising that models providing abetter fitting are those built on larger datasets, even if the Bernoulli sign trick proposed in Sect. 4.3contributes to filter out those models built on smaller samples.

A primary test for assessing the quality of our REMs has been made by considering a randomsample of 5000 points in P and computing the frequencies of ROM errors and the correspondingcorrections obtained with the three REMs, when (for the sake of comparison) Ncal = 50 or Ncal =200 points in the calibration sample are considered. (see Fig. 5).

10−4 10−3 10−2 10−1 100 101

10−5

10−3

10−1

error bound

error

1st output (positive sign)

10−4 10−3 10−2 10−1 100 101

10−5

10−3

10−1

error bound

3rd output (negative sign)

10−4 10−3 10−2 10−1 100 101

10−5

10−3

10−1

error bound

2nd output (negative sign)

10−4 10−3 10−2 10−1 100 101

10−5

10−3

10−1

error bound

error

1st output (negative sign)

10−4 10−3 10−2 10−1 100 101

10−5

10−3

10−1

error bound

3rd output (positive sign)

10−4 10−3 10−2 10−1 100 101

10−5

10−3

10−1

error bound

2nd output (positive sign)

Figure 4: REM-3 construction: linear regression (in the log / log space) of the ROM errors againsterror bounds and 95% confidence intervals. A linear model (22) in the log / log space is fitted overeach subset of computed error bounds and corresponding (positive or negative) errors Sj+, Sj−,j = 1, 2, 3.

18

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

2.39 ·10−6

no

REM

1st output

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

2.39 ·10−6

Var =

2.39 ·10−6

REM

-1

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

3.01 ·10−7

Var =

8.80 ·10−7

REM

-2

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

8.14 ·10−7

Var =

1.02 ·10−6

REM

-3

near

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

1.71 ·10−6

Var =

1.57 ·10−6

REM

-3

be

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

2.69 ·10−6

2nd output

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

2.69 ·10−6

Var =

2.69 ·10−6

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

3.67 ·10−7

Var =

4.53 ·10−7

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

7.84 ·10−7

Var =

7.63 ·10−7

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

2.29 ·10−6

Var =

2.36 ·10−6

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

4.02 ·10−6

3rd output

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

4.02 ·10−6

Var =

4.02 ·10−6

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

1.19 ·10−6

Var =

1.41 ·10−6

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

2.91 ·10−6

Var =

3.08 ·10−6

−5 · 10−3 0 5 · 10−30

0.2

0.4Var =

4.54 ·10−6

Var =

4.86 ·10−6

Figure 5: Comparison between the REM reconstruction of the errors for the outputs s1, s2 ands3 (from top to bottom: without REM, REM-1, REM-2, REM-3 with both options for the signmodel). Two different calibration samples have been considered: Ncal = 50 for the gray distributionNcal = 200 for the black one.

19

As expected, REM-1 results in a simple shift of the errors distribution. On the other hand,the distributions of the errors in presence of µ-dependent approaches (REM-2 and REM-3) showa more symmetric shape, with mean closer to zero and smaller variance than in the ROM errorsdistributions. Hence, the µ-dependent approaches allow to reduce the ROM error by at least oneorder of magnitude. REM-2 performs very well in this case, although it can suffer from the curseof dimensionality.

Then, we have applied the MCMC algorithm to reconstruct the unknown parameters values,by starting from a uniform prior on the parameters and sampling the RB posterior corrected bythe three different REM. A closer look to the results reported in Figs. 5–7 allows to conclude that:

• the correction provided by REM-1 does not improve the accuracy of the RB posterior sincethe distribution of the calibration ROM errors is far from being Gaussian. Even by assuminglog-normal calibration ROM errors, we do not get a significant improvement on the accuracyof the corrected posterior (see Fig. 6);

• REM-2 is quite good in terms of performance when compared purely in terms of model errorestimation to REM-1 and REM-3, see Fig. 5. As a matter of fact, REM-2 enhances also theevaluation of reduced posteriors; the same conclusion can also be drawn for REM-3, provideda suitable sign model is taken into account (see Fig. 7, left);

• by considering also statistical correction (Fig. 7, right), we get maximum a-posteriori es-timates close to the full-order ones, even if this procedure generates heavy-tailed posteriordistributions and consequently less tight a posteriori prediction intervals.

0 5

x 10−3

0

5

101

st output (negative sign)

freq hist

fitted log−N

0 5

x 10−3

0

5

102

nd output (negative sign)

freq hist

fitted log−N

0 5

x 10−3

0

5

103

rd output (negative sign)

freq hist

fitted log−N

0 0.005 0.010

10

20

30

401

st output (positive sign)

freq hist

fitted log−N

0 0.005 0.010

10

20

30

402

nd output (positive sign)

freq hist

fitted log−N

0 0.005 0.010

10

20

30

403

rd output (positive sign)

freq hist

fitted log−N

1.4 1.6 1.8 2 2.2 2.4 2.60

2

4

µ3

MCMC µ3 pdf, REM-1 approaches

FE

RB

REM-1 N

REM-1 logN

Figure 6: Distribution of the output errors divided by the sign (left) and the corrected reducedposterior obtained by REM-1 correction (right), starting from a uniform prior distribution. REM-1does not yield a relevant improvement of the posterior distribution obtained with the RB method.

1.4 1.6 1.8 2 2.2 2.4 2.60

2

4

µ3

MCMC µ3 pdf, deterministic REMs

1.4 1.6 1.8 2 2.2 2.4 2.60

2

4

µ3

MCMC µ3 pdf, statistical REMs

FE

RB

REM-1

REM-2

REM-3 near

REM-3 be

Figure 7: Posterior distributions obtained with deterministic (left) and statistical (right) correc-tions, using a uniform prior distribution. REM-2 and REM-3 yield a much relevant improvementof the posterior distribution obtained with the RB method than REM-1.

20

So far we have considered a uniform prior distribution. Turning to a Gaussian prior on theparameters, numerical results still confirm a good behavior of REM-2 and REM-3, and a worseperformance of REM-1. In this case, we have tested a situation of wrong a priori input with lowconfidence on it, i.e. µp,3 = 3 and variance σ2

i δii = 1, 1 ≤ i ≤ s. As before, we are able to get agood reconstruction of the full-order posterior distribution for REM-2 and REM-3 (see Fig. 8).

1.8 2 2.2 2.4 2.60

2

4

6

µ3

MCMC µ3 pdf, statistical REMs

FE

RB

REM-1

REM-2

REM-3 near

REM-3 be

1.8 2 2.2 2.4 2.60

2

4

6

µ3

MCMC µ3 pdf, deterministic REMs

Figure 8: Posterior distributions obtained with deterministic (left) and statistical (right) REMs inthe case of Gaussian prior distribution. Also in this case REM-2 and REM-3 yield a much relevantimprovement of the posterior distribution obtained with the RB method than REM-1.

Finally, as a validation of the theoretical results presented in Sect. 6, we report in Table 1 themaximum a-posteriori (MAP) estimate, the 95% prediction interval (PI) and the KL-divergencesfor the three REMs and both prior distributions. We are able to verify the relation between REMscorrections and KL divergences of Proposition 4. However, this relation does not hold in presenceof a significant variance associated with the correction (in this case the KL divergence increases).

uniform prior Gaussian prior

µMAP3 PI 95% KL-dist µMAP

3 PI 95% KL-dist

FE 1.9837 (1.8075, 2.1598) 2.1498 (2.0166, 2.2830)RB 1.8139 (1.5968, 2.0311) 1.2152 2.0776 (1.9879, 2.2499) 0.2936

REM-1 1.8997 (1.5783, 2.2213) 0.3829 2.2726 (2.0986, 2.5446) 1.0856REM-2 1.9881 (1.7608, 2.2150) 0.0555 2.1493 (2.0110, 2.2876) 0.0014

REM-3 (near) 1.9917 (1.7623, 2.1898) 0.0516 2.1420 (2.0102, 2.2818) 0.0213REM-3 (near+var) 1.9719 (1.4932, 2.4501) 0.3423 2.2121 (2.0224, 2.4017) 0.3064

REM-3 (BE) 1.9762 (1.8002, 2.1850) 0.0406 2.1331 (2.0329, 2.3228) 0.0124REM-3 (BE+var) 1.9645 (1.5546, 2.3723) 0.2474 2.1830 (1.9553, 2.4464) 0.2538

Table 1: MAP estimates, 95% prediction intervals and KL divergences from the posterior FEdistribution.

The use of the RB method allows to obtain a reduction of order 102 in terms of both RBdimension and required CPU time for each Online forward query. Moreover, the proposed REMsyield a KL divergence on the posterior distributions which is 50÷ 100 times smaller in the case ofREM-corrected posteriors, with respect to the case without correction.

Regarding computational costs, a REM-1 correction can be inexpensively obtained, whereasREM-2 and REM-3 entail a CPU time for the correction comparable to the one required by thesolution of the RB system (see Table 6). Concerning the Offline CPU time, the calibration stagerequired by any REM entails about the 10% of the CPU time required by the Offline RB stage ifn = 10 RB functions and Ncal = 100 calibration points are used.

The numerical performances in terms of both efficiency and accuracy obviously depend on theRB dimension n and the size Ncal of the calibration sample. To assess this fact, we have considereddifferent combinations (n,Ncal) for the test case discussed along this section.

By comparing the KL divergences in Table 2, it clearly turns out that a better RB approximationyields a more accurate solution of the inverse UQ problem. Building a RB approximation ofdimension n = 10 (resp. n = 40) requires an Offline CPU time of 134 s (resp. 226 s), whereas its

21

n = 10 n = 40

µMAP3 PI 95% KL-dist µMAP

3 PI 95% KL-dist

FE 1.9837 (1.8075, 2.1598) 1.9837 (1.8075, 2.1598)RB 1.8139 (1.5968, 2.0311) 1.2152 2.0645 (1.8892, 2.3114) 0.2806

Ncal = 50 1.9192 (1.7872, 2.1104) 0.3524 1.9418 (1.8206, 2.2137) 0.0512REM-1 Ncal = 100 1.8997 (1.5783, 2.2213) 0.3829 1.9618 (1.8357, 2.2300) 0.0675

Ncal = 200 1.93393 (1.8535, 2.1022) 0.2464 1.9671 (1.8463, 2.2054) 0.0459Ncal = 50 1.9657 (1.8238, 2.2168) 0.0850 1.9752 (1.8175, 2.1902) 0.0430

REM-2 Ncal = 100 1.9881 (1.7608, 2.2150) 0.0555 1.9928 (1.7981, 2.2025) 0.0380Ncal = 200 1.9881 (1.8335, 2.1429) 0.0541 1.9782 (1.8076, 2.1735) 0.0344Ncal = 50 1.9713 (1.8260, 2.1166) 0.1075 1.9630 (1.8140, 2.1604) 0.0634

REM-3 (near) Ncal = 100 1.9917 (1.7623, 2.1898) 0.0516 1.9923 (1.8180, 2.2136) 0.0351Ncal = 200 1.9760 (1.8002, 2.1851) 0.0405 1.9779 (1.8196, 2.1572) 0.0337Ncal = 50 1.95873 (1.8637, 2.1482) 0.0947 1.9488 (1.8101, 2.1726) 0.0967

REM-3 (BE) Ncal = 100 1.9762 (1.8002, 2.1850) 0.0406 1.9665 (1.8138, 2.1370) 0.0344Ncal = 200 1.9814 (1.7929, 2.1699) 0.0377 1.9818 (1.8271, 2.1966) 0.0328

Table 2: MAP estimates, 95% prediction intervals and KL divergences from the posterior FEdistribution obtained with a uniform prior, different RB dimensions n = 10, 40 and sizes Ncal =50, 100, 200 of the calibration sample.

Online evaluation requires 2 · 10−4 (resp. 7 · 10−4 s). This latter fact has a major drawback inthe overall CPU time required by the MCMC procedure, which grows of the same factor 3.5. Onthe other hand, considering a calibration sample of increasing dimension Ncal = 50, 100, 200 yieldsbetter results in terms of (corrected) posterior distributions, however entailing a weaker increaseof both (Offline and Online) costs: evaluating the errors over the calibration sample requires6, 13, 25 s, respectively, during the Offline stage, whereas the Online correction requires almost thesame time in the case of REM-2 (7.5 · 10−4 s) and REM-3 with Bernoulli sign model (4 · 10−4 s); inthe case of REM-3 with nearest neighbor sign model, the cost varies from 4.75 · 10−4 (Ncal = 50)to 1.05 · 10−3 (Ncal = 200).

A REM correction thus proves to be necessary also in the case where a more accurate RBapproximation is considered. Nevertheless, this latter may feature substantially higher costs, es-pecially when dealing with more complex problems than the one considered in this paper. Inthis respect, a larger size Ncal of the calibration sample may yield more accurate results, withoutshowing a too large impact on the computational efficiency. In any case, both n and Ncal haveto be chosen according to the problem at hand; from our experience, to reach the same level ofaccuracy, it may be preferable to deal with less accurate (but cheaper) RB approximations andlarger calibration samples; see, e.g., also the discussion reported in [7].

7.2 Test case 2

In order to consider a higher dimensional problems in terms of parametric dimension P , we modifyproblem (40) by considering a parametric field description of the diffusion coefficient k(x): allthe possible configurations of the field are generated from a standard multivariate Gaussian ofdimension Nh (we also deal in this case with a discretization made by linear P1 finite elements).To reduce this complexity, we assume the field as generated by a multivariate Gaussian distributionwith covariance matrix

Cij = a exp−‖xi−xj‖

2b2 +cδij ∀i, j = 1, . . . ,Nh,

a, b, c > 0 and xiNh

i=1 are the nodes of the computational mesh. A further simplification isobtained by considering a Karhunen-Loève expansion of the random field, which identifies the dmost relevant (independent) eigenmodes ξ1, ξ2, . . . , ξd of the covariance operator C correspondingto the largest eigenvalues λ1(C) ≥ λ2(C) ≥ . . . ≥ λd(C). The same result can be achieved bycomputing the POD of a set of random fields generated accordingly to the proposed distribution.The field description then reduces to

k(x,µ) = 3.5 +

d∑

i=1

µi√

λiξi (41)

22

where µi, 1 ≤ i ≤ d, will play the role of identifiable parameters (each one a priori distributed asa standard Gaussian). In this way the sampling is done in a d-dimensional space, with d≪ Nh =1, 572. By taking for the covariance matrix a = 1, b = 0.6 and c = 10−8, we explain about the90% of the variance by taking d = 4 modes (see Fig. 9, top). We also consider the presence of fournuisance parameters (see Fig. 9, bottom), which describe the presence of localized distortions ziof the parametric field given by

4∑

i=1

µ4+izi(x) =

4∑

i=1

µ4+i exp

(

− (x− xi)2 + (y − yi)

2

0.025

)

, (42)

where x = [0.25, 0.25, 1.25, 1.25] and y = [0.25, 1.25, 1.25, 0.25].

Figure 9: Top: first most relevant modes ui, i = 1, . . . , 4, of the Karhunen-Loève expansion.Bottom: Four nuisance modes zi, i = 1, . . . , 4.

In this case, the inverse problem consists in identifying the posterior distributions of µi, 1 ≤i ≤ 4, by observing the outputs sj(µ), j = 1, 2, 3, in presence of the four nuisance parametersµi, 5 ≤ i ≤ 8. We consider as a target values s

∗ = s(µ∗), where µ∗ = [γ∗, ζ∗] and γ∗ =(−1, 0.5,−0.6,−0.8) whereas ζ∗ is chosen randomly. We build a RB approximation of the stateand the dual problems associated to each of the three outputs; n = 20 basis functions are selectedfor each of these four problems. This yields a reduction error on the outputs that has been treatedwith the proposed REMs, built on a calibration set of Ncal = 100 values.

We then apply the MCMC procedure and we evaluate the effect of the proposed REMs on theposterior distributions. Similar conclusions to those reported in Sect. 7 can be drawn also for thissecond test case. As shown in Fig. 10, REM-2 is more effective than REM-1; moreover, REM-3shows to be the most effective model in terms of correction of the RB posterior, possibly due tothe increased parametric dimension (see Fig. 11 for the case of the Bernoulli sign model; similarresults are obtained with the nearest neighbor sign model).

−1.2 −1 −0.80

20

µ1

−2 0 20

1

2

µ4

FERBREM-1REM-2

−2 0 20

1

2

µ3

−2 0 20

1

µ2

Figure 10: Marginal posterior distribution for the four parameters of interest obtained with theRB method and REM-1, REM-2 corrections. REM-2 yields a better correction (and less skeweddistributions) than REM-1.

We then report the MAP estimates for each identifiable parameters and the KL-divergences foreach REM in Table 3; the 95% prediction intervals for the MAP estimates are reported in Table 4;these results confirm the efficacy of the deterministic correction provided by REM-2 and REM-3.

23

−2 0 20

1

2

µ3

−1.2 −1 −0.80

20

µ1

−2 0 20

1

µ2

−2 0 20

1

2

µ4

FERBREM-3dREM-3s

Figure 11: Marginal posterior distribution for the four parameters of interest obtained with the RBmethod and REM-3 using a Bernoulli sign model. Both the deterministic (d) and the statistical(s) version of the REM yield a very good correction of the posterior distribution.

Moreover, regarding the identification of µ3 and µ4 the presence of astatistical correction does notlead to an increasing of KL divergence values.

µMAP1 DKL µMAP

2 DKL µMAP3 DKL µMAP

4 DKL

FE −0.987 0.482 −0.571 −0.743RB −0.967 0.327 0.139 0.241 −0.132 0.849 −0.476 0.439

REM-1 −0.988 0.575 0.356 0.713 −0.504 0.722 −0.528 0.657REM-2 −0.986 0.205 0.3047 0.132 −0.526 0.177 −0.732 0.121

REM-3 (near) −0.997 0.288 0.584 0.166 −0.657 0.179 −0.831 0.142REM-3 (near+var) −0.953 0.530 0.614 0.409 −0.687 0.432 −0.852 0.309

REM-3 (BE) −0.980 0.221 0.546 0.154 −0.632 0.195 −0.770 0.146REM-3 (BE+var) −0.997 0.753 0.488 0.330 −0.601 0.313 −0.760 0.390

Table 3: MAP estimates for the identifiable parameters and KL-divergences from the FE posteriordistributions.

µMAP1 PI µMAP

2 PI µMAP3 PI µMAP

4 PI

FE (−1.02,−0.95) (−0.52, 1.12) (−0.89,−0.16) (−1.23,−0.07)RB (−1.032,−0.944) (−0.624, 0.902) (−0.864, 0.361) (−0.967, 0.015)

REM-1 (−1.130,−0.938) (−2.675, 3.000) (−1.902, 0.335) (−2.488, 1.433)REM-2 (−1.045,−0.928) (−1.392, 1.707) (−1.182, 0.129) (−1.321, 0.458)

REM-3 (near) (−1.070,−0.924) (−0.906, 2.135) (−1.354,−0.021) −1.8094, 0.103REM-3 (near+var) (−1.017,−0.890) (−1.203, 2.371) (−1.433, 0.119) (−2.059, 0.396)

REM-3 (BE) (−1.067,−0.892) (−0.930, 2.021) (−1.395, 0.133) (−1.806, 0.266)REM-3 (BE+var) (−1.113,−0.881) (−1.151, 2.127) (−1.403, 0.200) (−2.020, 0.502)

Table 4: 95% prediction intervals for each MAP estimator for different REMs.

Finally, we evaluate the diffusivity field (41) for the different MAP estimates, recovered aposteriori by considering different REMs (see Fig. 12). In order to quantify the difference betweenthe target field and the fields obtained through the inversion procedure, we compute the L2(Ω)-norm of their difference (see Table 5). As expected, the most distant field is the one obtained withthe RB method without corrections, while the presence of the REMs improves substantially thereconstruction of the field; the best performance in term of parameters identification is providedby REM-3 based on a sign correction strategy.

RB REM-1 REM-2 REM-3(near) REM-3(Be)

‖ · ‖L2(Ω) 0.328 0.146 0.099 0.060 0.045

Table 5: L2(Ω)-norm of the distance between the reconstructed field for different REMs and thetarget one.

24

Figure 12: Diffusivity field reconstructed by means of the different REMs (parameters valuesidentified with maximum a posteriori estimators).

Approximation data test 1 test 2 Performances test 1 test 2

Number of FE dofs Nh 1, 056 1, 572 burn-in M 500 1000Number of RB dofs n 10 20 thinning d 50 100Dofs reduction 101:1 79:1 ROM solution 2 · 10−4 s 3 · 10−4 sNumber of parameters 3 8 ROM error bound 4 · 10−4 s 8.2 · 10−4 sFE solution 4 · 10−2 s 1.3 · 10−1 s REM-1 ≈ 0 s ≈ 0 sOffline: basis computation 134 s 776 s REM-2 7.5 · 10−4 s 1.32 · 10−3 sOffline: REM calibration 13 s 24 s REM-3 (near) 7 · 10−4 s 1.12 · 10−3 sMCMC iterations 5 · 105 106 REM-3 (Be) 4 · 10−4 s 8.2 · 10−4 s

Table 6: Computational performances of the proposed framework for both test cases

8 Conclusions

The combined RB/REM methodology proposed in this paper shows how to speed up the solution ofan inverse UQ problem dealing with PDEs without affecting the accuracy of the posterior estimates.Provided the RB dimension n is sufficiently small, inexpensive online evaluations can be performedduring the MCMC algorithm; this latter takes e.g. less than 1h in the first test case we considered,compared to more than 6h when relying on the full-order model. On the other hand, the REMcalibration – which has a negligible impact on the offline phase – is instrumental to avoid bias inthe computed posterior distributions. Indeed, provided that the REM is trained on a sufficientlylarge calibration set, the small RB dimension can be compensated with a suitable REM strategy,like the proposed REM-2 or REM-3 approaches. This latter turns out to be a promising optionalso in view of higher-dimensional problems.

The general trend arising from all our numerical experiments is therefore that REM-1 maybe treated as a catch-all method that can be used in the most general case without access toany a posteriori error estimators, but which relies heavily on the assumption of Gaussianity andmay not give good results when this assumption is strongly violated. On the other hand, theREM-2 makes no a priori structural assumptions (except smoothness of the inverse effectivityfunction) and provides excellent accuracy on low-dimensional problems, but relies on multivariateinterpolation that does not scale well to high-dimensional problems. The REM-3 produces lessaccurate reduction error approximations, but only uses regression and so it is more robust in termsof parametric dimension; this method too can exploit existing a posteriori error estimators incorrecting the posterior distributions.

25

References

[1] S. R. Arridge, J. P. Kaipio, V. Kolehmainen, M. Schweiger, E. Somersalo, T. Tar-

vainen, and V. Vauhkonen, Approximation errors and model reduction with an applicationin optical diffusion tomography, Inverse Problems, 22(1):175–195, 2006.

[2] I. Babuška, F. Nobile, and R. Tempone, A stochastic collocation method for ellipticpartial differential equations with random input data, SIAM Review, 52(2):317–355, 2010.

[3] H. T. Banks, M. L. Joyner, B. Wincheski, and W. P. Winfree, Nondestructive evalu-ation using a reduced-order computational methodology, Inverse Problems, 16(4):929, 2000.

[4] G. Berkooz, P. Holmes, J. L. Lumley, The proper orthogonal decomposition in the anal-ysis of turbulent flows, Annu. Rev. Fluid Mech., 25(1):539–575, 1993.

[5] H. J. Bungartz, M. Griebel, Sparse grids, Acta Numerica, 13: 147–269, 2004.

[6] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick,

Y. Marzouk, B. van Bloemen Waanders, and K. Willcox (Eds.), Large-Scale InverseProblems and Quantification of Uncertainty, pages 123–149. John Wiley & Sons, Ltd, 2010.

[7] J. Biehler, M. W. Gee, and W. A. Wall, Towards efficient uncertainty quantificationin complex and large-scale biomechanical problems based on a Bayesian multi-fidelity scheme,Biomech. Model. Mechanobiol., 14(3): 489–513, 2015.

[8] A. Birolleau, G. Poëtte, and D. Lucor, Adaptive Bayesian inference for discontinu-ous inverse problems, application to hyperbolic conservation laws, Commun. Comput. Phys.,16(1):1–34, 2014.

[9] M. D. Buhmann, Radial basis functions, Acta Numerica, 9: 1–38, 2000.

[10] T. Bui-Thanh, and O. Ghattas, An analysis of infinite dimensional Bayesian inverse shapeacoustic scattering and its numerical approximation, SIAM J. Uncert. Quant., submitted, 2012.

[11] M. Chevreuil, and A. Nouy, Model order reduction based on proper generalized decompo-sition for the propagation of uncertainties in structural dynamics, Int. J. Numer. Meth. Engng.,89(2):241–268, 2012.

[12] T. Cui, Y. M. Marzouk, and K. Willcox, Data-Driven Model Reduction for the BayesianSolution of Inverse Problems, arXiv:1403.4290 e-prints, 2014.

[13] M. Drohmann, and K. C. Carlberg, The ROMES method for statistical modeling ofreduced-order-model error, SIAM/ASA J. Uncer. Quantif., 3(1):116–145, 2015.

[14] D. Galbally, K. Fidkowski, K. Willcox, and O. Ghattas, Nonlinear model reductionfor uncertainty quantification in large-scale inverse problems, Int. J. Numer. Methods Engrg,81(12):1581–1608, 2010.

[15] R. G. Ghanem, and P. D. Spanos, Stochastic finite elements: a spectral approach, Springer,1991.

[16] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov Chain Monte Carlo inpractice, CRC press, volume 2., 1995.

[17] J. C. Helton, J. D. Johnson, C. J. Sallaberry, and C. B. Storlie, Survey of sampling-based methods for uncertainty and sensitivity analysis, Reliab. Eng. Syst. Saf., 91(10):1175–1209, 2006.

[18] J. Jakeman, M. Eldred, and D. Xiu, Numerical approach for quantification of epistemicuncertainty, J. Comput. Phys., 229:4648–4663, 2010.

26

[19] J. Kaipio, and E. Somersalo, Statistical and computational inverse problems, SpringerScience+Business Media, Inc., 2005.

[20] T. Lassila, A. Manzoni, A. Quarteroni, and G. Rozza, Generalized reduced basismethods and n-width estimates for the approximation of the solution manifold of parametricPDEs. Bollettino UMI, Serie IX, Vol. VI (1):113–135, 2013.

[21] T. Lassila, A. Manzoni, A. Quarteroni, and G. Rozza, A reduced computationaland geometrical framework for inverse problems in haemodynamics, Int. J. Numer. MethodsBiomed. Engng., 29(7):741–776, 2013.

[22] T. Lassila, A. Manzoni, A. Quarteroni, and G. Rozza, Model order reduction influid dynamics: challenges and perspectives. In A. Quarteroni and G. Rozza, editors, ReducedOrder Methods for Modeling and Computational Reduction, volume 9, pages 235–274. Springer,MS&A Series, 2014.

[23] Y. Li, and Y. M. Marzouk Adaptive construction of surrogates for the Bayesian solutionof inverse problems, arXiv:1309.5524 e-prints, 2013.

[24] J. S. Liu Monte Carlo strategies in Scientific Computing, Springer, New York, 2001.

[25] C. Lieberman, K. Willcox, and O. Ghattas, Parameter and state model reduction forlarge-scale statistical inverse problems, SIAM J. Sci. Comput., 32(5):2523–2542, 2010.

[26] A. Lipponen, A. Seppänen ,and J. P. Kaipio, Reduced-order model for electricalimpedance tomography based on proper orthogonal decomposition, arXiv:1207.0914 e-prints,2012.

[27] A. Manzoni, and F. Negri, Rigorous and heuristic strategies for the approximation ofstability factors in nonlinear parametrized PDEs, Adv. Comput. Math., 41(5):1255–1288, 2015.

[28] J. Martin, L. C. Wilcox, C. Burstedde, and O. Ghattas, A stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion, SIAMJ. Sci. Comput., 34(3):A1460–A1487, 2012.

[29] Y. M. Marzouk, and H. N. Najm, Dimensionality reduction and polynomial chaos accel-eration of Bayesian inference in inverse problems, J. Comput. Phys., 228:1862–1902, 2009.

[30] Y. M. Marzouk, and D. Xiu, A Stochastic Collocation Approach to Bayesian Inference inInverse Problems, Commun. Comput. Phys., 6(4):826–847, 2009.

[31] L. W. T. Ng, and M. S. Eldred, Multifidelity Uncertainty Quantification Using Nonin-trusive Polynomial Chaos and Stochastic Collocation, paper AIAA-2012-1852 in Proceedingsof the 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and MaterialsConference, 2012.

[32] A. Nissinen, V. Kolehmainen, and J. P. Kaipio, Reconstruction of domain boundaryand conductivity in electrical impedance tomography using the approximation error approach,Int. J. Uncert. Quant., 1(3):203–222, 2011.

[33] A.Paul-Dubois-Taine, and D. Amsallem, An adaptive and efficient greedy procedure forthe optimal training of parametric reducedâĂŘorder models, Int. J. Numer. Meth. Engng.,102(5):1262–1292, 2015.

[34] N. Petra, J. Martin, G. Stadler, and O. Ghattas, A computational framework forinfinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with ap-plication to ice sheet flow inverse problems, arXiv:1308.6221 e-prints, 2013.

[35] C. Prud’homme, D. V. Rovas, K. Veroy, L. Machiels, Y. Maday, A. T. Pat-

era, G. Turinici, Reliable real-time solution of parametrized partial differential equations:Reduced-basis output bound methods, J. Fluid Eng. 124(1): 70–80, 2002.

27

[36] A. Quarteroni, Numerical Models for Differential Problems, volume 8, Springer, MS&ASeries, 2014.

[37] A. Quarteroni, A. Manzoni, and F. Negri, Reduced Basis Methods for Partial Differ-ential Equations. An Introduction, volume 92, Springer, Unitext Series, 2016.

[38] A. Quarteroni, G. Rozza, and A. Manzoni, Certified Reduced Basis Approximationfor Parametrized Partial Differential Equations in Industrial Applications, J. Math. Ind., 3(1),2011.

[39] O. Roderick, M. Anitescu, and Y. Peet, Proper orthogonal decompositions in mul-tifidelity uncertainty quantification of complex simulation models, Int. J. Comput. Math.,91(4):748–769, 2014.

[40] C. Robert and G. Casella, Monte Carlo Statisical Methods, Springer, 2004.

[41] G. Rozza, D. B. P. Huynh, and A. T. Patera, Reduced basis approximation and a pos-teriori error estimation for affinely parametrized elliptic coercive partial differential equations,Arch. Comput. Methods Engrg.,15:229–275, 2008.

[42] Y. Serinagaoglu, D. H. Brooks, and R. S. MacLeod, Improved performance ofBayesian solutions for inverse electrocardiography using multiple information sources, IEEETrans. Biomed. Engng., 53(10):2024–2034, 2006.

[43] L. Sirovich, Turbulence and the dynamics of coherent structures, part i: Coherent structures,Quart. Appl. Math., 45(3):561–571, 1987.

[44] A. M. Stuart, Inverse problems: a Bayesian Perspective, Acta Numer., 19(1):451–559, 2010.

[45] A. Tarantola, Inverse Problem Theory and Methods for Model Parameter Estimation,Society for Industrial and Applied Mathematics, 2004.

[46] K. Veroy, C. Prud’homme, D. V. Rovas, A. T. Patera, A posteriori error bounds forreduced basis approximation of parametrized noncoercive and nonlinear elliptic partial differ-ential equations. In: Proceedings of the 16th AIAA Computational Fluid Dynamics Conference(2003). Paper 2003-3847.

[47] D. Xiu, and J. S. Hesthaven, High-order collocation methods for differential equationswith random inputs, SIAM J. Sci. Comput., 27(3):1118–1139, 2005.

28

Recent publications:

MATHEMATICS INSTITUTE OF COMPUTATIONAL SCIENCE AND ENGINEERING Section of Mathematics

Ecole Polytechnique Fédérale CH-1015 Lausanne

34.2014 ANDRE USCHMAJEW: A new convergence proof for the high-order power method and generalizations 35.2014 ASSYR ABDULLE, ONDREJ BUDÁC: A Petrov-Galerkin reduced basis approximation of the Stokes equation in parametrized

geometries 36.2014 ASSYR ABDULLE, MARTIN E. HUBER: Error estimates for finite element approximations of nonlinear monotone elliptic

problems with application to numerical homogenization 37.2014 LARS KARLSSON, DANIEL KRESSNER, ANDRÉ USCHMAJEW: Parallel algorithms for tensor completion in the CP format 38.2014 PAOLO TRICERRI, LUCA DEDÈ, SIMONE DEPARIS, ALFIO QUARTERONI, ANNE M.

ROBERTSON, ADÉLIA SEQUEIRA: Fluid-structure interaction siumlations of cerebral arteries modeled by isotropic and

anistropic constitutive laws 39.2014 FRANCESCA BONIZZONI, FABIO NOBILE, DANIEL KRESSNER: Tensor train approximation of moment equations for the log-normal Darcy problem 40.2014 DIANE GUIGNARD, FABIO NOBILE, MARCO PICASSO: A posteriori error estimations for elliptic partial differential equations with small

uncertainties 41.2014 FABIO NOBILE, LORENZO TAMELLINI , RAÚL TEMPONE: Comparison of Clenshaw-Curtis and Leja quasi-optimal sparse grids for the

approximation of random PDEs 42.2014 ASSYR ABDULLE, PATRICK HENNING: A reduced basis localized orthogonal decomposition 43.2014 PAOLA F. ANTONIETTI, ILARIO MAZZIERI, ALFIO QUARTERONI: Improving seismic risk protection through mathemactical modeling 44.2014 LUCA DEDÈ, ALFIO QUARTERONI, SEHNGFENG ZHU: Isogeometric analysis and proper orthogonal decomposition for parabolic problems 45.2014 ZVONIMIR BUJANOVIC, DANIEL KRESSNER: A block algorithm for computing antitriangular factorizations of symmetric matrices 46.2014 ASSYR ABDULLE: The role of numerical integration in numerical in numerical homogenization 47.2014 NEW ANDREA MANZONI, STEFANO PAGANI, TONI LASSILA:

Accurate solution of Bayesian inverse uncertainty quantification problems combining reduced basis methods and reduction error models

accurate solution of bayesian inverse uncertainty...

Documents