multilevel double loop monte carlo and stochastic ... · multilevel double loop monte carlo and...

Multilevel Double Loop Monte Carlo and Stochastic Collocation

Methods with Importance Sampling for Bayesian Optimal

Experimental Design

Joakim Beck ∗1, Ben Mansour Dia †2, Luis F.R. Espath ‡1, and Raul Tempone §1,3

1King Abdullah University of Science and Technology (KAUST), Computer, Electrical andMathematical Science and Engineering Division (CEMSE), Thuwal, 23955-6900, Saudi Arabia

2Center for Integrative Petroleum Research (CIPR), College of Petroleum Engineering andGeosciences, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261,

Saudi Arabia3Alexander von Humboldt Professor in Mathematics of Uncertainty Quantification, RWTH

Aachen University, 52062 Aachen, Germany

February 4, 2020

Abstract

An optimal experimental set-up maximizes the value of data for statistical inferences and predic-tions. The efficiency of strategies for finding optimal experimental set-ups is particularly importantfor experiments that are time-consuming or expensive to perform. For instance, in the situationwhen the experiments are modeled by Partial Differential Equations (PDEs), multilevel methodshave been proven to dramatically reduce the computational complexity of their single-level coun-terparts when estimating expected values. For a setting where PDEs can model experiments, wepropose two multilevel methods for estimating a popular design criterion known as the expectedinformation gain in simulation-based Bayesian optimal experimental design. The expected infor-mation gain criterion is of a nested expectation form, and only a handful of multilevel methodshave been proposed for problems of such form. We propose a Multilevel Double Loop MonteCarlo (MLDLMC), which is a multilevel strategy with Double Loop Monte Carlo (DLMC), anda Multilevel Double Loop Stochastic Collocation (MLDLSC), which performs a high-dimensionalintegration by deterministic quadrature on sparse grids. For both methods, the Laplace approxima-tion is used for importance sampling that significantly reduces the computational work of estimatinginner expectations. The optimal values of the method parameters are determined by minimizingthe average computational work, subject to satisfying the desired error tolerance. The computa-tional efficiencies of the methods are demonstrated by estimating the expected information gainfor Bayesian inference of the fiber orientation in composite laminate materials from an electricalimpedance tomography experiment. MLDLSC performs better than MLDLMC when the regularityof the quantity of interest, with respect to the additive noise and the unknown parameters, can beexploited.

Keywords: Electrical impedance tomography, Expected information gain, Importance sampling,Multilevel, Stochastic collocation

AMS 2010 subject classification: 62K05, 65N21, 65C60, 65C05

∗[email protected]†[email protected]‡[email protected]§[email protected]

1

arX

iv:1

811.

1146

9v3

[m

ath.

NA

] 3

Feb

202

0

1 Introduction

Experiments are meant to provide meaningful information about selected quantities of interest. Anexperiment may assume different set-ups in a broad sense, and can be time consuming or expensive toperform. Therefore, the design of experiments plays an important role in improving the informationgain of the experiment; a comprehensive review of utility functions and their computational algorithmsfor Bayesian optimal experimental design is available in [33]. Bayesian optimal experimental designinvolves the task of designing experiments with the objective of maximizing the value of data for solvinginverse problems, in a Bayesian sense. Recent work on Bayesian alphabetical optimal experimentaldesign includes [1, 2, 3, 12, 39, 42]. A widely popular Bayesian information-theoretic utility function fornonlinear models known as Expected Information Gain (EIG) was introduced in 1956 by Lindley [27]for measuring the amount of information provided by an experiment. Lindley defined the EIG utilityfunction as the average, with respect to our prior knowledge expressed through a prior probabilitydensity function (pdf), of the relative information entropy, based on the Shannon information entropy[37], of the prior pdf with respect to the posterior pdf. The prior and posterior pdfs express ourknowledge before and after performing the experiment, respectively. The relative information entropyis equivalent to the Kullback-Leibler divergence [24, 25] of the prior pdf from the posterior pdf. TheEIG criterion is computationally challenging to compute since it is a nested expectation of the formE[f1(X1)/E[f2(X1,X2)|X1]], where fi are real-valued functions and Xi are random variables. As analternative to direct estimation of EIG, a lower bound estimate can be used as a design criterion, asproposed by Tsilifis et al. [41], it provides an approximate solution to the original design problem andrequires less computational work to solve. In the present work, the goal is to develop computationallyefficient estimators of the EIG criterion that should satisfy a specified accuracy requirement. Thesubsequent, albeit important, task of efficiently maximizing the expected information gain in order tofind the most informative experimental set-up is beyond the scope of our study. Efficient optimizationstrategies on continuous design spaces include stochastic gradient methods (e.g., [9, 22, 23]) and theapproximate coordinate exchange algorithm (e.g., [30, 32]).

Ryan [35] applied a Double Loop Monte Carlo (DLMC) estimator for the EIG criterion, whichentails applying Monte Carlo (MC) sampling for estimating the outer expectation and, for each outersample, MC sampling is performed to estimate the inner expectation. The DLMC estimator is highlycomputationally taxing and yields a bias due to the inner averaging. The computational work requiredby DLMC can be reduced by approximating the inner expectation by the Laplace method [26, 28, 29]instead of MC sampling, but comes at the price of a bias that is challenging to control. An alternativeto the Laplace method is the Laplace-based importance sampling presented in [34], which dramaticallyimproves the sampling efficiency for the inner MC sampling of DLMC. In [6], an optimization strategyfor estimating the EIG criterion with error control was derived for DLMC Laplace-based ImportanceSampling (DLMCIS) that for a desired error tolerance minimizes the computational work, and it wasshown that the estimator requires vastly fewer inner-loop samples compared to the standard DLMC,typically by orders of magnitude, for the same accuracy. Huan et al. [23] proposed an approximation ofEIG by standard DLMC by first replacing the underlying model by a polynomial approximation, usingpolynomial chaos expansions with pseudo-spectral projection, of the model’s input-output mapping.This approach requires substantially fewer model evaluations than DLMC, but the approximationerror is not fully estimated nor controlled.

In this work, we improve upon DLMCIS [6] by employing multilevel techniques [15]. MultilevelMonte Carlo (MLMC) estimators (e.g., [10, 15, 19]) have been widely used for estimating expectationsof quantities of interest that depends on the solution of Partial Differential Equations (PDEs) as itaccelerates the computations of expectations by using control variates, which are based on successivedifferences of a sequence of increasingly-refined meshes, to reduce the variance of MC estimators. TheEIG criterion is of a nested expectation form and only a few multilevel methods have been proposedfor quantities of interest of such form; see, e.g., Section 9 of [15] for nested MLMC simulation and

2

[16] for nested MLMC for efficient risk estimation. We propose two multilevel estimators for EIG:Multilevel Double Loop Monte Carlo (MLDLMC) and Multilevel Double Loop Stochastic Collocation(MLDLSC). MLDLMC uses a multilevel strategy with DLMCIS and the level defines the resolutionof the numerical discretization of the underlying model and the number of inner-loop samples. Asfor the work complexity of MLDLMC, we provide an upper bound of the total work for a prescribeddesired accuracy. A recent work by Godas et al. [17] proposed, independently of this work, anMLMC estimator using the Laplace-based importance sampling for the EIG criterion. They proposean MLMC estimator with an antithetic technique and where the number of inner-loop samples definesthe level. The problem setting between this work and that of [17] is significantly different as weconsider the case where the solution to the underlying model needs to be approximated instead ofbeing known exactly. Hence, our multilevel methods have a multilevel hierarchy that controls thenumber of inner samples as well as the mesh discretization. MLDLSC, the other method proposed,uses Multi-Index Stochastic Collocation (MISC) [18] for the outer expectation of EIG and full-tensorstochastic collocation, see, e.g., [4, 8], for approximating the inner expectations. MLDLSC provides anerror control of the quantity of interest as well as reducing the computational work for a given accuracyrequirement, compared to [23], as it combines model evaluations at different mesh resolutions, withonly a small number of those evaluations are on fine meshes, which are those that are more expensiveto evaluate. MLDLSC uses the Laplace-based importance sampling [6] for the inner expectation.

To assess the computational efficiency of our proposed methods, we consider an electrical impedancetomography (EIT) problem in which we infer the angle of fibers in a composite laminate material.The composite laminate has four plies, and five electrodes are deployed on each side of the plate. Eachply of the composite laminate is an orthotropic layer with its fibers uniformly distributed along onepredetermined direction. The electrodes inject electrical current and measure the electrical potential,which in turn is used to infer the material properties. We adopt the complete electrode model (CEM)[38] to simulate EIT experiments for composite laminate materials. The experiment for numericaldemonstration consists of a composite laminate with four plies, where five electrodes are deployedon each side of the plate to inject current and measure the potential. The goal of the experiment isto gain information about the fiber orientations in the composite laminate material from the mea-sured potential. MLDLMC and MLDLSC are applied to efficiently estimating the EIG for a givenexperiment set-up.

The outline of the paper is as follows. In Section 2.1, we present the EIG criterion and the un-derlying data model assumption. In Section 2.2, we detail the numerical discretization approximationof the EIG. The Laplace-based importance sampling that is used in both of the proposed multilevelmethods is given in Section 2.3. The MLDLMC estimator is presented in Section 3.1. Then, inSection 3.2, the bias, variance, and the work of DLMCIS is analyzed, and then used in Section 3.3when determining suitable values for method parameters in the MLDLMC estimator by minimizingthe computational work for a specified accuracy requirement. In Section 3.4, the work complexityof MLDLMC, with respect to a desired error tolerance, is compared to that of the standard MLMC.The proposed MLDLSC is described in 4. Finally, in Section 5, we provide a numerical comparisonof the computational performances of the two methods, MLDLMC and MLDLSC, for estimating theexpected information gain for an EIT experiment.

2 Problem setting

2.1 Bayesian optimal experimental design

In this work, we consider the data model

Y (θ, ξ) = G(θt, ξ) + ε, (1)

3

where Ydef= (y1, . . . ,yi, . . . ,yNe), yi ∈ Rq are observed experiment responses, Ne is the number of

repeated experiments, G(θt, ξ)def= g(θt, ξ)1 with 1

def= (1, . . . , 1), g(θt, ξ) ∈ Rq is the column vector

of forward model outputs, θt ∈ Rd is the true parameter, ξ ∈ Ξ is the design parameter, Ξ is theexperimental design space, and ε

def= (ε1, . . . , εi, . . . , εNe), where εi ∈ Rq are independent and identically

distributed (i.i.d.) zero-mean Gaussian errors with the covariance matrix Σε, that is, εii.i.d.∼ N (0,Σε)

and the distribution πε(ε) =∏Nei=1 πεi(εi).

We consider the case when the parameter θt is unknown. To this end, we treat θt as a randomparameter, θ ∈ Θ, with prior distribution π(θ), defined on the space Θ ⊆ Rd.

The goal of Bayesian optimal experimental design is to determine the optimal set-up of an exper-iment as defined by the design parameter ξ for Bayesian inference of θt. The information gain for agiven experimental design, ξ, is measured by the Kullback-Leibler divergence [24], which is based onthe Shannon entropy [37]. The Kullback-Leibler divergence, denoted by DKL(π(θ|Y , ξ) ‖ π(θ)), is adistance measure between prior π(θ) and posterior π(θ|Y , ξ) pdfs, i.e.,

DKL(π(θ|Y , ξ) ‖ π(θ))def=

∫Θπ(θ|Y , ξ) log

(π(θ|Y , ξ)

π(θ)

)dθ. (2)

The larger the value of DKL, the more informative the given experiment is about the unknown pa-rameter θt. The information gains for different designs, ξ, are mutually independent of each other.Henceforth, we omit dependences on ξ for the sake of conciseness. Since Y is not available to usduring the design selection, we work with the expected value of DKL,

Idef=

∫YDKL(π(θ|Y ) ‖ π(θ))p(Y ) dY =

∫Y

∫Θ

log

(π(θ|Y )

π(θ)

)π(θ|Y ) dθp(Y ) dY

=

∫Θ

∫Y

log

(p(Y |θ)

p(Y )

)p(Y |θ) dY π(θ) dθ, (3)

which is also known as Expected Information Gain (EIG) [27], and this is the design criterion consid-ered in this work for Bayesian optimal experimental design. The latter equality follows from Bayes’rule, and p(Y ) denotes the pdf of Y over the support Y def

= Rq×Ne . In accordance with the data model(1), the likelihood, denoted by p(Y |θ), is

p(Y |θ)def= det (2πΣε)

−Ne2 exp

(−1

2

Ne∑i=1

‖yi − g(θ)‖2Σ−1

ε

),

where the matrix norm is ‖x‖2Σ−1

ε= xTΣ−1

ε x for a vector x and covariance matrix Σε. For notational

convenience, we introduce the conditional expectation,

Z(θ)def= E [f(Y ,θ)|θ] =

∫Y

log

(p(Y |θ)

p(Y )

)p(Y |θ) dY

=

∫E

log

πε(ε)∫Θ det (2πΣε)

−Ne2 exp

(−1

2

Ne∑i=1‖g(θ) + εi − g(θ′)‖2

Σ−1ε

)π(θ′) dθ′

πε(ε) dε, (4)

where

f(Y ,θ)def= log

(p(Y |θ)

p(Y )

). (5)

Using (4), we formulate the EIG criterion (3) as

I = E [Z(θ)] . (6)

4

2.2 Numerical approximation of expected information gain

We consider the situation when model g needs to be evaluated by a numerical approximation throughdiscretization in space of g, denoted by g`, with an accuracy controlled by the mesh-element size,denoted by h` > 0. We consider a sequence of such discretization-based approximations, g`∞`=0, withdecreasing mesh-element size, i.e., h` < h`−1.

Assumption 1 (Convergence properties of g`).

[Weak convergence] ∃Cw > 0, ηw > 0 such that

‖E[g`(θ)− g(θ)]‖Σ−1ε≤ Cwhηw` , (7)

for all ` as h` → 0.[Strong convergence] For p ≥ 2, ∃Cs > 0, ηs > 0 such that

E[‖g`(θ)− g`−1(θ)‖pΣ−1

ε]1p ≤ Cshηs` , (8)

for all ` > 0 as h` → 0.[Average computational work] ∃γ > 0 such that

W (g`) ∝ h−γ` , (9)

with respect to ` as h` → 0, wherein W (·) denotes the average computational work.

Assumption 1 states some necessary assumptions on g` that will be used later on. Furthermore, g`needs to be twice differentiable with respect to θ and uniformly bounded by some constant independentof `. Since we have assumed that we only work with the approximations of g, let us introduce thedata model

Y = G`(θ) + ε, (10)

where G`(θ)def= g`(θ)1 approximates forward model G(θ). The approximate EIG criterion (6), given

the approximate data model (10), is defined as follows:

I`def=

∫ΘZ`(θ)π(θ) dθ, (11)

where

Z`(θ)def=

∫Yf`(Y ,θ)p`(Y |θ) dY

=

∫E

log

πε(ε)∫Θ det (2πΣε)

−Ne2 exp

(−1

2

Ne∑i=1‖g`(θ) + εi − g`(θ′)‖2Σ−1

ε

)π(θ′) dθ′

πε(ε) dε, (12)

from using g` in Z(θ), and the log-ratio f is approximated by

f`(Y ,θ)def= log

(p`(Y |θ)

p`(Y )

). (13)

We define an approximate likelihood by

p`(Y |θ)def= det (2πΣε)

−Ne2 exp

(−1

2

Ne∑i=1

‖yi − g`(θ)‖2Σ−1

ε

), (14)

and we approximate the evidence, for any Y satisfying the approximate data model (10), as

p`(Y )def=

∫Θp`(Y |θ′)π(θ′)dθ′ =

∫Θ

det (2πΣε)−Ne

2 exp

(−1

2

Ne∑i=1

∥∥g`(θ) + εi − g`(θ′)∥∥2

Σ−1ε

)π(θ′) dθ′.

5

2.3 Laplace-based importance sampling in the expected information gain

Whenever the posterior distribution, π`(θ|Y ) = p`(Y |θ)π(θ)/p`(Y ), can be well approximated by amultivariate normal distribution, we advocate using the Laplace-based importance sampling [6, 34].More specifically, we introduce an importance sampling distribution, denoted by π`(θ|Y ), to computethe approximate evidence p`(Y ) as follows:

p`(Y ) =

∫Θp`(Y |θ′)π(θ′)dθ′ =

∫Θp`(Y |θ′)R`(θ′;Y )π`(θ

′|Y )dθ′

=

∫Θ

det (2πΣε)−Ne

2 exp

(−1

2

Ne∑i=1

∥∥g`(θ) + εi − g`(θ′)∥∥2

Σ−1ε

)R`(θ

′;G`(θ) + ε)π`(θ′|G`(θ) + ε) dθ′,

(15)

where the ratio, R`, isR`(θ;Y )

def= π(θ)/π`(θ|Y ). (16)

The Laplace-based importance sampling measure, π`, is a multivariate normal pdf, denotedby N (θ`(Y ),Σ`(θ`(Y ))), i.e.,

π`(θ|Y )def= det

(2πΣ`(θ`(Y ))

)− 12

exp

(−1

2

∥∥∥θ − θ`(Y )∥∥∥2

Σ−1` (θ`(Y ))

), (17)

where θ`(Y ) is the maximum a posteriori (MAP) estimate,

θ`(Y )def= arg min

θ∈Θ

[1

2

Ne∑i=1

‖yi − g`(θ)‖2Σ−1

ε− log(π(θ))

], (18)

and, as shown in [28], the covariance is the inverse Hessian matrix of the negative logarithm of theposterior pdf,

Σ`(θ)def=(NeJ`(θ)TΣ−1

ε J`(θ)−∇θ∇θ log(π(θ)))−1

+OP

(1√Ne

), (19)

where J`(θ)def= −∇θg`(θ). Note that θ` depends on the data Y . Moreover, (19) says that the larger

the number of repetitive experiments Ne, the more accurately we can approximate the covarianceΣ`(θ`(Y )) of the importance-sampling pdf, π`.

3 Multilevel Double Loop Monte Carlo

The standard MLMC [14, 20] has been widely applied and extended to various problems [15]. Theidea behind multilevel methods is to not only compute the expectation of the quantity of interestusing g` for a fine mesh-element size h`, but instead reducing the computational work complexity bydistributing the computations over a sequence of L + 1 mesh-element sizes, h`L`=0, from coarse tofine meshes, and then combine the results. Multilevel methods distribute the computational workloadsuch that the majority of the model evaluations are on the coarser meshes. The classical choice ofdecreasing sequence is

h`def= β−`h0, for some β ∈ N+; (20)

typically β = 2, i.e., to progressively halve the size with increasing `, and with h0 being the coarsestmesh-element size considered.

6

3.1 Multilevel Double Loop Monte Carlo (MLDLMC) estimator

The approximate EIG (11) at level L can be written as the telescopic sum

IL =

L∑`=0

E [∆Z`(θ)], (21)

where the index ` is here referred to as the “level,” and

∆[Z`(θ)]def=

Z`(θ)−Z`−1(θ) if ` > 0,

Z`(θ) if ` = 0.(22)

The function Z` depends on f`, (12), and, in turn, f` depends on the approximate evidence p` (15).We estimate the approximate evidence by MC sampling with the Laplace-based importance samplingdescribed in Section 2.3, i.e., for each θ

i.i.d.∼ π(θ) and εii.i.d.∼ N (0,Σε), sample Y = G`(θ) + ε, then

compute the approximate evidence as follows:

p`(Y ,θ; θm)def=

1

M`

M∑m=1

p`(Y |θm)R`(θ;Y ) ≈ p`(Y ), (23)

where θmi.i.d.∼ π`(θ|Y ) = N (θ`(Y ),Σ(θ`(Y ))) from expressions (17) and (18), the number of samples

is denoted by M`, and R`(θ;Y ) is given in (16). As in standard MLMC [14, 15], we apply sampleaveraging to the L+1 telescopic, conditional expectation, differences, to obtain an MLDLMC estimatorfor the EIG criterion defined in (3). Let us introduce

f`(Y ,θ; θm)def= log

(p`(Y |θ)

p`(Y ,θ; θm)

). (24)

Then, by using the approximate EIG (21) with data following (10) for each level ` in (21), theMLDLMC estimator for the EIG criterion (3) reads

IMLDLMCdef=

1

N0

N0∑n=1

f0(Y(0)

0,n ,θ0,n; θ0,n,mM0m=1)

+

L∑`=1

1

N`

N∑n=1

[f`(Y

(`)`,n ,θ`,n; θ`,n,mM`

m=1)− f`−1(Y(`−1)`,n ,θ`,n; θ`,n,m

M`−1

m=1 )], (25)

where Y(k)`,n = Gk(θ`,n)+ε`,n

i.i.d.∼ pk(Y |θ`,n), θ`,ni.i.d.∼ π(θ), θk,n,m

i.i.d.∼ πk(θ|Y(k)`,n ) = N (θ(Y

(k)`,n ),Σ(θ(Y

(k)`,n ))),

and θ`,n,mM`−1

m=1 ⊆ θ`,n,mM`m=1. Here the superscript of Y (k) implies that the data Y depends on

pk, as defined in (14).

Remark 1 (Choice of θn). As shown in [6, 34], a Laplace-based importance sampling centered on

the MAP estimate θndef= θ(Yn) can drastically reduce the number of inner samples. In fact, it was

demonstrated in [6] that even a few samples can be sufficient for moderate error tolerances, which isequivalent to using the Laplace method as in [28] but centered on θn instead of θn, where θn is theparameter that is used to generate the data Yn. To estimate θn, we require additional evaluations ofthe forward model for each outer sample. The search for θn by solving the optimization problem (18) issubstantially reduced when initialized at θn. As mentioned above, an alternative approach is to centerthe new measure on θn, but this is a less accurate approximation because the discrepancy between θnand the MAP estimate θn may be large, risking underflow, which was discussed in detail in [6].

7

3.2 Bias, variance and work analysis

The bias, variance and computational work of the MLDLMC estimator, (25), need to be analyzed.MLDLMC is a consistent estimator, i.e., the bias goes to zero asymptotically, and its bias can bebounded from above by, as L→∞ and ML →∞,

|I − E [IMLDLMC] | / C2hηwL + C1M

−1L , (26)

with

C1 =1

2E[V[p(Y |θ)

p(Y )|Y]], (27)

and C2 being the constant for the weak convergence of I`. The upper bound for the bias (26) is fromthe bias result in Proposition 1 [6] for the DLMC estimator with mesh-element size hL. The varianceof the MLDLMC estimator is

V [IMLDLMC] =V0

N0+

L∑`=1

V`N`, (28)

since the samples for each level are mutually independent to the those of the other levels. Here V0

denotes the variance of the DLMC estimator at the coarsest level (` = 0),

V0def= V

[f0(Y

(0)0,n ,θ0,n; θ0,n,mM0

m=1)], (29)

and, as shown in Proposition 1 [6], as M0 →∞ the variance V0 behaves as

V0 ≈ C3 +C4

M0, (30)

for some constants C3, C4 > 0, where C4 C3 due to the Laplace-based importance sampling [6].The expectation and the variance of the telescopic, conditional expectation, difference at level ` aredefined by

E`def= E

[f`(Y

(`)`,n ,θ`,n; θ`,n,mM`

m=1)− f`−1(Y(`−1)`,n ,θ`,n; θ`,n,m

M`−1

m=1 )]

for ` > 0, (31)

andV`

def= V

[f`(Y

(`)`,n ,θ`,n; θ`,n,mM`

m=1)− f`−1(Y(`−1)`,n ,θ`,n; θ`,n,m

M`−1

m=1 )], for ` > 0, (32)

respectively. Theorem 1 provides an asymptotic upper bound to V` for ` > 0.

Theorem 1. Given Assumption 1, then for ` > 0,

V` /

[M`

(1

M`− 1

M`−1

)2

+h2ηs`

M`−1+M` −M`−1

M2`−1

]+ h2ηw

` , (33)

as M0 →∞.

Proof. The goal is to find an upper bound for the variance of the differences in the MLDLMC estimator(25) denoted by V`, i.e.,

V`def= V

[f`(Y ,θ; θ`,n,mM`

m=1)− f`−1(Y ,θ; θ`,n,mM`−1

m=1 )]

= V

[log

(p`(Y ,θ; θmM`

m=1)

p`−1(Y ,θ; θmM`−1

m=1 )

)].

(34)

8

First, consider the ratio between the approximate evidences evaluated for ` and `− 1,

log

(p`(Y ,θ; θmM`

m=1)

p`−1(Y ,θ; θmM`−1

m=1 )

), (35)

where

p`(Y ,θ; θmM`m=1) ∝ 1

M`

M∑m=1

exp

(−1

2

Ne∑i=1

∥∥∥y(`)i,`,n − gi,`(θ`,n,m)

∥∥∥2

Σ−1ε

)R`,n,m︸︷︷︸

def=X

(`)`,n,m

, (36)

and denote µ(`)`,n

def= E[X

(`)`,n,m|θ`,n, ε`,n] and R`,n,m

def= R`(θ`,n,m;Y

(`)`,n ). Similarly, consider

p`−1(Y ,θ; θmM`−1

m=1 ) ∝ 1

M`−1

M∑m=1

exp

(−1

2

Ne∑i=1

∥∥∥y(`−1)i,`,n − gi,`(θ`,n,m)

∥∥∥2

Σ−1ε

)R`,n,m︸︷︷︸

def=X

(`−1)`,n,m

, (37)

and denote µ(`−1)`,n

def= E[X

(`−1)`,n,m |θ`,n, ε`,n]. The approximate evidence ratio (35) is then rewritten as

log

(p`(Y ,θ; θmM`

m=1)

p`−1(Y ,θ; θmM`−1

m=1 )

)= log

µ(`)`,n

µ(`−1)`,n

− 1

+ 1

+ log

1

M`

∑M`m=1

(X

(`)`,n,m − µ

(`)`,n

)µ

(`)`,n

+ 1

− log

1

M`−1

∑M`−1

m=1

(X

(`−1)`,n,m − µ

(`−1)`,n

)µ

(`−1)`,n

+ 1

. (38)

We will use that M` → ∞ for all ` since M0 → ∞ and M` ≥ M`−1 for ` > 0. Then, by using Taylorexpansion, log(x+ 1) ∼ x in a neighborhood of zero, we are led to approximate (38) by

log

(p`(Y ,θ; θmM`

)

p`−1(Y ,θ; θmM`−1)

)−

µ(`)n

µ(`−1)`,n

− 1

∼ 1

µ(`)`,nM`

M∑m=1

(X

(`)`,n,m − µ

(`)`,n

)− 1

µ(`−1)`,n M`−1

M`−1∑m=1

(X

(`−1)`,n,m − µ

(`−1)`,n

),

=

1

µ(`)`,nM`

− 1

µ(`−1)`,n M`−1

M∑m=1

(X

(`)`,n,m − µ

(`)`,n

)

+1

µ(`−1)`,n M`−1

M`−1∑m=1

[(X

(`)`,n,m −X

(`−1)`,n,m

)−(µ

(`)`,n − µ

(`−1)`,n

)]

+

M∑m=M`−1+1

(X

(`)`,n,m − µ

(`)`,n

), (39)

where we denote the first term as A and the sum of the last two as B. Thus,

V[A+B] = V[A] + V[B] + 2C[A,B] ≤ 2(V[A] + V[B]). (40)

9

Next, denoting for any random variable Z conditional expectations and variances by

E`,n [Z]def= E [Z|θ`,n, ε`,n] , V`,n [Z]

def= V [Z|θ`,n, ε`,n] ,

we estimate the variance of expression (39) by combining it with expression (40), we arrive at thefollowing conditional variance estimate:

V`,n

[log

(p`(Y ,θ; θmM`

m=1)

p`−1(Y ,θ; θmM`−1

m=1 )

)]≤2

1

µ(`)`,nM`

− 1

µ(`−1)`,n M`−1

2

M`V`,n[X

(`)`,n,m − µ

(`)`,n

]

+4

(µ(`−1)`,n M`−1)2

M`−1V`,n

[X

(`)`,n,m −X

(`−1)`,n,m

]+ (M` −M`−1)V`,n

[X

(`)`,n,m − µ

(`)`,n

]. (41)

Now, to estimate V`,n[X

(`)`,n,m −X

(`−1)`,n,m

]above, we consider the difference

E def=(X

(`)`,n,m −X

(`−1)`,n,m

)R`,n,m = (exp(T`)− exp(T`−1))R`,n,m,

where T`def= X

(`)`,n,m and similarly T`−1

def= X

(`−1)`,n,m . The difference E can be rewritten as

E = (exp(T` − T`−1)− 1) exp(T`−1)R`,n,m. (42)

Then, by a Taylor expansion exp(x) − 1 ∼ x in a neighborhood of zero, we are led to approximate(42) by

E ∼ (T` − T`−1) exp(T`−1)R`,n,m. (43)

Thus, the variance of E can be bounded as

V`,n[E ] . E[(T` − T`−1)2 exp(2T`−1)R2`,n,m]. (44)

Insert in the difference T`− T`−1 the definitions of X(`)`,n,m in (36) and X

(`)`−1,n,m in (37), and we obtain

T` − T`−1 = −1

2

Ne∑i=1

(∥∥∥y(`)i,`,n − gi,`(θ`,n,m)

∥∥∥2

Σ−1ε

−∥∥∥y(`−1)

i,`,n − gi,`−1(θ`,n,m)∥∥∥2

Σ−1ε

)︸︷︷︸

F

, (45)

where F can be written as the following inner product in the Σ−1ε norm,

F =

( g`︷︸︸︷g`(θ`,n)− g`−1(θ`,n)− (g`(θ`,n,m)− g`−1(θ`,n,m)),

⊕g`︷︸︸︷g`(θ`,n) + g`−1(θ`,n)− (g`(θ`,n,m − g`−1(θ`,n,m)) + 2ε`,n

)Σ−1

ε

. (46)

Also, using Cauchy–Schwartz, we bound F 2 by

F 2 ≤ ‖g`‖2Σ−1ε‖⊕g`‖2Σ−1

ε. (47)

10

Next, to bound the variance of the approximation of E in (43), for some suitable 0 < p, q such that1/p+ 1/q = 1, we use Holder inequality yielding

V`,n[e] . E`,n[‖g`‖2Σ−1ε‖⊕g`‖2Σ−1

εexp(2T`−1)R2

`,n,m],

. E`,n[‖g`‖2pΣ−1ε

]1/p E`,n[‖⊕g`‖2qΣ−1ε

exp(2qT`−1)R2q`,n,m]1/q,

. h2ηs` , (48)

and the constants in the inequalities (48) are integrable and depends on θ`,n, Y ``,n and Y `−1

`,n . Thus,the variance of the ratio between the evidences in (39) is bounded by

V`,n = V`,n

[log

(p`(Y ,θ; θmM`

m=1)

p`−1(Y ,θ; θmM`−1

m=1 )

)]/M`

1

µ(`)`,nM`

− 1

µ(`−1)`,n M`−1

2

+h2ηs`

µ(`−1)`,n M`−1

+M` −M`−1

(µ(`−1)`,n M`−1)2

,

/1

µ(`)`,n

[M`

(1

M`− 1

M`−1

)2

+h2ηs`

M`−1+M` −M`−1

M2`−1

].

(49)

To conclude on the optimal work, we need to estimate the total variance corresponding to level `,namely

V` = E[V`,n

[log

(p`(Y ,θ; θmM`

)

p`−1(Y ,θ; θmM`−1)

)]]+ V

[E`,n

[log

(p`(Y ,θ; θmM`

)

p`−1(Y ,θ; θmM`−1)

)]].

Combining (49) with

∣∣∣∣E`,n [log

(p`(Y ,θ; θmM`

)

p`−1(Y ,θ; θmM`−1)

)]∣∣∣∣ ∼∣∣∣∣∣∣ µ

(`)`,n

µ(`−1)`,n

− 1

∣∣∣∣∣∣ ∼ hηw`yields

V` /

[M`

(1

M`− 1

M`−1

)2

+h2ηs`

M`−1+M` −M`−1

M2`−1

]+ h2ηw

` .

This concludes the proof.

Theorem 1 is an important result as it shows the variance decay per level `, as M` → ∞, whichwill later on be used when estimating the statistical error of the MLDLMC estimator. For comparisonpurposes, we note the same result for the standard MLMC for estimating the expected value of‖g‖Σ−1

εis given by V` / h2ηs

` for ` > 0 [15]. This comparison highlights the extra layer of complexityfor multilevel techniques when the quantities of interest are more challenging in structure, such asthe nested expectation form of the EIG criterion. The average computational work of the MLDLMCestimator (25) is modeled as

W (IMLDLMC) ∝L∑`=0

N`M`W (g`), (50)

where W (g`) is the average work of a single evaluation of g` and is modeled as W (g`) ∝ h−γ` , see (9)

in Assumption 1. Note that work model (50) exploits that W (f` − f`−1) ∝M`W (g`).

11

3.3 Choice of MLDLMC parameters

Following the approach in [11], we select the values of the MLDLMC parameters, L, M`L`=0 andN`L`=0, for a random estimator I (short for IMLDLMC) that minimizes the average computationalwork such that the absolute value of the error, |I−I|, is less than or equal to a desired error toleranceTOL > 0 with probability 1− α, i.e.,

P (|I − I| ≤ TOL) ≥ 1− α, (51)

where 0 < α < 1 and, typically, α 1. A solution to the above optimization problem can be foundby solving the problem below, where we split the total error into a bias component and a statisticalerror:

|I − I| ≤ |I − E [I]|+ |E [I]− I|.

Then, we minimize the average work such that the constraints

|I − E [I]| ≤ (1− κ)TOL and (52)

|E [I]− I| ≤ κTOL (53)

hold for a balancing parameter 0 < κ < 1. The constraint (52) is the bias constraint, and thee secondconstraint (53) is a statistical constraint. The constraint (53), imposed on the statistical error, musthold with probability of 1 − α. From a Central Limit Theorem (Theorem 1.1 [21]; Lemma 7.1 [11])for normalized MLMC estimators, if ηs > γ, then

I − IL√V[I]

N (0, 1), as TOL→ 0,

where N (0, 1) is a standard normal random variable, and denotes convergence in distribution.Therefore, the statistical error constraint (53) is approximated by a variance constraint, which iseasier to handle numerically, i.e.,

V [I] ≤(κTOL

Cα

)2

, (54)

where Cα = Φ−1(1− α2 ) and Φ−1(·) is the inverse cumulative distribution function (cdf) of the standard

normal distribution. By using the results for the bias, variance and work given in Section 3.2, we canstate an optimization problem for finding the method parameters of MLDLMC subject to (51):

arg min(N`,M`,L,κ)

L∑`=0

N`M`h−γ`

subject to

C2h

ηwL + C1M

−1L ≤ (1− κ)TOL

V0N0

+∑L

`=0 V`N−1` ≤ (κTOL/Cα)2

It is challenging to find a closed-form solution to the above optimization problem, and thus we willdetermine the level L and ML directly from the bias constraint as follows: Let us split bias constraintinto two bias constraints:

C2hηwL ≤ 1

2(1− κ)TOL, and

C1M−1L ≤ 1

2(1− κ)TOL.

By fixing h` = h0β−` as in (20), we obtain the following values, denoted by κ∗ and L∗, for the balancing

parameter κ and the highest level L, respectively,

L∗ =⌈η−1w

(logβ (2C2h

ηw0 ) + logβ

(TOL−1

))⌉, (55)

12

andκ∗ = 1− C2h

ηwL∗TOL−1. (56)

The choice of ML, denoted by M∗L, is given as

M∗L =

⌈C1

1− κ∗TOL−1

⌉. (57)

Next, for ` = 0, . . . , L− 1 let the choice of M`, denoted by M∗` , be given by

M∗` = M∗L, (58)

which results in simplifying cancellations in the variance estimate (49), leading to, for ` > 0,

V` / Ch2ηs`

ML+ h2ηw

` , (59)

for some C > 0. Given κ∗, M∗` , and L∗, the optimal number of outer samples N` is the solution of

N∗` = arg minN`

L∗∑`=0

N`M∗` h−γ`

subject toV0

N0+

L∗∑`=0

V`N−1` ≤ (κ∗TOL/Cα)2

which is given by

N∗` =

⌈(Cα

κ∗TOL

)2√

V`M∗`W (g`)

(L∗∑`=0

√V`M

∗`W (g`)

)⌉, (60)

as shown for standard MLMC in [15]; the ceiling of the optimal solution is to ensure N∗` is a positiveinteger. In conclusion, the proposed MLDLMC is given by (25) with number of outer samples N∗` L`=0

and number of inner samples M∗` L`=0 for level L = L∗ such that (51) is satisfied as TOL → 0, i.e.,for some desired error tolerance TOL > 0, the random estimator is designed to satisfy

P (|I − I| ≤ TOL) ≥ 1− α,

with probability 1 − α, where 0 < α 1. The average work of the proposed MLDLMC can bebounded from above as follows:

W (IMLDLMC) /

(Cα

κ∗TOL

)2(

L∗∑`=0

√V`M

∗`W (g`)

)2

+

L∗∑`=0

M∗`W (g`). (61)

Also, an alternative to (58), namely to use M∗` = ML for all `, is to numerically determine M∗` L∗−1`=0

by minimizing the upper bound of the work, (61), with respect to M∗` L∗−1`=0 .

3.4 Computational work discussion

There is a connection between the average work of the proposed MLDLMC (given in Section 3.3) andthat of the standard MLMC. The upper bound of the work of the MLDLMC with L+ 1 levels can be

13

bounded from above as follows:

W (IMLDLMC) ∝L∑`=0

N∗`M∗` h−γ`

≤(

CακTOL

)2(

L∑`=0

√V`M

∗LW (g`)

)2

+M∗L

L∑`=0

W (g`)

/

(Cα

κTOL

)2(

L∑`=0

√V`M

∗LW (g`)

)2

/

(Cα

κTOL

)2(√

M∗LW (g0) +

L∑`=1

√(h2ηs` + CM∗Lh

2ηw` )W (g`)

)2

/

(Cα

κTOL

)2(

L∑`=1

√h2ηs−γ` + CTOL−1h2ηw−γ

`

)2

. (62)

The constant C > 0 typically satisfies C 1 due to the Laplace-based importance sampling, andfrom the upper bound of the work we can notice that total work of the proposed MLDLMC willbehave as standard MLMC as long as TOL is large enough such that CTOL−1 1. The asymptoticwork complexity of the standard MLMC with respect to TOL can be found in Theorem 2.1 [15].Note that asymptotically as TOL → 0, we see from the work bound (62) that MLDLMC exhibitsan asymptotically worse complexity compared to that of MLMC, because of the additional termCTOL−1h2ηw−γ

` in (62), as expected due to the nested expectation of the EIG criterion. For thenumerical example in Section 5, we observe that the computational work of the proposed MLDLMC(25) follows TOL−2 up to some logarithmic factor over a reasonable range of TOL, which is the optimalwork rate of the standard MLMC under Assumption 1. For cases with 2ηs > γ, we thus expect thatthe work of MLDLMC follows the optimal rate of TOL−2, for ranges of TOL satisfying CTOL−1 1,which could be satisfied thanks to the efficiency of the Laplace-based importance sampling.

4 Multilevel Double Loop Stochastic Collocation (MLDLSC

As an alternative to methods based on MC sampling, we propose a Multilevel Double Loop StochasticCollocation (MLDLSC) method, based on the Multi-Index Stochastic Collocation (MISC) algorithm[18, 19], which exploits the regularity of the dependence on the random input variables. The ideais to compute the telescopic sum differences, i.e., expectations, in the multilevel estimator (21), bystochastic collocation, which is a high-dimensional integration over the probability space achieved bydeterministic quadrature on sparse grids, see, e.g., [4, 5].

4.1 Multilevel stochastic collocation

We start by defining a quadrature operator for a one-dimensional real-valued continuous functionu : Γi → R, where Γi = [−1, 1] is any of the univariate sub-domains Γ1, . . . ,Γd composing the

complete d-dimensional domain Γdef= Γ1 × · · · × Γd. The quadrature operator is defined as

Qm(β) : C0(Γi)→ R, Qm(β)[u] =

m(β)∑j=1

u(zβ,j)ωβ,j , (63)

where β is a positive integer specifying the “level” of the quadrature operator, m(β) a strictly increasing

function giving the number of distinct collocation points, zβ,jm(β)j=1 , and zβ,j ∈ Γi with corresponding

14

weights ωβ,jm(β)j=1 . The collocation points are chosen according to the underlying probability distri-

bution; see [43]. For the uniform probability distribution, we adopt the Clenshaw-Curtis family ofpoints and weights, which has the desired property of being nested. The distribution of points is givenby

zβ,j = cos

((j − 1)π

m(β)− 1

), 1 ≤ j ≤ m(β),

where the function m(β) is defined as m(β) = 2β−1 + 1 for β ≥ 2, where m(0) = 0, m(1) = 1.The generalization to high-dimensional real-valued continuous functions u : Γ→ R is obtained by

introducing a quadrature operator that is a tensorization of the one-dimensional quadrature operators,i.e.,

Qm(β) : C0(Γ)→ R, Qm(β) =⊗

1≤i≤dQmi(βi), Qm(β)[u] =

#m(β)∑j=1

u(zj)ωj ,

where zj are the points on the tensor grid⊗

1≤i≤dzβi,jmi(βi)j=1 , ωj are the products of the weights

imposed by the one-dimensional quadrature rules, mi(β) is the function giving the number of collo-cation points for input direction i, and #m(β) denotes the total number of collocation points on the

full grid for a multi-index β, i.e., #m(β)def=∏di=1mi(βi). A hierarchy of the anisotropic full-tensor

approximations can be constructed by selecting β ∈ Nd such that

dsiβie = w,

for the sequence of approximation levels w ∈ N, and where si is a user-specified importance weightfor input direction i. This is known as the total product (TP) approximation. However, this leads tothe total number of collocation points growing exponentially as w increases. To mitigate the curseof dimensionality, we adopt a sparsification technique, known as sparse grid stochastic collocation(SGSC), see, e.g., [4, 5].

The TP approximation for integration is denoted by Uβdef= Qm(β)[u]. The SC quadrature uses the

difference operator, ∆i, and is given for 1 ≤ i ≤ d by

∆i[Uβ]def=

Uβ − Uβ−ei , if βi > 1

Uβ, if βi = 1,(64)

where (ei)k = 1 if i = k, and zero otherwise. The sparse-grid stochastic collocation quadrature canbe formulated as

ISGSC =∑β∈Λ

∆[Uβ] =∑β∈Λ

∑j∈0,1dβ+j∈Λ

(−1)|j|Uβ, (65)

for some multi-index set Λ ⊂ Nd, and the mixed-difference operator is given by

∆[Uβ]def=

⊗1≤i≤d

∆i[Uβ]def= ∆1 [∆2 [. . .∆d[Uβ]]]

=∑

j∈0,1d(−1)|j|Uβ−j .

Now, we consider a case in which u is numerically approximated by u` at a discretization level `defined by the mesh-element size h`. Therefore, the complete sparse hierarchy can thus be specifiedby ` in the physical space and by β in the probability space, which leads us to the MLSC estimatorof E[u], given by

IMLSCdef=

∑[`,β]∈Λ

∆mix[U`,β] =∑

[`,β]∈Λ

∑j∈0,1d+1

[`,β]+j∈Λ

(−1)|j|U`,β, (66)

15

where Λ ⊂ Nd+1, U`,βdef= Qm(β)[u`], and

∆mix[U`,β]def=

∆[U`,β − U`−1,β], if ` > 0

∆[U`,β], if ` = 0.(67)

We evaluate MLSC by computing the full-tensor approximations U`,β independently, and combiningthem linearly according to the combination technique (66). Of course, the effectiveness depends on thechoice of the multi-index set Λ. The idea behind the sparse construction is that Λ should be chosen toexclude “expensive” isotropic full-tensor approximations from the estimate, by refining only a subsetof the physical or probability directions simultaneously. Then, we combine these approximations usingthe combination-technique formula (66) to create a more accurate approximation. Various approacheshave been proposed for selecting the multi-index set Λ, such as using the classical sets given in [4] orselecting the set adaptively as discussed in [7, 13, 31, 36].

4.2 Multilevel Double Loop Stochastic Collocation (MLDLSC) estimator

Here, we recast EIG (3) into an integration with respect to ε instead of Y :

I =

∫Θ

∫E

log

(πε(ε)∫

Θ p(G(θ) + ε|θ)π(θ)dθ

)πε(ε)π(θ) dε dθ (68)

=

∫Θ

∫E

log

(πε(ε)∫

Θ p(G(θ) + ε|θ)R(θ;G(θ) + ε)π(θ|G(θ) + ε)dθ

)πε(ε)π(θ) dε dθ, (69)

where πε(ε) =∏di=1 πε(εi), and the likelihood ratio is R(θ;G(θ) + ε) = π(θ)/π(θ|G(θ) + ε) as defined

in (16), the importance sampling distribution is π(θ|G(θ) + ε) ∼ N (θ(G(θ) + ε),Σ(G(θ) + ε)) asdefined in (17) with the MAP estimate, θ(G(θ)+ε), as given in (18), and the approximate covarianceis Σ(G(θ) + ε) as given in (19). We introduce the auxiliary function,

Ψ`(θ;G`(θ) + ε)def= p`(G(θ) + ε|θ)R`(θ;G`(θ) + ε).

Furthermore, we let β = (β1,β2) and mβ = (mβ1 ,mβ2), where β1,β2 are multi-indices associated

with the random variables of the outer (θ, εi) and inner (θ) integrals, respectively. The proposedMLDLSC estimator for approximating the EIG in the form given in (68) is

IMLDLSCdef=∑

[l,β]∈Λ

∆ [F`,β] =∑

[`,β]∈Λ

∑j∈0,1d+1

[`,β]+j∈Λ

(−1)|j|F`,β, (70)

where F`,βdef= Qmβ1

[f`,β2

], Λ ⊂ Nd+1 is the multi-index set, and

f`,β2(ε,θ)def= log

(πε(ε)

Qmβ2 [Ψ`]

). (71)

The natural choice of collocation points and weights are Gauss-Hermite for a random variableεi ∼ N (0,Σεi). If the covariance, Σ, is a positive-definite non-diagonal matrix, then we can transformthe standard N (0,1) Gauss-Hermite points, here denoted by zN (0,1), to N (µ,Σ) Gauss-Hermitepoints, denoted by z, by following two steps:

Σ = LLT (72)

z = LzN (0,1) + µ, (73)

16

where L is a left triangular matrix from the Cholesky decomposition. Similarly, the Gauss-Hermitepoints can be calculated from the standard Gauss-Hermite points for the random variables θ follow thethe importance sampling PDF, π`. An efficient importance sampling measure for the inner expectationof the stochastic collocation approach is necessary to avoid numerical underflow and, as demonstratedin [6], the Laplace-based importance sampling is adequate. The collocation points and weights for θare chosen with respect to π(θ).

5 Optimal electrodes placement in electrical impedance tomography

We apply the methods, MLDLMC and MLDLSC, proposed in this work, and the optimized DLMCmethod with Laplace-based importance sampling (DLMCIS) in [6], for estimating the expected in-formation gain for an electrical impedance tomography (EIT) experiment. EIT is an technique forimaging the interior conductivity of a closed body based on the voltage measurements from electrodesplaced on the body’s free-surface. In this experiment, low-frequency electrical currents are injectedthrough electrodes attached to a composite laminate material made of four orthotropic plies. Thepotential field in the body of the material is considered quasi-static for a given conductivity.

5.1 Configuration of the experiment

The complete electrode model (CEM) [38] is used to formulate the problem. The composite body D,

with boundary ∂D, is formed of Np plies, i.e. D = ∪Np

k=1Dk. The configuration is such that the pliesoverlap with their fibers facing different directions. The upper and lower surfaces of the boundary ∂Dare equipped with N

elsquare-shaped electrodes El, l = 1, · · · , N

el, with dimensions eel and a surface

impedance of zl, as illustrated in Figure 1.

Figure 1: (Color online) Experimental set-up (cross section) of a composite plate to be examinedmade up of four plies with the same thickness. Five electrodes are placed on the top surface and fiveon the bottom surface. Red (blue) electrodes represent the current inlet (outlet). The current at theelectrodes are known while the potential field on the body D and the potential at the electrodes Elare unknowns.

The potential field u obeys

∇ · (ω,x) = 0, in D, and (74)

(ω,x) = σ(ω,x) · ∇u(ω,x), (75)

where is the flux of electric current, and σ is the conductivity field and is given by

σ(ω,x) = QT (θk(ω)) · σ ·Q(θk(ω)), for x ∈ Dk, k = 1, · · · , Np .

17

The CEM is a set of boundary conditions for (74)-(75) given by · n = 0, on ∂D\ (∪El) ,∫El

· n dx = Il on El, l = 1, · · · , Nel,

1

El

∫El

u dx+ zl

∫El

· n dx = Ul on El, l = 1, · · · , Nel,

(76)

where n represents the outward normal unit vector. To obtain well-posedness (existence and unique-ness of (u,U)), the Kirchhoff law of charge conservation and the ground potential condition,

Nel∑

l=1

Il = 0 and

Nel∑

l=1

Ul = 0, (77)

respectively, are constraints. The orthogonal matrix Q(θk) is a rotational matrix that defines theorientation of the fibers, in ply k at a given angle θk, while σ stands for the orthotropic conductivity,i.e.,

Q(θk) =

cos(θk) 0 − sin(θk)0 1 0

sin(θk) 0 cos(θk)

and σ =

σ1 0 00 σ2 00 0 σ3

.In the rest of the paper, the EIT model refers to (74), (75), (76), and (77). The conductivity σ is

random and assumed to be a uniformly and strictly positive element of L∞(Ω×D) in order to guarantee

ellipticity. The vectors I =(I1, I2, · · · , IN

el

), and U =

(U1, U2, · · · , UN

el

)respectively determine the

vector of the injected (deterministic) current and the vector measurement of the (random) potential

at the electrodes. According to the constraints in (77), I belongs to the mean-free subspace RNelfree of

RNel and U is an element of RNelfree . In solving (74) with the consitutive relation (75) subject to the

second condition of (76) (i.e., the assigned current at the electrodes) and (77). The unknowns arerepresented by the pair of potential field on D and potential at the electrodes El, that is, (u,U).

5.2 Experimental design formulation

Ten electrodes are placed on the composite laminate body of four (Np = 4) plies, with five placedon the top ply and five on the bottom ply, to measure the electrical potential Ul at the electrodes.The parameters of the four plies are σ11 = 0.05, σ22 = σ33 = 10−3, and zl = 0.1. In Figure 1, thered-filled rectangles on the plies represent the electrodes where current is injected and the blue onesrepresent the electrodes where the current exits. The inlet and outlet currents are in absolute valueequal to 0.1. The orientations of the angles θ1, θ2, θ3, and θ4 of the fibers are the uncertain parameterstargeted for the statistical inference, i.e., the experimental goal, and we consider the following uniformdistributions to describe our prior knowledge:

π(θ1) ∼ U(π

3− 0.05,

π

3+ 0.05

), π(θ2) ∼ U

(π4− 0.05,

π

4+ 0.05

),

π(θ3) ∼ U(π

5− 0.05,

π

5+ 0.05

), π(θ4) ∼ U

(π6− 0.05,

π

6+ 0.05

).

For the Bayesian experimental design problem, the data model (cf. (1)) is given by

yi = Uh(θ) + εi, for i = 1, · · · , Ne , (78)

where yi ∈ RNel−1, i.e., q = N

el− 1, and the error distribution is Gaussian, i.e., εi ∼ N (0, 10−41q×q).

No repeated experiments are considered, i.e., Ne = 1. The vector θ = (θ1, θ2, θ3, θ4) represents

18

the unknown orientation angles that we want to know, Uh = (U1, · · · , UNel−1) is a finite elements

approximation, in the Galerkin sense, of U from the variational problem of finding (u,U) ∈ L2P (Ω;H)

such that

E [B ((u,U), (v,V ))] = Ie · E [U ] , for all (v,V ) ∈ L2P (Ω;H) , (79)

and where, for any event ω ∈ Ω, the bilinear form B : H×H → R is

B ((u,U), (v,V )) =

∫D · ∇vdD +

Nel∑

l=1

1

zl

∫El

(Ul − u) (Vl − v) dEl, (80)

where (Ω,F ,P) stands for the complete probability space, F is the σ-field of events, P : F → [0, 1] isthe probability measure, and Ω is the set of outcomes. The space of the solution for the potential field

(u(ω),U(ω)) is H def= H1(D) × RNel

free for a given random event ω ∈ Ω, and L2P (Ω;H) is the Bochner

space given by

L2P (Ω;H)

def=

(u,U) : Ω→ H s.t.

∫Ω‖(u(ω),U(ω))‖2H dP(ω) <∞

.

Figure 2 (top) shows both the electric potential field and the current streamlines, while Figure 2(bottom) shows the current streamlines along the four plies of the composite material. Streamlinesconnect electrodes with prescribed inlet flux and electrodes with prescribed outlet flux. Due to jumpdiscontinuities in the conductivity parameter, the current flux lacks smoothness. A similar behavioroccurs at the edges of the electrodes, where there is a sudden transition between the boundary condi-tions, no flux and current inlet/outlet flux. As for the potential field, since we account for a surfaceimpedance zl to emulate imperfections when an electrode is attached to the surface of the plate, thepotential at the electrodes slightly differs from the potential field at those boundaries.

Figure 2: (Color online) Finite elements approximation of the potential field (top) and streamlines(bottom). Streamlines depicting the current flux from inlet (red electrodes) to outlet (blue electrodes).

5.3 Implementation details of multilevel methods

By least-squares estimation, we estimate for g` the rate of work γ ≈ 2 and the weak error rate ηw ≈ 1.5in Assumption 1. The weak convergence of g` is used as an estimate for the weak convergence of the

19

approximate EIG, I` in (11). We define the sequence for the mesh hierarchy as h` = 2−`h0, ` > 0 whereh0 is the coarsest mesh-element size. In our numerical experiment, h0 is the mesh size determined bya rectangular mesh (Nx = 10, Ny = 4) on a domain Lx × Ly (Lx = 20, Ly = 4).

The MLDLMC estimator is given by (25), and the method parameters are estimated accordingto the procedure given in Section 3.3. The unknowns are the two constants C1 and C2 in the biasconstraints and estimates are also required for V`. Thus, we perform two runs of MLDLMC withL = 5, N` = 5 for all levels ` with the identical samples for both runs, and then for one run ML = 1and the other ML = 10. By using these initial runs and associated samples of g` for ` = 0, . . . , L, wecan estimate C1, C2, ηs and V`. We note that C1 ≈ 5 × 10−3 for our example, and due to C1 beingsmaller than any TOL to be considered, we set ML = 1. By construction (58), we have that M` = 1for all `. Given C2, we can estimate L and κ from (55). The variances V` are numerically estimated,and from from that the number of inner samples N` is determined from (60). Since ML is small, wecan resort to a standard MLMC implementation as shown in Section 3.4. We will use the ContinuationMultilevel Monte Carlo (CMLMC) given by Algorithm 2 [11] as it has the nice feature to adaptivelyestimate V` and accordingly determine N` until TOL is satisfied.

The MLDLSC estimator is given by (70) and the approach is described in detail in Section 4.2.We adopt the MISC implementation given by Algorithm 1 of [7]. The full-tensor approximation ofthe inner expectation with the change of measure being the Laplace-based importance sampling andwe set the level to 0, which is analog to the choice of ML = 1 for MLDLMC. The random parameterθ follows a uniform distribution in this case, and, thus, the Clenshaw–Curtis points and weights areused [40].

5.4 Numerical results

In this section, we analyze the performance of MLDLMC and MLDLSC for estimating the value ofthe EIG criterion for a fixed EIT experiment. For both estimators the goal is to achieve a specifiederror tolerance, TOL. As described in Section 3.3, since MLDLMC is a random estimator, a relaxedprobability constraint is applied, given in (51) with α = 0.05, which should ensure that the absoluteerror is below TOL with a 95% probability of success. The performance in terms of computationalefficiency and consistency will be investigated over a range of TOL.

In Figure 3, we numerically verify for TOL ranging from 1 to 0.001 the consistency between TOLand a computed absolute error using MLDLSC for TOL−3. Even though in practice one would onlyperform a single or a few runs of MLDLMC, here we instead for analysis purposes repeat the randomestimator MLDLMC hundred times for each considered TOL using different pseudo-random states.We observe that only 2% of the MLDLMC runs result in relative errors larger than TOL, which isconsistent with our choice of 95% probability of success. From the results shown in Figure 3b, oneobserves that the estimation error of MLDLSC is consistently below TOL, but the error estimatesunderpinning the adaptivity algorithm of MLDLSC are not sharp.

20

(a) MLDLMC (b) MLDLSC

Figure 3: Relative absolute error versus error tolerance TOL for MLDLMC (left) and MLDLSC(right). The blue dots are individual runs. For each considered tolerance TOL, we perform 100 runsof the random estimator MLDLMC.

We observe that the sample mean (Fig. 4a) and sample variance (Fig. 4b) of the telescopic differ-ences in MLDLMC with respect to level ` decay at rates roughly equal to the assumed asymptotic ratesηw = 1 and 2ηs = 2, respectively. These plots are used to show the consistency of the discretizationmethod in terms of the weak and strong convergences stated in Assumption 1.

(a) Weak convergence (b) Strong convergence

Figure 4: E` and V` with respect to level ` for MLDLMC.

21

Figure 5: Level L of the MLDLMC with respect to TOL.

The finest mesh level L considered in the MLDLMC method for the different choices of TOL isshown in Figure 5. The finest level L = 6 uses a mesh with Nx = 640 and Ny = 256. We observethat level L follows approximately L ≈ 1.4 log

(TOL−1

)in agreement with the asymptotic behavior of

L with respect to TOL, see (55). In Figure 6, the computational times of MLDLSC and MLDLMC,with 100 repeated runs, are compared for a range of TOL. An estimate of the average computationaltime for the DLMCIS estimator [6] is also included to demonstrate the computational efficiency of thetwo proposed multilevel methods.

Figure 6: Computational time for MLDLMC and MLDLSC, and the estimated time for DLMCIS.

MLDLSC exploits the regularity of the underlying model with respect to the uncertain parameterθ and thus achieves a better work complexity than MLDLMC. Conversely, it should be mentioned thatMLDLMC is expected to perform better when regularity is low. MLDLSC seems to perform worsethan expected for large TOL, which could be attributed to the error estimates not being sharp. Asshown in Section 3.4, the work complexity of DLMCIS follows that of the standard MLMC, in this caseTOL−2 log

(TOL−1

)2; see Theorem 2.1 [15] with β = γ where in that theorem β

def= 2ηs. MLDLSC and

MLDLMC are superior to DLMCIS, which shows that the multilevel construction greatly acceleratesthe computational performance.

22

6 Conclusion

In the situation when the experiments are modeled by PDEs, and under certain mild conditions onthe underlying computational model, we propose two multilevel methods for computationally efficientestimation of the EIG criterion in the context of Bayesian optimal experimental design. The firstmethod is the MLDLMC estimator, which is a multilevel extension of the DLMCIS method, and thesecond is the MLDLSC method, which uses an adaptive sparse-grid stochastic collocation scheme,and given enough regularity with respect to the random parameter could in many situations achievea higher accuracy than MLDLMC at a lower computational cost. We found that the computationalefficiency of using the multilevel construction relies strongly on keeping the number of inner samplesto be low, which in the proposed methods are achieved by the Laplace-based importance sampling.In Bayesian optimal experimental design, the relative difference in EIG values between designs is ofinterest, and within the likely range of accuracies needed for such an estimation, the work complexityof MLDLMC is the same as for the standard MLMC.

We have demonstrated that the proposed multilevel estimators for the EIG criterion are compu-tationally efficient, as they can balance the work over a hierarchy of meshes, and robust with reliableerror control, i.e., to achieve a specified error tolerance with a high probability of success.

Acknowledgements

The research reported in this publication was supported by funding from King Abdullah Univer-sity of Science and Technology (KAUST) Office of Sponsored Research (OSR) under award numbersURF/1/2281-01-01 and URF/1/2584-01-01 in the KAUST Competitive Research Grants Program-Round 3 and 4, respectively, and the Alexander von Humboldt Foundation. J. Beck, L.F.R. Espath,and R. Tempone are members of the KAUST SRI Center for Uncertainty Quantification in Compu-tational Science and Engineering.

References

[1] Alexanderian, A., Petra, N., Stadler, G., and Ghattas, O. A fast and scalable methodfor A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems.SIAM Journal on Scientific Computing 38, 1 (2016), A243–A272.

[2] Alexanderian, A., and Saibaba, A. Efficient D-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems. SIAM Journal on Scientific Computing 40, 5(2018), A2956–A2985.

[3] Attia, A., Alexanderian, A., and Saibaba, A. K. Goal-oriented optimal design of experi-ments for large-scale bayesian linear inverse problems. Inverse Problems 34, 9 (2018), 095009.

[4] Babuska, I., Nobile, F., and Tempone, R. A stochastic collocation method for elliptic partialdifferential equations with random input data. SIAM review 52, 2 (2010), 317–355.

[5] Back, J., Nobile, F., Tamellini, L., and Tempone, R. Stochastic Spectral Galerkin andCollocation Methods for PDEs with Random Coefficients: A Numerical Comparison. In Spectraland High Order Methods for Partial Differential Equations (Berlin, Heidelberg, 2011), J. Hes-thaven and E. Rønquist, Eds., Springer Berlin Heidelberg, pp. 43–62.

[6] Beck, J., Dia, B. M., Espath, L. F., Long, Q., and Tempone, R. Fast bayesian experi-mental design: Laplace-based importance sampling for the expected information gain. ComputerMethods in Applied Mechanics and Engineering 334 (2018), 523–553.

23

[7] Beck, J., Tamellini, L., and Tempone, R. Iga-based multi-index stochastic collocation forrandom pdes on arbitrary domains. Computer Methods in Applied Mechanics and Engineering351 (2019), 330–350.

[8] Beck, J., Tempone, R., Nobile, F., and Tamellini, L. On the optimal polynomial ap-proximation of stochastic PDEs by Galerkin and collocation methods. Mathematical Models andMethods in Applied Sciences 22, 09 (2012), 1250023.

[9] Carlon, A., Dia, B., Espath, L., Lopez, R., and Tempone, R. Nesterov-aidedStochastic Gradient Methods using Laplace Approximation for Bayesian Design Optimization.arXiv:1807.00653v2 (2018).

[10] Charrier, J., Scheichl, R., and Teckentrup, A. Finite Element Error Analysis of EllipticPDEs with Random Coefficients and Its Application to Multilevel Monte Carlo Methods. SIAMJournal on Numerical Analysis 51, 1 (2013), 322–352.

[11] Collier, N., Haji-Ali, A., Nobile, F., von Schwerin, E., and Tempone, R. A continu-ation multilevel Monte Carlo algorithm. BIT Numerical Mathematics (2014), 1–34.

[12] Crestel, B., Alexanderian, A., Stadler, G., and Ghattas, O. A-optimal encodingweights for nonlinear inverse problems, with application to the Helmholtz inverse problem. Inverseproblems 33, 7 (2017), 074008.

[13] Gerstner, T., and Griebel, M. Dimension–adaptive tensor–product quadrature. Computing71, 1 (2003), 65–87.

[14] Giles, M. Multilevel Monte Carlo path simulation. Operations Research 56, 3 (2008), 607–617.

[15] Giles, M. Multilevel Monte Carlo methods. Acta Numerica 24 (2015), 259–328.

[16] Giles, M. B., and Haji-Ali, A.-L. Multilevel nested simulation for efficient risk estimation.SIAM/ASA Journal on Uncertainty Quantification 7, 2 (2019), 497–525.

[17] Goda, T., Hironaka, T., and Iwamoto, T. Multilevel monte carlo estimation of expectedinformation gains. Stochastic Analysis and Applications (2019), 1–20.

[18] Haji-Ali, A., Nobile, F., Tamellini, L., and Tempone, R. Multi-index Stochastic Collo-cation Convergence Rates for Random PDEs with Parametric Regularity. Foundations of Com-putational Mathematics 16, 6 (2016), 1555–1605.

[19] Haji-Ali, A., Nobile, F., Tamellini, L., and Tempone, R. Multi-Index Stochastic Collo-cation for random PDEs. Computer Methods in Applied Mechanics and Engineering 306 (2016),95122.

[20] Heinrich, S. Monte Carlo complexity of global solution of integral equations. Journal of Com-plexity 14 (1998), 151–175.

[21] Hoel, H., and Krumscheid, S. Central limit theorems for multilevel monte carlo methods.Journal of Complexity 54 (2019), 101407.

[22] Huan, X., and Marzouk, Y. Gradient-based stochastic optimization methods in Bayesianexperimental design. International Journal for Uncertainty Quantification 4, 6 (2014).

[23] Huan, X., and Marzouk, Y. M. Simulation-based optimal Bayesian experimental design fornonlinear systems. Journal of Computational Physics 232, 1 (2013), 288–317.

[24] Kullback, S. Information Theory and Statistics. Wiley, 1959.

24

[25] Kullback, S., and Leibler, R. On information and sufficiency. Annals of MathematicalStatistics 22 (1951), 79–86.

[26] Lewi, J., Butera, R., and Paninski, L. Sequential optimal design of neurophysiology exper-iments. Neural Computation 21, 3 (2009), 619–687.

[27] Lindley, D. On a measure of information provided by an experiment. The Annals of Mathe-matical Statistics 27 (1956), 986–1005.

[28] Long, Q., Scavino, M., Tempone, R., and Wang, S. Fast estimation of expected informationgains for Bayesian experimental designs based on Laplace approximations. Computer Methods inApplied Mechanics and Engineering 259 (2013), 24–39.

[29] Long, Q., Scavino, M., Tempone, R., and Wang, S. A Laplace method for under-determinedBayesian optimal experimental design. Computer Methods in Applied Mechanics and Engineering285 (2015), 849–876.

[30] Meyer, R., and Nachtsheim, C. The coordinate-exchange algorithm for constructing exactoptimal experimental designs. Technometrics 37, 1 (1995), 60–69.

[31] Nobile, F., Tamellini, L., Tesei, F., and Tempone, R. An adaptive sparse grid algorithmfor elliptic PDEs with lognormal diffusion coefficient. In Sparse Grids and Applications-Stuttgart2014. Springer, 2016, pp. 191–220.

[32] Overstall, A., and Woods, D. Bayesian Design of Experiments Using Approximate Coordi-nate Exchange. Technometrics 59, 4 (2017), 458–470.

[33] Ryan, E., Drovandi, C., McGree, J., and Pettitt, A. A Review of Modern ComputationalAlgorithms for Bayesian Optimal Design. International Statistical Review 84 (2016), 128–154.

[34] Ryan, E., Drovandi, C., and Pettitt, A. Fully Bayesian Experimental Design for Pharma-cokinetic Studies. Entropy 17, 3 (2015), 1063–1089.

[35] Ryan, K. J. Estimating expected information gains for experimental designs with application tothe random fatigue-limit model. Journal of Computational and Graphical Statistics 12, 3 (2003),585–603.

[36] Schillings, C., and Schwab, C. Sparse, adaptive Smolyak quadratures for Bayesian inverseproblems. Inverse Problems 29, 6 (2013), 065011.

[37] Shannon, C. A Mathematical Theory of Communication. Bell System Technical Journal 27(1948), 379–423.

[38] Somersalo, E., Cheney, M., and Isaacson, D. Existence and uniqueness for electrodemodels for electric current computed tomography. SIAM J. Appl. Math, 52 (1992), 1023–1040.

[39] Tenorio, L., Lucero, C., Ball, V., and Horesh, L. Experimental design in the context ofTikhonov regularized inverse problems. Statistical Modelling 13, 5-6 (2013), 481–507.

[40] Trefethen, L. Is Gauss quadrature better than Clenshaw–Curtis? SIAM review 50, 1 (2008),67–87.

[41] Tsilifis, P., Ghanem, R., and Hajali, P. Efficient Bayesian experimentation using an ex-pected information gain lower bound. SIAM/ASA J. Uncertainty Quantification 5, 1 (2017),30–62.

25

[42] Walsh, S., Wildey, T., and Jakeman, J. Optimal experimental design using a consistentBayesian approach. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, PartB: Mechanical Engineering 4, 1 (2018), 011005.

[43] Xiu, D., and Karniadakis, G. The Wiener–Askey Polynomial Chaos for Stochastic DifferentialEquations. SIAM Journal on Scientific Computing 24, 2 (2002), 619–644.

26

multilevel double loop monte carlo and stochastic ... · multilevel double loop monte carlo and...

Documents