multilevel sampling methods for inverse problems constrained by partial differential … ·...

Multilevel Sampling Methods for Inverse ProblemsConstrained by Partial Differential Equations

Aretha Teckentrup

School of Mathematics, University of Edinburgh

Joint work with:

Tim Dodwell (Exeter), Chris Ketelsen (DigitalGlobe),Rob Scheichl (Heidelberg), Andrew Stuart (Caltech)

University of Greenwich - February 20, 2019

A. Teckentrup (Edinburgh) Multilevel methods in IPs February 20, 2019 1 / 25

Outline

1 Bayesian Inverse Problems

2 (Multilevel) Ratio Estimators

3 (Multilevel) Metropolis Hastings Estimators

Bayesian Inverse ProblemsDefinition and Applications

An inverse problem is concerned with determining causal factors fromobserved results.

In mathematical terms, we want to determine unknown parameters ina model from indirect observations.

Inverse problems appear in many application areas, including

I the determination of an earthquake’s epicentre using observations ofseismic waves on the earth’s surface,

I the detection of flaws or cracks within a concrete structure fromacoustic or electromagnetic measurements at its surface.

The model of the process is frequently given by a partial differentialequation, where we have observations of the solution and want todetermine initial conditions/boundary conditions/coefficients.

Bayesian Inverse ProblemsDefinition and Applications

An inverse problem is concerned with determining causal factors fromobserved results.

In mathematical terms, we want to determine unknown parameters ina model from indirect observations.

Inverse problems appear in many application areas, including

I the determination of an earthquake’s epicentre using observations ofseismic waves on the earth’s surface,

I the detection of flaws or cracks within a concrete structure fromacoustic or electromagnetic measurements at its surface.

The model of the process is frequently given by a partial differentialequation, where we have observations of the solution and want todetermine initial conditions/boundary conditions/coefficients.

Bayesian Inverse ProblemsMathematical Formulation

Suppose we are given a mathematical model of a process, representedby a (typically non-linear) map G.

We are interested in the following inverse problem: given observationaldata y ∈ Rdy , determine model parameters u ∈ U ⊆ Rdu such that

y = G(u) + η,

where η represents observational noise, due to for examplemeasurement error.

Simply ”inverting G” is not possible, since

I we do not know the value of η, and

I typically the problem is ill-posed and/or ill-conditioned.

y = G(u) + η,

Bayesian Inverse ProblemsMathematical Formulation [Stuart ’10] [Kaipio, Somersalo ’04]

The Bayesian approach treats u, y and η as random variables.

We choose a prior measure µ0 on u with density π0.

Under the measurement model y = G(u) + η with η ∼ N(0,Γ), wehave y|u ∼ N(G(u),Γ), and the likelihood of the data y is

P(y|u) h exp(− 1

2‖y − G(u)‖2Γ−1

)=: exp

(− Φ(u)

Using Bayes’ Theorem, we obtain the posterior measure µy of u|ywith density πy, given by

πy(u) =1

(− Φ(u)

)π0(u),

(dµydµ0

(u) =1

(− Φ(u)

))where Z =

∫U exp

(− Φ(u)

)π0(u)du = Eπ0 [exp

(− Φ

P(y|u) h exp(− 1

2‖y − G(u)‖2Γ−1

)=: exp

(− Φ(u)

πy(u) =1

(− Φ(u)

)π0(u),

(dµydµ0

(u) =1

(− Φ(u)

))where Z =

∫U exp

(− Φ(u)

(− Φ

P(y|u) h exp(− 1

2‖y − G(u)‖2Γ−1

)=: exp

(− Φ(u)

πy(u) =1

(− Φ(u)

)π0(u),

(dµydµ0

(u) =1

(− Φ(u)

))where Z =

∫U exp

(− Φ(u)

(− Φ

P(y|u) h exp(− 1

2‖y − G(u)‖2Γ−1

)=: exp

(− Φ(u)

πy(u) =1

(− Φ(u)

)π0(u),

(dµydµ0

(u) =1

(− Φ(u)

))where Z =

∫U exp

(− Φ(u)

(− Φ

Bayesian Inverse ProblemsComputational Challenges

For the remainder of this talk, suppose are interested in computingEπy [φ], for some quantity of interest φ(u).

In most cases, we do not have a closed form expression for theposterior density πy, since the normalising constant Z is not knownexplicitly.(Exception: parameter-to-observation map G linear and prior π0

Gaussian ⇒ posterior πy also Gaussian.)

We will present two efficient approaches to computing Eπy [φ]:

I (multilevel) ratio estimators,

I (multilevel) Metropolis Hastings estimators.

Bayesian Inverse ProblemsComputational Challenges

For the remainder of this talk, suppose are interested in computingEπy [φ], for some quantity of interest φ(u).

In most cases, we do not have a closed form expression for theposterior density πy, since the normalising constant Z is not knownexplicitly.(Exception: parameter-to-observation map G linear and prior π0

Gaussian ⇒ posterior πy also Gaussian.)

We will present two efficient approaches to computing Eπy [φ]:

I (multilevel) ratio estimators,

I (multilevel) Metropolis Hastings estimators.

(Multilevel) Ratio EstimatorsComputing Expectations

Suppose are interested in computing Eπy [φ]. Using Bayes’ Theorem, wecan write this as

Eπy [φ] = Eπ0 [1

Zexp(−Φ)φ] =

Eπ0 [φ exp(−Φ)]

Eπ0 [exp(−Φ)].

We have rewritten the posterior expectation as a ratio of two prior

expectations.

We can now approximate

Eπy [φ] =Q

Z≈ Q

where Q is an estimator of Q := Eπ0 [φ exp(−Φ)] and Z is an estimator ofZ = Eπ0 [exp

(− Φ

The prior distribution is known in closed form, and furthermore often has asimple structure (e.g. i.i.d. Gaussian or uniform).

(Multilevel) Ratio EstimatorsComputing Expectations

Suppose are interested in computing Eπy [φ]. Using Bayes’ Theorem, wecan write this as

Eπy [φ] = Eπ0 [1

Zexp(−Φ)φ] =

Eπ0 [φ exp(−Φ)]

Eπ0 [exp(−Φ)].

We have rewritten the posterior expectation as a ratio of two prior

expectations. We can now approximate

Eπy [φ] =Q

Z≈ Q

where Q is an estimator of Q := Eπ0 [φ exp(−Φ)] and Z is an estimator ofZ = Eπ0 [exp

(− Φ

The prior distribution is known in closed form, and furthermore often has asimple structure (e.g. i.i.d. Gaussian or uniform).

(Multilevel) Ratio EstimatorsMonte Carlo Estimators

The standard Monte Carlo (MC) estimator of Z = Eπ0 [exp(−Φ)],

ZMCh,N =

N∑i=1

exp(−Φh(u(i))),

is an equal weighted average of N i.i.d samples exp(−Φh(u(i))), where Φh

denotes an approximation of Φ using mesh width h.

The mean square error of ZMCh,N is

e(ZMCh,N )2 := E[(ZMC

h,N − Eπ0 [exp(−Φ)])2]

=Vπ0 [exp(−Φh]

N︸︷︷︸sampling error

+ (Eπ0 [exp(−Φh)− exp(−Φ)])2︸︷︷︸numerical error

⇒ need to choose N large and h small!

(Multilevel) Ratio EstimatorsMonte Carlo Estimators

The standard Monte Carlo (MC) estimator of Z = Eπ0 [exp(−Φ)],

ZMCh,N =

N∑i=1

exp(−Φh(u(i))),

is an equal weighted average of N i.i.d samples exp(−Φh(u(i))), where Φh

denotes an approximation of Φ using mesh width h.

The mean square error of ZMCh,N is

e(ZMCh,N )2 := E[(ZMC

h,N − Eπ0 [exp(−Φ)])2]

=Vπ0 [exp(−Φh]

N︸︷︷︸sampling error

+ (Eπ0 [exp(−Φh)− exp(−Φ)])2︸︷︷︸numerical error

⇒ need to choose N large and h small!A. Teckentrup (Edinburgh) Multilevel methods in IPs February 20, 2019 8 / 25

(Multilevel) Ratio EstimatorsMultilevel Monte Carlo estimators [Giles, ’08], [Heinrich ’01]

Multilevel methods use a decreasing sequence of mesh widths {h`}L`=0.

Linearity of expectation gives us

[e−ΦhL

]= Eπ0

[e−Φh0

L∑`=1

[e−Φh` − e

−Φh`−1

The multilevel Monte Carlo (MLMC) estimator

ZML{h`,N`} =

N0∑i=1

e−Φh0(u(i,0)) +

L∑`=1

N∑i=1

e−Φh` (u(i,`))− e

−Φh`−1(u(i,`))

is a sum of L+ 1 independent MC estimators. The sequence {N`} isdecreasing, which means a significant portion of the computational effortis shifted onto the coarse grids.

Multilevel methods use a decreasing sequence of mesh widths {h`}L`=0.

Linearity of expectation gives us

[e−ΦhL

]= Eπ0

[e−Φh0

L∑`=1

[e−Φh` − e

−Φh`−1

The multilevel Monte Carlo (MLMC) estimator

ZML{h`,N`} =

N0∑i=1

e−Φh0(u(i,0)) +

L∑`=1

N∑i=1

e−Φh` (u(i,`))− e

−Φh`−1(u(i,`))

is a sum of L+ 1 independent MC estimators. The sequence {N`} isdecreasing, which means a significant portion of the computational effortis shifted onto the coarse grids.

The mean square error of ZML{h`,N`} is

e(ZML{h`,N`})

2 =Vπ0 [e

−Φh0 ]

L∑`=1

Vπ0 [e−Φh` − e

−Φh`−1 ]

N`︸︷︷︸sampling error

+ (Eπ0 [e−ΦhL − e−Φ])2︸︷︷︸

numerical error

N0 still needs to be large, but samples are much cheaper to obtain oncoarse grids.

N` (`� 0) are much smaller, since Vπ0 [e−Φh` − e−Φh`−1 ]→ 0 as

h` → 0.

The mean square error of ZML{h`,N`} is

e(ZML{h`,N`})

2 =Vπ0 [e

−Φh0 ]

L∑`=1

Vπ0 [e−Φh` − e

−Φh`−1 ]

N`︸︷︷︸sampling error

+ (Eπ0 [e−ΦhL − e−Φ])2︸︷︷︸

numerical error

N0 still needs to be large, but samples are much cheaper to obtain oncoarse grids.

N` (`� 0) are much smaller, since Vπ0 [e−Φh` − e−Φh`−1 ]→ 0 as

h` → 0.

(Multilevel) Ratio EstimatorsConvergence of Mean Square Error [Stuart, Scheichl, T. ’17]

We want to bound the mean square error (MSE): E[(

QZ −

Rearranging the MSE and applying the triangle inequality, we have

E[(QZ− Q

)2]≤ 2

(E[(Q− Q)2

[(Q/Z)2(Z − Z)2

If (Q/Z)2 ∈ L∞, then the MSE of Q/Z can be bounded in terms ofthe individual MSEs of Q and Z.

If (Q/Z)2 ∈ Lr, for some 2 < r <∞, then the MSE of Q/Z can bebounded in terms of the MSE of Q and a higher order error of Z,E[(Z − Z)p

]for some p > 2.

We want to bound the mean square error (MSE): E[(

QZ −

Rearranging the MSE and applying the triangle inequality, we have

E[(QZ− Q

)2]≤ 2

(E[(Q− Q)2

[(Q/Z)2(Z − Z)2

If (Q/Z)2 ∈ L∞, then the MSE of Q/Z can be bounded in terms ofthe individual MSEs of Q and Z.

If (Q/Z)2 ∈ Lr, for some 2 < r <∞, then the MSE of Q/Z can bebounded in terms of the MSE of Q and a higher order error of Z,E[(Z − Z)p

]for some p > 2.

Consider the diffusion problem

−∇ · (k(x)∇p(x)) = g(x), in D ⊂ Rd, y = O(p) + η,

with prior distribution

I a log-normal distribution k(x) = exp(∑du

n=1 unφn(x))

, with {φn}dun=1

given functions in L∞(D) and un ∼ N(0, 1) ⇒ Q/Z ∈ Lr, r <∞I a uniform distribution, i.e. k(x) = m(x) +

n=1 unφn(x), with

{φn}dun=1 given functions in L∞(D) and un ∼ U [−1, 1] ⇒ Q/Z ∈ L∞

(Multilevel) Ratio EstimatorsNumerical Results [Stuart, Scheichl, T. ’17]

Flow cell model problem on (0, 1)2

u log-normal random field

Observed data: y = {p(xi) + ηi}9i=1, with ηi ∼ N(0, 0.09)

QoI φ is outflow over right boundary

Accuracy ǫ10

MC (dep.)QMC (dep.)MLMC (dep.)

Accuracy ǫ10

MC (indep.)QMC (indep.)MLMC (indep.)

(Multilevel) Ratio EstimatorsRelated works

A number of works have recently considered the ratio estimator approachin the context of PDE constrained inverse problems, as it allows to reusemachinery developed for π0:

[Schillings, Schwab ’13]: dimension-adaptive sparse grids

[Dick, Gantner, Le Gia, Schwab ’17]: (multilevel) higher orderQuasi-Monte Carlo

[Gantner, Peters ’18]: higher order Quasi-Monte Carlo for PDEs onrandom domains

When ‖Γ‖ � 1 or dy � 1, the posterior density πy may concentrate, andthe prior evaluations become difficult to evaluate accurately.

[Schillings, Schwab ’16]: rescaling of parameter space around(unique) MAP point

[Schillings, Sprungk, Wacker ’19]: use of Laplace approximations

(Multilevel) Ratio EstimatorsRelated works

A number of works have recently considered the ratio estimator approachin the context of PDE constrained inverse problems, as it allows to reusemachinery developed for π0:

[Schillings, Schwab ’13]: dimension-adaptive sparse grids

[Dick, Gantner, Le Gia, Schwab ’17]: (multilevel) higher orderQuasi-Monte Carlo

[Gantner, Peters ’18]: higher order Quasi-Monte Carlo for PDEs onrandom domains

When ‖Γ‖ � 1 or dy � 1, the posterior density πy may concentrate, andthe prior evaluations become difficult to evaluate accurately.

[Schillings, Schwab ’16]: rescaling of parameter space around(unique) MAP point

[Schillings, Sprungk, Wacker ’19]: use of Laplace approximations

(Multilevel) Metropolis Hastings EstimatorsIntroduction

The second approach we are going to discuss is the (multilevel)Metropolis-Hastings algorithm, a type of Markov chain Monte Carlo(MCMC) algorithm.

The estimator of Eπy [φ] takes the form

EMHh,N =

N∑i=1

φh(u(i)),

where u(i) ∼ πyh, for 1 ≤ i ≤ N , and πyh(u) = 1Zh

exp(−Φh(u))π0(u).

Since πyh is not known in closed form, it is generally not possible togenerate i.i.d. samples distributed according to πyh.

(Multilevel) Metropolis Hastings EstimatorsIntroduction

The second approach we are going to discuss is the (multilevel)Metropolis-Hastings algorithm, a type of Markov chain Monte Carlo(MCMC) algorithm.

The estimator of Eπy [φ] takes the form

EMHh,N =

N∑i=1

φh(u(i)),

where u(i) ∼ πyh, for 1 ≤ i ≤ N , and πyh(u) = 1Zh

exp(−Φh(u))π0(u).

Since πyh is not known in closed form, it is generally not possible togenerate i.i.d. samples distributed according to πyh.

(Multilevel) Metropolis Hastings EstimatorsStandard Metropolis-Hastings [Robert, Casella ’99]

ALGORITHM 1. (Standard MH-MCMC)

Choose u(1) with πyh(u(1)) > 0.

At state u(i), sample a proposal u′ from density q(u′ |u(i)).

Accept sample u′ with probability

α(u′ |u(i)) = min

πyh(u′) q(u(i) |u′)πyh(u(i)) q(u′ |u(i))

i.e. u(i+1) = u′ with probability α(u′ |u(i)); otherwise stay atu(i+1) = u(i).

The proposal density q is chosen to be easy to sample from.

The accept/reject step is added in order to obtain samples from πyh.

Knowledge of the normalising constant Zh of πyh is not required.

(Multilevel) Metropolis Hastings EstimatorsDefinition of Multilevel Metropolis-Hastings [Dodwell, Ketelsen, Scheichl, T. ’15]

Key ingredients in multilevel method:

telescoping sum: E [φhL ] = E [φh0 ] +∑L

`=1 E [φh` ]− E[φh`−1

number of samples per level N` is decreasing.

The target distribution πyh` depends on h`, so need to define multilevelestimator carefully!

EπyhL[φhL ] = Eπyh0

[φh0 ]︸︷︷︸standard MCMC

+L∑`=1

Eπyh`[φh` ]− Eπyh`−1

[φh`−1

]︸︷︷︸

2 level MCMC (NEW)

EMLMH{h`,N`} :=

N0∑i=1

φh0(u(i)0 ) +

L∑`=1

N∑i=1

(φh`(u

(i)` )− φh`−1

(U(i)`−1)

)We need to be able to choose N` → 1 as `→∞!

`=1 E [φh` ]− E[φh`−1

[φh0 ] +

L∑`=1

[φh`−1

+L∑`=1

[φh`−1

]︸︷︷︸

2 level MCMC (NEW)

EMLMH{h`,N`} :=

N0∑i=1

φh0(u(i)0 ) +

L∑`=1

N∑i=1

(φh`(u

(i)` )− φh`−1

(U(i)`−1)

`=1 E [φh` ]− E[φh`−1

L∑`=1

[φh`−1

]︸︷︷︸

2 level MCMC (NEW)

EMLMH{h`,N`} :=

N0∑i=1

φh0(u(i)0 ) +

L∑`=1

N∑i=1

(φh`(u

(i)` )− φh`−1

(U(i)`−1)

We need to be able to choose N` → 1 as `→∞!

`=1 E [φh` ]− E[φh`−1

L∑`=1

[φh`−1

]︸︷︷︸

2 level MCMC (NEW)

EMLMH{h`,N`} :=

N0∑i=1

φh0(u(i)0 ) +

L∑`=1

N∑i=1

(φh`(u

(i)` )− φh`−1

(U(i)`−1)

We need to generate (coupled) Markov chains {u(i)` } and {U (i)

`−1}, suchthat

{U (i)`−1} has marginal distribution πyh`−1

{u(i)` } has marginal distribution πyh` ,

V[φh`(u(i)` )− φh`−1

(U(i)`−1)]→ 0 as `→∞.

The main idea of our algorithm is to:

use Metropolis-Hastings with U ′`−1 ∼ q(·|U(i)`−1) to generate U

(i+1)`−1 ,

use Metropolis-Hastings with u′` = U(i+1)`−1 to generate u

(i+1)` .

In practice, we use t` steps of Metropolis-Hastings to generate U(i+1)`−1 ,

such that U(i)`−1 and U

(i+1)`−1 are essentially uncorrelated. This significantly

simplifies the computation of the acceptance probability for u(i+1)` .

`−1}, suchthat

V[φh`(u(i)` )− φh`−1

(U(i)`−1)]→ 0 as `→∞.

(i+1)`−1 ,

(i+1)` .

`−1}, suchthat

V[φh`(u(i)` )− φh`−1

(U(i)`−1)]→ 0 as `→∞.

(i+1)`−1 ,

(i+1)` .

(Multilevel) Metropolis Hastings EstimatorsImplementation details [Dodwell, Ketelsen, Scheichl, T. ’15]

The acceptance probability α2L for u(i)` is easy to compute, and we

can prove E[α2L]→ 1 as `→∞. This means P[u

(i)` = U

(i)`−1

]→ 1 as

`→∞.

To define the level ` approximation πyh` , we can

I change the number of parameters on level `: u ∈ RN → u1:R`∈ RR` ,

I change the noise level on level `: Γ→ Γ`, where η ∼ N(0,Γ).

In practice, we always start at level 0 when generating samples, anduse a sub-sampling rate t`.

I This leads to an efficient implementation with small integratedautocorrelation times (O(1)) on levels ` ≥ 1 on level `.

(Multilevel) Metropolis Hastings EstimatorsImplementation details [Dodwell, Ketelsen, Scheichl, T. ’15]

The acceptance probability α2L for u(i)` is easy to compute, and we

can prove E[α2L]→ 1 as `→∞. This means P[u

(i)` = U

(i)`−1

]→ 1 as

`→∞.

To define the level ` approximation πyh` , we can

I change the number of parameters on level `: u ∈ RN → u1:R`∈ RR` ,

I change the noise level on level `: Γ→ Γ`, where η ∼ N(0,Γ).

In practice, we always start at level 0 when generating samples, anduse a sub-sampling rate t`.

I This leads to an efficient implementation with small integratedautocorrelation times (O(1)) on levels ` ≥ 1 on level `.

(Multilevel) Metropolis Hastings EstimatorsNumerical Example [Dodwell, Ketelsen, Scheichl, T. ’15]

Flow cell model problem on (0, 1)2

u log-normal random field

Observed data: y = {p(xi) + ηi}16i=1, with ηi ∼ N(0, 10−4)

QoI φ is outflow over right boundary

00.01 0.02 0.03 0.04

MLMCMCStandard MCMC

(Multilevel) Metropolis Hastings EstimatorsRelated works

Earlier related work using coarse models to ”pre-screen” proposals forfine models:

I [Liu, ’01] : surrogate transition method

I [Christen, Fox ’05] : delayed acceptance

I [Efendiev, Hou, Lou ’06] : application to PDEs

There are several other works discussing multilevel methods in thecontext of inverse problems, see the MLMC community webpage.

Conclusions

Standard sampling methods quickly become computationallyinfeasible for practical applications.

Multilevel methods are powerful tools for reducing the computationalburden in practical applications, leading to a cost reduction of ordersof magnitude.

In the context of Bayesian inverse problems, multilevel methodsinclude

I multilevel ratio estimators (importance sampling),

I multilevel Metropolis-Hastings estimators.

Multilevel methods are generally applicable!

References I

Mulilevel Community Webpage.

https://people.maths.ox.ac.uk/gilesm/mlmc_community.html.

J. A. Christen and C. Fox, Markov chain Monte Carlo using an approximation,Journal of Computational and Graphical Statistics, 14 (2005), pp. 795–810.

J. Dick, R. N. Gantner, Q. T. Le Gia, and C. Schwab, Multilevelhigher-order quasi-Monte Carlo Bayesian estimation, Mathematical Models andMethods in Applied Sciences, 27 (2017), pp. 953–995.

T. Dodwell, C. Ketelsen, R. Scheichl, and A. Teckentrup, Ahierarchical multilevel Markov chain Monte Carlo algorithm with applications touncertainty quantification in subsurface flow, SIAM/ASA Journal on UncertaintyQuantification, 3 (2017), pp. 1075–1108.

Y. Efendiev, T. Hou, and W. Luo, Preconditioning Markov chain MonteCarlo simulations using coarse-scale models, SIAM Journal on ScientificComputing, 28 (2006), pp. 776–803.

References II

R. N. Gantner and M. D. Peters, Higher-Order Quasi-Monte Carlo forBayesian Shape Inversion, SIAM/ASA Journal on Uncertainty Quantification, 6(2018), pp. 707–736.

J. Kaipio and E. Somersalo, Statistical and computational inverse problems,Springer, 2004.

J. S. Liu, Monte Carlo strategies in scientific computing, Springer, 2001.

C. Robert and G. Casella, Monte Carlo Statistical Methods, Springer, 1999.

D. Rudolf, Explicit error bounds for Markov chain Monte Carlo, arXiv preprintarXiv:1108.3201, (2011).

R. Scheichl, A. Stuart, and A. Teckentrup, Quasi-Monte Carlo andMultilevel Monte Carlo Methods for Computing Posterior Expectations in EllipticInverse Problems, SIAM/ASA Journal on Uncertainty Quantification, 5 (2017),pp. 493–518.

C. Schillings and C. Schwab, Sparse, adaptive Smolyak quadratures forBayesian inverse problems, Inverse Problems, 29 (2013), p. 065011.

References III

, Scaling limits in computational Bayesian inversion, ESAIM: MathematicalModelling and Numerical Analysis, 50 (2016), pp. 1825–1856.

C. Schillings, B. Sprungk, and P. Wacker, On the Convergence of theLaplace Approximation and Noise-Level-Robustness of Laplace-based Monte CarloMethods for Bayesian Inverse Problems, arXiv preprint arXiv:1901.03958, (2019).

A. M. Stuart, Inverse Problems: A Bayesian Perspective, Acta Numerica, 19(2010), pp. 451–559.

multilevel sampling methods for inverse problems constrained by partial differential … ·...

Documents