accelerating bayesian inference with computationally ... · accelerating bayesian inference with...

1
Accelerating Bayesian inference with computationally intensive models, with application to Pine Island Glacier Patrick R. Conrad (MIT), Patrick Heimbach (MIT), Youssef Marzouk (MIT), Natesh Pillai (Harvard), and Aaron Smith (Univ. of Ottawa) Antarctica and climate change The Western Antarctic Ice Sheet has recently shown growing mass loss along the Amundsen coast Western Antarctic Ice Sheet [Rignot et al. 2011] Pine Island Glacier [NASA] Vast uncertainty in ice-ocean dynamics Figure: Temperature profile under Pine Island Glacier, Antarctica [Jacobs et al.] I How readily is heat absorbed by the ice? I How much mixing occurs near the ice-ocean interface? I Ultimately, can we predict melt rates and the stability of the glacier? Forward model of ice-ocean coupling I MIT General Circulation Model, configured for Pine Island I Realistic geometry on coarse scale (4 km × 4 km × 20 m) or fine scale (1 km × 1 km × 20 m) models I Several input parameters are unknown Constructing an inference problem Satellite image Bathymetry and sample locations 102 o W 30’ 101 o W 30’ 100 o W 30’ 24’ 12’ 75 o S 48’ 36’ 0 100 200 300 400 500 600 700 800 900 1000 I Representative locations for temperature and salinity observations Bayesian inference illustration I Bayesian inference expresses our prior beliefs over parameters θ R n , with a probability density, p (θ ), and constructs a posterior probability density, p (θ |d) ∝L(θ |d, f (θ ))p (θ ) expressing our beliefs after comparing the data d R d , to the computationally expensive forward model f (θ ). I Well suited to limited data and complex models θ MAP Posterior Contours Prior Contours Markov chain Monte Carlo (MCMC) Posterior contours Proposal contours MCMC samples I Significant literature discusses proposals that “mix” quickly, i.e., that generate nearly independent samples I Evaluates forward model N times I Run-time can be dominated by cost of f I Standard MCMC links cost of understanding p (θ |d) and f (θ ) MCMC with Local Approximations Given X 0 , initialize S 0 , then simulate chain {X t } t N with kernel: MH Kernel K t (x , ·) 1. Given X t , draw q t Q (X t , ·) from kernel Q with (symmetric) translation invariant density q (x , ·) 2. Compute acceptance ratio α = min 1, L(θ |d, ˜ f t (q t ))p (q t ) L(θ |d, ˜ f t (X t ))p (X t ) ! 3. As needed, select new samples near q t or X t , yielding S t ⊆S t +1 . Refine ˜ f t ˜ f t +1 . 4. Draw u ∼U (0, 1). If u < α, let X t +1 = q t , otherwise X t +1 = X t . Local approximations I To compute ˜ f (θ ), construct a model over ball B R (θ ) I Use samples θ i ∈S at distance r = kθ - θ i k < R I Approximation converges locally under loose conditions [Cleveland] I For example, quadratic approximations over B R (θ ) [Conn et al.]: kf -Q R f k≤kf kκλR 3 Local approximation illustration Early times Late times Models are refined using new points chosen when model quality appears poor Ergodicity and exactness of approximate samplers Assume the log-posterior is approximated with local quadratic models and θ ∈X⊆ R n for compact X , or p (θ |d) obeys a Gaussian envelope : lim r →∞ sup |θ |=r | log(p (θ |d)) - log(p (θ ))| =0 for some quadratic form log(p ) with negative definite coefficient matrix. Then under standard regularity assumptions for geometrically ergodic kernel K and posterior p (θ |d), the chain X t is ergodic and asymptotically samples from the exact posterior : lim t →∞ kP(X t ) - p (θ |d)k TV =0 Example: Elliptic permeability inversion Infer parameters of k given observations of u in the PDE: s · (k (s)s u (s)) = 0, Accuracy of chains Cost of chains Prior and likelihood selection I Priors are log-normal with expert-chosen mean and width I Likelihoods are i.i.d. Gaussian with variance suggested by in situ experimental data Parameter Nominal value, μ 0 Prior “width” σ 0 Drag coefficients 1.5E-3 1.5E-3 Heat & Salt transfer 1.0E-4 0.5E-4 Prandtl Number 13.8 1. Schmidt Number 2432. 200. Horizontal Diffusion 5.0E-5 5.0E-5 ZetaN 5.2E-2 0.5E-3 Temperature 0.04 Salinity 0.1 Computational details and results I Compute synthetic data using the fine scale model, try to infer them using the coarse scale I Constructed 30 parallel chains with shared evaluations I Chains run for approximately two weeks I Results shown after burn-in is removed Inference cost summary Samples Model runs Savings Drill and surface 225,000 53,000 4.2x Surface only 450,000 52,000 8.6x Prior and posterior marginals Drag Drill and Surface Surface Only Prior Transfer 1.00e-04 2.50e-04 Prandtl 13 14.5 Schmidt 2400 2600 Diff 2.00e-05 1.20e-04 0.0515 0.0525 Zeta 2 6 x 10 -3 0.0515 0.0525 1 2.5 x 10 -4 13 14.5 2400 2600 2 12 x 10 -5 Contributions I Introduce a novel framework for using local approximations within MCMC; prove that the framework produces asymptotically exact samples. I Demonstrate strong numerical performance on canonical inference problems. I Construct a realistic, synthetic inference problem for ice-ocean coupling near Pine Island Glacier. I Apply local approximation methods to reduce computational cost of inference in the Pine Island Glacier setting. This work is supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0007099, part of the SciDAC Institute for the Quantification of Uncertainty in Extreme-Scale Computations (QUEST). [email protected], [email protected]

Upload: others

Post on 02-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Bayesian inference with computationally ... · Accelerating Bayesian inference with computationally intensive models, with application to Pine Island Glacier Patrick

Accelerating Bayesian inference with computationallyintensive models, with application to Pine Island Glacier

Patrick R. Conrad (MIT), Patrick Heimbach (MIT), Youssef Marzouk (MIT),Natesh Pillai (Harvard), and Aaron Smith (Univ. of Ottawa)

Antarctica and climate change

The Western Antarctic Ice Sheet has recently showngrowing mass loss along the Amundsen coast

Western Antarctic Ice Sheet

[Rignot et al. 2011]

Pine Island Glacier

[NASA]

Vast uncertainty in ice-ocean dynamics

Figure: Temperature profile under Pine Island Glacier, Antarctica [Jacobset al.]

I How readily is heat absorbed by the ice?I How much mixing occurs near the ice-ocean interface?I Ultimately, can we predict melt rates and the stability of

the glacier?

Forward model of ice-ocean coupling

I MIT General CirculationModel, configured forPine Island

I Realistic geometry oncoarse scale (4 km × 4km × 20 m) or fine scale(1 km × 1 km × 20 m)models

I Several input parametersare unknown

Constructing an inference problem

Satellite imageBathymetry and sample

locations

102oW 30’ 101oW 30’ 100oW 30’

24’

12’

75oS

48’

36’

0

100

200

300

400

500

600

700

800

900

1000

I Representative locations for temperature and salinityobservations

Bayesian inference illustration

I Bayesian inferenceexpresses our prior beliefsover parameters θ ∈ Rn,with a probability density,

p(θ),

and constructs a posteriorprobability density,

p(θ|d) ∝ L(θ|d, f(θ))p(θ)

expressing our beliefs aftercomparing the datad ∈ Rd , to thecomputationally expensiveforward model f(θ).

I Well suited to limiteddata and complex models

θMAP

Posterior Contours

Prior Contours

Markov chain Monte Carlo (MCMC)

Posterior contours

Proposal contours

MCMC samples

I Significant literature discusses proposals that “mix”quickly, i.e., that generate nearly independent samples

I Evaluates forward model N timesI Run-time can be dominated by cost of fI Standard MCMC links cost of understanding p(θ|d) and

f(θ)

MCMC with Local Approximations

Given X0, initialize S0, then simulate chain {Xt}t≤N withkernel:

MH Kernel Kt(x , ·)1. Given Xt, draw qt ∼ Q(Xt, ·) from kernel Q with

(symmetric) translation invariant density q(x , ·)2. Compute acceptance ratio

α = min

(1,L(θ|d, f̃t(qt))p(qt)

L(θ|d, f̃t(Xt))p(Xt)

)3. As needed, select new samples near qt or Xt, yieldingSt ⊆ St+1. Refine f̃t → f̃t+1.

4. Draw u ∼ U(0, 1). If u < α, let Xt+1 = qt, otherwiseXt+1 = Xt.

Local approximations

I To compute f̃(θ), construct a model over ball BR(θ)I Use samples θi ∈ S at distance r = ‖θ − θi‖ < RI Approximation converges locally under loose conditions

[Cleveland]I For example, quadratic approximations over BR(θ) [Conn

et al.]:‖f −QRf‖ ≤ ‖f‖κλR3

Local approximation illustration

Early times Late times

Models are refined using new points chosen when modelquality appears poor

Ergodicity and exactness of approximate samplers

Assume the log-posterior is approximated with localquadratic models and θ ∈ X ⊆ Rn for compact X , orp(θ|d) obeys a Gaussian envelope:

limr→∞

sup|θ|=r

| log(p(θ|d))− log(p∞(θ))| = 0

for some quadratic form log(p∞) with negative definitecoefficient matrix.

Then under standard regularity assumptions forgeometrically ergodic kernel K∞ and posterior p(θ|d), thechain Xt is ergodic and asymptotically samples from theexact posterior:

limt→∞‖P(Xt)− p(θ|d)‖TV = 0

Example: Elliptic permeability inversion

Infer parameters of k given observations of u in the PDE:

∇s · (k(s, θ)∇su(s, θ)) = 0,

Accuracy of chains

104 105

MCMC step

10-3

10-2

10-1

100

101

Rela

tive c

ovari

ance

err

or

True model

Linear

Quadratic

GP

Cost of chains

104 105

MCMC step

102

103

104

105

Tota

l num

ber

of

evalu

ati

ons True model

Linear

Quadratic

GP

Prior and likelihood selection

I Priors are log-normal with expert-chosen mean and widthI Likelihoods are i.i.d. Gaussian with variance suggested by

in situ experimental data

Parameter Nominal value, µ′ Prior “width” σ′

Drag coefficients 1.5E-3 1.5E-3Heat & Salt transfer 1.0E-4 0.5E-4Prandtl Number 13.8 1.Schmidt Number 2432. 200.Horizontal Diffusion 5.0E-5 5.0E-5ZetaN 5.2E-2 0.5E-3

Temperature – 0.04Salinity – 0.1

Computational details and results

I Compute synthetic data using the fine scale model, try toinfer them using the coarse scale

I Constructed 30 parallel chains with shared evaluationsI Chains run for approximately two weeksI Results shown after burn-in is removed

Inference cost summary

Samples Model runs Savings

Drill and surface 225,000 53,000 ≥ 4.2x

Surface only 450,000 52,000 ≥ 8.6x

Prior and posterior marginals

Drag

Drill and Surface

Surface Only

Prior

Transfer

1.00e−04

2.50e−04

Prandtl

13

14.5

Schmidt

2400

2600

Diff

2.00e−05

1.20e−04

0.0515 0.0525

Zeta

2 6

x 10−3

0.0515

0.0525

1 2.5

x 10−4

13 14.5 2400 2600 2 12

x 10−5

Contributions

I Introduce a novel framework for using localapproximations within MCMC; prove that theframework produces asymptotically exact samples.

I Demonstrate strong numerical performance on canonicalinference problems.

I Construct a realistic, synthetic inference problem forice-ocean coupling near Pine Island Glacier.

I Apply local approximation methods to reducecomputational cost of inference in the Pine Island Glaciersetting.

This work is supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0007099, part of the SciDAC Institute for the Quantification of Uncertainty in Extreme-Scale Computations (QUEST). [email protected], [email protected]