linear bayesian update surrogate for updating pce coefficients

Bayesian Update in low-rank tensor format

A. Litvinenko, B. V. Rosic, E. Zander, O. Pajonk, H. G. Matthies,

Institute for Scientific Computing, TU Braunschweig, Germany

July 13, 2011

Bayesian Update in low-rank tensor format — July 13, 2011 1/40

1 Introduction

2 Direct General Bayesian Approach

3 Discretisation

4 Numerical Examples

5 Conclusion


Introduction

Inverse Problem: Find parameter q given measurement data z

qA( ,u)

f u=S(q,f)

Y(q,u)

Forward

(u?)

Inverse(q?)

z

Ill-posed problem: issues of existence, uniqueness and stabilityBayesian Update in low-rank tensor format — July 13, 2011 3/40

Bayesian Regularization

- Additional information to data z: qf (apriori information, forecast)

What is qf ?

- classical Bayesian approach: qf := πf apriori pdf

πa(q|z) = const πf (q)π(z|q) = const πf (q)L(q)

Markov Chain Monte Carlo methods (MCMC) [Gamerman 2006]spectral stochastic FEM +MCMC [Kucerova at all 2010, Marzouk2009]collocation methods [Christen & Fox 2010]

-drawback: requires a complete statistical description of the problem


Direct General Bayesian Approach

- Probability space (Ω,B,P)

- the space of RVs with finite variance S := L2(Ω) (stochastic space)

- the Hilbert space Q (deterministic space)

Q -valued RVs form a space Q := Q⊗ STrue measurement

- Linear measurement y = Y (q, u) ∈ Y is polluted by noise ε :

z = y + ε, ε ∼ N(0,Cε) ⇒ z ∈ Y0 ⊆ Y := Y ⊗ S

Apriori information

qf : Ω→ Q, qf ∈ Qf ⊂ Q


Direct General Bayesian Approach

- already defined: z ∈ Y0, qf ∈ Qf- given linear mapping H : Q → Y , predict observation

y = Hqf , y ∈ Q0 = H∗(Y0)

TheoremIn the setting just described, the random variable qa ∈ Q — “a”stands for “assimilated” or “analysis” — is the orthogonal ( min.variance) projection of q onto the subspace Qf + Q0:

qa(ω) = qf (ω) + K (z(ω)− y(ω)), K := Cqf y (Cy + Cε)−1

with qf being the orthogonal projection onto Qf and K the “Kalmangain” operator [Luenberger 1969, Rosic at all 2011, Pajonk at all2011].

- doesn’t assume Gaussian statistics; in linear case reduces toKalman fillter [Evensen 2009]


Discretisation

“Projection of Projection ”

- the orthogonal projector P : Q → Q, P∗ = P

Q := QN ⊗ SJ

- project onto Q

qa(ω) = Pqa(ω) = P(qf (ω) + K (z(ω)− y(ω)))

= Pqf (ω) + PK (z(ω)− y(ω))

= qf (ω) + K (z(ω)− y(ω)),

where y(ω) = HPqf (ω) = Hqf (ω)


Example

- Darcy Law− div(κ(x , ω)∇u(x , ω)) = f (x , ω),

u(x , ω) = 0.

- Conductivity is for simplicity assumed to be scalar field with aprioridistribution (via maximum entropy principle)

κf (x) := exp(qf (x)), qf (x) ∼ N(µqf , σ2qf

)

- Covariance function

Covqf (x , y) = σ2qf

exp(−|x − y |/lc)

- following conditions hold:

κf (x , ω) > 0, ‖κf‖L∞(G×Ω) <∞, ‖1/κf‖L∞(G×Ω) <∞.


Variational Formulation

- The solution space:

U := U ⊗ S, U := H1(G) = u ∈ H1(G) | u = 0 on ∂G

- Euqilibrium equation:

a(v ,u) := E (a(ω)(v(·, ω),u(·, ω))) = E (〈`(ω), v(·, ω)〉) =: 〈〈`, v〉〉.

a(ω)(v ,u) :=

∫G∇v(x) · (κf (x , ω)∇u(x)) dx ,

〈`(ω), v〉 :=

∫G

v(x)f (x , ω) dx , ∀v ∈ U ,

- The well-possednes via Lax-Milgram theorem.


Discretisation

- Finite element discretisation: u(x , ω) =∑N

n=1 un(ω)φn(x)

A(ω)[u(ω)] = f (ω),

(A(ω))m,n := a(ω)(φm, φn) with the bi-linear form a(ω),(f (ω))m := 〈`(ω), φm〉,u(ω) = [u1(ω), . . . ,uN(ω)]T .


PCE and KLE

- Wiener’s polynomial chaos expansion: un(θ) =∑α∈J uαn Hα(θ(ω)),

α = (α1, . . . , α, . . .) ∈ N(N)0 , (1)

∀β ∈ J : E ([f (θ)− A(θ)u(θ)]Hβ(θ)) = 0, (2)

with f β := E (f (θ)Hβ(θ)) and Aβ,α := E (Hβ(θ)A(θ)Hα(θ)),

∀β ∈ J :∑α∈J

Aβ,αuα = f β , (3)

which further represents a linear, symmetric and positive definitesystem of equations of size N × R.


- The Karhunen-Loeve expansion (KLE) of stiffness and rhs

Au := (∞∑j=0

Aj ⊗∆j )(∑α∈J

uα ⊗ eα) = (∑α∈J

fα ⊗ eα) =: f,

where ∆j = E(HαξjHβ), κf =∑M

j=1 κjf ξj and |J | = R.

- The sparse tensor Galerkin methods [Zander, Matthies 2010]


Simulation of Measurements

- Measure some functional of the solution u in finitely many patches L:

G := x1, ..., xL ⊂ G, L := |G|.

- The average hydraulic head:

y(u, ω) :=[..., y(xj ), ...

]∈ RL, y(xj ) =

∫Gj

u(x , ω)dx ,

y = [y(x1, ω), ..., y(xL, ω)]T

- Observation:

z := y + ε, ε ∼ N(0,Cε)


Inverse Problem

- κf is cone in the vector space of RVs (not subspace)

- project: κf =∑α∈J κ

(α)f Hα(θ(ω)) (similar for z and y )

qf (x , ω) = log κf =∑α∈J

q(α)f (x)Hα(θ(ω)) = Qf H, Qf ∈ RN×R , H ∈ RR

Let Qa = [...,qβa , ...], Z = [..., zβ , ...] and Y = [...,yβ , ...], then- matrix form of update formula:

Qa = Qf + K (Z − Y ), K ∈ RN×L; Z ,Y ∈ RL×R

- map backκa = exp(qa(x , ω))


Bayesian update procedure

Input: a priori information qf (ω) and measurements z.1 approximate qf (ω) and input z(ω) by PCE.

2 set Qf = [...,qβf , ...], Z = [..., zβ , ...]3 solve u(ω) = S(qf (ω);f(ω))

4 forecast of measurementy(ω) = Y (qf (ω);u(ω)) = Y (qf (ω);S(qf (ω);f(ω)))

5 PCE representation of y(ω): Y = [...,yβ , ...]

6 compute covariance Cd = Cy + Cε = Y∆0YT

+ Cε

7 compute G = C−1d (Z − Y )

8 compute covariance Cqf y = Qf∆0Y

T

9 compute formula Qa = Qf + Cqf y G

Assimilated data Qa = [...,qβa , ...].


Kalman Filter

- the variance

Cqa = E (qa(·)⊗ qa(·)) =∑γ,β>0

qγa ⊗ qβa E (HγHβ) =∑γ>0

qγa ⊗ qγaγ!,

Cqa = Qa∆0Qa

T, Qa = Qa|γ=0

- Kalman formula:

Cqa = Cqf + Cqf y (Cy + Cε)−1 CT

qf y − 2Cqf y (Cy + Cε)−1 CT

qf y

= Cqf − Cqf y (Cy + Cε)−1 CT

qf y


Low rank data format

Aim: to compute the following equation in low-rank tensor format

qa(ω) = qf (ω) + K (z(ω)− y(ω)), (4)

withK = Cqf y (Cy + Cε)

−1, (5)

where Cqf y = Cov(qf , y) = E((qf − E (qf ))(y − E (y))T

),

Cy = Cov(y , y), Cε = Cov(ε, ε). can be approximated in H-matrix or inlow-rank tensor formats [Litvinenko et al. 2008].


Compression of PCE coefficients

Let RF q(x ,θ), θ = (θ1, ..., θM , ...) is approximated:

q(x ,θ) =∑β∈J

Hβ(θ)qβ(x), (6)

qβ(x) =1β!

∫Θ

Hβ(θ)q(x ,θ)P(dθ) ≈ 1β!

nq∑i=1

Hβ(θi )q(x ,θi )wi , (7)

where nq - number of quadrature points. Using low-rank format,obtain

qβ(x) = [q(x ,θ1), ...,q(x ,θnq )] · [Hβ(θ1)w1, ...,Hβ(θnq )wnq ]T (8)


Denotecβ := [Hβ(θ1)w1, ...,Hβ(θnq )wnq ] ∈ Rnq (9)

and approximate the set of realisations in low-rank format:

[q(x ,θ1), ...,q(x ,θnq )] ≈ ABT .

The matrix of all PCE coefficients will be

RN×|J | 3 [...qβ(x)...] ≈ ABT [...cTβ ...], β ∈ J . (10)

Later compression Hβ(θ) =∏M

j=1 hβj (θj ), where hβj (θj ) are 1DHermite polynomials, is possible.


Response surface in low-rank format

Put all together, obtain low-rank representation of RS

q(x ,θ) =∑β∈J

Hβ(θ)qβ(x) = HqT (x), (11)

where H = (...,Hβ(θ), ...) and q(x) = (...,qβ(x), ...). Use Eq. 10,obtain

q(x ,θ) = Hq(x)T = HABT [...cTβ ...], (12)

where vector cβ is defined in Eq. 9.Matrices A, BT and [...cT

β ...] are given. By fixing random parameterθ = θ∗ compute vector H and then a realisations q(x ,θ∗) of RF.


Application of response surface

Now, having RSq(x ,θ) = HABT [...cT

β ...] (13)

we generate RV θ, compute vector H, multiply by A, resulting vectormultiply by BT and then by matrix [...cT

β ...]. We repeat this , e.g., 106

times and then use the obtained sample to compute (in each point x)errorbars (command errorbar in Matlab ),quantiles (command quantile in Matlab ),cumulative density function (command ksdensity in Matlab ).


Relative errors and memory of rank-k approx.

rank k press. density tke ev xv memory, MB10 1.9e-2 1.9e-2 4.0e-3 1.4e-3 1.1e-2 2120 1.4e-2 1.3e-2 5.9e-3 4.1e-4 9.7e-3 4250 5.3e-3 5.1e-3 1.5e-4 7.7e-5 3.4e-3 104

Table: Matrices ∈ R260000×600. Dense matrix format costs 1.25 GB.


Numerical examples of tensor approximations

Gaussian kernel exp(−h2) has the Kronecker rank 1.

The exponen. kernel exp(−h) can be approximated by a tensor withlow Kronecker rank r .Approximation of C ∈ RN×N , N = 412 = 1681 in the KT format.

r 1 2 3 4 5 6 10‖C−Cr‖∞‖C‖∞ 11.5 1.7 0.4 0.14 0.035 0.007 2.8e − 8‖C−Cr‖2‖C‖2

6.7 0.52 0.1 0.03 0.008 0.001 5.3e − 9


Sequential Updating

1.update 2.updatef1

f2

κf1

κa1

κf2

start


Measurement points

−1 0 1−1

−0.5

0

0.5

1

−1 0 1−1

−0.5

0

0.5

1

a) 447 measurement patches b) 239 measurement patches

−1 0 1−1

−0.5

0

0.5

1

−1 0 1−1

−0.5

0

0.5

1

c) 120 measurement patches d) 10 measurement patches

Table: Position of measurement points (FEM nodes) used in the experiments


Given Data

- Right hand side: f = f0 sin( 2πλ xT d + ϕ)

d = [cos α sin α], α ∈ [−π/2, π/2], ϕ ∈ [0,2π]

- ’ Virtual truth’ is taken as

a) κ = 2b) κ = 2 + 0.3 · (x + y)

c) κ = 2.2− 0.1 · (x2 + y2)

- Apriori information:

E(κ) = 2.4, σκ = 0.4

order of PCE p = 3 and number of KLE modes: M <= 50


Relative Error

Experiment L εp 1st 2nd 3rd 4th1. 477 0.45 0.08 0.04 0.03 0.032. 239 0.45 0.08 0.05 0.05 0.043. 120 0.45 0.07 0.05 0.05 0.044. 60 0.45 0.07 0.06 0.05 0.055. 10 0.45 0.13 0.08 0.07 0.07

Table: “Constant truth”: Decay of the relative error εa in each experiment

εa :=‖κa − κt‖L2(Ω⊗G)

‖κt‖L2(Ω⊗G); εa :=

|E(κa)− E(κt )||E(κt )|


Relative Error

0 1 2 3 410

−2

10−1

100

Number of sequential updates

Rel

ativ

e er

ror

ε a

447 pt239 pt120 pt60 pt10 pt

Figure: “Linear truth”, experiment 1 (L=447): Convergence behaviour ofthe relative error εa with respect to the number of sequential updates andmeasurement points


Relative Error

−10

1

−10

10

1

2

a) εa [%]

−10

1

−10

15.5

6

6.5

b) εa [%]

−10

1

−10

176

78

80

c) I [%]

Figure: “Constant truth”, experiment 1 (L=447) after 4th update: a)Relative error εa (the mean of the posterior compared to the mean of thetruth) b) relative error εa (the posterior compared to the truth) c) improvementI (the posterior compared to the prior)


PDF

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

2

4

6

κ

PD

F

κf

κa

Figure: “Constant truth”, experiment 3 (L=120): Posterior probabilitydensity function κa compared to the prior κf for a single point in domain


Update

Figure: “Linear truth”, experiment 1 (L=447) after 1th update: a) mean ofthe prior, κf b) truth, κ c) mean of the posterior, κa


Update

−10

1

−10

11

2

3

a) κf

−10

1

−10

12

2.1

2.2

b) true κ

−10

1

−10

12

2.1

2.2

c) κa

Figure: “Quadratic truth”, experiment 1 (L=447) after 4th update: a)mean of the prior, κf b) truth, κ c) mean of the posterior, κa


Example: The Lorenz-84 Model

Described by the system:

dxdt

= −ax − y2 − z2 + aF1

dydt

= −y + xy − bxz + F2 (14)

dzdt

= −z − xz + bxy ,

where F1 and F2 represent known thermal forcings, and a and b arefixed constants.


The Lorenz-84 model shows chaotic behaviour and is very sensitiveto the initial conditions. For this reason we model these asindependent Gaussian RVs:

x0(ω) ∼ N (x0, σ1)

y0(ω) ∼ N (y0, σ2) (15)z0(ω) ∼ N (z0, σ3).

Due to the appearance of RVs, the determ. model turns into a systemof SDEs.


Figure: Bi-modal identification experiment after 1 update. Here are shownthe results for different amounts of measurements used to determine thePCE coefficients. First, we use 10 measurements (a), then 100 (b) and finally1000 (c). The plot contains the truth, the prior and the posterior, as well asthe last used measurement as an example.


Figure: Bi-modal identification experiment after 10 updates


Figure: Bi-modal identification experiment after 100 updates


Conclusion

The ill-posed problem is regularized by introduction of aprioriinformationthe update of the prior is a projection of the minimum varianceestimator from linear Bayesian updating onto the polynomialchaos basisfor the mean and variance the estimation is of the Kalman type.The estimation is purely deterministic without need for any kindof sampling proceduresThe presented linear Bayesian update does not need anylinearity in the forward model, and it can readily updatenon-Gaussian uncertainties.


Any Questions?

Thank you for your attention! Any Questions?

LiBerty

LInear BayEsian diRecT polYnomial chaos update


References

1 Gamerman, D. and Lopes, H. F. , Markov Chain Monte Carlo:Stochastic Simulation for Bayesian Inference, Chapman andHall, 2006

2 Kucerova, A. and Matthies, H. G., Uncertainty Updating in theDescription of Heterogeneous Materials, Technische Mechanik,Vol. 30, pp. 211–225, 2010

3 Marzouk, Y. M. and Najm, H. N. ,Dimensionality reduction andpolynomial chaos acceleration of Bayesian inference in inverseproblems, J. Comput. Phys, Vol. 228, 2009

4 Christen, J. A. and Fox, C., MCMC using an approximation, J.Comput. Graph. Stat., Vol. 14, pp. 795–810, 2005

5 Luenberger, D. G., Optimization by Vector Space Methods, JohnWiley and Sons, Inc., New York, 1969

6 Rosic, B., Litvinenko. A, Pajonk O., Matthies H.G., DirectBayesian update of polynomial chaos representations, J.Comput. Phys, 2011, submitted

7 Pajonk, O. and Rosic, B. V. and Litvinenko, A. and Matthies,H. G., A Deterministic Filter for non-Gaussian BayesianEstimation, Physica D: Nonlinear Phenomena, 2011, submittedBayesian Update in low-rank tensor format — July 13, 2011 40/40

linear bayesian update surrogate for updating pce coefficients

Engineering