lecture ii-3: interpolation and variational methods lecture outline: the interpolation problem,...

Lecture II-3: Interpolation and Variational Methods

Lecture Outline:

• The Interpolation Problem, Estimation Options• Regression Methods

– Linear– Nonlinear

• Input-oriented Bayesian Methods– Linear– Nonlinear

• Variational Solutions• SGP97 Case Study

A Typical Interpolation Problem -- Groundwater Flow

Problem is to characterize unknown heads at nodes on a discrete grid. Estimates rely on scattered head measurements and must be compatible with the groundwater flow equation.

How can we characterize unknown states (heads) and inputs (recharge) at all nodes?

y = vector of hydraulic head at n grid nodesu = vector of recharge values at n grid nodes

(uncertain)T = Scalar transmissivity (assumed known)

M = Matrix of coefs. used to interpolate nodal heads to measurement locations

z = vector of measurements at n locations = vector of n measurement errors (uncertain)

Well observationGrid node

State Eq. (GW Flow Eq)

0 ; )()(2 yuyT xx on boundaries ContinuousBuyTA )( Discretized:

Output Eq. MywzMeas Eq.

Myw

Options for Solving the Interpolation Problem

1 Classical regression approach: Assume the input u is unknown and the measurement error is random with a zero mean and known covariance C . Adjust nodal values of u to obtain the ‘best’ (e.g. least-squares) fit between the model output w and the measurement z. Given certain assumptions, this ‘point’ estimate may be used to derive probabilistic information about the range of likely states.

2 Bayesian estimation approach: Assume u and are random vectors described by known unconditional PDFs f u(u) and f ( ). Derive the conditional PDF of the state f y|z (y|z) or, when this is not feasible, identify particular properties of this PDF. Use this information to characterize the uncertain state variable.

Although these methods can lead to similar results in some cases, they are based on different assumptions and have somewhat different objectives. We will emphasize the Bayesian approach.

The two most commonly used options for solving the interpolation problem emphasize point and probabilistic estimates, respectively.

Classical Regression - Linear Problems

In the regression approach the “goodness of fit” between model outputs and observations is measured in terms of the weighted sum-squared error JLS :

][][ 1 wzCwzJ TLS

][][)( 1 GuzCGuzuJ TLS

When the problem is linear (as in the groundwater example), the state and output are linear functions of the input:

DuBuTAy )(1 GuMDuMyw

In this case the error JLS is a quadratic function of u with a unique minimum which is a linear function of z:

zzCGGCGu TTLS ])[(ˆ 111

LSLS uDy ˆˆ

Note that the matrix [G T C -1 G] has an inverse only when the number of unknowns

in u is less than the number of measurements in z.

is the classic least-squares estimate of u. The corresponding least-squares estimate of y is:

LSu

Minimizing u:

Function to minimize:

Classical Regression - Nonlinear Problems

When the state and/or measurement vectors are nonlinear functions of the input, the regression approach can be applied iteratively. Suppose that w = g(u). At each iteration the linear estimation equations are used, with the nonlinear model approximated by a first-order Taylor series:

Then the least-squares estimation equations at iteration k become:

)]ˆ([])[(ˆˆ 111,1, k

Tkk

TkkLSkLS ugzCGGCGuu

)ˆ(ˆ LSLS udy

In practice, JLS may have many local minima in the nonlinear case and convergence is not guaranteed (i.e. the estimation problem may be ill-posed).

kuuk uugG

ˆ)(

On iteration k ; k = 1, … kmax)ˆ()ˆ()( kkk uuGugugw

Where:

The iteration is started with a “first guess” and then continued until the sequence of estimates converges. An estimate of the state y = d(u) is obtained from the converged estimate of y :

1u

Bayesian Estimation - Linear Multivariate Normal Problems

Bayesian estimation focuses on the conditional PDF f y|z(y|z). This PDF conveys all the information about the uncertain state y contained in the measurement vector z.

Tyzzzyzyyzyy

zzyzB

CCCCC

zzCCyzyEy

|

1 ][)|(ˆ

In this case, f y|z(y|z) is completely defined by its mean and covariance, which can be derived from the general expression for a conditional multivariate normal PDF:

• f u (u) is multivariate normal with specified mean and covariance Cu uu

• f () is multivariate normal with a zero mean and covariance C • The state and measurement equations are linear and the measurement

error is additive, so y = D u and z = My + = MDu + = G u + .• u and are independent

Derivation of f y|z(u|z) (which is multivariate normal) is straightforward when u and z are jointly normal. This requirement is met in our groundwater example if we assume that:

These expressions are equivalent to those obtained from kriging with a known mean and optimal interpolation, when comparable assumptions are made.

Derivation of the Unconditional Mean and Covariance - Linear Multivariate Normal Problems

CGGCuGGuuGGuEzzzzEC

GDCuGGuuuDEzzyyEC

DDCDuuuuDEyyyyEC

uGGuEzuDDuEy

Tuu

TTzz

Tuu

TTyz

Tuu

TTTyy

]))([(]))([(

]))(([(]))([(

]))(([]))([(

][][

The groundwater model enters the Bayesian estimation equations through the unconditional mean and the unconditional covariances Cyz and Czz. These can be derived from the linear state and measurement equations and the specified covariances Cu u and C .

y

The conditional mean estimate obtained from these expressions can be shown to approach the least-squares estimate when C u u .

ByLSy

An approach similar to the one outlined above can be used to derive the conditional mean and covariance of the uncertain input u.

Contours of = E[y|z]By

Interpreting Bayesian Estimation Results

The conditional PDFs produced in the linear multivariate normal case are not particularly informative in themselves. In practice, it is more useful to examine spatial plots of scalar properties of these PDFs, such as the mean and standard deviation, or plots of the marginal conditional PDFs at particular locations.

The conditional mean is generally used as a point estimate of y while the conditional standard deviation provides a measure of confidence in this estimate. Note that the conditional standard deviation decreases near well locations, reflecting the local information provided by the head measurements.

Contours of yy|z

Marginal conditional PDF of y at node 14

y 0 2 4 6 80

0.2

0.4f y|z(y|z)

11

2

2

43

1

Bayesian Estimation - Nonlinear Problems

In this case, we have all the information required to apply Bayes Theorem.

and the PDF f z|u(z| u) is:

We suppose that f u (u) and f () are given (e.g. multivariate normal). If the measurement error is additive but the transformations y = d(u) and w = m(y) are nonlinear, then:

When the state and/or measurement vectors are nonlinear functions of the input, the variables y and z are generally not mutivariate normal, even if u and are normal. In this case, it is difficult to derive the conditional PDF f y|z(y| z) directly.

An alternative is to work with f u|z(u| z), the conditional PDF of u. Once f u|z(u| z) is computed it may be possible to use it to derive f y|z(y| z) or some of its properties.

The PDF f u|z(u| z) can be obtained from Bayes Theorem:

duufuzf

ufuzf

zf

ufuzfzuf

uuz

uuz

z

uuzzu

)()|(

)()|(

)(

)()|()|(

|

|||

)( )]([ )( ugudmymz

)]([)|(| ugzfuzf uz

Obtaining Practical Bayesian Estimates -- The Conditional Mode

For problems of realistic size the conditional PDF f u|z(u| z) is difficult to derive in closed form and is too large to store in numerical form. Even when this PDF can be computed, it is difficult to interpret. Usually spatial plots of scalar PDF properties provide the best characterization of the system’s inputs and states.

0 2 4 6 8 10 12 140

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Mode (peak)

f u|z(u|z)Conditional PDF of u (given z) for a scalar (single input) problem

u

In the nonlinear case it is difficult to derive exact expressions for the conditional mean and standard deviation or for the marginal conditional densities for nonlinear problems. However, it is possible to estimate the conditional mode (maximum) of f u|z(u| z).

Deriving the Conditional Mode

If and u are multivariate normal this expression may be written as:

The conditional mode is derived by noting that the maximum (with respect to u) of the PDF f u|z(u| z) is the same as the minimum of - ln [ f u|z(u| z)] (since - ln[ ] is a monotonically decreasing function of its argument). From Bayes Theorem we have (for additive measurement error):

])([])([])(([

)()()|(

ln])|([ ||

zflnuflnugzfln

zfufuzf

zuflnJ

zu

z

uuzzuB

][][ 21 )]([)]([

21 11 uuCuuugzCugzJ uu

TTB

Terms that do not depend on u

The estimated mode of f u|z(u| z) is the value of u (represented by ) which minimizes JB. Note that JB is an extended form of the least-square error measure JLS used in nonlinear regression.

modeB,u

is found with an iterative search similar to the one used to solve the nonlinear regression problem. This search usually converges better than the regression search because the second term in JB tends to give a better defined minimum. This is sometimes called a regularization term.

modeB,u

Iterative Solution of Nonlinear Bayesian Minimization Problems

In spatially distributed problems where the dimension of u is large a gradient-based search is the preferred method for minimizing JB. The search is carried out iteratively, with the new estimate (at the end of iteration k) computed from the old estimate (at the end of iteration k -1) and the gradient of JB evaluated at the old estimate:

Conventional numerical computation of JB /u using, for example, a finite difference technique, is very time-consuming, requiring order n model runs per iteration, where n is the dimension of u. Variational (adjoint) methods can greatly reduce the effort needed to compute JB /u.

where:

u

Juu kB

kk1

1,

,ˆˆ

1ˆ

1, )(

ku

BkB

uuJ

uJ

Contours of JB for a problem with 2 uncertain inputs, with search steps shown in red

u1

u21ˆ ku

ku

1ˆ ku

Variational (Adjoint) Methods for Deriving Search Gradients-1

Variational methods obtain the search gradient JB /u indirectly, from the first variation of a modified form of JB. These methods treat the state equation as an equality constraint. This constraint is adjoined to JB with a Lagrange multiplier (or adjoint vector). To illustrate, consider a static interpolation problem with nonlinear state and measurement equations and an additive measurement error:

When the state equation is adjoined the part of JB that depends on u is:

d(u)y )(ymz

)]([ ][][ 21 )]([)]([

21 11 udyuuCuuymzCymzJ T

uuTT

B

where is the Lagrange multiplier (or adjoint) vector. At a local minimum the first variation of JB must equal zero:

0)(][ )()]([ 11

uuudCuuy

yymCymzJ T

uuTTT

B

If is selected to insure that the first bracketed term is zero then the second bracketed term is the desired gradient JB /u.

Variational (Adjoint) Methods for Deriving Search Gradients - 2

The variational approach for computing JB /u on iteration k of the search can be summarized as follows:

)-kud(-ky 1ˆ1ˆ

)]ˆ([ 111

1

k

Tk

k ymzCy

m Compute adjoint from new state

uud

Cuuu

J kTkuu

Tk

kB

)(]ˆ[ 1

11

11,

u

Juu kB

kk1

1,

,ˆˆ

Compute gradient at

Compute state using input estimate from iteration k-11ˆ ku

Compute new input estimate ku

Here the subscripts k-1 on the partial derivatives m /y and d/u indicate that they are evaluated at and , respectively.1ˆ ku1ˆ ky

There are many versions of this static variational algorithm, depending on the form used to write the state equation. All of these give the same final result. In particular, all require only one solution of the state equation, together with inversions of the covariance matrices C and Cu u . When these matrices are diagonal (implying uncorrelated input and measurement errors) the inversions are straightforward. When correlation is included they can be computationally demanding.

Case Study Area

Aircraft microwave measurements

SGP97 Experiment - Soil Moisture Campaign

Test of Variational Smoothing Algorithm – SGP97 Soil Moisture Problem

“Measured” radiobrightness

“True” radiobrightness

“True” soil, canopy moisture and temperature

Soil properties and land use Land surface

model

Mean initial conditions

Mean land-atmosphere boundary fluxes

Radiative transfer model

Random input error

Random initial condition error

Random meas. error

Variational Algorithm

Estimated radiobrightness and soil moisture

Soil properties and land use, mean fluxes and initial conditions, error covariances

Estimation error

Observing System Simulation Experiment (OSSE)

Synthetic Experiment (OSSE) based on SGP97 Field Campaign

Synthetic experiment uses real soil, landcover, and precipitation data from SGP97 (Oklahoma). Radiobrightness measurements are generated from our land surface and radiative transfer models, with space/time correlated model error (process noise) and measurement error added.

SGP97 study area, showing principal inputs to data assimilation algorithm:

1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC

to p no de s a tura tio n rm s error [-]

da y o f yea r

refer enc e exp erim ent (rm s = 0 .0 2 9)3 a ss im . inte rvals A (rm s = 0 .0 3)12 a ss i m . i nterva ls B ( rm s = 0.03 2 )12 a ss i m . i nterva ls C (rm s = 0.03 8)rad io br ig htnes s o bs e rvatio n tim e s

W in d o w co n fi gu ra t io n s

1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

E ff ects o f S m o o th in g W in d o w C o n fi g u ra tio n

P o s it io n an d len g th o f va r ia t io n a l sm o o th in g w in d o w aff ect es t im at io n accu racy . E s tim at io n e r ro r is le ss fo r lo n g e r w in d o w s th at a re re in it ia liz ed j u s t a fte r ( ra th er th an j u s t b e fo re ) m easu rem en t t im e s .

1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r



1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r



1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r



1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r



1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r


1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r


1 7 0 1 72 1 74 1 76 17 8 18 0 18 2 0

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

ABC


da y o f yea r



1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

E ff ects o f S m o o th in g W in d o w C o n fi g u ra tio nE ff ects o f S m o o th in g W in d o w C o n fi g u ra tio n

P o s it io n an d len g th o f va r ia t io n a l sm o o th in g w in d o w aff ect es t im at io n accu racy . E s tim at io n e r ro r is le ss fo r lo n g e r w in d o w s th at a re re in it ia liz ed j u s t a fte r ( ra th er th an j u s t b e fo re ) m easu rem en t t im e s .

Variational algorithm performs well even without precipitation information. In this case, soil moisture is inferred only from microwave measurements.

Effects of Precipitation Information

1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

170 172 174 176 178 180 182 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

top node saturation rms error [-]

day of year

reference experiment (rms = 0.014)est - precip. withheld (rms = 0.034)prior - precip. withheld (rms = 0.19)

1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

1 7 0 1 7 2 1 7 4 1 7 6 1 7 8 1 8 0 1 8 2 0

5 0

1 0 0

1 5 0

Pr

ec

ip

it

at

io

n

[m

m/

d]

d a y o f y e a r

170 172 174 176 178 180 182 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

top node saturation rms error [-]

day of year

reference experiment (rms = 0.014)est - precip. withheld (rms = 0.034)prior - precip. withheld (rms = 0.19)

Summary

The Bayesian estimation approach outlined above is frequently used to solve static data assimilation (or interpolation) problems. It has the following notable features:

• When the state and measurement equations are linear and inputs and measurements errors are normally distributed the conditional PDFs f y|z(y| z)] and f u|z(u| z) are multivariate normal. In this case the Bayesian conditional mean and Bayesian conditional mode approaches give the same point estimate (i.e. the conditional mode is equal to the conditional mean).

• When the problem is nonlinear the Bayesian conditional mean and mode estimates are generally different. The Bayesian conditional mean estimate is generally not practical to compute for nonlinear problems of realistic size.

• The least squares approach is generally less likely than the Bayesian approach to converge to a reasonable answer for nonlinear problems since it does not benefit from the “regularization” properties imparted by the second term in JB.

• The variational (adjoint) approach greatly improves the computational efficiency of the Bayesian conditional mode estimation algorithm, especially for large problems.

• The input-oriented variational approach discussed here is a 3DVAR data assimilation algorithm. This name reflects the fact that 3DVAR is used for problems with variability in three spatial dimensions but not in time. 4DVAR data assimilation methods extend the concepts discussed here to time-dependent (dynamic) problems.

lecture ii-3: interpolation and variational methods lecture outline: the interpolation problem,...

Documents