lecture 4: practical examples. remember this? m est = m a + m [ d obs – gm a ] where m = [g t c d...

Lecture 4:

Practical Examples

Remember this?

mest = mA + M [ dobs – GmA]

where M = [GTCd-1G + Cm

-1]-1 GT Cd-1

It’s exactly the same as solving this equation

Cd-½G

Cm-½

Cd-½d

Cm-½mA

m =

which has the form Fm=h by simple least-squares!

m = [FTF]-1FTh

This form of the equation is usually easier to set up

m = [FTF]-1FTh

in the uncorrelated case, the equation simplifies to

d-1G

m-1

d-1d

m-1mA

m =

each data equation weighted by the variance of that datum

each prior equation weighted by the variance of that prior

Example 1

1D Interpolation

Find a function f(x) that

1) goes through all your data points

(observations)

2) does something smooth inbetween

(prior information)

This is interpolation … but

why not just use least-squares?

m – a vector of all the points at which you want to estimate the function, including the points for which you have observations

d – a vector of just those points where you have observations

So the equation Gm=d is very simple, a model parameter equals the data when the corresponding observation is available:

…0 … 0 1 0 … 0…

…mi

…

…dj

… =

Just a single “1” per row

You then implement a smoothness constraint by first developing a matrix D that computes the non-smoothness of m

…0 … 1 -2 1 … 0…

D =

One possibility is to use the finite-difference approximation of the second derivative

And by realizing that:maximizing smoothness is the same as

minimizing |Dm|2

and minimizing |Dm|2 is the same as choosing

Cm-1DTD (along with mA=0).

First derivative

[dm/dx]i (1/x) mi – mi-1

mi – mi-1

Second derivative

[d2m/dx2]i [dm/dx]i+1 - [dm/dx]i

= mi+1 – mi – mi + mi-1

= mi+1 – 2mi + mi-1

So the F m = h equation is:

G

D

d

m =

is a damping parameter that represent the relative weight of the smoothness constraint, that is, how certain we are that the solution is smooth.

m =

1 0 … 0 0 0

0 0 … 0 1 0

… … … … … …

0 0 … 0 0 1

d1

d7

…

dN

- 0 0 0 0

-2 0 0 0

… … … … … …

0 0 … -2

0 0 … 0 -

0

0

0

0

0

example101 equally spaced along the x-axis

So 101 values of the function f(x)

40 of these values measured (the data, d)the rest are unknown

Two prior informationminimize 2nd derivative for interior 99 x’sminimize 1st derivative at left and right x’s

(nice to have the same numberof priors as unknowns, but notrequired)

= 10-6

data

result

f(x)

x

can be chosen by trial and error

but usually the result fairly insensitive to , as long as its small

varied over six orders of magnitude

log10 (T

otal Error)

log10()

A purist might say that this is not really interpolation, because the

curve goes through the data only in the limit 0

but for small ’sthe error is extremely small

Example 2

Reconstructing 2D data known to obey a differential equation

2f = 0

e.g. f(x,y) could be temperature

21 unknowns

21 u

nk

now

ns

2121=441 unknowns

44 observed data

Prior information:

2f = d2f/dx2 + d2f/dy2 = 0 in interior of the box

nf = 0 on edges of box

(sides of box are insulating)

The biggest issue here is bookkeeping

Conceptually, the model parameters are on a nm grid mij

But they have to be reorganized into a vector mk to do the calculations

m11 m12 m13 … m1n

m21 m22 m23 … m2n

m31 m32 m33 … m3n

…

mm1 mm2 mm3 … mmn

m1

m2

m3

…

mnm

e.g. mij mk with k=(i-1)*m+j

Thus a large percentage of the code is concerned with converting back and forth between positions in the grid and positions in the corresponding vector. It can look pretty messy!

results

comparison

Example 3

Linear Systems

Scenario 1: no past history needed

Flame with time-varying heat h(t)

Thermometer measuring temperature (t)

Flame instantaneously heats the thermometer

Thermometer retains no heat

(t) h(t)

Scenario 2:past history needed

Flame with time-varying heat h(t)

Thermometer measuring temperature (t)

Heats takes time to seep through plate

Plate retains heat

(t=t’) history of h(t) for time t<t’

Steel plate

How to write a Linear System(t) history of h(t’) for all times in the past

(t0) = … + g0 h(t0)

+ g1 h(t-1)

+ g2 h(t-2)

+ g3 h(t-3)

+ g4 h(t-4) + …

(t1) = … + g0 h(t1)

+ g1 h(t0)

+ g2 h(t-1)

+ g3 h(t-2)

+ g4 h(t-3) + …g is called the “impulse response” of the system

Matrix formulations

0

1

…N

h0

h1

…hN

g0 0 0 0 0 0g1 g0 0 0 0 0…gN … g3 g2 g1 g0

=

Note problem with parts of the equation being “off the ends” of the matrix

0

1

…N

g0

g1

…gN

h0 0 0 0 0 0h1 h0 0 0 0 0…hN … h3 h2 h1 h0

=

This formulation might be especially usefulwhen we know and g

and want to find h

0

1

…N

h0

h1

…hN

g0 0 0 0 0 0g1 g0 0 0 0 0…gN … g3 g2 g1 g0

=

= G h

0

1

…N

g0

g1

…gN

h0 0 0 0 0 0h1 h0 0 0 0 0…hN … h3 h2 h1 h0

=

= H g

This formulation might be especially usefulwhen we know andh and

and want to find g

Thermometer measuring plate temperature

Goal: infer “physics” of plate, as embodied in its impulse response function, g

plateThermometer measuring flame heat h

g(t)

htrue(t)

true(t)

Set up of problem

obs(t)=true(t)+noise

hobs(t)=htrue(t)+noise

Simulate noisy data

Results

gtrue(t) and gest(t) … yuck!

fix-uptry for shorter g(t) and use

2nd derivative damping

Damping: 2=100

Example 4

prediction error filter

how well does the past predict the present?

5 = g14 + g23 + g32 + g41 …6 = g15 + g24 + g33 + g42 …7 = g16 + g25 + g34 + g43 …

= g05 + g14 + g23 + g32 + g41 … = g06 + g15 + g24 + g33 + g42 … = g07 + g16 + g25 + g34 + g43 …

with g0 = -1

Solve g=0 by least squares with prior information g0=-1

matrix of ’s

use large damping

20 years of Laguardia Airport Temperatures, filter length M = 10 days

g

filter length M = 10 days

g

filter length M = 100 days

g

*g is the unpredictable part of

Let’s try it with the Neuse River Hydrograph Dataset

Filter length M=100

What’s that?

g

g

Close up of first year of data

g

Note that the prediction error, *g, is spikier than the hydrograph data, . I think that this means that some of the dynamics of the river flow is being captured by the filter, g, and that the unpredictable part is mostly the forcing, that is, precipitation

Example 4

Tomography

Tomography: reconstructing an image from measurements made along rays

CAT scan: density image, reconstructed from X-ray absorption

Seismic Tomography: velocity image, reconstructed from seismic ray travel times

MRI : proton density image, reconstructed from radio wave emission intensity along lines of constant precession frequency

source

receiver

ray

dataray i = ray i model(x,y) dL

arc length

source

receiver

ray

dataray i = voxel j modelj Lij arc length of ray i in voxel j

Discretize image into pixels or voxels

So the data kernel, G, is very simple

… … … … …

… … … … …

… … Lij … …

… … … … …

… … … … …

Arc length ofray i in voxel j

G =

Many elements will be zero

… … … … …

… … … … …

… … … …

… … … … …

… … … … …

ray i does not go through voxel j

G =

the hard parts are:

1. computing the ray paths, if they are more complicated than straight lines

2. book-keeping, e.g. figuring out which rays pass through which voxels

Sample seismic tomography problemhere’s the true model, mtrue

sources and receivers

Note: for the equation Gm=d to be linear, m

must be 1/velocity or “slownes”

Straight line ray paths

The true traveltime data, dtrue

In the previous plot, each ray is indexed by its closest distance to the origin, R, and it orientation,

ray

R

R

Each ray makes plots as one point

on the image, with its travel

time indicated by its color

true model, dtrue

estimated model, dest

(solution via damped least squares)

true model, dtrue

estimated model, dest

After doubling the station/receiver density …

lecture 4: practical examples. remember this? m est = m a + m [ d obs – gm a ] where m = [g t c d...

Documents