collected_slides_5b.pdf

8/13/2019 collected_slides_5b.pdf

1/56

General pseudo-inverse

ifA has SVD A= UVT,

A =V1UT

is the pseudo-inverse or Moore-Penrose inverse ofA

ifA is skinny and full rank,

A = (ATA)1AT

gives the least-squares solution xls= Ay

ifA is fat and full rank,

A =AT(AAT)1

gives the least-norm solution xln=Ay

SVD Applications 162


2/56

Full SVD

SVD ofA Rmn

with Rank(A) = r:

A= U11VT

1 = u1 ur

1. . .

r

vT1...vTr

find U2 R

m(mr), V2 Rn(nr) s.t. U= [U1 U2] R

mm andV = [V1 V2] R

nn are orthogonal

add zero rows/cols to 1 to form Rmn:

=

1 0r(n r)

0(m r)r 0(m r)(n r)



3/56

then we have

A = U11V

T

1 = U1 U2

1 0r(n r)0(m r)r 0(m r)(n r)

VT1VT2

i.e.:

A = UVT

called full SVD ofA

(SVD with positive singular values only called compact SVD)



4/56

Image of unit ball under linear transformation

full SVD:A =UVT

gives intepretation ofy =Ax:

rotate (by VT)

stretch along axes by i (i = 0 fori > r)

zero-pad (ifm > n) or truncate (ifm < n) to get m-vector

rotate (by U)



5/56

Image of unit ball under A

rotate by VT

stretch, =diag(2, 0.5)

u1u2 rotate byU

1

1

1

1

2

0.5

{Ax | x 1} is ellipsoidwith principal axes iui.



6/56

Sensitivity of linear equations to data error

consider y = Ax, A Rnn invertible; of course x = A1y

suppose we have an error or noise in y, i.e., y becomes y + y

then x becomes x+ x with x = A1y

hence we have x = A1y A1y

ifA1 is large,

small errors in y can lead to large errors in x

cant solve for x given y (with small errors)

hence, A can be considered singular in practice



7/56

a more refined analysis uses relative instead of absoluteerrors in x andy

since y =Ax, we also have y Ax, hence

x

x AA1

y

y

(A) = AA1 = max(A)/min(A)

is called the condition number ofA

we have:

relative error in solutionx condition number relative error in data y

or, in terms of # bits of guaranteed accuracy:

# bits accuacy in solution # bits accuracy in data log2



8/56


9/56

State estimation set up

we consider the discrete-time system

x(t + 1) =Ax(t) + Bu(t) + w(t), y(t) =C x(t) + Du(t) + v(t)

w is state disturbance

ornoise

v is sensor noise or error

A, B, C, and D are known

u and y are observed over time interval [0, t 1]

w and v are not known, but can be described statistically, or assumedsmall (e.g., in RMS value)

Observability and state estimation 192


10/56

State estimation problem

state estimation problem: estimate x(s) from

u(0), . . . , u(t 1), y(0), . . . , y(t 1)

s= 0: estimate initial state

s= t

1: estimate current state

s= t: estimate (i.e., predict) next state

an algorithm or system that yields an estimate x(s) is called an observer or

state estimator

x(s) is denoted x(s|t 1) to show what information estimate is based on(read, x(s) given t 1)



11/56

Noiseless case

lets look at finding x(0), with no state or measurement noise:

x(t + 1) =Ax(t) + Bu(t), y(t) =C x(t) + Du(t)

with x(t) Rn, u(t) Rm, y(t) Rp

then we have

y(0)...

y(t 1)

= Otx(0) +Tt

u(0)...

u(t 1)



12/56

where

Ot=

C

CA...

CAt1

, Tt=

D 0

CB D 0

...

CAt2B CAt3B CB D

Ot maps initials state into resulting output over [0, t 1]

Tt maps input to output over [0, t 1]

hence we have

Otx(0) =

y(0)

...y(t 1)

Tt

u(0)

...u(t 1)

RHS is known, x(0) is to be determined



13/56

hence:

can uniquely determine x(0) if and only ifN(Ot) ={0}

N(Ot) gives ambiguity in determining x(0)

if x(0) N(Ot) and u= 0, output is zero over interval [0, t 1]

input u does not affect ability to determine x(0);its effect can be subtracted out



14/56

Observability matrix

by C-H theorem, each Ak is linear combination ofA0, . . . , An1

hence for t n,N(Ot) =N(O) where

O= On=

C

CA...

CAn1

is called the observability matrix

ifx(0) can be deduced from u and y over [0, t 1] for any t, then x(0)can be deduced from u and y over [0, n 1]

N(O) is called unobservable subspace; describes ambiguity in determiningstate from input and output

system is called observable ifN(O) ={0}, i.e., Rank(O) =n



15/56

Observers for noiseless case

suppose Rank(Ot) =n (i.e., system is observable) and let Fbe any leftinverse ofOt, i.e., FOt=I

then we have the observer

x(0) =F

y(0)...

y(t 1)

Tt

u(0)...

u(t 1)

which deduces x(0) (exactly) from u, y over [0, t 1]

in fact we have

x( t+ 1) = F

y( t+ 1)...

y()

Tt

u( t+ 1)...

u()



16/56

i.e., our observer estimates what state was t 1 epochs ago, given pastt 1inputs & outputs

observer is (multi-input, multi-output) finite impulse response(FIR) filter,

with inputs u and y, and output x



17/56


18/56


19/56


20/56

rewrite as

Ox=

y

y...y(n1)

T

u

u...u(n1)

RHS is known; x is to be determined

hence ifN(O) ={0} we can deduce x(t) from derivatives ofu(t), y(t) up

to order n

1in this case we say system is observable

can construct an observer using any left inverse F ofO:

x= F

y

y...

y(n1)

T

u

u...

u(n1)



21/56

reconstructs x(t) (exactly and instantaneously) from

u(t), . . . , u(n1)(t), y(t), . . . , y(n1)(t)

derivative-based state reconstruction is dual of state transfer usingimpulsive inputs



22/56

A converse

suppose z N(O) (the unobservable subspace), and u is any input, with

x, y the corresponding state and output, i.e.,

x= Ax + Bu, y=C x + Du

then state trajectory x= x + etAz satisfies

x= Ax + Bu, y=Cx + Du

i.e., input/output signals u, y consistent with both state trajectories x, x

hence if system is unobservable, no signal processing of any kind applied tou and y can deduce x

unobservable subspace N(O) gives fundamental ambiguity in deducing xfrom u, y



23/56


24/56

interpretation: xls(0) minimizes discrepancy between

output y that would beobserved, with input u and initial state x(0)

(and no sensor noise), and

output y that was observed,

measured as

t1=0

y() y()2

can express least-squares initial state estimate as

xls(0) =t1

=0(A

T

)

C

T

CA

1

t1=0

(AT

)

C

T

y(

)

where y is observed output with portion due to input subtracted:y=y h u where h is impulse response



25/56


26/56

set of possible estimation errors is ellipsoid

x(0) Eunc= Ot v(0)

...v(t 1)

1

t

t1=0

v()2

2

Eunc is uncertainty ellipsoid for x(0) (least-square gives best Eunc)

shape of uncertainty ellipsoid determined by matrix

O

T

tOt

1

=

t1=0

(AT)CTCA

1

maximum norm of error is

xls(0) x(0) tO

t



27/56


28/56

Continuous-time least-squares state estimation

assume x= Ax+Bu, y=Cx+Du+v is observable

least-squares estimate of initial state x(0), given u(), y(), 0 t:choose xls(0) to minimize integral square residual

J=

t0

y() CeAx(0)2

d

where y=y h u is observed output minus part due to input

lets expand as J=x(0)TQx(0) + 2rTx(0) +s,

Q= t0

eA

T

CT

CeA

d, r= t0

eA

T

CT

y()d,

q=

t

0

y()Ty()d



29/56

setting x(0)J to zero, we obtain the least-squares observer

xls(0) =Q1

r= t

0e

AT

CT

CeA

d1 t

0eAT

CT

y()d

estimation error is

x(0) = xls(0) x(0) =

t0

eAT

CTCeA d

1 t

0

eAT

CTv()d

therefore ifv= 0 then xls(0) =x(0)



30/56

Given examples we want to model the relation between xiand yias yi!Axi. Define the estimation problem as:

we differentiate w.r.t. Aand set to 0

13

{xi,yi}Ni=1

argminA

N

i=1 ||yi Axi||2 =

N

i=1 x

iAAxi 2y

iAxi+ y

iyi

A

N

i=1||yi Axi||

2 = 0

Ni=1

2Axix

i 2yix

i = 0

A

N

i=1

xix

i =

N

i=1

yix

i

N N


31/56

If rank of is full rank (requires N > n) then

E.g. wed like to estimate A in system: xt+1= Axt + !(!is noise).

To solve, simply replace yi with xi+1 in above solution.

Note that this would also be the most likely Aif !were Gaussian noisewith zero mean and unit variance.

14

A

Ni=1

xix

i =

Ni=1

yix

i

N

i=1xix

i

A=

Ni=1

yix

i

Ni=1

xix

i

1


32/56

Left: Phase plane, values of xt where !N(0,I)

Right: Squared error between true and estimated A as function of stepnumber. error = "i,j(Aij-ij)2 is called the Frobenius norm.

15

!10 !8 !6 !4 !2 0 2 4 6 8 10!10

!8

!6

!4

!2

0

2

4

6

8

Phase plane. xt+1

=Axt+w

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

||A!A||2

time

squared

error


33/56

EE363 Winter 2005-06

Lecture 6

Estimation

Gaussian random vectors

minimum mean-square estimation (MMSE)

MMSE with linear measurements

relation to least-squares, pseudo-inverse

61


34/56

Gaussian random vectors

random vector x Rn is Gaussian if it has density

px(v) = (2)n/2(det )1/2 exp

1

2(v x)T1(v x)

,

for some = T >0, x Rn

denoted x N(x,)

x Rn is the mean orexpected value ofx, i.e.,

x= Ex= vpx(v)dv

= T >0 is the covariancematrix ofx, i.e.,

= E(x x)(x x)T

Estimation 62


35/56

= ExxT xxT

=

(v x)(v x)Tpx(v)dv

density for x N(0, 1):

4 3 2 1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

v

p

x(v)=

1

2

ev

2/2

Estimation 63


36/56

mean and variance of scalar random variable xi are

Exi= xi, E(xi xi)2 = ii

hence standard deviation ofxi isii

covariance between xi and xj is E(xi xi)(xj xj) = ij correlation coefficient between xi and xj is ij =

ijiijj

mean (norm) square deviation ofx from x is

E x x2 = ETr(x x)(x x)T = Tr =n

i=1

ii

(using TrAB = TrBA)

example: x N(0, I) means xi are independent identically distributed(IID) N(0, 1) random variables

Estimation 64


37/56


38/56

geometrically:

mean xgives center of ellipsoid

semiaxes areiui, where ui are (orthonormal) eigenvectors of

with eigenvalues i

Estimation 66


39/56

example: x N(x,) with x=

21

, =

2 11 1

x1 has mean 2, std. dev. 2 x2 has mean 1, std. dev. 1

correlation coefficient between x1 and x2 is = 1/

2

E x x2 = 3

90% confidence ellipsoid corresponds to = 4.6:

10 8 6 4 2 0 2 4 6 8 108

6

4

2

0

2

4

6

8

x1

x2

(here, 91 out of 100 fall in E4.6)

Estimation 67


40/56

Affine transformation

suppose x N(x,x)

consider affine transformation ofx:

z =Ax+b,

where A Rmn, b R

m

then z is Gaussian, with mean

E z = E(Ax+b) =AEx+b= Ax+b

and covariance

z = E(z z)(z z)T

= EA(x x)(x x)TAT

= AxAT

Estimation 68


41/56

examples:

ifw N(0, I) then x= 1/2w+ x is N(x,)

useful for simulating vectors with given mean and covariance

conversely, ifx N(x,) then z = 1/2(x x) is N(0, I)

(normalizes & decorrelates)

Estimation 69


42/56


43/56

Degenerate Gaussian vectors

it is convenient to allow to be singular (but still = T 0)

(in this case density formula obviously does not hold)

meaning: in some directions x is not random at all

write as

= [Q+ Q0] + 0

0 0

[Q+ Q0]T

where Q = [Q+ Q0] is orthogonal, + > 0

columns ofQ0 are orthonormal basis for N()

columns ofQ+ are orthonormal basis for range()

Estimation 611


44/56

then QTx= [zT wT]T, where

z N(QT+x,+) is (nondegenerate) Gaussian (hence, density formula

holds)

w= QT0x Rn is not random

(QT0 x is called deterministic componentofx)

Estimation 612


45/56

Linear measurements

linear measurements with noise:

y = Ax+ v

x Rn is what we want to measure or estimate

y Rm is measurement

A Rmn characterizes sensors or measurements

v is sensor noise

Estimation 613


46/56

common assumptions:

x N(x,x)

v N(v,v)

x and v are independent

N(x,x) is the prior distribution ofx (describes initial uncertaintyabout x)

v is noise bias oroffset(and is usually 0)

v is noise covariance

Estimation 614


47/56


48/56


49/56

Minimum mean-square estimation

suppose x Rn and y Rm are random vectors (not necessarily Gaussian)

we seek to estimate x given y

thus we seek a function : Rm Rn such that x= (y) is near x

one common measure of nearness: mean-square error,

E (y) x2

minimum mean-square estimator (MMSE) mmse minimizes this quantity

general solution: mmse(y) = E(x|y), i.e., the conditional expectation ofxgiven y

Estimation 617


50/56

MMSE for Gaussian vectors

now suppose x Rn and y Rm are jointly Gaussian:

x

y

N

x

y

,

x xy

Txy y

(after alot of algebra) the conditional density is

px|y(v|y) = (2)n/2(det)1/2 exp

1

2(v w)T1(v w)

,

where

=x

xy1

y

T

xy, w= x+xy1

y (y

y)

hence MMSE estimator (i.e., conditional expectation) is

x= mmse(y) = E(x|y) = x+ xy1

y (y y)

Estimation 618


51/56

mmse is an affine function

MMSE estimation error, x x, is a Gaussian random vector

x x N(0,x xy1

y Txy)

note that

x xy1

y Txy x

i.e., covariance of estimation error is always less than prior covariance ofx

Estimation 619


52/56

Best linear unbiased estimator

estimator

x= blu(y) = x+ xy1y (y y)

makes sense when x, y arent jointly Gaussian

this estimator

is unbiased, i.e., E x= Ex

often works well

is widely used

has minimum mean square error among all affine estimators

sometimes called best linear unbiased estimator

Estimation 620


53/56

MMSE with linear measurements

consider specific case

y=Ax+v, x N(x,x), v N(v,v),

x, v independent

MMSE ofx given y is affine function

x= x+B(y y)

where B = xAT(AxA

T +v)1, y=Ax+ v

intepretation:

x is our best prior guess ofx (before measurement)

y y is the discrepancy between what we actually measure (y) and theexpected value of what we measure (y)

Estimation 621


54/56

estimator modifies prior guess by B times this discrepancy

estimator blends prior information with measurement

B gives gain from observed discrepancy to estimate

B is small if noise term v in denominator is large

Estimation 622


55/56

MMSE error with linear measurements

MMSE estimation error, x= x x, is Gaussian with zero mean andcovariance

est= x xAT(AxA

T + v)1Ax

est x, i.e., measurement always decreases uncertainty about x

diff

erence

xest

givesvalue

of measurement y in estimating x e.g., (estii/x ii)

1/2 gives fractional decrease in uncertainty ofxi dueto measurement

note: error covariance est can be determined before measurement y is

made!

Estimation 623


56/56

to evaluate est, only need to know

A (which characterizes sensors)

prior covariance ofx (i.e., x)

noise covariance (i.e., v)

youdo notneed to know the measurement y (or the means x, v)

useful for experiment design orsensor selection

Estimation 6 24

collected_slides_5b.pdf

Documents