collected_slides_5b.pdf

Upload: pedro-luis-carro

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 collected_slides_5b.pdf

    1/56

    General pseudo-inverse

    ifA has SVD A= UVT,

    A =V1UT

    is the pseudo-inverse or Moore-Penrose inverse ofA

    ifA is skinny and full rank,

    A = (ATA)1AT

    gives the least-squares solution xls= Ay

    ifA is fat and full rank,

    A =AT(AAT)1

    gives the least-norm solution xln=Ay

    SVD Applications 162

  • 8/13/2019 collected_slides_5b.pdf

    2/56

    Full SVD

    SVD ofA Rmn

    with Rank(A) = r:

    A= U11VT

    1 = u1 ur

    1. . .

    r

    vT1...vTr

    find U2 R

    m(mr), V2 Rn(nr) s.t. U= [U1 U2] R

    mm andV = [V1 V2] R

    nn are orthogonal

    add zero rows/cols to 1 to form Rmn:

    =

    1 0r(n r)

    0(m r)r 0(m r)(n r)

    SVD Applications 165

  • 8/13/2019 collected_slides_5b.pdf

    3/56

    then we have

    A = U11V

    T

    1 = U1 U2

    1 0r(n r)0(m r)r 0(m r)(n r)

    VT1VT2

    i.e.:

    A = UVT

    called full SVD ofA

    (SVD with positive singular values only called compact SVD)

    SVD Applications 166

  • 8/13/2019 collected_slides_5b.pdf

    4/56

    Image of unit ball under linear transformation

    full SVD:A =UVT

    gives intepretation ofy =Ax:

    rotate (by VT)

    stretch along axes by i (i = 0 fori > r)

    zero-pad (ifm > n) or truncate (ifm < n) to get m-vector

    rotate (by U)

    SVD Applications 167

  • 8/13/2019 collected_slides_5b.pdf

    5/56

    Image of unit ball under A

    rotate by VT

    stretch, =diag(2, 0.5)

    u1u2 rotate byU

    1

    1

    1

    1

    2

    0.5

    {Ax | x 1} is ellipsoidwith principal axes iui.

    SVD Applications 168

  • 8/13/2019 collected_slides_5b.pdf

    6/56

    Sensitivity of linear equations to data error

    consider y = Ax, A Rnn invertible; of course x = A1y

    suppose we have an error or noise in y, i.e., y becomes y + y

    then x becomes x+ x with x = A1y

    hence we have x = A1y A1y

    ifA1 is large,

    small errors in y can lead to large errors in x

    cant solve for x given y (with small errors)

    hence, A can be considered singular in practice

    SVD Applications 1615

  • 8/13/2019 collected_slides_5b.pdf

    7/56

    a more refined analysis uses relative instead of absoluteerrors in x andy

    since y =Ax, we also have y Ax, hence

    x

    x AA1

    y

    y

    (A) = AA1 = max(A)/min(A)

    is called the condition number ofA

    we have:

    relative error in solutionx condition number relative error in data y

    or, in terms of # bits of guaranteed accuracy:

    # bits accuacy in solution # bits accuracy in data log2

    SVD Applications 1616

  • 8/13/2019 collected_slides_5b.pdf

    8/56

  • 8/13/2019 collected_slides_5b.pdf

    9/56

    State estimation set up

    we consider the discrete-time system

    x(t + 1) =Ax(t) + Bu(t) + w(t), y(t) =C x(t) + Du(t) + v(t)

    w is state disturbance

    ornoise

    v is sensor noise or error

    A, B, C, and D are known

    u and y are observed over time interval [0, t 1]

    w and v are not known, but can be described statistically, or assumedsmall (e.g., in RMS value)

    Observability and state estimation 192

  • 8/13/2019 collected_slides_5b.pdf

    10/56

    State estimation problem

    state estimation problem: estimate x(s) from

    u(0), . . . , u(t 1), y(0), . . . , y(t 1)

    s= 0: estimate initial state

    s= t

    1: estimate current state

    s= t: estimate (i.e., predict) next state

    an algorithm or system that yields an estimate x(s) is called an observer or

    state estimator

    x(s) is denoted x(s|t 1) to show what information estimate is based on(read, x(s) given t 1)

    Observability and state estimation 193

  • 8/13/2019 collected_slides_5b.pdf

    11/56

    Noiseless case

    lets look at finding x(0), with no state or measurement noise:

    x(t + 1) =Ax(t) + Bu(t), y(t) =C x(t) + Du(t)

    with x(t) Rn, u(t) Rm, y(t) Rp

    then we have

    y(0)...

    y(t 1)

    = Otx(0) +Tt

    u(0)...

    u(t 1)

    Observability and state estimation 194

  • 8/13/2019 collected_slides_5b.pdf

    12/56

    where

    Ot=

    C

    CA...

    CAt1

    , Tt=

    D 0

    CB D 0

    ...

    CAt2B CAt3B CB D

    Ot maps initials state into resulting output over [0, t 1]

    Tt maps input to output over [0, t 1]

    hence we have

    Otx(0) =

    y(0)

    ...y(t 1)

    Tt

    u(0)

    ...u(t 1)

    RHS is known, x(0) is to be determined

    Observability and state estimation 195

  • 8/13/2019 collected_slides_5b.pdf

    13/56

    hence:

    can uniquely determine x(0) if and only ifN(Ot) ={0}

    N(Ot) gives ambiguity in determining x(0)

    if x(0) N(Ot) and u= 0, output is zero over interval [0, t 1]

    input u does not affect ability to determine x(0);its effect can be subtracted out

    Observability and state estimation 196

  • 8/13/2019 collected_slides_5b.pdf

    14/56

    Observability matrix

    by C-H theorem, each Ak is linear combination ofA0, . . . , An1

    hence for t n,N(Ot) =N(O) where

    O= On=

    C

    CA...

    CAn1

    is called the observability matrix

    ifx(0) can be deduced from u and y over [0, t 1] for any t, then x(0)can be deduced from u and y over [0, n 1]

    N(O) is called unobservable subspace; describes ambiguity in determiningstate from input and output

    system is called observable ifN(O) ={0}, i.e., Rank(O) =n

    Observability and state estimation 197

  • 8/13/2019 collected_slides_5b.pdf

    15/56

    Observers for noiseless case

    suppose Rank(Ot) =n (i.e., system is observable) and let Fbe any leftinverse ofOt, i.e., FOt=I

    then we have the observer

    x(0) =F

    y(0)...

    y(t 1)

    Tt

    u(0)...

    u(t 1)

    which deduces x(0) (exactly) from u, y over [0, t 1]

    in fact we have

    x( t+ 1) = F

    y( t+ 1)...

    y()

    Tt

    u( t+ 1)...

    u()

    Observability and state estimation 1910

  • 8/13/2019 collected_slides_5b.pdf

    16/56

    i.e., our observer estimates what state was t 1 epochs ago, given pastt 1inputs & outputs

    observer is (multi-input, multi-output) finite impulse response(FIR) filter,

    with inputs u and y, and output x

    Observability and state estimation 1911

  • 8/13/2019 collected_slides_5b.pdf

    17/56

  • 8/13/2019 collected_slides_5b.pdf

    18/56

  • 8/13/2019 collected_slides_5b.pdf

    19/56

  • 8/13/2019 collected_slides_5b.pdf

    20/56

    rewrite as

    Ox=

    y

    y...y(n1)

    T

    u

    u...u(n1)

    RHS is known; x is to be determined

    hence ifN(O) ={0} we can deduce x(t) from derivatives ofu(t), y(t) up

    to order n

    1in this case we say system is observable

    can construct an observer using any left inverse F ofO:

    x= F

    y

    y...

    y(n1)

    T

    u

    u...

    u(n1)

    Observability and state estimation 1915

  • 8/13/2019 collected_slides_5b.pdf

    21/56

    reconstructs x(t) (exactly and instantaneously) from

    u(t), . . . , u(n1)(t), y(t), . . . , y(n1)(t)

    derivative-based state reconstruction is dual of state transfer usingimpulsive inputs

    Observability and state estimation 1916

  • 8/13/2019 collected_slides_5b.pdf

    22/56

    A converse

    suppose z N(O) (the unobservable subspace), and u is any input, with

    x, y the corresponding state and output, i.e.,

    x= Ax + Bu, y=C x + Du

    then state trajectory x= x + etAz satisfies

    x= Ax + Bu, y=Cx + Du

    i.e., input/output signals u, y consistent with both state trajectories x, x

    hence if system is unobservable, no signal processing of any kind applied tou and y can deduce x

    unobservable subspace N(O) gives fundamental ambiguity in deducing xfrom u, y

    Observability and state estimation 1917

  • 8/13/2019 collected_slides_5b.pdf

    23/56

  • 8/13/2019 collected_slides_5b.pdf

    24/56

    interpretation: xls(0) minimizes discrepancy between

    output y that would beobserved, with input u and initial state x(0)

    (and no sensor noise), and

    output y that was observed,

    measured as

    t1=0

    y() y()2

    can express least-squares initial state estimate as

    xls(0) =t1

    =0(A

    T

    )

    C

    T

    CA

    1

    t1=0

    (AT

    )

    C

    T

    y(

    )

    where y is observed output with portion due to input subtracted:y=y h u where h is impulse response

    Observability and state estimation 1919

  • 8/13/2019 collected_slides_5b.pdf

    25/56

  • 8/13/2019 collected_slides_5b.pdf

    26/56

    set of possible estimation errors is ellipsoid

    x(0) Eunc= Ot v(0)

    ...v(t 1)

    1

    t

    t1=0

    v()2

    2

    Eunc is uncertainty ellipsoid for x(0) (least-square gives best Eunc)

    shape of uncertainty ellipsoid determined by matrix

    O

    T

    tOt

    1

    =

    t1=0

    (AT)CTCA

    1

    maximum norm of error is

    xls(0) x(0) tO

    t

    Observability and state estimation 1921

  • 8/13/2019 collected_slides_5b.pdf

    27/56

  • 8/13/2019 collected_slides_5b.pdf

    28/56

    Continuous-time least-squares state estimation

    assume x= Ax+Bu, y=Cx+Du+v is observable

    least-squares estimate of initial state x(0), given u(), y(), 0 t:choose xls(0) to minimize integral square residual

    J=

    t0

    y() CeAx(0)2

    d

    where y=y h u is observed output minus part due to input

    lets expand as J=x(0)TQx(0) + 2rTx(0) +s,

    Q= t0

    eA

    T

    CT

    CeA

    d, r= t0

    eA

    T

    CT

    y()d,

    q=

    t

    0

    y()Ty()d

    Observability and state estimation 1928

  • 8/13/2019 collected_slides_5b.pdf

    29/56

    setting x(0)J to zero, we obtain the least-squares observer

    xls(0) =Q1

    r= t

    0e

    AT

    CT

    CeA

    d1 t

    0eAT

    CT

    y()d

    estimation error is

    x(0) = xls(0) x(0) =

    t0

    eAT

    CTCeA d

    1 t

    0

    eAT

    CTv()d

    therefore ifv= 0 then xls(0) =x(0)

    Observability and state estimation 1929

  • 8/13/2019 collected_slides_5b.pdf

    30/56

    Given examples we want to model the relation between xiand yias yi!Axi. Define the estimation problem as:

    we differentiate w.r.t. Aand set to 0

    13

    {xi,yi}Ni=1

    argminA

    N

    i=1 ||yi Axi||2 =

    N

    i=1 x

    iAAxi 2y

    iAxi+ y

    iyi

    A

    N

    i=1||yi Axi||

    2 = 0

    Ni=1

    2Axix

    i 2yix

    i = 0

    A

    N

    i=1

    xix

    i =

    N

    i=1

    yix

    i

    N N

  • 8/13/2019 collected_slides_5b.pdf

    31/56

    If rank of is full rank (requires N > n) then

    E.g. wed like to estimate A in system: xt+1= Axt + !(!is noise).

    To solve, simply replace yi with xi+1 in above solution.

    Note that this would also be the most likely Aif !were Gaussian noisewith zero mean and unit variance.

    14

    A

    Ni=1

    xix

    i =

    Ni=1

    yix

    i

    N

    i=1xix

    i

    A=

    Ni=1

    yix

    i

    Ni=1

    xix

    i

    1

  • 8/13/2019 collected_slides_5b.pdf

    32/56

    Left: Phase plane, values of xt where !N(0,I)

    Right: Squared error between true and estimated A as function of stepnumber. error = "i,j(Aij-ij)2 is called the Frobenius norm.

    15

    !10 !8 !6 !4 !2 0 2 4 6 8 10!10

    !8

    !6

    !4

    !2

    0

    2

    4

    6

    8

    Phase plane. xt+1

    =Axt+w

    0 10 20 30 40 50 60 70 80 90 1000

    1

    2

    3

    4

    5

    6

    7

    8

    ||A!A||2

    time

    squared

    error

  • 8/13/2019 collected_slides_5b.pdf

    33/56

    EE363 Winter 2005-06

    Lecture 6

    Estimation

    Gaussian random vectors

    minimum mean-square estimation (MMSE)

    MMSE with linear measurements

    relation to least-squares, pseudo-inverse

    61

  • 8/13/2019 collected_slides_5b.pdf

    34/56

    Gaussian random vectors

    random vector x Rn is Gaussian if it has density

    px(v) = (2)n/2(det )1/2 exp

    1

    2(v x)T1(v x)

    ,

    for some = T >0, x Rn

    denoted x N(x,)

    x Rn is the mean orexpected value ofx, i.e.,

    x= Ex= vpx(v)dv

    = T >0 is the covariancematrix ofx, i.e.,

    = E(x x)(x x)T

    Estimation 62

  • 8/13/2019 collected_slides_5b.pdf

    35/56

    = ExxT xxT

    =

    (v x)(v x)Tpx(v)dv

    density for x N(0, 1):

    4 3 2 1 0 1 2 3 40

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    v

    p

    x(v)=

    1

    2

    ev

    2/2

    Estimation 63

  • 8/13/2019 collected_slides_5b.pdf

    36/56

    mean and variance of scalar random variable xi are

    Exi= xi, E(xi xi)2 = ii

    hence standard deviation ofxi isii

    covariance between xi and xj is E(xi xi)(xj xj) = ij correlation coefficient between xi and xj is ij =

    ijiijj

    mean (norm) square deviation ofx from x is

    E x x2 = ETr(x x)(x x)T = Tr =n

    i=1

    ii

    (using TrAB = TrBA)

    example: x N(0, I) means xi are independent identically distributed(IID) N(0, 1) random variables

    Estimation 64

  • 8/13/2019 collected_slides_5b.pdf

    37/56

  • 8/13/2019 collected_slides_5b.pdf

    38/56

    geometrically:

    mean xgives center of ellipsoid

    semiaxes areiui, where ui are (orthonormal) eigenvectors of

    with eigenvalues i

    Estimation 66

  • 8/13/2019 collected_slides_5b.pdf

    39/56

    example: x N(x,) with x=

    21

    , =

    2 11 1

    x1 has mean 2, std. dev. 2 x2 has mean 1, std. dev. 1

    correlation coefficient between x1 and x2 is = 1/

    2

    E x x2 = 3

    90% confidence ellipsoid corresponds to = 4.6:

    10 8 6 4 2 0 2 4 6 8 108

    6

    4

    2

    0

    2

    4

    6

    8

    x1

    x2

    (here, 91 out of 100 fall in E4.6)

    Estimation 67

  • 8/13/2019 collected_slides_5b.pdf

    40/56

    Affine transformation

    suppose x N(x,x)

    consider affine transformation ofx:

    z =Ax+b,

    where A Rmn, b R

    m

    then z is Gaussian, with mean

    E z = E(Ax+b) =AEx+b= Ax+b

    and covariance

    z = E(z z)(z z)T

    = EA(x x)(x x)TAT

    = AxAT

    Estimation 68

  • 8/13/2019 collected_slides_5b.pdf

    41/56

    examples:

    ifw N(0, I) then x= 1/2w+ x is N(x,)

    useful for simulating vectors with given mean and covariance

    conversely, ifx N(x,) then z = 1/2(x x) is N(0, I)

    (normalizes & decorrelates)

    Estimation 69

  • 8/13/2019 collected_slides_5b.pdf

    42/56

  • 8/13/2019 collected_slides_5b.pdf

    43/56

    Degenerate Gaussian vectors

    it is convenient to allow to be singular (but still = T 0)

    (in this case density formula obviously does not hold)

    meaning: in some directions x is not random at all

    write as

    = [Q+ Q0] + 0

    0 0

    [Q+ Q0]T

    where Q = [Q+ Q0] is orthogonal, + > 0

    columns ofQ0 are orthonormal basis for N()

    columns ofQ+ are orthonormal basis for range()

    Estimation 611

  • 8/13/2019 collected_slides_5b.pdf

    44/56

    then QTx= [zT wT]T, where

    z N(QT+x,+) is (nondegenerate) Gaussian (hence, density formula

    holds)

    w= QT0x Rn is not random

    (QT0 x is called deterministic componentofx)

    Estimation 612

  • 8/13/2019 collected_slides_5b.pdf

    45/56

    Linear measurements

    linear measurements with noise:

    y = Ax+ v

    x Rn is what we want to measure or estimate

    y Rm is measurement

    A Rmn characterizes sensors or measurements

    v is sensor noise

    Estimation 613

  • 8/13/2019 collected_slides_5b.pdf

    46/56

    common assumptions:

    x N(x,x)

    v N(v,v)

    x and v are independent

    N(x,x) is the prior distribution ofx (describes initial uncertaintyabout x)

    v is noise bias oroffset(and is usually 0)

    v is noise covariance

    Estimation 614

  • 8/13/2019 collected_slides_5b.pdf

    47/56

  • 8/13/2019 collected_slides_5b.pdf

    48/56

  • 8/13/2019 collected_slides_5b.pdf

    49/56

    Minimum mean-square estimation

    suppose x Rn and y Rm are random vectors (not necessarily Gaussian)

    we seek to estimate x given y

    thus we seek a function : Rm Rn such that x= (y) is near x

    one common measure of nearness: mean-square error,

    E (y) x2

    minimum mean-square estimator (MMSE) mmse minimizes this quantity

    general solution: mmse(y) = E(x|y), i.e., the conditional expectation ofxgiven y

    Estimation 617

  • 8/13/2019 collected_slides_5b.pdf

    50/56

    MMSE for Gaussian vectors

    now suppose x Rn and y Rm are jointly Gaussian:

    x

    y

    N

    x

    y

    ,

    x xy

    Txy y

    (after alot of algebra) the conditional density is

    px|y(v|y) = (2)n/2(det)1/2 exp

    1

    2(v w)T1(v w)

    ,

    where

    =x

    xy1

    y

    T

    xy, w= x+xy1

    y (y

    y)

    hence MMSE estimator (i.e., conditional expectation) is

    x= mmse(y) = E(x|y) = x+ xy1

    y (y y)

    Estimation 618

  • 8/13/2019 collected_slides_5b.pdf

    51/56

    mmse is an affine function

    MMSE estimation error, x x, is a Gaussian random vector

    x x N(0,x xy1

    y Txy)

    note that

    x xy1

    y Txy x

    i.e., covariance of estimation error is always less than prior covariance ofx

    Estimation 619

  • 8/13/2019 collected_slides_5b.pdf

    52/56

    Best linear unbiased estimator

    estimator

    x= blu(y) = x+ xy1y (y y)

    makes sense when x, y arent jointly Gaussian

    this estimator

    is unbiased, i.e., E x= Ex

    often works well

    is widely used

    has minimum mean square error among all affine estimators

    sometimes called best linear unbiased estimator

    Estimation 620

  • 8/13/2019 collected_slides_5b.pdf

    53/56

    MMSE with linear measurements

    consider specific case

    y=Ax+v, x N(x,x), v N(v,v),

    x, v independent

    MMSE ofx given y is affine function

    x= x+B(y y)

    where B = xAT(AxA

    T +v)1, y=Ax+ v

    intepretation:

    x is our best prior guess ofx (before measurement)

    y y is the discrepancy between what we actually measure (y) and theexpected value of what we measure (y)

    Estimation 621

  • 8/13/2019 collected_slides_5b.pdf

    54/56

    estimator modifies prior guess by B times this discrepancy

    estimator blends prior information with measurement

    B gives gain from observed discrepancy to estimate

    B is small if noise term v in denominator is large

    Estimation 622

  • 8/13/2019 collected_slides_5b.pdf

    55/56

    MMSE error with linear measurements

    MMSE estimation error, x= x x, is Gaussian with zero mean andcovariance

    est= x xAT(AxA

    T + v)1Ax

    est x, i.e., measurement always decreases uncertainty about x

    diff

    erence

    xest

    givesvalue

    of measurement y in estimating x e.g., (estii/x ii)

    1/2 gives fractional decrease in uncertainty ofxi dueto measurement

    note: error covariance est can be determined before measurement y is

    made!

    Estimation 623

  • 8/13/2019 collected_slides_5b.pdf

    56/56

    to evaluate est, only need to know

    A (which characterizes sensors)

    prior covariance ofx (i.e., x)

    noise covariance (i.e., v)

    youdo notneed to know the measurement y (or the means x, v)

    useful for experiment design orsensor selection

    Estimation 6 24