collected_slides_5b.pdf
TRANSCRIPT
-
8/13/2019 collected_slides_5b.pdf
1/56
General pseudo-inverse
ifA has SVD A= UVT,
A =V1UT
is the pseudo-inverse or Moore-Penrose inverse ofA
ifA is skinny and full rank,
A = (ATA)1AT
gives the least-squares solution xls= Ay
ifA is fat and full rank,
A =AT(AAT)1
gives the least-norm solution xln=Ay
SVD Applications 162
-
8/13/2019 collected_slides_5b.pdf
2/56
Full SVD
SVD ofA Rmn
with Rank(A) = r:
A= U11VT
1 = u1 ur
1. . .
r
vT1...vTr
find U2 R
m(mr), V2 Rn(nr) s.t. U= [U1 U2] R
mm andV = [V1 V2] R
nn are orthogonal
add zero rows/cols to 1 to form Rmn:
=
1 0r(n r)
0(m r)r 0(m r)(n r)
SVD Applications 165
-
8/13/2019 collected_slides_5b.pdf
3/56
then we have
A = U11V
T
1 = U1 U2
1 0r(n r)0(m r)r 0(m r)(n r)
VT1VT2
i.e.:
A = UVT
called full SVD ofA
(SVD with positive singular values only called compact SVD)
SVD Applications 166
-
8/13/2019 collected_slides_5b.pdf
4/56
Image of unit ball under linear transformation
full SVD:A =UVT
gives intepretation ofy =Ax:
rotate (by VT)
stretch along axes by i (i = 0 fori > r)
zero-pad (ifm > n) or truncate (ifm < n) to get m-vector
rotate (by U)
SVD Applications 167
-
8/13/2019 collected_slides_5b.pdf
5/56
Image of unit ball under A
rotate by VT
stretch, =diag(2, 0.5)
u1u2 rotate byU
1
1
1
1
2
0.5
{Ax | x 1} is ellipsoidwith principal axes iui.
SVD Applications 168
-
8/13/2019 collected_slides_5b.pdf
6/56
Sensitivity of linear equations to data error
consider y = Ax, A Rnn invertible; of course x = A1y
suppose we have an error or noise in y, i.e., y becomes y + y
then x becomes x+ x with x = A1y
hence we have x = A1y A1y
ifA1 is large,
small errors in y can lead to large errors in x
cant solve for x given y (with small errors)
hence, A can be considered singular in practice
SVD Applications 1615
-
8/13/2019 collected_slides_5b.pdf
7/56
a more refined analysis uses relative instead of absoluteerrors in x andy
since y =Ax, we also have y Ax, hence
x
x AA1
y
y
(A) = AA1 = max(A)/min(A)
is called the condition number ofA
we have:
relative error in solutionx condition number relative error in data y
or, in terms of # bits of guaranteed accuracy:
# bits accuacy in solution # bits accuracy in data log2
SVD Applications 1616
-
8/13/2019 collected_slides_5b.pdf
8/56
-
8/13/2019 collected_slides_5b.pdf
9/56
State estimation set up
we consider the discrete-time system
x(t + 1) =Ax(t) + Bu(t) + w(t), y(t) =C x(t) + Du(t) + v(t)
w is state disturbance
ornoise
v is sensor noise or error
A, B, C, and D are known
u and y are observed over time interval [0, t 1]
w and v are not known, but can be described statistically, or assumedsmall (e.g., in RMS value)
Observability and state estimation 192
-
8/13/2019 collected_slides_5b.pdf
10/56
State estimation problem
state estimation problem: estimate x(s) from
u(0), . . . , u(t 1), y(0), . . . , y(t 1)
s= 0: estimate initial state
s= t
1: estimate current state
s= t: estimate (i.e., predict) next state
an algorithm or system that yields an estimate x(s) is called an observer or
state estimator
x(s) is denoted x(s|t 1) to show what information estimate is based on(read, x(s) given t 1)
Observability and state estimation 193
-
8/13/2019 collected_slides_5b.pdf
11/56
Noiseless case
lets look at finding x(0), with no state or measurement noise:
x(t + 1) =Ax(t) + Bu(t), y(t) =C x(t) + Du(t)
with x(t) Rn, u(t) Rm, y(t) Rp
then we have
y(0)...
y(t 1)
= Otx(0) +Tt
u(0)...
u(t 1)
Observability and state estimation 194
-
8/13/2019 collected_slides_5b.pdf
12/56
where
Ot=
C
CA...
CAt1
, Tt=
D 0
CB D 0
...
CAt2B CAt3B CB D
Ot maps initials state into resulting output over [0, t 1]
Tt maps input to output over [0, t 1]
hence we have
Otx(0) =
y(0)
...y(t 1)
Tt
u(0)
...u(t 1)
RHS is known, x(0) is to be determined
Observability and state estimation 195
-
8/13/2019 collected_slides_5b.pdf
13/56
hence:
can uniquely determine x(0) if and only ifN(Ot) ={0}
N(Ot) gives ambiguity in determining x(0)
if x(0) N(Ot) and u= 0, output is zero over interval [0, t 1]
input u does not affect ability to determine x(0);its effect can be subtracted out
Observability and state estimation 196
-
8/13/2019 collected_slides_5b.pdf
14/56
Observability matrix
by C-H theorem, each Ak is linear combination ofA0, . . . , An1
hence for t n,N(Ot) =N(O) where
O= On=
C
CA...
CAn1
is called the observability matrix
ifx(0) can be deduced from u and y over [0, t 1] for any t, then x(0)can be deduced from u and y over [0, n 1]
N(O) is called unobservable subspace; describes ambiguity in determiningstate from input and output
system is called observable ifN(O) ={0}, i.e., Rank(O) =n
Observability and state estimation 197
-
8/13/2019 collected_slides_5b.pdf
15/56
Observers for noiseless case
suppose Rank(Ot) =n (i.e., system is observable) and let Fbe any leftinverse ofOt, i.e., FOt=I
then we have the observer
x(0) =F
y(0)...
y(t 1)
Tt
u(0)...
u(t 1)
which deduces x(0) (exactly) from u, y over [0, t 1]
in fact we have
x( t+ 1) = F
y( t+ 1)...
y()
Tt
u( t+ 1)...
u()
Observability and state estimation 1910
-
8/13/2019 collected_slides_5b.pdf
16/56
i.e., our observer estimates what state was t 1 epochs ago, given pastt 1inputs & outputs
observer is (multi-input, multi-output) finite impulse response(FIR) filter,
with inputs u and y, and output x
Observability and state estimation 1911
-
8/13/2019 collected_slides_5b.pdf
17/56
-
8/13/2019 collected_slides_5b.pdf
18/56
-
8/13/2019 collected_slides_5b.pdf
19/56
-
8/13/2019 collected_slides_5b.pdf
20/56
rewrite as
Ox=
y
y...y(n1)
T
u
u...u(n1)
RHS is known; x is to be determined
hence ifN(O) ={0} we can deduce x(t) from derivatives ofu(t), y(t) up
to order n
1in this case we say system is observable
can construct an observer using any left inverse F ofO:
x= F
y
y...
y(n1)
T
u
u...
u(n1)
Observability and state estimation 1915
-
8/13/2019 collected_slides_5b.pdf
21/56
reconstructs x(t) (exactly and instantaneously) from
u(t), . . . , u(n1)(t), y(t), . . . , y(n1)(t)
derivative-based state reconstruction is dual of state transfer usingimpulsive inputs
Observability and state estimation 1916
-
8/13/2019 collected_slides_5b.pdf
22/56
A converse
suppose z N(O) (the unobservable subspace), and u is any input, with
x, y the corresponding state and output, i.e.,
x= Ax + Bu, y=C x + Du
then state trajectory x= x + etAz satisfies
x= Ax + Bu, y=Cx + Du
i.e., input/output signals u, y consistent with both state trajectories x, x
hence if system is unobservable, no signal processing of any kind applied tou and y can deduce x
unobservable subspace N(O) gives fundamental ambiguity in deducing xfrom u, y
Observability and state estimation 1917
-
8/13/2019 collected_slides_5b.pdf
23/56
-
8/13/2019 collected_slides_5b.pdf
24/56
interpretation: xls(0) minimizes discrepancy between
output y that would beobserved, with input u and initial state x(0)
(and no sensor noise), and
output y that was observed,
measured as
t1=0
y() y()2
can express least-squares initial state estimate as
xls(0) =t1
=0(A
T
)
C
T
CA
1
t1=0
(AT
)
C
T
y(
)
where y is observed output with portion due to input subtracted:y=y h u where h is impulse response
Observability and state estimation 1919
-
8/13/2019 collected_slides_5b.pdf
25/56
-
8/13/2019 collected_slides_5b.pdf
26/56
set of possible estimation errors is ellipsoid
x(0) Eunc= Ot v(0)
...v(t 1)
1
t
t1=0
v()2
2
Eunc is uncertainty ellipsoid for x(0) (least-square gives best Eunc)
shape of uncertainty ellipsoid determined by matrix
O
T
tOt
1
=
t1=0
(AT)CTCA
1
maximum norm of error is
xls(0) x(0) tO
t
Observability and state estimation 1921
-
8/13/2019 collected_slides_5b.pdf
27/56
-
8/13/2019 collected_slides_5b.pdf
28/56
Continuous-time least-squares state estimation
assume x= Ax+Bu, y=Cx+Du+v is observable
least-squares estimate of initial state x(0), given u(), y(), 0 t:choose xls(0) to minimize integral square residual
J=
t0
y() CeAx(0)2
d
where y=y h u is observed output minus part due to input
lets expand as J=x(0)TQx(0) + 2rTx(0) +s,
Q= t0
eA
T
CT
CeA
d, r= t0
eA
T
CT
y()d,
q=
t
0
y()Ty()d
Observability and state estimation 1928
-
8/13/2019 collected_slides_5b.pdf
29/56
setting x(0)J to zero, we obtain the least-squares observer
xls(0) =Q1
r= t
0e
AT
CT
CeA
d1 t
0eAT
CT
y()d
estimation error is
x(0) = xls(0) x(0) =
t0
eAT
CTCeA d
1 t
0
eAT
CTv()d
therefore ifv= 0 then xls(0) =x(0)
Observability and state estimation 1929
-
8/13/2019 collected_slides_5b.pdf
30/56
Given examples we want to model the relation between xiand yias yi!Axi. Define the estimation problem as:
we differentiate w.r.t. Aand set to 0
13
{xi,yi}Ni=1
argminA
N
i=1 ||yi Axi||2 =
N
i=1 x
iAAxi 2y
iAxi+ y
iyi
A
N
i=1||yi Axi||
2 = 0
Ni=1
2Axix
i 2yix
i = 0
A
N
i=1
xix
i =
N
i=1
yix
i
N N
-
8/13/2019 collected_slides_5b.pdf
31/56
If rank of is full rank (requires N > n) then
E.g. wed like to estimate A in system: xt+1= Axt + !(!is noise).
To solve, simply replace yi with xi+1 in above solution.
Note that this would also be the most likely Aif !were Gaussian noisewith zero mean and unit variance.
14
A
Ni=1
xix
i =
Ni=1
yix
i
N
i=1xix
i
A=
Ni=1
yix
i
Ni=1
xix
i
1
-
8/13/2019 collected_slides_5b.pdf
32/56
Left: Phase plane, values of xt where !N(0,I)
Right: Squared error between true and estimated A as function of stepnumber. error = "i,j(Aij-ij)2 is called the Frobenius norm.
15
!10 !8 !6 !4 !2 0 2 4 6 8 10!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. xt+1
=Axt+w
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
||A!A||2
time
squared
error
-
8/13/2019 collected_slides_5b.pdf
33/56
EE363 Winter 2005-06
Lecture 6
Estimation
Gaussian random vectors
minimum mean-square estimation (MMSE)
MMSE with linear measurements
relation to least-squares, pseudo-inverse
61
-
8/13/2019 collected_slides_5b.pdf
34/56
Gaussian random vectors
random vector x Rn is Gaussian if it has density
px(v) = (2)n/2(det )1/2 exp
1
2(v x)T1(v x)
,
for some = T >0, x Rn
denoted x N(x,)
x Rn is the mean orexpected value ofx, i.e.,
x= Ex= vpx(v)dv
= T >0 is the covariancematrix ofx, i.e.,
= E(x x)(x x)T
Estimation 62
-
8/13/2019 collected_slides_5b.pdf
35/56
= ExxT xxT
=
(v x)(v x)Tpx(v)dv
density for x N(0, 1):
4 3 2 1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
v
p
x(v)=
1
2
ev
2/2
Estimation 63
-
8/13/2019 collected_slides_5b.pdf
36/56
mean and variance of scalar random variable xi are
Exi= xi, E(xi xi)2 = ii
hence standard deviation ofxi isii
covariance between xi and xj is E(xi xi)(xj xj) = ij correlation coefficient between xi and xj is ij =
ijiijj
mean (norm) square deviation ofx from x is
E x x2 = ETr(x x)(x x)T = Tr =n
i=1
ii
(using TrAB = TrBA)
example: x N(0, I) means xi are independent identically distributed(IID) N(0, 1) random variables
Estimation 64
-
8/13/2019 collected_slides_5b.pdf
37/56
-
8/13/2019 collected_slides_5b.pdf
38/56
geometrically:
mean xgives center of ellipsoid
semiaxes areiui, where ui are (orthonormal) eigenvectors of
with eigenvalues i
Estimation 66
-
8/13/2019 collected_slides_5b.pdf
39/56
example: x N(x,) with x=
21
, =
2 11 1
x1 has mean 2, std. dev. 2 x2 has mean 1, std. dev. 1
correlation coefficient between x1 and x2 is = 1/
2
E x x2 = 3
90% confidence ellipsoid corresponds to = 4.6:
10 8 6 4 2 0 2 4 6 8 108
6
4
2
0
2
4
6
8
x1
x2
(here, 91 out of 100 fall in E4.6)
Estimation 67
-
8/13/2019 collected_slides_5b.pdf
40/56
Affine transformation
suppose x N(x,x)
consider affine transformation ofx:
z =Ax+b,
where A Rmn, b R
m
then z is Gaussian, with mean
E z = E(Ax+b) =AEx+b= Ax+b
and covariance
z = E(z z)(z z)T
= EA(x x)(x x)TAT
= AxAT
Estimation 68
-
8/13/2019 collected_slides_5b.pdf
41/56
examples:
ifw N(0, I) then x= 1/2w+ x is N(x,)
useful for simulating vectors with given mean and covariance
conversely, ifx N(x,) then z = 1/2(x x) is N(0, I)
(normalizes & decorrelates)
Estimation 69
-
8/13/2019 collected_slides_5b.pdf
42/56
-
8/13/2019 collected_slides_5b.pdf
43/56
Degenerate Gaussian vectors
it is convenient to allow to be singular (but still = T 0)
(in this case density formula obviously does not hold)
meaning: in some directions x is not random at all
write as
= [Q+ Q0] + 0
0 0
[Q+ Q0]T
where Q = [Q+ Q0] is orthogonal, + > 0
columns ofQ0 are orthonormal basis for N()
columns ofQ+ are orthonormal basis for range()
Estimation 611
-
8/13/2019 collected_slides_5b.pdf
44/56
then QTx= [zT wT]T, where
z N(QT+x,+) is (nondegenerate) Gaussian (hence, density formula
holds)
w= QT0x Rn is not random
(QT0 x is called deterministic componentofx)
Estimation 612
-
8/13/2019 collected_slides_5b.pdf
45/56
Linear measurements
linear measurements with noise:
y = Ax+ v
x Rn is what we want to measure or estimate
y Rm is measurement
A Rmn characterizes sensors or measurements
v is sensor noise
Estimation 613
-
8/13/2019 collected_slides_5b.pdf
46/56
common assumptions:
x N(x,x)
v N(v,v)
x and v are independent
N(x,x) is the prior distribution ofx (describes initial uncertaintyabout x)
v is noise bias oroffset(and is usually 0)
v is noise covariance
Estimation 614
-
8/13/2019 collected_slides_5b.pdf
47/56
-
8/13/2019 collected_slides_5b.pdf
48/56
-
8/13/2019 collected_slides_5b.pdf
49/56
Minimum mean-square estimation
suppose x Rn and y Rm are random vectors (not necessarily Gaussian)
we seek to estimate x given y
thus we seek a function : Rm Rn such that x= (y) is near x
one common measure of nearness: mean-square error,
E (y) x2
minimum mean-square estimator (MMSE) mmse minimizes this quantity
general solution: mmse(y) = E(x|y), i.e., the conditional expectation ofxgiven y
Estimation 617
-
8/13/2019 collected_slides_5b.pdf
50/56
MMSE for Gaussian vectors
now suppose x Rn and y Rm are jointly Gaussian:
x
y
N
x
y
,
x xy
Txy y
(after alot of algebra) the conditional density is
px|y(v|y) = (2)n/2(det)1/2 exp
1
2(v w)T1(v w)
,
where
=x
xy1
y
T
xy, w= x+xy1
y (y
y)
hence MMSE estimator (i.e., conditional expectation) is
x= mmse(y) = E(x|y) = x+ xy1
y (y y)
Estimation 618
-
8/13/2019 collected_slides_5b.pdf
51/56
mmse is an affine function
MMSE estimation error, x x, is a Gaussian random vector
x x N(0,x xy1
y Txy)
note that
x xy1
y Txy x
i.e., covariance of estimation error is always less than prior covariance ofx
Estimation 619
-
8/13/2019 collected_slides_5b.pdf
52/56
Best linear unbiased estimator
estimator
x= blu(y) = x+ xy1y (y y)
makes sense when x, y arent jointly Gaussian
this estimator
is unbiased, i.e., E x= Ex
often works well
is widely used
has minimum mean square error among all affine estimators
sometimes called best linear unbiased estimator
Estimation 620
-
8/13/2019 collected_slides_5b.pdf
53/56
MMSE with linear measurements
consider specific case
y=Ax+v, x N(x,x), v N(v,v),
x, v independent
MMSE ofx given y is affine function
x= x+B(y y)
where B = xAT(AxA
T +v)1, y=Ax+ v
intepretation:
x is our best prior guess ofx (before measurement)
y y is the discrepancy between what we actually measure (y) and theexpected value of what we measure (y)
Estimation 621
-
8/13/2019 collected_slides_5b.pdf
54/56
estimator modifies prior guess by B times this discrepancy
estimator blends prior information with measurement
B gives gain from observed discrepancy to estimate
B is small if noise term v in denominator is large
Estimation 622
-
8/13/2019 collected_slides_5b.pdf
55/56
MMSE error with linear measurements
MMSE estimation error, x= x x, is Gaussian with zero mean andcovariance
est= x xAT(AxA
T + v)1Ax
est x, i.e., measurement always decreases uncertainty about x
diff
erence
xest
givesvalue
of measurement y in estimating x e.g., (estii/x ii)
1/2 gives fractional decrease in uncertainty ofxi dueto measurement
note: error covariance est can be determined before measurement y is
made!
Estimation 623
-
8/13/2019 collected_slides_5b.pdf
56/56
to evaluate est, only need to know
A (which characterizes sensors)
prior covariance ofx (i.e., x)
noise covariance (i.e., v)
youdo notneed to know the measurement y (or the means x, v)
useful for experiment design orsensor selection
Estimation 6 24