linear quadratic regulatormocha-java.uccs.edu/ece5530/ece5530-ch03.pdfece5530, linear quadratic...

ECE5530: Multivariable Control Systems II. 3–1

LINEAR QUADRATIC REGULATOR

3.1: Cost functions; deterministic LQR problem

Cost functions

■ The engineering tradeoff in control-system design is

Fast response Slower response

Large intermediate states versus Smaller intermediate states

Large control effort Smaller control effort

■ Optimizing a tradeoff like this can be turned into a mathematical

optimization by defining a cost function that must be minimized.

■ The cost function used to compare controllers should include a

measure of size of output errors and size of control

J DZ tf

0

´T .t/W.t/´.t/ dt D k´.t/k2W.t/;Œ0;tf ! ;

where ´.t/ is assumed to include both output errors and control

inputs and W.t/ is psd at all times.

■ The cost function may be expanded to give more detailed description

J DZ tf

0

h

eT .t/ uT .t/i

2

4W1.t/ 0

0 W2.t/

3

5

2

4e.t/

u.t/

3

5 dt

DZ tf

0

eT .t/W1.t/e.t/ C uT .t/W2.t/u.t/ dt

where W1.t/ and W2.t/ are both psd.

Lecture notes prepared by Dr. Gregory L. Plett. Copyright © 2016, 2010, 2008, 2006, 2004, 2002, 2001, 1999, Gregory L. Plett

ECE5530, LINEAR QUADRATIC REGULATOR 3–2

■ This cost function is not as general, but often sufficient.

■ Note we can write e.t/ D r.t/ ! Cyx.t/.

■ Then,

J DZ tf

0

.r.t/ ! Cyx.t//T W1.t/.r.t/ ! Cyx.t// C uT .t/W2.t/u.t/ dt:

■ Often, we can formulate a goal such that we want to drive x.t/ ! 0 or

some linear combination of states to zero. This is called a regulator.

■ Then,

J DZ tf

0

xT .t/Q.t/x.t/ C uT .t/R.t/u.t/ dt;

where Q.t/ D C Ty W1.t/Cy and R.t/ D W2.t/, for example.

■ Sometimes, final output error is more critical than intermediate state

errors. e.g., flight of a missile

J D!

r.tf / ! Cyx.tf /"T

V!

r.tf / ! Cyx.tf /"

CZ tf

0

uT .t/Ru.t/ dt:

■ Value of cost function depends on reference input applied,

disturbance applied, initial conditions, final conditions, constraints on

state or control.

" Collectively, these are known as test conditions. e.g., nominal or

worst-case.

" For valid comparison between controllers, same test conditions

must be used.

The deterministic LQR problem

■ In our case, the deterministic linear quadratic regulator (LQR)

problem assumes that we have full state information available, and



that we desire to minimize the cost function

J.xk; uk/ D xTN HdxN C

N !1X

kD0

ŒxTk Qdxk C uT

k Rduk!;

subject to xkC1 D Axk C Buk (if the system dynamics are discrete), or

J.x.t/; u.t// D xT .tf /Hx.tf / CZ tf

0

xT .t/Qx.t/ C uT .t/Ru.t/ dt

subject to Px.t/ D Ax.t/ C Bu.t/ (if system dynamics are continuous).

" xTN HdxN or xT .tf /Hx.tf / is the penalty for “missing” the desired

final state.

" xTk Qdxk or xT .t/Qx.t/ is the penalty on excessive state size.

" uTk Rduk or uT .t/Ru.t/ is the penalty on excessive control effort.

■ Matrices H , Hd , Q, Qd and R, Rd are symmetric matrices that put

more (or less) cost on each term in the cost function.

" We require H; Hd # 0, Q; Qd # 0 and R; Rd > 0.

" The weighting of these matrices is relative to each other; if we

multiply everything by 2, the solution does not change (but the

minimum cost is doubled).

■ The solution is an input trajectory, either u.t/ or uk.

■ There are two approaches to optimization problems such as this:

" Calculus of variations [Burl];

" Dynamic programming [Bay].

■ We will look at both, since both are common in the literature.

" We’ll use calculus of variations to derive the continuous-time result.

" We will review the dynamic-programming results from ECE5520

for the discrete-time problem.



Lagrange multipliers

■ The LQR optimization is subject to the constraint imposed by the

system dynamics: e.g., Px.t/ D Ax.t/ C Bu.t/.

■ Without the constraint, we might consider optimizing the cost function

by using its gradient, rJ .

" The gradient at any location points in the direction of the steepest

increase in the function.

" Following the negative gradient will lead us down to a minimum of

the function.

" Whenever the gradient is zero, we are either at a minimum, a

maximum, or a saddle point.

" Second-derivative tests can tell us what is the case at any

particular location.

■ Now, suppose that we must minimize a function J.x/ subject to the

constraint c.x/ D 0. Plotting the constraint itself:

c.x/ D 0.

The directions you can

move are perpendicular to

the gradient.

The gradient of c.x/ at any

point is orthogonal to the

curve c.x/ D 0.

■ If rJ has any component aligned with the constraint, then the

constrained cost can be reduced by moving along the constraint in

the negative direction of that component.

■ However, when rJ is perpendicular to the constraint, that point is a

local minimum, maximum, or saddle point.



■ Since the gradient of the constraint is perpendicular to the constraint,

we have that rJ D !"rc, and the Lagrange multiplier " is the

proportionality factor.

■ So, we satisfy a constrained optimization when r.J C "c/ D 0.

" We do so by making an augmented cost Ja.x; "/ D J.x/ C "c.x/,

" Taking partial derivatives of Ja.x; "/, and

" Setting these derivatives to zero.

■ We will use this approach to solve the LQR problem.



3.2: Optimization via calculus of variations

■ Differentiation is primary tool for optimization w.r.t. a (scalar) variable.

■ Our objective requires differentiation of a real scalar cost function

w.r.t. the control input (a function of time).

■ Can be accomplished by using a generalization of the differential

called the variation.

■ Consider: The real scalar function of a scalar J.x/ has a local

minimum at x$ iff

J.x$ C ıx/ # J.x$/

for all ıx sufficiently small. Equivalently,

#J.x$; ıx/ D J.x$ C ıx/ ! J.x$/ # 0:

■ #J.x$; ıx/ is called the increment of J . Expand via Taylor series

#J.x$; ıx/ D J.x$ C ıx/ ! J.x$/

DdJ.x$/

dxıx C

d2J.x$/

dx2ıx2 C h:o:t: # 0:

■ Above, ıx is called the differential of x, and the term linear in ıx

(which is ŒdJ.x$/=dx!ıx in this case) is called the differential of J .

■ When dealing with a functional (a real scalar function of functions), ıx

is called the variation of x and the term linear in ıx is called the

variation of J and is denoted ıJ.x$; ıx/. So,

#J.x$; ıx/ D ıJ.x$; ıx/ C h:o:t: # 0:

■ The variation of J is a generalization of the differential and can be

applied to the optimization of a functional.



NECESSARY CONDITION OF OPTIMALITY: The variation of J is zero at x$

for all ıx.

EXAMPLE: Consider

J.x/ D x2 C 6x C 8:

1. Find the increment #J .

2. Simplify, and extract the variation ıJ .

3. Set ıJ D 0 and solve.

■ The increment of J is

#J.x; ıx/ D .x C ıx/2 C 6.x C ıx/ C 8 ! .x2 C 6x C 8/

D x2 C 2xıx C ıx2 C 6x C 6ıx C 8 ! x2 ! 6x ! 8

D .2x C 6/ıx C ıx2:

■ For optimality, the variation of J must be zero

ıJ.x; ıx/ D .2x C 6/ıx D 0

for all ıx ➠ Therefore, x D !3:

■ Using standard calculus, we would havedJ

dxD 2x C 6 D 0 and would

get the answer more quickly. However, the calculus of variations can

be directly extended to the optimization of a functional.

Lagrange multipliers (again)

■ The optimal control problem is a constrained minimization problem.

■ Want minimum cost subject to dynamics of plant Px D Ax C Bu.

■ The calculus of variations applies to unconstrained minimization

problems.



■ Lagrange multipliers convert a constrained minimization problem to a

higher-order unconstrained minimization problem. Can then use

calculus of variations.

■ Consider minimizing J.x/, x 2 Rn, subject to the constraint c.x/ D 0.

" The differential in J must be parallel to the differential in c for an

optimal solution.

■ That is, generalizing our prior solution (and dividing both sides by ıx),

we must haveıJ.x$/

ıxC "

ıc.x$/

ıxD 0

for some scalar ". Can also be written as the solution to an

unconstrained augmented cost function

Ja.x; "/ D J.x/ C "c.x/:

■ Note that we are just adding zero to our cost function in a clever way

that enforces the constraint. When c.x/ is a vector, we use

Ja.x; "/ D J.x/ C "T c.x/:

EXAMPLE: We wish to minimize J.x/ D x21 C x2

2 subject to the constraint

that c.x/ D 2x1 C x2 C 4 D 0.

■ The augmented cost function is

Ja.x; "/ D x21 C x2

2 C ".2x1 C x2 C 4/:

■ The increment of the augmented cost function is



#Ja.x; "/ D Ja.x C ıx; " C ı"/ ! Ja.x; "/

D .x1 C ıx1/2 C .x2 C ıx2/

2 C ." C ı"/

#

2.x1 C ıx1/

C.x2 C ıx2/ C 4

$

! x21 ! x2

2 ! "Œ2x1 C x2 C 4!

D .2x1 C 2"/ıx1 C .2x2 C "/ıx2 C .2x1 C x2 C 4/ı"

Cıx21 C ıx2

2 C 2ı"ıx1 C ı"ıx2:

■ Then,

ıJa.x; "; ıx; ı"/ D .2x1 C 2"/ıx1 C .2x2 C "/ıx2 C .2x1 C x2 C 4/ı" D 0:

■ Therefore,

ıJa

ıx1D 2x1 C 2" D 0

ıJa

ıx2D 2x2 C " D 0

ıJa

ı"D 2x1 C x2 C 4 D 0:

■ Notice that the unconstrained minimization problem simply has

2x1 D 2x2 D 0. Adding the constraint is seen by the added “"” terms

plus the ıJa=ı" term.

■ Solving these three equations and three unknowns givesh

x1 x2 "i

Dh

!1:6 !0:8 1:6i

:



3.3: The LQR problem solved via calculus of variations

■ For the continuous-time LQR problem, we wish to minimize

J.x.t/; u.t// D1

2xT .tf /Hx.tf / C

1

2

Z tf

0


subject to the constraint that Px.t/ D Ax.t/ C Buu.t/.

■ We add the constraint to the integral term to get

Ja.x.t/; u.t/; ".t// D J.x.t/; u.t// CZ tf

0

"T .t/ŒAx.t/ C Buu.t/ ! Px.t/! dt:

D1

2xT .tf /Hx.tf / C

Z tf

0

1

2xT .t/Qx.t/ C

1

2uT .t/Ru.t/

C"T .t/ŒAx.t/ C Buu.t/ ! Px.t/!

!

dt:

■ The Lagrange multiplier ".t/ is often referred to as the costate for

reasons that will become clear later.

■ We find the optimal control by first forming the increment of Ja.

#Ja.x; u; "; ıx; ıu; ı"/

D Ja.x C ıx; u C ıu; " C ı"/ ! Ja.x; u; "/

D1

2

!

x.tf / C ıx.tf /"T

H!

x.tf / C ıx.tf /"

CZ tf

0

1

2.x C ıx/T Q.x C ıx/ C

1

2.u C ıu/T R.u C ıu/

C." C ı"/T ŒA.x C ıx/ C Bu.u C ıu/ ! . Px C ı Px/! dt

!1

2xT .tf /Hx.tf / !

Z tf

0

1

2xT Qx C

1

2uT Ru C "T .Ax C Buu ! Px/ dt;

where time dependence has been omitted to simplify the expression.



■ Expand, noting that xT Qıx D .xT Qıx/T D ıxT QT x D ıxT Qx,

#Ja D1

2ıxT .tf /Hıx.tf /

CZ tf

0

1

2ıxT Qıx C

1

2ıuT Rıu C ı"T .Aıx C Buıu ! ı Px/ dt

CxT .tf /Hıx.tf / CZ tf

0

xT Qıx C uT Rıu C ı"T .Ax C Buu ! Px/

C"T .Aıx C Buıu ! ı Px/ dt:

■ The variation of Ja must equal zero

ıJa.x; u; "; ıx; ıu; ı"/ D xT .tf /Hıx.tf /

CZ tf

0

.xT Q C "T A/ıx C .uT R C "T Bu/ıu

Cı"T .Ax C Buu ! Px/ ! "T ı Px dt D 0:

■ The last term, "T ı Px, deserves special attention. It is a function of ıx

so is not an independent variable. We can eliminate it from the

equation via integration-by-partsZ tf

0

"T .t/ı Px.t/ dt D "T .t/ıx.t/ˇˇtf0

!Z tf

0

P"T .t/ıx.t/ dt:

■ Note that the initial state is fixed, so its partial is zero soZ tf

0

"T .t/ı Px.t/ dt D "T .tf /ıx.tf / !Z tf

0

P"T .t/ıx.t/ dt:

■ Substituting,

ıJa.x; u; "; ıx; ıu; ı"/ D!

xT .tf /H ! "T .tf /"

ıx.tf /

CZ tf

0

.xT Q C "T A C P"T /ıx C .uT R C "T Bu/ıu

Cı"T .Ax C Buu ! Px/ dt D 0:



■ Since the variations ıx, ıu, ı" are arbitrary, this expression is zero iff

"T .tf / D xT .tf /H; 0 D uT .t/R C "T .t/Bu;

P"T .t/ D !xT .t/Q ! "T .t/A; Px.t/ D Ax.t/ C Buu.t/:

■ From the third result, the optimal control is u.t/ D !R!1BTu ".t/, where

the inverse exists since R > 0. Still need to solve for ".t/ . . .

■ Combining two remaining equations (substitute u.t/ as above)

"

Px.t/

P".t/

#

D

"

A !BuR!1BTu

!Q !AT

#

„ ƒ‚ …

Hamiltonian matrix

"

x.t/

".t/

#

D Z

"

x.t/

".t/

#

■ We solve this system of equations noting that x.0/ D x0 and

".tf / D Hx.tf /.

"

x.tf /

".tf /

#

D eZ.tf !t /

"

x.t/

".t/

#

D

"

ˆ11.tf ! t/ ˆ12.tf ! t/

ˆ21.tf ! t/ ˆ22.tf ! t/

#"

x.t/

".t/

#

■ Substitute ".tf /

"

x.tf /

Hx.tf /

#

D

"

ˆ11.tf ! t/ ˆ12.tf ! t/

ˆ21.tf ! t/ ˆ22.tf ! t/

#"

x.t/

".t/

#

■ Eliminating x.tf / by substituting first equation into second

H!

ˆ11.tf ! t/x.t/ C ˆ12.tf ! t/".t/"

D ˆ21.tf !t/x.t/Cˆ22.tf !t/".t/:

■ Solve for ".t/

".t/ D!

ˆ22.tf ! t/ ! Hˆ12.tf ! t/"!1 !

Hˆ11.tf ! t/ ! ˆ21.tf ! t/"

x.t/

D P.t/x.t/;



so

u.t/ D !R!1BTu P.t/x.t/ D !K.t/x.t/:

Solving for P.t/ via the Hamiltonian System

■ From above, ".t/ D P.t/x.t/. Differentiating (via product rule) we get

P".t/ D PP .t/x.t/ C P.t/ Px.t/:

■ Substitute for P".t/ and Px.t/ to get

!Qx.t/ ! AT ".t/ D PP .t/x.t/ C P.t/ŒAx.t/ ! BuR!1BTu ".t/!:

■ Substitute ".t/ D P.t/x.t/ to get

Œ PP .t/ C P.t/A C AT P.t/ C Q ! P.t/BuR!1BTu P.t/!x.t/ D 0:

■ Since this is valid for arbitrary x.t/ we get

PP .t/ D !P.t/A ! AT P.t/ ! Q C P.t/BuR!1BTu P.t/:

■ This is called the differential (matrix) Riccati equation.

■ It is a nonlinear differential equation with boundary condition

P.tf / D H , solved backwards in time.

Steady-State Solution

■ As the differential equation for P.t/ is simulated backward in time

from the terminal point, it tends toward steady-state values as t ! 0.

It is much simpler to approximate the optimal control gains as a

constant set of gains calculated using Pss.

0 D PssBR!1BT Pss ! PssA ! AT Pss ! Q:

■ This is called the (continuous-time) algebraic Riccati equation (ARE).

In MATLAB, care.m



3.4: Solving the Riccati equation

Solving the differential Riccati equation via simulation

■ The differential Riccati equation may be solved numerically by

integrating the matrix differential equation

PP .t/ D P.t/BR!1BT P.t/ ! P.t/A ! AT P.t/ ! Q

backward in time.

■ An easy way to do this is to use Simulink with matrix signals.

EXAMPLE: Consider the continuous-time system

Px.t/ D

"

1 0

2 0

#

x.t/ C

"

1

0

#

u.t/

y.t/ Dh

0 1i

x.t/:

■ Solve the differential matrix Riccati equation that results in the control

signal that minimizes the cost function

J D1

2xT .5/

"

2 0

0 2

#

x.5/ C1

2

Z 5

0

ŒyT .t/y.t/ C uT .t/u.t/! dt:

■ First, note that the open-loop system is unstable, with poles at 0

and 1. It is controllable and observable.

■ The cost function is written in terms of y.t/ but not x.t/. However,

since there is no feedthrough term, we can also write it as

J D1

2xT .5/

"

2 0

0 2

#

x.5/ C1

2

Z 5

0

!

xT .t/C T Cx.t/ C uT .t/u.t/"

dt:

This is a common “trick”.



■ Therefore, the penalty matrices are Q D C T C and R D $ D 1.

■ We can simulate the finite-horizon case to find P.t/.

P=P.signals.values;t=squeeze(t.signals.values);P2=reshape(P,[min(size(P))^2 1 length(P)]);plot(t,squeeze(P2)’)

Integrator has "initial condition" Ptf.Final time.

To plot:

tP

MatrixMultiply

Product

K*u

MatrixGain A’

u*K

MatrixGain A

1/s−1

Gain

B*inv(R)*B’

tfQ

Clock

P(t)P(t)

0 1 2 3 4 50

1

2

3

4Solving for P

Time (s)

Valu

eSolving the algebraic Riccati equation manually

■ We can also solve the infinite-horizon case (analytically, for this

example). Consider the ARE

0 D AT P C PA C C T C ! PBR!1BT P"

0 0

0 0

#

D

"

1 2

0 0

#"

p11 p12

p21 p22

#

C

"

p11 p12

p21 p22

#"

1 0

2 0

#

C

"

0 0

0 1

#

!

"

p11 p12

p21 p22

#"

1 0

0 0

#"

p11 p12

p21 p22

#

D

"

p11 C 2p12 p12 C 2p22

0 0

#

C

"

p11 C 2p12 0

p12 C 2p22 0

#

C

"

0 0

0 1

#

!

"

p211 p11p12

p11p12 p212

#

:

■ This matrix equality represents a set of three simultaneous equations

(because P is symmetric). They are:



2p11 ! p211 C 4p12 D 0 1 ! p2

12 D0:

p12 C 2p22 ! p11p12 D 0

■ The final equation gives us p12 D ˙1. If we select p12 D !1 then the

first equation will have complex roots (bad). So, p12 D 1.

■ Then, p11 D 1 ˙p

5. If p11 D 1 !p

5 then P cannot be positive

definite. Therefore, p11 D 1 Cp

5 D 3:236.

■ Finally, we get p22 Dp

5=2 D 1:118.

■ These are the same values as the steady-state solution found by

integrating the differential Riccati equation.

■ The static feedback control signal is

u.t/ D !R!1BT Pssx.t/ D !h

3:236 1i

x.t/:

For this feedback, the closed-loop poles are at !p

5

2˙

p3

2j (stable).

Solving the algebraic Riccati equation generally

■ Can sometimes solve ARE manually, as above.

■ Or, can solve ARE by substituting P.tf / D H and solving differential

equation backwards in time until steady-state achieved.

" Usually a bad idea due to computation involved and propagation of

numeric errors.

■ Instead, recall

"

Px.t/

P".t/

#

D

"

A !BuR!1BTu

!Q !AT

#"

x.t/

".t/

#

D Z

"

x.t/

".t/

#



■ Diagonalize the Hamiltonian matrix. This can be done if all

eigenvalues distinct. If not, perturb Q or R.

"

P1.t/

P2.t/

#

D

"

!ƒ 0

0 ƒ

#"

´1.t/

´2.t/

#

■ The matrix ƒ is diagonal, and contains RHP poles of Hamiltonian

matrix.

■ We achieved the transformation via the transformation matrix ‰

"

x.t/

".t/

#

D

"

‰11 ‰12

‰21 ‰22

#"

´1.t/

´2.t/

#

■ The reverse relationship is then

"

´1.t/

´2.t/

#

D

"

.‰!1/11 .‰!1/12

.‰!1/21 .‰!1/22

#"

x.t/

".t/

#

■ Solving for ´2.t/ we get

´2.t/ D eƒt´2.0/ D .‰!1/21x.t/ C .‰!1/22".t/

D!

.‰!1/21 C .‰!1/22P.t/"

x.t/:

■ As t ! 1, x.t/ ! 0 for the cost function to remain finite. Therefore

limt!1

´2.t/ D limt!1

eƒt´2.0/ D limt!1

!

.‰!1/21 C .‰!1/22P.t/"

x.t/ D 0

which forces ´2.0/ D 0 and therefore ´2.t/ D 0. Then

x.t/ D ‰11´1.t/I

".t/ D ‰21´1.t/:



■ Since ".t/ D P x.t/ D ‰21.‰11/!1x.t/ then

P D ‰21.‰11/!1

and the steady-state optimal feedback gain matrix is

K D R!1BTu P D R!1BT

u ‰21.‰11/!1:

■ Summary:

1. Find the eigenvalues and eigenvectors of the Hamiltonian matrix.

2. Select the eigenvectors associated with stable eigenvalues and

write as

"

‰11

‰21

#

:

3. Compute the steady-state P D ‰21.‰11/!1.

4. Compute the steady-state control gain as K D R!1BTu ‰21.‰11/

!1.

■ “Schur decomposition” algorithm also exists—numerically more

stable.



3.5: Symmetric root locus

■ We have found that the closed-loop dynamics are governed by"

Px.t/

P".t/

#

D

"

A !BuR!1BTu

!Q !AT

#"

x.t/

".t/

#

D Z

"

x.t/

".t/

#

■ We have seen how to eliminate ".t/ using the solution to a Riccati

equation (more later), but the Hamiltonian is still useful since it can

tell us where the closed-loop poles are.

" Closed-loop poles are given by det.sI ! Z / D 0.

" Can evaluate this using the block-matrix determinant identity:

det

"

A B

C D

#

D det.A/ det.D ! CA!1B/:

■ So, substituting terms from the Hamiltonian,

det.sI ! Z /

D jsI ! Aj % detŒ.sI C AT / ! Q.sI ! A/!1BuR!1BTu !

D jsI ! Aj % jsI C AT j % detŒI ! .sI C AT /!1Q.sI ! A/!1BuR!1BTu ! D 0:

■ Note: det.I C ABC / D det.I C CAB/ so,

det.sI ! Z /

D jsI ! Aj % jsI C AT j % detŒI C R!1BTu .!sI ! AT /!1Q.sI ! A/!1Bu!:

■ Let Q D C Ty QyCy. Also

Gyu.s/ D Cy.sI ! A/!1Bu

GTyu.!s/ D BT

u .!sI ! AT /!1C Ty

D.s/ D det.sI ! A/

D.!s/ D det.sI C A/T .!1/n



■ Therefore

det.sI ! Z / D .!1/nD.s/D.!s/ detŒI C R!1GTyu.!s/QyGyu.s/!

D .!1/nD.s/D.!s/ŒI C R!1QyGyu.s/Gyu.!s/! if SISO:

■ This is called the “symmetric root locus” for reasons that will become

clear.

" In SISO case, there will be easy drawing rules.

" In MIMO case, don’t have easy drawing rules, but still have

closed-loop pole symmetry such that if p is a pole of the system,

then !p is also a pole of the system.

■ The stable poles are roots of the regulator. The “unstable” poles when

solving the differential equation in the forward direction become stable

poles when solving in the backward direction, so are poles of ".t/.

Symmetric root locus in MATLAB

■ We want to plot the root locus

1 C1

$GT .!s/G.s/ D 0:

■ We need to find a way to represent GT .!s/G.s/ as a state-space

system in MATLAB.

G.s/ D C.sI ! A/!1B C D

and

GT .!s/ D BT%

!sI ! AT&!1

C T C DT :

■ This can be represented in block-diagram form as:



u.t/ y.t/x.t/Px.t/ ".t/P".t/

R R

A

B C

D

!AT

BT!C T

DT

■ The overall system has state

"

Px.t/

P".t/

#

D

"

A 0

!C T C !AT

#"

x.t/

".t/

#

C

"

B

!C T D

#

u.t/

y.t/ Dh

DT C BT

i"

x.t/

".t/

#

C DT Du.t/:

function srl(sys)

[A,B,C,D]=ssdata(sys);

bigA=[A zeros(size(A)); -C'*C -A'];

bigB=[B; -C'*D];

bigC=[D'*C B'];

bigD=D'*D;

srlsys=ss(bigA,bigB,bigC,bigD);

rlocus(srlsys);

EXAMPLE: Let

G.s/ D1

.s ! 1:5/.s2 C 2s C 2/:

Note that G.s/ is unstable.

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Imag

Axi

s

Real Axis

Symmetric Root Locus



3.6: Stochastic LQR

■ An intermediate step to developing controllers for systems with

incomplete, noisy measurements.

■ We consider full state available, noisy system.

Px.t/ D Ax.t/ C Buu.t/ C Bww.t/

and w.t/ & N .0; Sw/ and white; x.0/ stochastic too: EŒx.0/! D mo and

EŒx.0/x.0/T ! D †x.0/.

Cost functions involving random signals

■ Need to use expectation to come up with real measure of

performance

J D E

#Z tf

0

´T .t/W.t/´.t/ dt

$

DZ tf

0

E!

´T .t/W.t/´.t/"

dt:

■ For a finite final time, constant W and stationary random inputs, this

is proportional to

J D E!

´T .t/W ´.t/"

:

■ We can compute the cost by noticing that J is a scalar and that

J D tracefJ g.J D trace

˚

E!

´T .t/W ´.t/"'

:

■ The trace operator is invariant under cyclic permutation

J D trace˚

W E!

´.t/´T .t/"'

D trace fW †´g :



■ We can compute

†´ DZ 1

0

Z 1

0

gcl.%1/Rw.%1 ! %2/gTcl.%2/ d%1 d%2

where gcl is the impulse response of the closed-loop system from

input w to output ´.

■ If w is white with spectral density Sw, this simplifies

†´ DZ 1

0

gcl.%/SwgTcl.%/ d%:

■ We can also compute †´ from †x if ´.t/ D C´x.t/

†´ D C´E!

x.t/xT .t/"

C T´ D C´†xC T

´ :

Then J D tracefW C´†xC T´ g, where

Acl†x C †xATcl C BclSwBT

cl D 0

may be solved for †x using lyap.

Finite horizon, white process noise

■ Moving from cost functions in general to specific LQR case: cost is

now of the form

Js D E

#

xT .tf /Hx.tf / CZ tf

0


$

:

■ The controller must be some kind of feedback system (state or

output) since x.0/ is not known a priori. Linearity in the feedback is

assumed (and is optimal if the disturbance inputs are Gaussian).

■ The system state contains all the information about past inputs and

past states that contribute to the future behavior of the system.



■ Also, since w.t/ is white, noise inputs are uncorrelated with past

inputs and cannot be predicted from past inputs or states.

■ Therefore, the current state contains all the available information on

the future behavior of the plant.

■ Therefore, control at any time instant same as deterministic LQR with

unknown initial state.

Finite horizon, colored process noise

■ If we know that the noise is colored, we should augment the shaping

filter dynamics.

■ Design stochastic regulator for augmented system. Same design

procedure as deterministic since input is white Gaussian.

■ Solution includes feedback on disturbance dynamics “state.” Non

physical—we cannot measure xh.t/.

■ Therefore will need to estimate xh.t/. . . more on this later with LQG.

Infinite horizon, white process noise

■ Extend analysis of stochastic LQR to case of infinite time horizon

tf ! 1.

■ Already did this for deterministic LQR. Had no particular problems

because

" System asymptotically stable,

" Only disturbance is initial condition x.0/ which goes to zero as

tf ! 1. Therefore JLQR finite.



■ Cannot make similar claims for the stochastic case. Have continuing

driving noise so even optimized cost ! 1.

■ Solution: Use a time-averaged cost

J D limtf !1

E

#

1

tf

Z tf

0

xT .t/Qx.t/ C uT .t/Ru.t/ dt:

$

■ Then, same optimality conditions as for finite horizon case since both

are fixed end-time problems.

■ Consider: A time-invariant system.

■ If control properly posed, optimum closed-loop system stable. x.t/

and u.t/ are stationary random processes, and we can write

J D limt!1

E!

x.t/T Qx.t/ C u.t/T Ru.t/"

:

■ Provides steady-state mean-square response.

■ Can use this form to analyze controller performance. Consider control

u D !Kx. Then

Px.t/ D .A ! BuK/x.t/ C Bww.t/

y.t/ D Cyx.t/:

■ Then,

J D limt!1

E!

xT .t/Qx.t/ C xT .t/KT RKx.t/"

D limt!1

E!

xT .t/˚

Q C KT RK'

x.t/"

D limt!1

trace˚

.Q C KT RK/E!

x.t/x.t/T"'

D trace˚

.Q C KT RK/†x;ss

'

;

where †x;ss solves

.A ! BuK/†x;ss C †x;ss.A ! BK/T C BwSwBTw D 0:



■ We will later use this cost when comparing the performance of LQR

versus LQG.

H2 optimal control

■ A final observation is that we can pose the stochastic LQR problem

as an H2 optimization.

■ The H refers to the Hardy space of all stable, LTI systems.

■ The subscript “2” denotes the applicable system norm.

■ The H2 problem is to find the LTI controller for the plant

Px.t/ D Ax.t/ Ch

Bu Ii"

u.t/

w.t/

#

I

2

64

m.t/

y1.t/

u1.t/

3

75 D

2

64

I

Q1=2

0

3

75x.t/ C

2

64

0 0

0 0

R1=2 0

3

75

"

u.t/

w.t/

#

;

that stabilizes the closed-loop system and minimizes the system

2-norm

J2 D#Z 1

0

tracefgTcl.t/gcl.t/ dt

$1=2

D kGclk2 ;

where gcl.t/ is the impulse response from w.t/ to the reference

output (combination of u1.t/ and y1.t/).

■ Equivalent to steady-state stochastic regulator with Sw D I . Then,

steady-state mean square value of reference output is

E

(h

yT1 .1/ uT

1 .1/i"

y1.1/

u1.1/

#)

D EŒxT .1/Qx.1/ C uT .1/Ru.1/!;



which is also the square of the closed-loop 2-norm kGclk22.



3.7: The LQR problem solved via dynamic programming

■ We now consider the discrete-time LQR problem, and a different

method of solution known as dynamic programming.

" Review from final chapter of ECE5520 materials.

■ Dynamic programming optimizes multi-step optimizations a single

step at a time, efficiently eliminating infeasible paths.

■ Consider the task of finding the lowest-cost route from point xo to xf ,

where there are many possible ways to get there.

➀

➁

➂

➃

➄

➅

➆

➇J12

J15 J58

J78J47

J24

J46

J68

J36

J23 J38

■ Then J $18 D min fJ15 C J58; J12 C J24 C J46 C J68; : : :g.

■ We need to make only one simple observation:

In general, if xi is an intermediate point between xo and xtf and

xi is on the optimal path, then J $of D Joi C J $

if .

■ This is called Bellman’s principle of optimality:

“An optimal path has the property that whatever the initial

conditions and control variables (choices) over some initial

period, the control (or decision variables) chosen over the

remaining period must be optimal for the remaining problem,

with the state resulting from the early decisions taken to be the

initial condition.”



Vector derivatives

■ In the derivation, we will need to take derivatives of vector/ matrix

quantities.

■ This small dictionary should help: x; y 2 Rn, A 2 R

n'n.

1.@

@x.xT Ay/ D Ay,

2.@

@x.yT Ax/ D AT y,

3.@

@x.xT Ax/ D .A C AT /x.

The discrete-time LQR problem

■ We want to choose uk such that we minimize

Ji;N D xTN HdxN C

N !1X

kDi

!

xTk Qdxk C uT

k Rduk

"

;

■ To find the optimum uk, we start at the last step and work backwards.

JN !1;N D xTN HdxN C xT

N !1QdxN !1 C uTN !1RduN !1:

■ We express xN as a function of xN !1 and uN !1 via the system

dynamics

JN !1;N D .AxN !1 C BuN !1/T Hd .AxN !1 C BuN !1/

CxTN !1Qd xN !1 C uT

N !1RduN !1

D xTN !1A

T HdAxN !1 C uTN !1B

T HdBuN !1

CxTN !1A

T HdBuN !1 C uTN !1B

T HdAxN !1


N !1RduN !1:



■ We minimize over all possible inputs uN !1 by differentiation

@JN !1;N

@uN !1

D 2BT HdBuN !1 C 2BT HdAxN !1 C 2RduN !1 D 0

0 D 2%

Rd C BT HdB&

uN !1 C 2BT HdAxN !1:

■ Therefore,

u$N !1 D !

%

Rd C BT HdB&!1

BT HdAxN !1:

■ The exciting point is that the optimal uN !1, with no constraints on its

functional form, turns out to be a linear state feedback! To ease

notation, define

KN !1 D%

Rd C BT HdB&!1

BT HdA

such that

u$N !1 D !KN !1xN !1:

■ Now, we can express the value of J $N !1;N as

J $N !1;N D

"

AxN !1 ! BKN !1xN !1

!T

Hd

AxN !1 ! BKN !1xN !1

!

CxTN !1QdxN !1 C xT

N !1KTN !1RdKN !1xN !1

#

D xTN !1

"

.A ! BKN !1/T Hd.A ! BKN !1/ C Qd C KT

N !1RdKN !1

#

xN !1:

■ Simplify notation once again by defining PN D Hd and

PN !1 D .A ! BKN !1/T PN .A ! BKN !1/ C Qd C KT

N !1RdKN !1;

so that

J $N !1;N D xT

N !1PN !1xN !1:



■ To see that this notation makes sense, notice that

JN;N D J $N;N D xT

N PN xN4D xT

N HdxN :

■ Now, we take another step backwards and compute the cost JN !2;N

JN !2;N D JN !2;N !1 C JN !1;N :

Therefore, the optimal policy (via dynamic programming) is

J $N !2;N D JN !2;N !1 C J $

N !1;N :

■ To minimize this, we realize that N ! 1 is now the goal state and

JN !2;N !1 D .AxN !2 C BuN !2/T PN !1 .AxN !2 C BuN !2/


N !2RduN !2:

■ We can find the best result just as before

u$N !2 D !KN !2xN !2

where

KN !2 D%

Rd C BT PN !1B&!1

BT PN !1A:

■ In general,

u$Œk! D !KkxŒk!

where

Kk D%

Rd C BT PkC1B&!1

BT PkC1A

and

Pk D .A ! BKk/T PkC1.A ! BKk/ C Qd C KTk Rd Kk;

■ This difference equation for Pk has a starting condition that occurs at

the final time, and is solved recursively backwards in time.



EXAMPLE: Simulate a feedback controller for the system

xkC1 D

"

2 1

!1 1

#

xk C

"

0

1

#

uk; x0 D

"

2

!3

#

such that the cost criterion

J D xT10

"

5 0

0 5

#

x10 C9X

kD1

xTk

"

2 0

0 0:1

#

xk C 2u2k

!

is minimized.

■ From the problem, we gather that

P10 D

"

5 0

0 5

#

; Qd D

"

2 0

0 0:1

#

; Rd D Œ2!:

■ Iteratively, solve for K9, P9, K8, P8 and so forth down to K1 and P1.

Then, uŒk! D !KkxŒk!.

A=[2 1; -1 1]; B=[0; 1]; x0=[2; -3];

P=zeros(2,2,10); K=zeros(1,2,9);

x=zeros(2,1,11); x(:,:,1)=x0;

P(:,:,10)=[5 0; 0 5]; R=2; Q=[2 0; 0 0.1];

for i=9:-1:1,

K(:,:,i)=inv(R+B'*P(:,:,i+1)*B)*B'*P(:,:,i+1)*A;

P(:,:,i)=(A-B*K(:,:,i))'*P(:,:,i+1)*(A-B*K(:,:,i))+ ...

Q+K(:,:,i)'*R*K(:,:,i);

end

for i=1:9,

x(:,:,i+1)=A*x(:,:,i)-B*K(:,:,i)*x(:,:,i);

end



0 2 4 6 8 10−3

−2

−1

0

1

2State vector x[k]

Time sample, k

Valu

e

1 2 3 4 5 6 7 8 9−1

−0.5

0

0.5

1

1.5

2

2.5k2

k1

Feedback Gains K[k]

Time sample, k

Valu

e

2 4 6 8 100

10

20

30

40

50

60P11

P12=P21P22

Elements of the P matrix

Time sample, k

Valu

e



3.8: Infinite-horizon discrete-time LQR

■ If we let N ! 1, then Pk tends to a steady-state solution as k ! 0.

Therefore, Kk ! K. This is clearly a much easier control design, and

usually does just about as well.

■ To find the steady-state P and K, we let Pk D PkC1 D Pss in the above

equation.

Pss D .A ! BK/T Pss.A ! BK/ C Qd C KT RdK

and

K D%

Rd C BT PssB&!1

BT PssA

which may be combined to get

Pss D AT PssA ! AT PssB%

Rd C BT PssB&!1

BT PssA C Qd

which is called a (discrete-time) algebraic Riccati equation, and may

be solved in MATLAB using dare.m

EXAMPLE: For the previous example (with a finite end time), the solution

reached for P1 was

P1 D

"

49:5336 28:5208

28:5208 20:8434

#

:

In MATLAB, dare(A,B,Q,R) for the same system gives

Pss D

"

49:5352 28:5215

28:5215 20:8438

#

:

So, we see that the system settles very quickly to steady-state

behavior.

■ There are many ways to solve the the DARE, but when Qd has the

form C T C , and the system is SISO, there is a simple method which



yields the optimal closed-loop eigenvalues directly. (Note, when

Qd D C T C we are minimizing the output energy jykj2).

Chang–Letov method

■ The optimal eigenvalues are the roots of the equation

1 C1

$GT .´!1/G.´/ D 0

which are inside the unit circle, where

G.´/ D C.´I ! A/!1B C D:

(Proved earlier for the continuous-time version).

EXAMPLE: Consider G.´/ D1

´ ! 1so

1 C$!1

.´ ! 1/.´!1 ! 1/D 0

2 C $!1 ! ´ ! ´!1 D

´ D 1 C1

2$˙

s

1

4$2C

1

$:

■ The locus of optimal pole locations for all $ form a reciprocal root

locus.

Reciprocal root locus in MATLAB (SISO)

■ We want to plot the root locus

1 C1

$GT .´!1/G.´/ D 0;

where

G.´/ D C.´I ! A/!1B C D:



■ We know how to plot a root locus of the form

1 C KG 0.´/ D 0

so we need to find a way to convert GT .´!1/G.´/ into G 0.´/.

■ We know that

GT .´!1/ D BT%

´!1I ! AT&!1

C T C DT

D BT ´%

´I ! A!T&!1

.!A!T C T / C DT :

■ Combining G.´/ and GT .´!1/ in block-diagram form:

uŒk!

yŒk!

xŒk!xŒk C 1!"Œk!

"Œk C 1!

´!1 ´!1

A

B C

D

A!T

BT

!C T

DT

■ The overall system has state"

xŒk C 1!

"Œk C 1!

#

D

"

A 0

!A!T C T C A!T

#"

xŒk!

"Œk!

#

C

"

B

!A!T C T D

#

uŒk!

yŒk! Dh

!BT A!T C T C C DT C BT A!Ti"

xŒk!

"Œk!

#

C

!

DT D ! BT A!T C T D"

uŒk!:

function rrl(sys)

[A,B,C,D]=ssdata(sys);

bigA=[A zeros(size(A)); -inv(A)'*C'*C inv(A)'];

bigB=[B; -inv(A)'*C'*D];

bigC=[-B'*inv(A)'*C'*C+D'*C B'*inv(A)'];

bigD=-B'*inv(A)'*C'*D+D'*D;

rrlsys=ss(bigA,bigB,bigC,bigD,-1);

rlocus(rrlsys);



EXAMPLE: Let

G.´/ D.´ C 0:25/.´2 C ´ C 0:5/

.´ ! 0:2/.´2 ! 2´ C 2/:

Note that G.´/ is unstable.

−3 −2 −1 0 1 2 3−2

−1

0

1

2

Imag

Axi

s

Real Axis

Reciprocal root locus

OBSERVATIONS: For the “expensive cost of control” case, stable poles

remain where they are and unstable poles are mirrored into the unit

disc. (They are not moved to be just barely stable, as we might

expect!)

■ For the “cheap cost of control” case, poles migrate to the finite zeros

of the transfer function, and to the origin (deadbeat control).


linear quadratic regulatormocha-java.uccs.edu/ece5530/ece5530-ch03.pdfece5530, linear quadratic...

Documents