me233 advanced control ii lecture 1 dynamic programming ...linear quadratic regulator (lqr)...

54
1 ME233 Advanced Control II Lecture 1 Dynamic Programming & Optimal Linear Quadratic Regulators (LQR) (ME233 Class Notes DP1-DP4)

Upload: others

Post on 21-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

1

ME233 Advanced Control II

Lecture 1

Dynamic Programming &

Optimal Linear Quadratic Regulators (LQR)

(ME233 Class Notes DP1-DP4)

Page 2: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

2

Outline

1. Dynamic Programming

2. Simple multi-stage example

3. Solution of finite-horizon optimal

Linear Quadratic Reguator (LQR)

Page 3: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

3

Dynamic Programming

Invented by Richard Bellman in 1953

• From IEEE History Center: Richard Bellman:

– “His invention of dynamic programming in 1953 was a major breakthrough in the theory of multistage decision processes…”

– “A breakthrough which set the stage for the application of functional equation techniques in a wide spectrum of fields…”

– “…extending far beyond the problem-areas which provided the initial motivation for his ideas.”

Page 4: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

4

Dynamic Programming

Invented by Richard Bellman in 1953

• From IEEE History Center: Richard Bellman:

– In 1946 he entered Princeton as a graduate

student at age 26.

– He completed his Ph.D. degree in a record

time of three months.

– His Ph.D. thesis entitled “Stability Theory of

Differential Equations" (1946) was

subsequently published as a book in 1953,

and is regarded as a classic in its field.

Page 5: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

5

Dynamic Programming

We will use dynamic programming to derive the solution of:

• Discrete time LQR and related problems

• Discrete time Linear Quadratic Gaussian (LQG) controller.

– Optimal estimation and regulation

Page 6: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

6

B

Dynamic Programming Example

Illustrative Example:

Find “optimal” path:

From A to B

by moving only to the right.

A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

if we are here…

ok

oknot ok

Page 7: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

7

Dynamic Programming Example

Illustrative Example:

Find “optimal” path:

From A to B

by moving only to the right.

A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

• Number next to line is the “cost” in going along that particular path.

Page 8: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

8

B

7

Dynamic Programming Example

Illustrative Example:

Find “optimal” path:

From A to B

by moving only to the right.

A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

11

Page 9: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

9

A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

B

16

Dynamic Programming Example

Illustrative Example:

Find “optimal” path:

From A to B

by moving only to the right.

23

Page 10: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

10

Dynamic Programming Example

Illustrative Example:

Find optimal path:

From A to B

by moving only to the right.

A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

• Optimal path from A to B

is the one with the

smallest overall cost.

• There are 70 possible

routes starting from A.

Page 11: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

11

Dynamic Programming

Key idea:

• Convert a single “large” optimization problem into a series of “small” multistage optimization problems.

– Principle of optimality: “From any point on an optimal trajectory, the remaining trajectory is optimal for the corresponding problem initiated at that point.”

– Optimal Value Function: Compute the optimal value of the cost from each state to the final state.

Page 12: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

12

Dynamic Programming ExampleIllustrative Example:

• Use principle of optimality

• Compute Optimal Value Function and

optimal control at each state

• Start from the final state B A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

B

9

12

7

11

assume that

we are here…

determine the

optimal path

from to B

Page 13: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

13

Dynamic Programming ExampleIllustrative Example:

• Use principle of optimality

• Compute Optimal Value Function and

optimal control at each state

• Start from the final state B A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

B

9

12

7

11

two options:

7 + 9 = 16

11 + 12 = 23

Page 14: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

14

Dynamic Programming ExampleIllustrative Example:

• Use principle of optimality

• Compute Optimal Value Function and

optimal control at each state

• Start from the final state B A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

B

9

12

Assign:

• optimal path

• optimal cost16

Page 15: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

15

Dynamic Programming ExampleIllustrative Example:

• Use principle of optimality

• Compute Optimal Value Function and

optimal control at each state

• Start from the final state B A B

10

13

9

6 10

7

6

9

13

9

6

10 10

5

11

12

12

7

8

5 8

10

12

8

8

6

6

11

7

14

12

14

15

15

5

12

11

9

10

7

B

9

12

Continue…

16

15

12

1010 + 15 = 25

12 + 16 = 28

Page 16: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

16

Dynamic Programming ExampleIllustrative Example:

• Use principle of optimality

• Compute Optimal Value Function and

optimal control at each state

• Start from the final state B

B

9

12

Continue…

16

15

10 + 15 = 25

12 + 16 = 28

25

Page 17: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

17

Dynamic Programming Example

Optimal cost:

Continue until

A is reached

Page 18: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

18

LTI Optimal regulators

• State space description of a discrete time LTI

• Find “optimal” control

• That drives the state to the origin

For now, everything is deterministic

In some sense, to be defined later…

Page 19: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

19

Finite Horizon LQ optimal regulator

Consider the nth order discrete time LTI system:

We want to find the optimal control sequence:

which minimizes the cost functional:

Page 20: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

20

Finite Horizon LQ optimal regulator

Consider the nth order discrete time LTI system:

Notice that the value of the cost depends on the initial

condition

To emphasize the dependence on

Page 21: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

21

LQ Cost Functional:

• total number of steps—“horizon”

• penalizes the final state

deviation from the origin

• penalizes the transient state deviation

from the origin and the control effort

symmetric

Page 22: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

22

LQ Cost Functional:

Simplified nomenclature:

final state

cost

transient

cost at each

step

Page 23: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

For define:

23

Additional notation

Optimal control sequence from instance m

Arbitrary control sequence from instance m:

Page 24: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

24

Dynamic Programming

Optimal cost functional

Control sequence from instance 0

Function of initial state

Page 25: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

25

Optimal Incremental Cost Function

Optimal cost function from state x(m) at instant m

Control sequence from instance m

For define:

Page 26: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

26

Optimal Cost Function

Optimal cost function at the final state x(N)

… only a function of the final state x(N)

Page 27: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

27

Dynamic Programming

Optimal value function:

For :

Page 28: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

28

Dynamic Programming

Optimal value function:

Page 29: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

29

Dynamic Programming

Optimal value function:

given x(m), these are only functions of u(m) !!

only an optimization with respect to a single vector

Page 30: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

30

Bellman Equation

1. The Bellman equation can be solved recursively

(backwards), starting from N:

2. Each iteration involves only an optimization with

respect to a single variable (u(m)) – multistage

optimization

Page 31: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

31

Recursive Solution to the Bellman Equation

known function of x(N)

not known

boundary condition

Page 32: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

32

Recursive Solution to the Bellman Equation

Start with N-1:

find by solving:

assume that x(N-1) is given

will be a function of

known function of

Page 33: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

33

Recursive Solution to the Bellman Equation

Continue with N-2:

find by solving:

assume that x(N-2) is given

will be a function of

known function of

Page 34: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

34

Solving the Bellman Equation for a LQR

1)

2)

Quadratic functions

Page 35: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

For we have that:

• Optimal u given by

35

Minimization of quadratic functions

Proof:

Completing the square

Page 36: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

For we have that:

• Optimal u given by

36

Minimization of quadratic functions

Proof:

Page 37: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

37

Finite-horizon LQR solution

Where P(k) is computed backwards in time using the

discrete Riccati difference equation :

Page 38: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Proof of finite-horizon LQR solution

Proof (by induction on decreasing k)

Let

(Trivially holds for k=N -1 by definition of )

38

Page 39: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Proof of finite-horizon LQR solution39

Page 40: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Proof of finite-horizon LQR solution40

The Bellman equation gives

Page 41: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Proof of finite-horizon LQR solution41

Using results for quadratic optimizations:

where

Page 42: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

42

Example – Double Integrator

ZOHu(t)u(k) x(k)Tx(t)

v(k)T

1

s

1

s

v(t)

Double integrator with ZOH and sampling time T =1:

position

velocity

Page 43: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Choose:

43

Example – Double Integrator

LQR cost:

only penalize

position x1and control u

Page 44: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

44

Example – Double Integrator (DI)

Compute P(k) for an arbitrary and N.

Computing backwards:

Page 45: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

45

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

k

P(1

,1),

P(1

,2),

P(2

,2)

Example – DI Finite Horizon Case 1

• N = 10 , R = 10,

P11(k)

P22(k)

P12(k)

Page 46: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

46

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

9

k

P(1

,1),

P(1

,2),

P(2

,2)

Example – DI Finite Horizon Case 2

• N = 30 , R = 10,

P11(k)

P22(k)

P12(k)

Page 47: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

47

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

9

k

P(1

,1),

P(1

,2),

P(2

,2)

Example – DI Finite Horizon Case 3

• N = 30 , R = 10,

P11(k)

P22(k)

P12(k)

Page 48: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

48

Example – DI Finite Horizon

Observation:

In all cases, regardless of the choice of

when the horizon, N, is sufficiently large

the backwards computation of the Riccati Eq.

always converges to the same solution:

We will return to this important idea in a few lectures

Page 49: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Properties of Matrix P(k)

P(k) satisfies:

1)

2)

49

(symmetric)

(positive semi-definite)

Page 50: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Properties of Matrix P(k)50

(symmetric)

Proof: (by induction on decreasing k)

Base case, k=N :

For :

Transpose both sides of the equation

Page 51: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

Properties of Matrix P(k)51

(positive semi-definite)

Proof: (by induction on decreasing k)

Base case, k=N :

For :

Algebra…

Page 52: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

52

Summary

• Bellman’s dynamic programming invention was a major breakthrough in the theory of multistage decision processes and optimization

• Key ideas

– Principle of optimality

– Computation of optimal cost function

• Illustrated with a simple multi-stage example

Page 53: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

53

Summary

• Bellman’s equation:

– has to be solved backwards in time

– may be difficult to solve

– the solution yields a feedback law

Page 54: ME233 Advanced Control II Lecture 1 Dynamic Programming ...Linear Quadratic Regulator (LQR) •Bellman’s equation is easily solved •Optimal cost is a quadratic function •matrix

54

Summary

Linear Quadratic Regulator (LQR)

• Bellman’s equation is easily solved

• Optimal cost is a quadratic function

• matrix P is solved using a Riccati equation

• Optimal control is a linear time varying

feedback law