Lecture 16: Switched Linear QuadraticRegulation
1 / 37
Optimal Control of D-T Nonswitched Systems
A discrete-time controlled dynamical system
xk+1 = f (xk , uk), k = 0, 1, . . .
Problem: Given a time horizon k ∈ N ∪ {∞}, find the optimal inputsequence u = (u0, . . . , uN−1) that minimizes
J(u) =N−1∑k=0
`(xk , uk) + φ(xN)
• Running cost `(xk , uk) ≥ 0 and terminal cost φ(xN) ≥ 0
• State constraint x ∈ X and control constraint u ∈ U
2 / 37
D-T Linear Quadratic Regulation
A discrete-time LTI system with given initial condition x0:
xk+1 = Axk + Buk
yk = Cxk + Duk
Problem: find optimal input sequence u = (u0, . . . , uN−1) that minimizes
J(u) =N−1∑k=0
(xTk Qxk + uTk Ruk
)︸ ︷︷ ︸
running cost
+ xTN Qf xN︸ ︷︷ ︸terminal cost
• State weight matrix Q = QT � 0
• Control weight matrix R = RT � 0 (no free control)
• Terminal state weight matrix Qf = QTf � 0
3 / 37
Switched Linear Quadratic Regulation
Problem: For the switched linear system
xk+1 = Aσkxk + Bσkuk
yk = Cσkxk + Dσkuk
Find the optimal input sequence (u0, . . . , uN−1) and mode sequence(σ0, . . . , σN−1) that minimize the cost function
N−1∑k=0
(xTk Qσkxk + uTk Rσkuk
)+ xTN Qf xN
• Both dynamics and costs are different in different modes
• Special case: no continuous input (Bi = 0, Di = 0 for all i)
• Challenge: number of mode sequences increases exponentially with N
4 / 37
Example: Energy Efficient Building Control
Zone3
Zone2
Zone11clQ
1htQ
2clQ
2htQ
3clQ
3htQ
• Goal:
1 Maintain zone temperatures near setpoints (loss-of-productivity cost)2 Less energy consumption (energy cost)
• Control: cooling/heating equipments (some on/off only)
5 / 37
Solution Roadmap
• Solve LQR problem using dynamic programming method
• Extend the method to solve SLQR problem
• Complexity reduction techniques
LQR as a constrained optimization problem:
Minimize
[N−1∑k=0
(xTk Qxk + uTk Ruk
)+ xTN Qf xN
]subject to xk+1 = Axk + Buk , k = 0, . . . ,N − 1
x0 fixed
6 / 37
Direct Solution
LQR as a least square problem:
Minimize xT
Q
Q. . .
Qf
︸ ︷︷ ︸
Q
x + uT
R
R. . .
R
︸ ︷︷ ︸
R
u
such that
x1x2...xN
︸ ︷︷ ︸
x
=
B 0 · · ·AB B 0 · · ·
......
. . .
AN−1B AN−2B · · · B
︸ ︷︷ ︸
G
u0u1...
uN−1
︸ ︷︷ ︸
u
+
AA2
...AN
︸ ︷︷ ︸
H
x0
Optimal control sequence is u∗ = −(R + GTQG)−1GTQHx0
• Not numerically viable for large N (e.g., N =∞)
7 / 37
Dynamical Programming Approach
Idea: Solve a sequence of optimal control problems over time horizons{t, . . . ,N}, for t = N,N − 1, . . . , 0
• Value function at time t is the optimal cost over horizon {t, . . . ,N}:
Vt(x) = minut ,ut+1,...,uN−1
N−1∑k=t
(xTk Qxk + uTk Ruk
)+ xTN Qf xN
with the initial condition xt = x
• Value function iteration: Vt−1(·) can be computed based on Vt(·)
• Optimal cost of original problem is V0(x0)
• Optimal input sequence can be recovered from value functions
8 / 37
Motivating Example
• Start from point A
• End at point B
• Each step only move right
• Cost labeled on each edge
Problem: find the path from A to B with the least cost
• For `-by-` grid, total number of legal paths is (2`)!(`!)2
9 / 37
Value Function
Value function V(z) at node z is the least cost to reach B from z
Value function iteration: V (z) satisfies
V (z) = min{wu + V (z ′u),wd + V (z ′d)}
Principle of Optimality: If x∗0 = z → x∗1 → x∗2 → · · · → B is a least-cost pathfrom z to B, then any truncation of it is also a least-cost pathe.g., x∗1 → x∗2 → · · · → B is a least-cost path from x∗1 to B
Reduced computational complexity: for `-by-` grid, only need to compute `2
value functions, each with fixed complexity
10 / 37
Iteration Results
• Stage 1: compute the value functions from right to left
• Stage 2: recover the least-cost path from left to right
11 / 37
Value Functions of LQR Problem
Value function at time t ∈ {0, 1, . . . ,N} and state x ∈ Rn is
Vt(x) = minut ,ut+1,...,uN−1
{N−1∑k=t
(xTk Qxk + uTk Ruk
)+ xTN Qf xN
∣∣∣∣ xt = x
}• Vt(x) is the optimal cost from time t to N starting from xt = x
• V0(x0) is the optimal cost of the original LQR problem
Bellman equation: (value function iteration)
Vt(x) = minut=v
[xTQx + vTRv︸ ︷︷ ︸current running cost
+Vt+1(Ax + Bv)︸ ︷︷ ︸cost to go
]• Principle of Optimality: cost-to-go from next state xt+1 is also optimal
12 / 37
Value Function Iteration
• Value function at time N is quadratic: VN(x) = xTQf x
• Suppose Vt+1(x) = xTPt+1x is quadratic, then
Vt(x) = minv
[xTQx + vTRv + Vt+1(Ax + Bv)
]= min
v
[xv
]T [Q + ATPt+1A ATPt+1BBTPt+1A R + BTPt+1B
] [xv
]= xTPtx
where Pt is the Schur complement of R + BTPt+1B in the block matrix:
Pt := ρ(Pt+1) = Q + ATPt+1A− ATPt+1B(R + BTPt+1B)−1BTPt+1A︸ ︷︷ ︸Riccati mapping
• Minimum achieved by the optimal control
u∗t = − (R + BTPt+1B)−1BTPt+1A︸ ︷︷ ︸Kalman gain
x = −Ktx
13 / 37
Useful Properties of Riccati Mapping
Riccati mapping ρ : S+ → S+ maps PSD matrices to PSD matrices:
Pt = Q + ATPt+1A− ATPt+1B(R + BTPt+1B)−1BTPt+1A
• derived from xTPtx = minv
[xv
]T [Q + ATPt+1A ATPt+1B
BTPt+1 R + BTPt+1B
] [xv
]
• ρ is PSD-monotone:
P � P ′ implies ρ(P) � ρ(P ′)
• ρ is PSD-concave, i.e., for P,P ′ ∈ S+ and θ ∈ [0, 1]
ρ(θP + (1− θ)P ′
)� θ · ρ(P) + (1− θ) · ρ(P ′)
“Kalman filtering with intermittent observations,” Sinopoli, B. et al, IEEE Trans.Automatic Control, vol. 49, no. 9, pp. 1453-1464, 2004.
14 / 37
LQR Solution Algorithm
Stage 1: Compute the value functions backward in time:
Set PN = Qf
for t = N − 1, . . . , 1, 0 doPt = Q + ATPt+1A− ATPt+1B(R + BTPt+1B)−1BTPt+1A
end for
Return xT0 P0x0 as the optimal cost
Stage 2: Compute the optimal controls and state solution forward in time:
Set x∗0 = x0for t = 0, 1, . . . ,N − 1 dou∗t = −(R + BTPt+1B)−1BTPt+1Ax
∗t
x∗t+1 = Ax∗t + Bu∗tend for
Return u∗t and x∗t as the optimal control and state sequences
15 / 37
Inverted Pendulum: Model
States x =[z θ z θ
]T:
• θ: angle of pendulum
• θ: angular velocity of pendulum
• z : horizontal position of cart
• z : velocity of cart
Linearized dynamics near xe =[0 0 0 0
]T(sampled at T = 0.5 s):
xk+1 =
1 0.1054 0.06128 0.016880 5.753 −1.859 1.1540 0.5287 −0.1617 0.10370 27.31 −9.328 5.665
︸ ︷︷ ︸
A
xk +
0.057630.24420.15261.225
︸ ︷︷ ︸
B
uk
16 / 37
Inverted Pendulum: Stabilization
x0 =[−1 0 0 0
]TGoal: Find u0, . . . , uN−1 to minimize
J = α
N∑k=1
‖xk‖2 + β
N−1∑k=0
|uk |2
• LQR formulation: Q = Qf = αI , R = β
• Stabilization with energy consideration
17 / 37
Inverted Pendulum: Solutions
18 / 37
Infinite Horizon Case (N =∞)
• If (A,B) is stabilizable, then Riccati iteration will converge to asolution Pss of the Algebraic Riccati Equation (ARE):
Pss = Q + ATPssA− ATPssB(R + BTPssB)−1BTPssA
• If further (Q,A) is detectable, then Pss is unique, and under thesteady-state optimal control gain
Kss = (R + BTPssB)−1BTPssA,
the closed-loop system Acl = A− BKss is stable
• Hence stabilization can be studied via LQR
“On the discrete time matrix Riccati equation of optimal control,” P.E. Caines andD.Q. Mayne, Int. J. Control, vol. 12, no. 5, pp. 785-794, 1970.
19 / 37
Switched LQR Problem
Find control sequence u0, . . . , uN−1 and mode sequence σ0, . . . , σN−1 to
minimize
[N−1∑k=0
(xTk Qσkxk + uTk Rσkuk
)+ xTN Qf xN
]subject to xk+1 = Aσkxk + Bσkuk , k = 0, . . . ,N − 1, and fixed x0
Value function at each t = 0, 1, . . . ,N and x is the optimal cost overhorizon {t, . . . ,N} assuming xt = x
Vt(x) = minσt ,...,σN−1ut ,...,uN−1
N−1∑k=t
(xTk Qσkx
Tk + uTk Rσkuk
)+ xTN Qf xN
• V0(x0) is the optimal cost of the original SLQR problem
• Vt(x) does not depend on mode (why?)
20 / 37
SLQR Bellman Equation
Value functions can be obtained from the iteration
Vt(x) = minσt=σut=v
[xTQσx + vTRσv + Vt+1(Aσx + Bσv)
]• Optimal control and mode are the ones achieving minimum above
• Optimal state feedback switching policy σ∗t (x)
• Optimal state feedback control policy u∗t (x)
• Value functions are in general not quadratic
“On the value functions of the discrete-time switched LQR problem,” W. Zhang, J. Hu,and A. Abate, TAC 2009.
21 / 37
Value Function at t = N − 1
VN−1(x) = minσ
minv
[xTQσx + vTRσv + VN(Aσx + Bσv)
]︸ ︷︷ ︸
Bellman equation of σ-th subsystem
= minσ
[xTρσ(Qf )x
]Here, ρσ is the Riccati mapping of subsystem (Aσ,Bσ) with weights Qσ,Rσ
22 / 37
Optimal Control at t = N − 1
VN−1(x) is pointwise minimum of a number of quadratic functions
VN−1(x) = minP∈PN−1
xTPx
where PN−1 = {ρ1(Qf ), . . . , ρM(Qf )} := ρM(Qf )
• State space partitioned into sectors (cones)
• Each sector has an optimal mode and aKalman gain
• May have multiple sectors with differentKalman gains for a mode
23 / 37
General t Case
Suppose Vt+1(x) = minP∈Pt+1 xTPx for some set Pt+1 ⊂ Sn+
At time t, the value function Vt(x) is given by
Vt(x) = minP∈Pt
xTPx
where Pt is obtained from Pt+1 by switched Riccati mapping:
Pt = ρM(Pt+1) := {ρi (P) |P ∈ Pt+1, i ∈M} ⊂ Sn+
24 / 37
SLQR Solution Algorithm
Stage 1: Compute the value functions backward in time:
Set PN = {Qf }for t = N − 1, . . . , 1, 0 doPt = ρM(Pt+1)
end for
Return minP∈P0
xT0 Px0 as the optimal cost
Stage 2: Compute optimal mode/control/state sequences forward in time:
Set x∗0 = x0for t = 0, 1, . . . ,N − 1 do
(σ∗t , u∗t ) = arg min
σ,v
[(x∗t )TQσx
∗t + vTRσv + Vt+1(Aσx
∗t + Bσv)
]x∗t+1 = Aσ∗t x
∗t + Bσ∗t u
∗t
end for
Return σ∗t , u∗t and x∗t as the optimal mode/control/state sequences
25 / 37
Complexity Reduction
Issue: Number of matrices in Pt grows exponentially
In the set Pt defining the value function Vt(x) = minP∈Pt xTPx
• Matrix P ∈ Pt is called effective if for at least one x 6= 0
xTPx < xTP ′x , ∀P ′ ∈ Pt \ {P}
• Otherwise P is called redundant and can be discarded without loss(due to monotonicity of Riccati mapping)
Sufficient condition for P ∈ Pt to be redundant:
P � a convex combination of matrices in Pt \ {P}
• Checked via an LMI feasibility problem
26 / 37
Decision Tree Pruning
27 / 37
Example
Switched LQR problem specified by
A1 =
[2 00 2
], B1 =
[12
], A2 =
[1.5 10 1.5
], B2 =
[10
]Qσ = I , Rσ = 1, N = 20
0 0.5 1 1.5 2 2.5 3 3.50
5
10
15
20
25
30
16 matrices in P16
28 / 37
Further Reduction by Relaxation
Matrix P in Pt is called ε-redundant if P + εI is redundant
• ε > 0 is a small relaxation parameter
• A sufficient condition is given by
P + εI � a convex combination of matrices in Pt \ {P}
• Oftentimes a small ε could result in significant reduction
“Infinite horizon switched LQR problems in discrete time: A suboptimal algorithm withperformance analysis,” W. Zhang, J. Hu and A. Abate, TAC 2012.
29 / 37
Example
A1 =
[2 10 1
], B1 =
[11
], A2 =
[2 10 0.5
], B2 =
[12
]Q1 = Q2 = I , R1 = R2 = 1, Qf = I , N = 100
k |PN−k |1 2
2 4
3 5
4 5
5 5
6 5
30 / 37
Example (cont.)Optimal switching policy (Gray region: mode 1 optimal; Black region: mode 2 optimal)
31 / 37
Another Example
A1 =
[2 00 2
], B1 =
[12
]A2 =
[1.5 10 1.5
], B2 =
[10
]Qσ = I , Rσ = 1, N = 20
0 5 10 15 2010
0
102
104
106
108
1010
1012
# of steps left
Com
plex
ity
|Hk|
|Hk|
|Hǫk|
36014
1012
• Without any reduction, complexity grows exponentially
• With reduction, complexity saturates at 360 matrices
• With relaxation (ε = 10−3), complexity saturates at 14 matrices
32 / 37
SLQR Problem with Switching Cost
Cost function to be minimized:
N−1∑k=0
xTk Qσkxk + uTk Rσkuk + w(σk ,σk+1)(xk)︸ ︷︷ ︸switching cost
+ xTN Qf xN
Value function Vt(σ, x) is the optimal cost-to-go starting from xt = xwith previous mode being σt−1 = σ
Bellman equation:
Vt(σ, x) = minσt=σ′ut=v
[xTQσ′x + vTRσ′v + w(σ,σ′)(x) + Vt+1(σ′,Aσ′x + Bσ′v)
]• Optimal switching policy σ∗
t (σ, x) and optimal control u∗t (σ, x)
• Both depend on previous mode σ and current state x
• Same technique if w(σ,σ′)(·) is quadratic, linear, or constant,
33 / 37
Continuous-Time LQR Problem
For the continuous-time LTI system
x = Ax + Bu, x(0) = x0
Find the control input u(t) over the time horizon [0, tf ] to
minimize
∫ tf
0
(xTQx + uTRu
)dt + x(tf )TQf x(tf )
• Q = QT � 0, R = RT � 0, Qf = QTf � 0
Value function Vt(x) is the optimal cost-to-go at time t from state x :
Vt(x) = minu(s), s∈[t,tf ]
∫ tf
t
[x(s)TQx(s)T + u(s)TRu(s)
]ds + x(tf )TQf x(tf )
• Value functions are still quadratic: Vt(x) = xTP(t)x , t ∈ [0, tf ]
34 / 37
Continuous-Time Bellman Equation
Vt(x) ' minu(·)≡w
δ(xTQx + wTRw)︸ ︷︷ ︸cost during [t, t + δ]
+Vt+δ(x + δ(Ax + Bw))︸ ︷︷ ︸cost-to-go from time t + δ
As δ → 0, we obtain the Riccati (matrix) differential equation:
−P(t) = Q + P(t)A + ATP(t)− P(t)BR−1BTP(t), P(tf ) = Qf
The optimal control is a linear state feedback controller:
u∗t = −R−1BTP(t)x
35 / 37
Continuous-Time SLQR Problem
For the continuous-time switched linear system
x = Aσx + Bσu, x(0) = x0
Find the mode σ(t) ∈M and input u(t) over the time horizon [0, tf ] to
minimize
∫ tf
0
(xTQσx + uTRσu
)dt + x(tf )TQf x(tf )
• Qσ = QTσ � 0, Rσ = RT
σ � 0, Qf = QTf � 0
Value function Vt(x) is the optimal cost-to-go at time t from state x :
Vt(x)= minu(s),σ(s)
∫ tf
t
[x(s)TQσ(s)x(s)T +u(s)TRσ(s)u(s)
]ds + x(tf )TQf x(tf )
36 / 37
Value Functions of C.-T. SLQR Problem
The value function Vt(x) is still of the form
Vt(x) = infP∈P(t)
xTPx
• |P(t)| =∞ as infinite switchings possible in a short time interval
• P(t) can be computed from the Riccati differential inclusion
−P(t) ∈ {Qλ + P(t)Aλ + ATλP(t)− P(t)BλR
−1λ BT
λ P(t) : λ ∈ S}
where Aλ,Bλ,Qλ,Rλ is any convex combination of Ai ,Bi ,Qi ,Ri for i ∈M
In general, P(t) is very difficult to compute analytically and numerically
• Impose dwell-time or maximum switching times constraints
• Discretize the C.-T. SLS into D.-T. SLS
37 / 37