towards safe learning-based...
Post on 19-Oct-2020
3 Views
Preview:
TRANSCRIPT
-
Towards Safe Learning-Based ControlPerformance & Constraint Satisfaction under Uncertainties
Melanie ZeilingerInstitute for Dynamic Systems and Control
mzeilinger@ethz.ch
Collaboration with:E. Klenske, P. Hennig, B. Schölkopf (MPI)K. Akametalu, J. Fisac C. Tomlin (UC Berkeley)
-
Variable speed control of compressorsABB drives control the compressors of the world‘s longest gas export pipeline
theweek.comabb.com control.ee.ethz.ch
PerformanceSafety
-
Performance and SafetyGo Together
no knowledge of safety boundaries
with knowledge of safety boundaries
Video courtesy of Kene Akametalu
-
Variable speed control of compressorsABB drives control the compressors of the world‘s longest gas export pipeline
theweek.comabb.com control.ee.ethz.ch
Model! System dynamics
! Constraints
! Computation
PerformanceSafety
-
The Quest for a Good ModelModel Uncertainty
ẋ = f (x, u, t, d) g(x, u, t, d) � 0
Complexity External effects Variation
tcadsolutions.com
autoguide.com
-
Optimization-based Control High Performance for Constrained Systems
6
mea
sure
men
ts
Dynamical System
control input
minN�
k=0
l(x(k), u(k)) + g(x(N))
x(k + 1) = f (x(k), u(k), d(k))
(x(k), u(k)) � X
Standard control
Model predictive control
-
Optimization-based Control High Performance for Constrained Systems
7
mea
sure
men
ts
Dynamical System
control input
minN�
k=0
l(x(k), u(k)) + g(x(N))
x(k + 1) = f (x(k), u(k), d(k))
(x(k), u(k)) � X
Rely on system model! High performance! Recursive constraint satisfaction! Stability by design! Automatic synthesis
-
Automatic Synthesis Of High Performance, Safe Controllers
8
Learning&
Controller
Performance &
Safety
Software/Implementation
mea
sure
men
ts
Dynamical System
control input
Environment Human
Online data
Specifications+
Online data
Learning-basedcontroller
-
Previous WorkReal-time Constraints in Optimization-based Control
9
ControllerPerformance
& Safety
Software/Implementation
! Real-time model predictive control [Zeilinger et al., 2008 – 2014] ! High-speed, certified optimization [Domahidi, Z. et al., 2012; Pu, Z. et al., 2014, 2016]
mea
sure
men
ts
Dynamical System
control input
minN�
k=0
l(x(k), u(k)) + g(x(N))
x(k + 1) = f (x(k), u(k), d(k))
(x(k), u(k)) � X
Software/Specifications+Online data
-
Outline
10
mea
sure
men
ts
Dynamical System
control input
Environment Human
Online data
This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12
Learning-basedcontroller
-
Outline
11
mea
sure
men
ts
Dynamical System
control input
Environment Human
Online data
This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12
minN�
k=0
l(x(k), u(k)) + g(x(N))
x(k + 1) = f (x(k), u(k), d(k))
(x(k), u(k)) � X
How to predict complex behavior, quantify & reduce
uncertainties?
-
Quantifying Uncertain Predictions From Online Data
12
estimated RegressionDynamical Model
Infer function d (t ) online:
measured states
applied inputs
{x(t k)}Kk=0
{u(t k)}Kk=0
�d̃(t k)
� Kk=0
inferred function d̂(t )
e.g. x(k + 1 )= f (x(k ), u(k )) + d(x, u, k )
x(k + 1) = f (x(k), u(k), d(k))
d̂
x 1 x2
-
Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression
13
meanfunction
variance
measured states
applied inputs
estimated GPRegression
Dynamical Model
{x(t k)}Kk=0
{u(t k)}Kk=0
d̄(t )
� 2d (t )
" Predictive distribution
Infer function d (t ) online:
GP prior: mean + covariance fcn
d̄(t )
2 �d (t )
d̂(t )
t
�d̃(t k)
� Kk=0
x(k + 1) = f (x(k), u(k), d(k))
-
Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression
14
meanfunction
variance
measured states
applied inputs
estimated GPRegression
Dynamical Model
{x(t k)}Kk=0
{u(t k)}Kk=0
d̄(t )
� 2d (t )
Infer function d (t ) online:
GP prior: mean + covariance fcn
�d̃(t k)
� Kk=0
Examples:! Mechanical errors! Prediction of loads, efficiencies etc. in energy systems! Unknown nonlinear dynamics (e.g. ground effects)
Gears Cause Periodic Errorsexample: astrophotography
1
" Predictive distribution
x(k + 1) = f (x(k), u(k), d(k))
-
Example:Telescope Guiding
Photograph by Robert Vanderbei, La Palma, 2012
-
Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression
16
meanfunction
variance
measured states
applied inputs
estimated GPRegression
Dynamical Model
{x(t k)}Kk=0
{u(t k)}Kk=0
d̄(t )
� 2d (t )
Infer function d (t ) online:
Gaussian Process regression with ‘weakly periodic’ kernel
�d̃(t k)
� Kk=0
linear system + gear effectx(k + 1 ) = A x(k ) + B u(k ) + d(k )
-
Comparison to State of the ArtOpen Source Software PHD Guiding by Edgar Klenske
17currently own branch on github.com/OpenPHDGuiding/phd2
0 500 1,000 1,500 2,000 2,500 3,000 3,500�2�1
012
errorRA
[px]
Tracking Algorithm of PHD2 Guiding 2.5.0
0 500 1,000 1,500 2,000 2,500 3,000 3,500�2�1
012
errorRA
[px]
Tracking with GP Prediction
RMS(e) = 0.4174
RMS(e) = 0.3080
26% reduction
erro
r RA
[px]
erro
r RA
[px]
26% reduction
time [s]
Tracking Algorithm of PHD2 Guiding 2.5.0
Tracking with GP Prediction
RMS(error) = 0.4174
RMS(error) = 0.3080
-
Tracking in “Darkness”Robustness through Improved Predictions
18
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,0000
10
20
30
40
50
time [s]
accumulated
gear
errorRA
[px]
Tracking with ”cloud”
accu
mula
ted
gear
erro
r RA
[px]
time [s]
-
Outline
19
mea
sure
men
ts
Dynamical System
control input
Environment Human
Online data
This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12
minN�
k=0
l(x(k), u(k)) + g(x(N))
x(k + 1) = f (x(k), u(k), d(k))
(x(k), u(k)) � X
-
The Issue of The TransientAn Example
20
What if we have to learn while
satisfying constraints?
Stanford Autonomous Helicopter – Tic-toc learning from scratch [Abbeel, Coates, Quigley, Ng, NIPS 2007]
Transients make system traverse through unsafe behavior
Goal: Safe online learning in control
-
Safety GuaranteesInvariance & Stability
21
Constraint satisfactionSatisfy constraints now & in future" Utilize feasible, invariant set
StabilityControl system to output target" Show Lyapunov function
states x � X , inputs u � UModel: ẋ(t) = f (x(t), u(t))
V (x) positive definite, bounded˙V (x) � ��(|x |) �x � X
O = {x � Rn | �u(·) � Uts.t. �� � [ 0,�),�(� ; x, t, u(·)) � X}
-
Safety GuaranteesInvariance & Stability for Uncertain Systems
22
Robust Invariance Satisfy constraints now & in future" Utilize feasible, robust invariant set
Input-to-State Stability (ISS)Control system to output target" Show ISS Lyapunov function
, disturbances d � DModel: ẋ(t ) = f (x(t ), u(t ), d(t ))
states x � X , inputs u � U
O = {x � R n | �u(· ) � U t�� � [ 0 ,�), �d(· ) � D t
�(� ; x, t , u(· ), d(· )) � X }
V (x) positive definite, bounded˙V (x, d) � ��(|x |) + �(|d |)
�x � X , d � D
Alternative: Constraint relaxation" Soft constraints with stability guarantees
[Zeilinger et al., TAC 2014]
-
Safety GuaranteesInvariance & Stability for Uncertain Systems
23
Robust Invariance Satisfy constraints now & in future" Utilize feasible, robust invariant set
Input-to-State Stability (ISS)Control system to output target" Show ISS Lyapunov function
, disturbances d � DModel: ẋ(t ) = f (x(t ), u(t ), d(t ))
states x � X , inputs u � U
O = {x � R n | �u(· ) � U t�� � [ 0 ,�), �d(· ) � D t
�(� ; x, t , u(· ), d(· )) � X }
V (x) positive definite, bounded˙V (x, d) � ��(|x |) + �(|d |)
�x � X , d � D
Invariant set is based on system model" Common practice: Use conservative bound on model uncertainties" Small safe set restricts performance
-
" Distribution for model uncertainties" Estimate of disturbance set
Quantifying Model Uncertainties Using Gaussian Process Regression
24
meanfunction
variance
Infer disturbance function d (x ) online:
measured states
applied inputs
estimated GPRegression
Dynamical Model
{x(t k)}Kk=0
{u(t k)}Kk=0
ẋ = f (x, u, d(x))
�d̂(xk)
� Kk=0
{x(t k)}Kk=0d̄(x)
� 2d (x)
ˆD (x) = [ d̄(x)� m � d (x), d̄(x) + m � d (x)]
Prior distribution
m chosen according to desired probability p(e.g. m=3 for p=99.7%)
-
Safety Based on Robust InvarianceMain Idea
25
Goal: Guarantee safety for any online controller
Floor
Ceilin
gSafe
Unsafe
Largest controlled invariant set within constraints
Example: Quadrotor tracking up and down reference (e.g.pick-up task)
ẋ = f (x, u) + d(x)
states:vertical position
velocity
control:thrust
model uncertainty
Idea: Utilize robust invariant safe set
Unsafe
-
Safety Based on Reachability Main Idea
26
Goal: Guarantee safety for any online controller
Idea: Utilize robust invariant safe set
1. Inside: Apply performance controller2. On boundary: Apply safety controller
" Control law:
" Guaranteed constraint satisfaction, if model captures true system dynamics
[Gillula & Tomlin, ICRA 2012]
safety controller
u =
�u p (x), x
u s (x),
Floor
Ceilin
gSafe
Unsafe
Apply safety controller
-
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4ẋ = f (x, u) + d1 (x)
d1 (x) � D 1 (x)
Safety Based on Reachability and Online Model Validation
27
Goal: Leverage online data to improve performance and safety properties
1. Learn model from online data and compute safe set
2. On boundary: Apply safety controller
3. Inside: Apply performance controller
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
Idea: Utilize robust invariant safe set
-
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4ẋ = f (x, u) + d1 (x)
d1 (x) � D 1 (x)
Safety Based on Reachability and Online Model Validation
28
Goal: Leverage online data to improve performance and safety properties
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4ẋ = f (x, u) + d2(x)
d2(x) � D 2(x)1. Learn model from online data and
compute safe set2. On boundary: Apply safety controller
3. Inside: Apply performance controller
Idea: Utilize robust invariant safe set
-
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Safety Based on Reachability and Online Model Validation
29
Goal: Leverage online data to improve performance and safety properties
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
1. Learn model from online data and compute safe set
2. On boundary: Apply safety controller
3. Inside: Check confidence in model! If confident: Apply performance-
maximizing controller! If unconfident: Contract safe set
and apply safety controller
Idea: Utilize robust invariant safe setẋ = f (x, u) + d2(x)
d2(x) � D 2(x)
Loss of model confidence
-
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Safety Based on Reachability and Online Model Validation
30
Goal: Leverage online data to improve performance and safety properties
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
Idea: Utilize robust invariant safe setẋ = f (x, u) + d2(x)
d2(x) � D 2(x)
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Contracted safe set
Loss of model confidence
1. Learn model from online data and compute safe set
2. On boundary: Apply safety controller
3. Inside: Check confidence in model! If confident: Apply performance-
maximizing controller! If unconfident: Contract safe set
and apply safety controller
ẋ = f (x, u) + d2(x)
d2(x) � D 2(x)
-
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Safety Based on Reachability and Online Model Validation
31
Safety by combining learning and control theoretic techniques
Goal: Leverage online data to improve performance and safety properties
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
Idea: Utilize robust invariant safe setẋ = f (x, u) + d2(x)
d2(x) � D 2(x)
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4 1. Learn model from online data and compute safe set
2. On boundary: Apply safety controller
3. Inside: Check confidence in model! If confident: Apply performance-
maximizing controller! If unconfident: Contract safe set
and apply safety controller
-
0
0.5
1
1.5
2
2.5
3
Alti
tud
e (
m)
No Online Validation
0 10 20 30 40 50 600
0.5
1
1.5
2
2.5
3
Alti
tud
e (
m)
Online Disturbance Validation
Time (s)
Safety: increased thrust command
Safety: decreased thrust command
Physical Collision
Reaction to Incorrect ModelWith Model Validation, No Re-computation of Safe Set
32
ground collisions
Detect unmodeleddisturbance" Contract safe set" Wait for
recomputation of safe set
No online validation
Online disturbance validation
-
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Safe Online Learning With Model Learning and Validation
33
! First updated model is invalid
! Method detects inconsistency, slightly contracts safe set
! Improved tracking after next model update
Initial Inaccurate Improved
System never leaves converged safe set
-
Safety Based on Reachability and Online Model Validation
34
Safety by combining learning and control theoretic techniques
Goal: Leverage online data to improve performance and safety properties
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4 1. Learn model from online data and compute safe set
2. On boundary: Apply safety controller
3. Inside: Check confidence in model! If confident: Apply performance-
maximizing controller! If unconfident: Contract safe set
and apply safety controller
Idea: Utilize robust invariant safe set
-
The Safe SetHamilton-Jacobi-Isaacs Formulation
35Constraint Set
u � U , d � DModel:
K
ẋ = f (x, u, d)
Goal: Compute safe set = maximum robust control invariant subset
Stay in : Control u vs. disturbance d
K
-
The Safe SetHamilton-Jacobi-Isaacs Formulation
36
l(x )l(x ) : positive in setand negative otherwise
Safe Set
Constraint Set
u � U , d � D
J(x )
� J (x, t )
� t= �min
�
0 ,ma xu �U
mind �D
� J (x, t )
�x
T
f (x, u, d)
�
J (x, 0) = l(x)
[Mitchell, Bayen, Tomlin, TAC 2005]
Model:
{x : J (x) � 0 }
K
ẋ = f (x, u, d)
Modified HJI equation:
Stay in : Control u vs. disturbance d
K
Safe Set:= Largest robust control invariant set = Set of states for which disturbance cannot win!
-
The Safe SetHamilton-Jacobi-Isaacs Formulation
37
l(x )
Safe Set:= Largest robust control invariant set = Set of states for which disturbance cannot win!
l(x ) : positive in setand negative otherwise
Safe Set
Constraint Set
u � U , d � D
J(x )
� J (x, t )
� t= �min
�
0 ,ma xu �U
mind �D
� J (x, t )
�x
T
f (x, u, d)
�
J (x, 0) = l(x)
[Mitchell, Bayen, Tomlin, TAC 2005]
Model:
{x : J (x) � 0 }
K
ẋ = f (x, u, d)
Safetycontroller: u s (x) = ma x
u �Umind �D
� J (x) T
�xf (x, u, d)
Modified HJI equation:
Stay in : Control u vs. disturbance d
K
-
Safety Based on Reachability and Online Model Validation
38
Safety by combining learning and control theoretic techniques
Goal: Leverage online data to improve performance and safety properties
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3
Velo
city
(m/s
)
-3
-2
-1
0
1
2
3
4
1
2
3
4
Idea: Utilize robust invariant safe set
1. Learn model from online data and compute safe set
2. On boundary: Apply safety controller
3. Inside: Check confidence in model! If confident: Apply performance-
maximizing controller! If unconfident: Contract safe set
and apply safety controller
-
Benefit of HJI FormulationInfinite Number of Invariant Sets
39
� J (x, t )
� t= �min
�
0 ,ma xu �U
mind �D
� J (x, t )
�x
T
f (x, u, d)
�
Lemma:
Safe Set: {x : J (x) � 0 }
K
J (x){x | J (x) � �,� � 0 }
d � D
x, u, d)
�
J (x)
-
Safety Strategy with Online Model Validation
40
Given:
! Initialize: Active safe set = biggest candidate set" Critical level JL = 0
! At each time step:Check confidence in If not confident:
Contract safe set to include current state
! Control law
ˆD (x)
ˆD (x), J (x)
Ku =
�u p (x), J (x) > J L
u s (x),
J L = ma x {J (x), J L }
Standard:J (x) > 0
J (x)
J LJ L
-
Safety Strategy with Online Model Validation
41
Given:
! Initialize: Active safe set = biggest candidate set" Critical level JL = 0
! At each time step:Check confidence inIf not confident:
Contract safe set to include current state
! Control law
ˆD (x), J (x)
K
ˆD (x)
u =
�u p (x), J (x) > J L
u s (x),
J L = ma x {J (x), J L }
J (x)
J L
J L
-
Safety Strategy with Online Model Validation
42
ˆD (x), J (x)
K
Given:
! Initialize: Active safe set = biggest candidate set" Critical level JL = 0
! At each time step:Check confidence inIf not confident:
Contract safe set to include current state
! Control law
ˆD (x)
re-compute
u =
�u p (x), J (x) > J L
u s (x),
J L = ma x {J (x), J L }
= biggest candidate set
J (x)
-
Safety Strategy with Online Model ValidationGlobal Guarantees
43
Theorem:
Intuitively: Safety guaranteed, if there exists some superlevel set, such that disturbance is captured on its boundary
K
z �J S � [ 0 , J (z )]d(x) � ˆD (x) �x � {x | J (x) = J S }
[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]
-
Safety A First Experiment with Humans
44
Model inconsistency
Safer to stay above this altitude
-
SummaryPerformance & Safety for Learning-based Control
45
! Online model learning provides automatic enhancement of prediction & controller performance
" Important to characterize residual uncertainty
! Invariant sets provide safe exploration space, but require system model
! Model validation critical when learning online" Iteratively identify safe set
[Klenske et al., TCST 2015]
Acknowledgements:Edgar Klenske, Philipp Hennig, Bernhard SchölkopfKene Akametalu, Jaime Fisac, Shahab Kaynama, Claire Tomlin,
Optimal Controller
Dynamical System
External effects(human, environment)
Model Learning
d̂
[Akametalu, Fisac et al., CDC 2014]
top related