towards safe learning-based...

Towards Safe Learning-Based ControlPerformance & Constraint Satisfaction under Uncertainties

Melanie ZeilingerInstitute for Dynamic Systems and Control

mzeilinger@ethz.ch

Collaboration with:E. Klenske, P. Hennig, B. Schölkopf (MPI)K. Akametalu, J. Fisac C. Tomlin (UC Berkeley)

Variable speed control of compressorsABB drives control the compressors of the world‘s longest gas export pipeline

theweek.comabb.com control.ee.ethz.ch

PerformanceSafety

Performance and SafetyGo Together

no knowledge of safety boundaries

with knowledge of safety boundaries

Video courtesy of Kene Akametalu

Variable speed control of compressorsABB drives control the compressors of the world‘s longest gas export pipeline

theweek.comabb.com control.ee.ethz.ch

Model! System dynamics

! Constraints

! Computation

PerformanceSafety

The Quest for a Good ModelModel Uncertainty

ẋ = f (x, u, t, d) g(x, u, t, d) � 0

Complexity External effects Variation

tcadsolutions.com

autoguide.com

Optimization-based Control High Performance for Constrained Systems

6

mea

sure

men

ts

Dynamical System

control input

minN�

k=0

l(x(k), u(k)) + g(x(N))

x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Standard control

Model predictive control

Optimization-based Control High Performance for Constrained Systems

7

mea

sure

men

ts

Dynamical System

control input

minN�

k=0

l(x(k), u(k)) + g(x(N))

x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Rely on system model! High performance! Recursive constraint satisfaction! Stability by design! Automatic synthesis

Automatic Synthesis Of High Performance, Safe Controllers

8

Learning&

Controller

Performance &

Safety

Software/Implementation

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data

Specifications+

Online data

Learning-basedcontroller

Previous WorkReal-time Constraints in Optimization-based Control

9

ControllerPerformance

& Safety

Software/Implementation

! Real-time model predictive control [Zeilinger et al., 2008 – 2014] ! High-speed, certified optimization [Domahidi, Z. et al., 2012; Pu, Z. et al., 2014, 2016]

mea

sure

men

ts

Dynamical System

control input

minN�

k=0

l(x(k), u(k)) + g(x(N))

x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Software/Specifications+Online data

Outline

10

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data

This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12

Learning-basedcontroller

Outline

11

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data


minN�

k=0

l(x(k), u(k)) + g(x(N))

x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

How to predict complex behavior, quantify & reduce

uncertainties?

Quantifying Uncertain Predictions From Online Data

12

estimated RegressionDynamical Model

Infer function d (t ) online:

measured states

applied inputs

{x(t k)}Kk=0

{u(t k)}Kk=0

�d̃(t k)

� Kk=0

inferred function d̂(t )

e.g. x(k + 1 )= f (x(k ), u(k )) + d(x, u, k )

x(k + 1) = f (x(k), u(k), d(k))

d̂

x 1 x2

Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression

13

meanfunction

variance

measured states

applied inputs

estimated GPRegression

Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

d̄(t )

� 2d (t )

" Predictive distribution


GP prior: mean + covariance fcn

d̄(t )

2 �d (t )

d̂(t )

t

�d̃(t k)

� Kk=0

x(k + 1) = f (x(k), u(k), d(k))


14

meanfunction

variance

measured states

applied inputs


Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

d̄(t )

� 2d (t )


GP prior: mean + covariance fcn

�d̃(t k)

� Kk=0

Examples:! Mechanical errors! Prediction of loads, efficiencies etc. in energy systems! Unknown nonlinear dynamics (e.g. ground effects)

Gears Cause Periodic Errorsexample: astrophotography

1

" Predictive distribution

x(k + 1) = f (x(k), u(k), d(k))

Example:Telescope Guiding

Photograph by Robert Vanderbei, La Palma, 2012


16

meanfunction

variance

measured states

applied inputs


Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

d̄(t )

� 2d (t )


Gaussian Process regression with ‘weakly periodic’ kernel

�d̃(t k)

� Kk=0

linear system + gear effectx(k + 1 ) = A x(k ) + B u(k ) + d(k )

Comparison to State of the ArtOpen Source Software PHD Guiding by Edgar Klenske

17currently own branch on github.com/OpenPHDGuiding/phd2

0 500 1,000 1,500 2,000 2,500 3,000 3,500�2�1

012

errorRA

[px]

Tracking Algorithm of PHD2 Guiding 2.5.0

0 500 1,000 1,500 2,000 2,500 3,000 3,500�2�1

012

errorRA

[px]

Tracking with GP Prediction

RMS(e) = 0.4174

RMS(e) = 0.3080

26% reduction

erro

r RA

[px]

erro

r RA

[px]

26% reduction

time [s]

Tracking Algorithm of PHD2 Guiding 2.5.0

Tracking with GP Prediction

RMS(error) = 0.4174

RMS(error) = 0.3080

Tracking in “Darkness”Robustness through Improved Predictions

18

0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,0000

10

20

30

40

50

time [s]

accumulated

gear

errorRA

[px]

Tracking with ”cloud”

accu

mula

ted

gear

erro

r RA

[px]

time [s]

Outline

19

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data


minN�

k=0

l(x(k), u(k)) + g(x(N))

x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

The Issue of The TransientAn Example

20

What if we have to learn while

satisfying constraints?

Stanford Autonomous Helicopter – Tic-toc learning from scratch [Abbeel, Coates, Quigley, Ng, NIPS 2007]

Transients make system traverse through unsafe behavior

Goal: Safe online learning in control

Safety GuaranteesInvariance & Stability

21

Constraint satisfactionSatisfy constraints now & in future" Utilize feasible, invariant set

StabilityControl system to output target" Show Lyapunov function

states x � X , inputs u � UModel: ẋ(t) = f (x(t), u(t))

V (x) positive definite, bounded˙V (x) � ��(|x |) �x � X

O = {x � Rn | �u(·) � Uts.t. �� [ 0,�),�(� ; x, t, u(·)) � X}

Safety GuaranteesInvariance & Stability for Uncertain Systems

22

Robust Invariance Satisfy constraints now & in future" Utilize feasible, robust invariant set

Input-to-State Stability (ISS)Control system to output target" Show ISS Lyapunov function

, disturbances d � DModel: ẋ(t ) = f (x(t ), u(t ), d(t ))

states x � X , inputs u � U

O = {x � R n | �u(· ) � U t�� [ 0 ,�), �d(· ) � D t

�(� ; x, t , u(· ), d(· )) � X }

V (x) positive definite, bounded˙V (x, d) � ��(|x |) + �(|d |)

�x � X , d � D

Alternative: Constraint relaxation" Soft constraints with stability guarantees

[Zeilinger et al., TAC 2014]

Safety GuaranteesInvariance & Stability for Uncertain Systems

23

Robust Invariance Satisfy constraints now & in future" Utilize feasible, robust invariant set

Input-to-State Stability (ISS)Control system to output target" Show ISS Lyapunov function

, disturbances d � DModel: ẋ(t ) = f (x(t ), u(t ), d(t ))

states x � X , inputs u � U

O = {x � R n | �u(· ) � U t�� [ 0 ,�), �d(· ) � D t

�(� ; x, t , u(· ), d(· )) � X }

V (x) positive definite, bounded˙V (x, d) � ��(|x |) + �(|d |)

�x � X , d � D

Invariant set is based on system model" Common practice: Use conservative bound on model uncertainties" Small safe set restricts performance

" Distribution for model uncertainties" Estimate of disturbance set

Quantifying Model Uncertainties Using Gaussian Process Regression

24

meanfunction

variance

Infer disturbance function d (x ) online:

measured states

applied inputs


Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

ẋ = f (x, u, d(x))

�d̂(xk)

� Kk=0

{x(t k)}Kk=0d̄(x)

� 2d (x)

ˆD (x) = [ d̄(x)� m � d (x), d̄(x) + m � d (x)]

Prior distribution

m chosen according to desired probability p(e.g. m=3 for p=99.7%)

Safety Based on Robust InvarianceMain Idea

25

Goal: Guarantee safety for any online controller

Floor

Ceilin

gSafe

Unsafe

Largest controlled invariant set within constraints

Example: Quadrotor tracking up and down reference (e.g.pick-up task)

ẋ = f (x, u) + d(x)

states:vertical position

velocity

control:thrust

model uncertainty

Idea: Utilize robust invariant safe set

Unsafe

Safety Based on Reachability Main Idea

26

Goal: Guarantee safety for any online controller


1. Inside: Apply performance controller2. On boundary: Apply safety controller

" Control law:

" Guaranteed constraint satisfaction, if model captures true system dynamics

[Gillula & Tomlin, ICRA 2012]

safety controller

u =

�u p (x), x

u s (x),

Floor

Ceilin

gSafe

Unsafe

Apply safety controller

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4ẋ = f (x, u) + d1 (x)

d1 (x) � D 1 (x)

Safety Based on Reachability and Online Model Validation

27

Goal: Leverage online data to improve performance and safety properties

1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Apply performance controller

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]


Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4ẋ = f (x, u) + d1 (x)

d1 (x) � D 1 (x)


28



Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4ẋ = f (x, u) + d2(x)

d2(x) � D 2(x)1. Learn model from online data and

compute safe set2. On boundary: Apply safety controller

3. Inside: Apply performance controller


Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4


29





3. Inside: Check confidence in model! If confident: Apply performance-

maximizing controller! If unconfident: Contract safe set

and apply safety controller

Idea: Utilize robust invariant safe setẋ = f (x, u) + d2(x)

d2(x) � D 2(x)

Loss of model confidence

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4


30




d2(x) � D 2(x)

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Contracted safe set

Loss of model confidence






ẋ = f (x, u) + d2(x)

d2(x) � D 2(x)

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4


31

Safety by combining learning and control theoretic techniques




d2(x) � D 2(x)

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4 1. Learn model from online data and compute safe set





0

0.5

1

1.5

2

2.5

3

Alti

tud

e (

m)

No Online Validation

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

Alti

tud

e (

m)

Online Disturbance Validation

Time (s)

Safety: increased thrust command

Safety: decreased thrust command

Physical Collision

Reaction to Incorrect ModelWith Model Validation, No Re-computation of Safe Set

32

ground collisions

Detect unmodeleddisturbance" Contract safe set" Wait for

recomputation of safe set

No online validation

Online disturbance validation

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Safe Online Learning With Model Learning and Validation

33

! First updated model is invalid

! Method detects inconsistency, slightly contracts safe set

! Improved tracking after next model update

Initial Inaccurate Improved

System never leaves converged safe set


34




Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4 1. Learn model from online data and compute safe set






The Safe SetHamilton-Jacobi-Isaacs Formulation

35Constraint Set

u � U , d � DModel:

K

ẋ = f (x, u, d)

Goal: Compute safe set = maximum robust control invariant subset

Stay in : Control u vs. disturbance d

K


36

l(x )l(x ) : positive in setand negative otherwise

Safe Set

Constraint Set

u � U , d � D

J(x )

� J (x, t )

� t= �min

�

0 ,ma xu �U

mind �D

� J (x, t )

�x

T

f (x, u, d)

�

J (x, 0) = l(x)

[Mitchell, Bayen, Tomlin, TAC 2005]

Model:

{x : J (x) � 0 }

K

ẋ = f (x, u, d)

Modified HJI equation:


K

Safe Set:= Largest robust control invariant set = Set of states for which disturbance cannot win!


37

l(x )

Safe Set:= Largest robust control invariant set = Set of states for which disturbance cannot win!

l(x ) : positive in setand negative otherwise

Safe Set

Constraint Set

u � U , d � D

J(x )

� J (x, t )

� t= �min

�

0 ,ma xu �U

mind �D

� J (x, t )

�x

T

f (x, u, d)

�

J (x, 0) = l(x)

[Mitchell, Bayen, Tomlin, TAC 2005]

Model:

{x : J (x) � 0 }

K

ẋ = f (x, u, d)

Safetycontroller: u s (x) = ma x

u �Umind �D

� J (x) T

�xf (x, u, d)

Modified HJI equation:


K


38




Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4







Benefit of HJI FormulationInfinite Number of Invariant Sets

39

� J (x, t )

� t= �min

�

0 ,ma xu �U

mind �D

� J (x, t )

�x

T

f (x, u, d)

�

Lemma:

Safe Set: {x : J (x) � 0 }

K

J (x){x | J (x) � �,� � 0 }

d � D

x, u, d)

�

J (x)

Safety Strategy with Online Model Validation

40

Given:

! Initialize: Active safe set = biggest candidate set" Critical level JL = 0

! At each time step:Check confidence in If not confident:

Contract safe set to include current state

! Control law

ˆD (x)

ˆD (x), J (x)

Ku =

�u p (x), J (x) > J L

u s (x),

J L = ma x {J (x), J L }

Standard:J (x) > 0

J (x)

J LJ L


41

Given:


! At each time step:Check confidence inIf not confident:


! Control law

ˆD (x), J (x)

K

ˆD (x)

u =

�u p (x), J (x) > J L

u s (x),

J L = ma x {J (x), J L }

J (x)

J L

J L


42

ˆD (x), J (x)

K

Given:


! At each time step:Check confidence inIf not confident:


! Control law

ˆD (x)

re-compute

u =

�u p (x), J (x) > J L

u s (x),

J L = ma x {J (x), J L }

= biggest candidate set

J (x)

Safety Strategy with Online Model ValidationGlobal Guarantees

43

Theorem:

Intuitively: Safety guaranteed, if there exists some superlevel set, such that disturbance is captured on its boundary

K

z �J S � [ 0 , J (z )]d(x) � ˆD (x) �x � {x | J (x) = J S }


Safety A First Experiment with Humans

44

Model inconsistency

Safer to stay above this altitude

SummaryPerformance & Safety for Learning-based Control

45

! Online model learning provides automatic enhancement of prediction & controller performance

" Important to characterize residual uncertainty

! Invariant sets provide safe exploration space, but require system model

! Model validation critical when learning online" Iteratively identify safe set

[Klenske et al., TCST 2015]

Acknowledgements:Edgar Klenske, Philipp Hennig, Bernhard SchölkopfKene Akametalu, Jaime Fisac, Shahab Kaynama, Claire Tomlin,

Optimal Controller

Dynamical System

External effects(human, environment)

Model Learning

d̂

[Akametalu, Fisac et al., CDC 2014]

towards safe learning-based...

Documents

towards safe reinforcement learning in the real world

inductive learning methods towards learning outcomes

towards mobile learning support

licorice towards safe use

towards collaborative learning @ scale

towards a safe, low-cost, intelligent wheelchair

safe seats of learning

towards a patient-safe future patient engagement

towards a safe and healthy growth

towards open & connected learning

agi safety and understanding - tom everittreinforcement...

building learning culture towards a learning …

terrorist safe havens: towards an understanding of what

meta-learning : towards universal learning paradigms

safe and effective nuclear power plant life cycle management...

e learning towards-aec2

swin: towards type-safe java program adaptation between …

challenge: providing a safe learning environment · pdf...

meta-learning : towards universal learning paradigms

towards an effective and safe treatment of inflammatory