development of a deep recurrent neural network controller

18
Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer Engineering University of Florida Distribution A: Approved for public release; distribution is unlimited.

Upload: others

Post on 18-Dec-2021

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a Deep Recurrent Neural Network Controller

Development of a Deep Recurrent Neural Network

Controller for Flight Applications

American Control Conference (ACC) May 26, 2017

Scott A. Nivison Pramod P. Khargonekar

Department of Electrical and Computer Engineering University of Florida

Distribution A: Approved for public release; distribution is unlimited.

Page 2: Development of a Deep Recurrent Neural Network Controller

Outline

Inspiration DLC Limitations and Research Goals Background – Direct Policy Search Flight Vehicle (Plant) Model Architecture - Deep Learning Flight Control Optimization - Deep Learning Controller Simulation Results Conclusions Future Work

Distribution A: Approved for public release; distribution is unlimited

Page 3: Development of a Deep Recurrent Neural Network Controller

Inspiration

Deep Learning (DL): β€’ Dominant performance in multiple machine learning areas: Speech, language, vision, face recognition, etc. β€’ Replaces time consuming typical hand-tuned machine learning methods

Distribution A: Approved for public release; distribution is unlimited.

Key Breakthroughs in establishing Deep Learning: β€’ Optimization methods: Stochastic Gradient Descent, Hessian-free, Nestorov’s momentum, etc. β€’ Deep recurrent neural network (RNN) architectures: deep stacked RNN, deep bidirectional

RNN, etc. β€’ RNN Modules: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) β€’ Automatic gradient computation β€’ Parallel computing

Connecting Deep Learning to Control: β€’ Guided Policy Search – uses trajectory optimization to assist policy learning (S. Levine and V.

Koltun, 2013) β€’ Hessian-free Optimization– deep RNN learning control with uncertainties (I. Sutskever, 2013)

Page 4: Development of a Deep Recurrent Neural Network Controller

DLC Limitations and Research Goals

Policy Search/Deep Learning (DL) Control Limitations: β€’ Large number of computations at each time step β€’ No standard training procedures for control tasks β€’ Few analysis tools for deep neural network architectures β€’ Non-convex form of optimization provides few guarantees β€’ Lack of research in optimization with regards to robustness β€’ Most RL approaches use the combination of modeled dynamics and real-life trial to tune policy (not

useful for flight control)

Distribution A: Approved for public release; distribution is unlimited.

Develop a robust deep recurrent neural network (RNN) controller with gated recurrent unit (GRU) modules for a highly agile high-speed flight vehicle that is trained using a set of sample trajectories that contain disturbances, aerodynamic uncertainties, and significant control attenuation/amplification in the plant dynamics during optimization.

Research Goals

Page 5: Development of a Deep Recurrent Neural Network Controller

Direct Policy Search Background

Adaptive Controller: Longitudinal Short-Period Dynamics for High-Speed Flight Vehicle:

Adaptive Controller:

Distribution A: Approved for public release; distribution is unlimited.

π‘₯𝑑+1 = 𝑓 π‘₯𝑑,𝑒𝑑 + 𝑀, π‘₯0~𝑝(π‘₯0)

πœ‹Ξ˜βˆ— = π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘₯πœ‹Ξ˜π½Ξ˜ = π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘₯πœ‹Ξ£π‘‘=1𝑑𝑓 𝛾𝑑𝐸 π‘Ž π‘₯𝑑,𝑒𝑑 πœ‹Ξ˜ , 𝛾 ∈ [0,1]

Markovian Dynamics

Find Parametrized Policy (πœ‹Ξ˜βˆ— ) for the finite horizon problem:

Certainty-Equivalence (CE) Assumption: The optimal policy for the learned model corresponds to the optimal policy for the true dynamics

How to use models for long-term predictions: 1. Stochastic inference (i.e. trajectory sampling) 2. Deterministic approximate inference

*M. P. Deisenroth, et al., β€œA survey on policy search for robotics,” Foundations and Trends in Robotics, vol. 2, 2013.

Adaptive Controller:

Reinforcement Learning (RL): Develop methods to sufficiently train an agent by maximizing a cost function through repeated interactions with its environment.

Page 6: Development of a Deep Recurrent Neural Network Controller

Direct Policy Search Background

Adaptive Controller:

Stochastic Sampling:

Distribution A: Approved for public release; distribution is unlimited.

𝐽Θ= π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘₯πœ‹Ξ£π‘‘=1𝑑𝑓 𝛾𝑑Ε[π‘Ž(π‘₯𝑑,𝑒𝑑)|πœ‹Ξ˜]

Expected long-term reward:

Adaptive Controller: 𝐽Θ� = 1

𝑁Σ𝑖=1𝑁 Σ𝑑=1

𝑑𝑓 π›Ύπ‘‘π‘Ž(π‘₯𝑑𝑖) limπ‘β†’βˆž

𝐽Θ� = 𝐽Θ

Approximation of 𝐽Θ:

Adaptive Controller:

Deterministic Approximations:

𝐸 π‘Ž π‘₯𝑑,𝑒𝑑 |πœ‹Ξ˜ = οΏ½π‘Ž π‘₯𝑑 𝑝 π‘₯𝑑 𝑑π‘₯𝑑

𝑝 π‘₯𝑑 β‰… Ɲ π‘₯𝑑|πœ‡π‘‘π‘₯ , Σ𝑑 π‘₯

Approximation of 𝑝 π‘₯𝑑 : (e. g. Unscented Transformation or Moment Matching)

*M. P. Deisenroth, et al., β€œA survey on policy search for robotics,” Foundations and Trends in Robotics, vol. 2, 2013.

Page 7: Development of a Deep Recurrent Neural Network Controller

Deep Learning based Flight Control

Distribution A: Approved for public release; distribution is unlimited.

Deep Learning Training Architecture

π‘₯𝑑 are the states of the plant π‘’π‘‘π‘Žπ‘Žπ‘‘ is the actuator output 𝑦𝑑 is the output vector πœπ‘ is the plant noise πœ†π‘’ is the control effectiveness 𝑑𝑒 is an input disturbance πœŒπ‘’,πœŒπ›Ό ,πœŒπ‘ž are uncertainty parameters

Generalized Form of the Discrete Plant Dynamics

Page 8: Development of a Deep Recurrent Neural Network Controller

Flight Vehicle Model

Distribution A: Approved for public release; distribution is unlimited.

Longitudinal Rigid Body Dynamics: Force and Moment Equations:

Aerodynamic Coefficients (Longitudinal):

Page 9: Development of a Deep Recurrent Neural Network Controller

Deep Learning Controller - Architecture

Distribution A: Approved for public release; distribution is unlimited.

Stacked Recurrent Neural Network (S-RNN)

𝑐𝑑 = [𝑒𝑖 ,𝛼, π‘ž, π‘žοΏ½] 𝑒 = 𝑦𝑠𝑠𝑠 βˆ’ π‘¦π‘Žπ‘π‘

𝑒𝑖 = οΏ½ 𝑒𝑑𝑓

0𝑑𝑑

Ξ˜π‘– matrix of parameters for each GRU module Ξ˜π‘– = π‘ˆπ‘’,π‘Šπ‘’,π‘ˆπ‘Ÿ ,π‘Šπ‘Ÿ ,π‘ˆβ„Ž,π‘Šβ„Ž,𝑏1, 𝑏2, 𝑏3 Θ total parameters of the controller Θ = [Θ1,Θ2, … ,Θ𝐿 ,𝑉, 𝑐] 𝐿 total layers of GRU modules

Controller Input Vector:

Page 10: Development of a Deep Recurrent Neural Network Controller

Deep Learning Controller - Optimization

Distribution A: Approved for public release; distribution is unlimited.

𝐽(Θ) =1𝑁Σ𝑖=1𝑁 𝐽𝑖(Θ)

Estimated Expected Long-Term Reward:

Cost function for each trajectory, i:

π‘™π‘Žπ‘π‘’π‘™ = 𝑃: only uncertainties π‘™π‘Žπ‘π‘’π‘™ = 𝑅: uncertainties, noise, and disturbances

𝐽𝑖(Θ) = Σ𝑑=0𝑑𝑓 π›Ύπ‘‘πœ’ π‘₯𝑑,𝑒𝑑

𝑅𝛼,π‘…π‘ž,𝑅𝑒,Λ𝑒 ~π‘ˆπ‘ˆπ‘ˆπ‘“π‘ˆπ‘Žπ‘Ž π‘Žπ‘ˆπ‘ˆ,π‘Žπ‘Žπ‘₯ [πœŒπ›Όπ‘– ,πœŒπ‘žπ‘– ,πœŒπ‘’π‘– , πœ†π‘’π‘– ] are uncertainties (ith trajectory) 𝑁 is the number of sampled trajectories 𝑑𝑓 is the duration of each sampled trajectory π‘˜1,π‘˜2,π‘˜3,π‘˜4 are positive constant gains 𝑏𝑠 ,𝑏�̇� are static constant bounds of the funnel 𝑒𝑑 = 𝑦𝑠𝑠𝑠 βˆ’ π‘¦π‘Ÿπ‘ π‘“

Instantaneous Measurement:

Time-varying funnel:

πœ’ π‘₯𝑑,𝑒𝑑 = οΏ½π‘˜1𝑒𝑑2 + π‘˜2𝑓�̇�2 π‘ˆπ‘“ π‘™π‘Žπ‘π‘’π‘™ = π‘ƒπ‘˜3𝑓𝑠2 + π‘˜4𝑓�̇�2 π‘ˆπ‘“ π‘™π‘Žπ‘π‘’π‘™ = 𝑅

𝑓𝑠 = max 𝑒𝑑 βˆ’ 𝑏𝑠 , 0 𝑓�̇� = max (|𝑒𝑑|Μ‡ βˆ’ 𝑏�̇�, 0)

𝛽�̇� 𝑑 = π‘₯ ∈ ℝ𝑛 π‘ˆ π‘₯, 𝑑 ≀ 𝑏�̇� 𝑑 𝛽𝑠 𝑑 = π‘₯ ∈ ℝ𝑛 𝐸 π‘₯, 𝑑 ≀ 𝑏𝑠 𝑑

Page 11: Development of a Deep Recurrent Neural Network Controller

DLC – Incremental Training

Distribution A: Approved for public release; distribution is unlimited.

Optimization Specifications:

RNN/GRU, RNN, TD-FNN L-BFGS (Quasi-Newton) Optimization N = 2,970 Sample Trajectories 540 Performance Trajectories 2,430 Robust Trajectories 3.1 GHz PC with 256 GB RAM and 10 cores Parallel Processing ~40 hours for 1,000 sample trajectories for optimization

Page 12: Development of a Deep Recurrent Neural Network Controller

DLC - Results

Distribution A: Approved for public release; distribution is unlimited.

Initial Conditions:

Augmented polynomial short period model:

Flight Condition:

Page 13: Development of a Deep Recurrent Neural Network Controller

DLC - Results

Distribution A: Approved for public release; distribution is unlimited.

𝐴𝐴𝐸 = 1𝑁Σ𝑖=1𝑁 Σ𝑑=1

𝑑𝑓 𝑒𝑑

𝐴𝐴𝑅 =1𝑁Σ𝑖=1

𝑁 Σ𝑑=1𝑑𝑓 𝑒�̇�

𝐴𝐴𝐸 = Σ𝑗=1𝑀 𝐴𝐴𝐸 𝐴𝐴𝑅 = Σ𝑗=1𝑀 𝐴𝐴𝑅

Initial Conditions:

Augmented polynomial short period model:

Performance Metrics:

𝑁 number of trajectories for each analysis model 𝑑𝑓 is the duration of each sampled trajectory CTE is the cumulative tracking error CCR is the cumulative control rate

Flight Condition:

DLC Performance: 66% reduction in CTE, 91.5% reduction in CCR

Page 14: Development of a Deep Recurrent Neural Network Controller

DLC - Results

Distribution A: Approved for public release; distribution is unlimited.

Initial Conditions:

Augmented polynomial short period model:

Flight Condition:

Gain Scheduled Controller Deep Learning Controller

Page 15: Development of a Deep Recurrent Neural Network Controller

DLC - Results

Distribution A: Approved for public release; distribution is unlimited.

Gain Scheduled Controller

Deep Learning Controller

Gain Scheduled Controller

No uncertainty or disturbances

Page 16: Development of a Deep Recurrent Neural Network Controller

Contribution and Conclusion

Distribution A: Approved for public release; distribution is unlimited.

β€’ Created a novel training procedure focused on bringing deep learning benefits to flight control.

β€’ Trained controller using a set of sample trajectories that contain disturbances, aerodynamic uncertainties, and significant control attenuation/amplification in the plant dynamics during optimization.

β€’ We found benefits of using a piecewise cost function that allows the designer to solve both robustness and performance criteria simultaneously.

β€’ We utilized an incremental initialization training procedure for deep recurrent neural networks.

Page 17: Development of a Deep Recurrent Neural Network Controller

Future Work

Distribution A: Approved for public release; distribution is unlimited.

β€’ Pursue a vehicle model with flexible body effects, time delays, controller effectiveness, center of gravity changes, and aerodynamic parameter variations.

β€’ Explore improving parameter convergence and analytic guarantees: Kullback-Leibler (KL) divergence, importance sampling, etc.

β€’ Pursue development/use of robustness analysis tools for deep learning controllers to provide region of attraction estimates and time delay margins: sum-of-squares programming etc.

Page 18: Development of a Deep Recurrent Neural Network Controller

Questions?

Distribution A: Approved for public release; distribution is unlimited.