optimizing collision avoidance in dense airspace using deep … · 2019. 6. 26. · vicas for...

32
Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning Sheng Li, Maxim Egorov and Mykel Kochenderfer 06/19/2019 19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning 1

Upload: others

Post on 13-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning

    Sheng Li, Maxim Egorov and Mykel Kochenderfer

    06/19/2019

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning1

  • High Density Airspace Operations

    Define a dense airspace: when aircraft having encounters

    Pr num_intruders > 1 > 50%

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning2

    Future airspace simulation for Auckland

  • High Density Airspace Operations

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning3

    Unmanned Flight Management

    CAS: Collision Avoidance Systems

  • • Current methods

    • Proposed solution: deep correction

    • Results and analysis

    • Summary and future work

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning4

    Outline

  • Current Methods for Multi-agent CASORCA [1] ACAS X [2]

    Principle Geometry based Value table based

    Advantages • Fast• Smooth

    • Robust• Safe• Can handle uncertainties

    Disadvantages

    • Hard to tune• Sensitive to uncertainties• Sometimes infeasible

    • Optimized for pairwise encounters

    • Possibly over-conservative

    Not optimized for dense airspace

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning5

    A multi-agent encounter

  • Ownship

    vown

    Intrudervint

    ψ

    VICAS: resolve pairwise encounter

    • Markov decision process (MDP)

    • Assume horizontal (2D) encounters

    • State: 𝑠 = 𝜌, 𝜃, 𝜑, 𝑣89:, 𝑣;:<• Action: 𝐴 = −10,−5, 0,+5,+10 @/sec for heading ∪

    {Clear of ConJlict (𝐶𝑂𝐶)}

    • Reward: 𝑅 𝑠, 𝑎 = 𝑅RS8TU:UTT 𝑠 + 𝑅R8SS; 𝑠 + 𝑅VW

  • Utility Decomposition and Fusion

    Utility Decomposition: dividing the encounter into pairwise encounters

    Utility Fusion: “adding-up” pairwise utilities to decide on safe actions

    Ownship

    Intruder

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning7

    Ownship Action

    Decomposition

    𝑠`𝑠a

    …𝑠:

    𝑄`∗

    𝑄a∗

    𝑄:∗

    𝑠 argmax𝑄S89∗

    Fusion

  • VICAS for Multi-Intruders

    Ownship Action

    Decomposition

    𝑠`𝑠a

    𝑠:

    𝑄`∗

    𝑄a∗

    𝑄:∗

    𝑠 argmax𝑄S89∗

    VICASClosest: using the closest intruder𝑄S89∗ 𝑠, 𝑎 ≈ 𝑄;∗ 𝑠;, 𝑎𝑖 = arg mind∈ `,…,: 𝜌d

    • A very rough approximation

    VICASMulti: using all the 𝑛 intruders𝑄S89∗ (𝑠, 𝑎) ≈ min;∈{`,…,:} 𝑄;

    ∗(𝑠;, 𝑎)

    • Considers the most dangerous intruder for each action, risk averse

    Fusion

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning8

  • Airspace SimulationORCA with realistic “clamped” dynamics:• 𝜏 = 1 sec, 𝑅 = 150 m• 𝑣 i_j = 50 m/sec, 𝑎 i_j = 2 m/seca• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 i_j = 108/sec

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning9

    VICASMulti:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec

    High NMAC rates due to 2nd-order dynamics Lower NMAC rates but unsteady airspace

  • Safety and Efficiency

    10 20 30 40 50 60

    Take-o↵ Rates (flight / km2-hr)

    0

    20

    40

    60

    80

    NM

    AC

    s/

    Fligh

    tH

    our

    (⇥10

    �3)

    No CAS

    ORCA

    VICASClosest

    VICASMulti

    Safety: Near Mid-Air Collision (NMAC) Rate Efficiency: |Taken Path| / |Shortest Path|

    10 20 30 40 50 60

    Take-o↵ Rates (flight / km2-hr)

    1.0

    1.2

    1.4

    1.6

    1.8

    2.0

    Nor

    mal

    ized

    Rou

    teLen

    gth

    Region: 10 km × 10 km, Demand: geographically uniform, Simulation Time: 5000 sec

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning10

    NMAC rates of VICASMulti increase explosively Low efficiency causes congestion in the airspace

  • • Current methods

    • Proposed solution: deep correction

    • Results and analysis

    • Summary and future work

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning11

    Outline

  • What is Deep Correction?• Multi-fidelity optimization

    • The high fidelity model (𝑓u;) is expensive to evaluate

    • Use a surrogate model :

    𝑓u; ≈ 𝑓S89 + 𝛿

    𝑄∗ ≈ 𝑄S89∗ + 𝛿

    • Correction 𝛿 is a deep Q-network

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning12

    AgentAction

    𝑠`

    …𝑠:

    𝑄`∗

    𝑄:∗𝑠

    argmax𝑄S89∗

    Correction 𝛿DQN(𝜃)

    +

    A diagram for deep correction

  • • 𝑄∗ is hard to optimize

    • 𝑄S89∗ is easy to obtain

    • Deep Q-network is powerful

    Why Deep Correction?

    𝑄∗ ≈ 𝑄S89∗ + 𝛿

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning13

    AgentAction

    𝑠`

    …𝑠:

    𝑄`∗

    𝑄:∗𝑠

    argmax𝑄S89∗

    Correction 𝛿DQN(𝜃)

    +

    A diagram for deep correction

  • Deep CorrectionUtility decomposition / fusion + deep correction:

    Agent

    Action

    𝑠`𝑠a

    …𝑠:

    𝑄`∗

    𝑄a∗

    𝑄:∗

    𝑠

    argmax

    𝑄S89∗

    Correction 𝛿DQN(𝜃)

    +

    w𝑄∗ = 1 − 𝑘 𝑄S89∗ + 𝑘𝛿

    Decomposition Fusion:VICASMulti

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning14

  • Deep Q-Network

    𝑠 𝑄(𝑠, 𝑎; 𝜃)

    Weights: 𝜃

    Neural networks are universal nonlinear function approximators

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning15

  • Deep Q-Learning

    Example: training DQNs to play Atari games [3]

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning16

    Deep Q-learning can achieve superhuman performance in Atari games

  • Correction State Formation

    AgentAction

    𝑄S89∗𝑠

    argmax

    VICASMulti

    Correction 𝛿DQN(𝜃)

    +𝑠W

    𝑠W needs a fixed size

    CorrectedSector: CorrectedClosest: Add destination info in state:• Efficiency stimulation in reward

    • 𝑅 𝑠, 𝑎 = 𝑅RS8TU:UTT 𝑠 + 𝑅R8SS; 𝑠 +𝑅VW

  • • Current methods

    • Proposed solution: deep correction

    • Results and analysis

    • Summary and future work

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning18

    Outline

  • Policy Sensitivity

    Advisory maps (pairwise): corrected CAS have more compact alert area

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning19

  • Policy Sensitivity

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning20

    Advisory maps (multi-threat): corrected CAS have more compact alert area

  • Policy Sensitivity

    Encounter simulations with fixed numbers of aircraft

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning21

    Corrected CAS have low alert frequencies

  • Trajectories

    VICASMulti VICASClosestNo CAS

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning22

    An encounter with NMAC Winding routes and oscillations in actions Less winding routes

  • Trajectories

    CorrectedClosest CorrectedSector

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning23

    Avoiding collision with minimal maneuvers and straightforward paths

  • Airspace Simulation

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning24

    VICASMulti:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec

    CorrectedClosest:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec

    Unsteady airspace Lower NMAC rates and steady airspace

  • Safety and Efficiency

    Safety: NMAC Rate Efficiency: |Taken Path| / |Shortest Path|

    10 20 30 40 50 60

    Take-o↵ Rates (flight / km2-hr)

    0

    20

    40

    60

    80

    NM

    AC

    s/

    Fligh

    tH

    our

    (⇥10

    �3)

    No CAS

    ORCA

    VICASClosest

    VICASMulti

    CorrectedSector

    CorrectedClosest

    10 20 30 40 50 60

    Take-o↵ Rates (flight / km2-hr)

    1.0

    1.2

    1.4

    1.6

    1.8

    2.0

    Nor

    mal

    ized

    Rou

    teLen

    gth

    Region: 10 km × 10 km, Demand: geographically uniform, Simulation Time: 5000 sec

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning25

    Corrected CAS have low NMAC rates Corrected CAS have decent route efficiency

  • Trade-off

    1 2 3 4 5 6 7 8

    Normalized Route Length

    0

    20

    40

    60

    80

    NM

    AC

    s/

    Flig

    htH

    our

    (⇥10

    �3 )

    Take-o↵ Rate = 20

    Take-o↵ Rate = 40

    Take-o↵ Rate = 60

    No CAS

    ORCA

    VICASClosest

    VICASMulti

    CorrectedSector

    CorrectedClosest

    1.0 1.2 1.4 1.6 1.8 2.0

    Normalized Route Length

    0

    5

    10

    15

    20

    25

    30

    35

    40

    NM

    AC

    s/

    Flig

    htH

    our

    (⇥10

    �3 )

    Take-o↵ Rate = 20

    Take-o↵ Rate = 40

    Take-o↵ Rate = 60

    No CAS

    ORCA

    VICASClosest

    VICASMulti

    CorrectedSector

    CorrectedClosest

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning26

    Corrected CAS are the best performing points at the bottom left corners of the Pareto frontiers

  • The impact of the active CAS on the encounter distribution

    𝐷}~ 𝑃@ || 𝑃 =12j∈

    𝑃@ 𝑥 − 𝑃(𝑥)

    𝑥 is the number of intruders in an encounter

    Impact on Encounters

    # intruders

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning27

    Corrected CAS have low impact on the encounter distribution

  • • Current methods

    • Proposed solution: deep correction

    • Results and analysis

    • Summary and future work

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning28

    Outline

  • Summary for Deep Correction

    • Using a deep Q-network as correction term

    • Trained by deep Q-learning

    • Both safety and efficiency are improved in dense airspace

    • Impact on encounter distribution is small

    Agent

    Action

    𝑠`

    𝑠:

    𝑄`∗

    𝑄:∗𝑠

    argmax𝑄S89∗

    Correction 𝛿DQN(𝜃)

    +

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning29

  • Future Work• Exploring relationships between strategic deconfliction and on-

    board collision avoidance

    • “End-to-end”: integrating guidance and collision avoidance

    • “Reciprocal”: considering the “reactive nature” of the other aircraft

    • Using Multi-agent reinforcement learning framework

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning30

    SharedPolicy

    World

    !(#$%)

    obs 1

    obs 2

    obs n

    ……

    Action 1

    Action 2

    Action n

    ……

    A framework with centralized policy and decentralized control

  • Q&A

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning31

  • References

    [1] Van Den Berg, Jur, et al. “Reciprocal n-body collision avoidance,” Robotics research. Springer, Berlin, Heidelberg, 2011. 3-19.

    [2] M. J. Kochenderfer, J. E. Holland, and J. P. Chryssanthacopoulos, “Next generation airborne collision avoidance system,” Lincoln Laboratory Journal, vol. 19, no. 1, pp. 17–33, 2012.

    [3] V. Mnih, et al. (2013). “Playing Atari with Deep Reinforcement Learning,” Available at https://arxiv.org/abs/1312.5602.

    19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning32

    https://arxiv.org/abs/1312.5602