optimizing collision avoidance in dense airspace using deep … · 2019. 6. 26. · vicas for...

Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning

Sheng Li, Maxim Egorov and Mykel Kochenderfer

06/19/2019

19 June, 2019 Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning1

High Density Airspace Operations

Define a dense airspace: when aircraft having encounters

Pr num_intruders > 1 > 50%


Future airspace simulation for Auckland

High Density Airspace Operations


Unmanned Flight Management

CAS: Collision Avoidance Systems

• Current methods

• Proposed solution: deep correction

• Results and analysis

• Summary and future work


Outline

Current Methods for Multi-agent CASORCA [1] ACAS X [2]

Principle Geometry based Value table based

Advantages • Fast• Smooth

• Robust• Safe• Can handle uncertainties

Disadvantages

• Hard to tune• Sensitive to uncertainties• Sometimes infeasible

• Optimized for pairwise encounters

• Possibly over-conservative

Not optimized for dense airspace


A multi-agent encounter

Ownship

vown

Intrudervint

⇢

✓

ψ

VICAS: resolve pairwise encounter

• Markov decision process (MDP)

• Assume horizontal (2D) encounters

• State: 𝑠 = 𝜌, 𝜃, 𝜑, 𝑣89:, 𝑣;:<• Action: 𝐴 = −10,−5, 0,+5,+10 @/sec for heading ∪

{Clear of ConJlict (𝐶𝑂𝐶)}

• Reward: 𝑅 𝑠, 𝑎 = 𝑅RS8TU:UTT 𝑠 + 𝑅R8SS; 𝑠 + 𝑅VW

Utility Decomposition and Fusion

Utility Decomposition: dividing the encounter into pairwise encounters

Utility Fusion: “adding-up” pairwise utilities to decide on safe actions

Ownship

Intruder


Ownship Action

Decomposition

𝑠`𝑠a

…𝑠:

𝑄`∗

𝑄a∗

𝑄:∗

𝑠 argmax𝑄S89∗

Fusion

VICAS for Multi-Intruders

Ownship Action

Decomposition

𝑠`𝑠a

…

𝑠:

𝑄`∗

𝑄a∗

𝑄:∗

𝑠 argmax𝑄S89∗

VICASClosest: using the closest intruder𝑄S89∗ 𝑠, 𝑎 ≈ 𝑄;∗ 𝑠;, 𝑎𝑖 = arg mind∈ `,…,: 𝜌d

• A very rough approximation

VICASMulti: using all the 𝑛 intruders𝑄S89∗ (𝑠, 𝑎) ≈ min;∈{`,…,:} 𝑄;

∗(𝑠;, 𝑎)

• Considers the most dangerous intruder for each action, risk averse

Fusion


Airspace SimulationORCA with realistic “clamped” dynamics:• 𝜏 = 1 sec, 𝑅 = 150 m• 𝑣 i_j = 50 m/sec, 𝑎 i_j = 2 m/seca• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 i_j = 108/sec


VICASMulti:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec

High NMAC rates due to 2nd-order dynamics Lower NMAC rates but unsteady airspace

Safety and Efficiency

10 20 30 40 50 60

Take-o↵ Rates (flight / km2-hr)

0

20

40

60

80

NM

AC

s/

Fligh

tH

our

(⇥10

�3)

No CAS

ORCA

VICASClosest

VICASMulti

Safety: Near Mid-Air Collision (NMAC) Rate Efficiency: |Taken Path| / |Shortest Path|

10 20 30 40 50 60


1.0

1.2

1.4

1.6

1.8

2.0

Nor

mal

ized

Rou

teLen

gth

Region: 10 km × 10 km, Demand: geographically uniform, Simulation Time: 5000 sec


NMAC rates of VICASMulti increase explosively Low efficiency causes congestion in the airspace

• Current methods





Outline

What is Deep Correction?• Multi-fidelity optimization

• The high fidelity model (𝑓u;) is expensive to evaluate

• Use a surrogate model :

𝑓u; ≈ 𝑓S89 + 𝛿

𝑄∗ ≈ 𝑄S89∗ + 𝛿

• Correction 𝛿 is a deep Q-network


AgentAction

𝑠`

…𝑠:

𝑄`∗

𝑄:∗𝑠

argmax𝑄S89∗

Correction 𝛿DQN(𝜃)

+

…

A diagram for deep correction

• 𝑄∗ is hard to optimize

• 𝑄S89∗ is easy to obtain

• Deep Q-network is powerful

Why Deep Correction?

𝑄∗ ≈ 𝑄S89∗ + 𝛿


AgentAction

𝑠`

…𝑠:

𝑄`∗

𝑄:∗𝑠

argmax𝑄S89∗


+

…

A diagram for deep correction

Deep CorrectionUtility decomposition / fusion + deep correction:

Agent

Action

𝑠`𝑠a

…𝑠:

𝑄`∗

𝑄a∗

𝑄:∗

𝑠

argmax

𝑄S89∗


+

w𝑄∗ = 1 − 𝑘 𝑄S89∗ + 𝑘𝛿

Decomposition Fusion:VICASMulti


Deep Q-Network

𝑠 𝑄(𝑠, 𝑎; 𝜃)

Weights: 𝜃

Neural networks are universal nonlinear function approximators


Deep Q-Learning

Example: training DQNs to play Atari games [3]


Deep Q-learning can achieve superhuman performance in Atari games

Correction State Formation

AgentAction

𝑄S89∗𝑠

argmax

VICASMulti


+𝑠W

𝑠W needs a fixed size

CorrectedSector: CorrectedClosest: Add destination info in state:• Efficiency stimulation in reward

• 𝑅 𝑠, 𝑎 = 𝑅RS8TU:UTT 𝑠 + 𝑅R8SS; 𝑠 +𝑅VW

• Current methods





Outline

Policy Sensitivity

Advisory maps (pairwise): corrected CAS have more compact alert area


Policy Sensitivity


Advisory maps (multi-threat): corrected CAS have more compact alert area

Policy Sensitivity

Encounter simulations with fixed numbers of aircraft


Corrected CAS have low alert frequencies

Trajectories

VICASMulti VICASClosestNo CAS


An encounter with NMAC Winding routes and oscillations in actions Less winding routes

Trajectories

CorrectedClosest CorrectedSector


Avoiding collision with minimal maneuvers and straightforward paths

Airspace Simulation


VICASMulti:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec

CorrectedClosest:• 𝑣 = 50m/sec• 𝑇𝑢𝑟𝑛𝑅𝑎𝑡𝑒 ∈ ±10,±5, 0 @/sec

Unsteady airspace Lower NMAC rates and steady airspace

Safety and Efficiency

Safety: NMAC Rate Efficiency: |Taken Path| / |Shortest Path|

10 20 30 40 50 60


0

20

40

60

80

NM

AC

s/

Fligh

tH

our

(⇥10

�3)

No CAS

ORCA

VICASClosest

VICASMulti

CorrectedSector

CorrectedClosest

10 20 30 40 50 60


1.0

1.2

1.4

1.6

1.8

2.0

Nor

mal

ized

Rou

teLen

gth

Region: 10 km × 10 km, Demand: geographically uniform, Simulation Time: 5000 sec


Corrected CAS have low NMAC rates Corrected CAS have decent route efficiency

Trade-off

1 2 3 4 5 6 7 8

Normalized Route Length

0

20

40

60

80

NM

AC

s/

Flig

htH

our

(⇥10

�3 )

Take-o↵ Rate = 20

Take-o↵ Rate = 40

Take-o↵ Rate = 60

No CAS

ORCA

VICASClosest

VICASMulti

CorrectedSector

CorrectedClosest

1.0 1.2 1.4 1.6 1.8 2.0

Normalized Route Length

0

5

10

15

20

25

30

35

40

NM

AC

s/

Flig

htH

our

(⇥10

�3 )

Take-o↵ Rate = 20

Take-o↵ Rate = 40

Take-o↵ Rate = 60

No CAS

ORCA

VICASClosest

VICASMulti

CorrectedSector

CorrectedClosest


Corrected CAS are the best performing points at the bottom left corners of the Pareto frontiers

The impact of the active CAS on the encounter distribution

𝐷}~ 𝑃@ || 𝑃 =12j∈

𝑃@ 𝑥 − 𝑃(𝑥)

𝑥 is the number of intruders in an encounter

Impact on Encounters

# intruders


Corrected CAS have low impact on the encounter distribution

• Current methods





Outline

Summary for Deep Correction

• Using a deep Q-network as correction term

• Trained by deep Q-learning

• Both safety and efficiency are improved in dense airspace

• Impact on encounter distribution is small

Agent

Action

𝑠`

…

𝑠:

𝑄`∗

𝑄:∗𝑠

argmax𝑄S89∗


+


Future Work• Exploring relationships between strategic deconfliction and on-

board collision avoidance

• “End-to-end”: integrating guidance and collision avoidance

• “Reciprocal”: considering the “reactive nature” of the other aircraft

• Using Multi-agent reinforcement learning framework


SharedPolicy

World

!(#$%)

obs 1

obs 2

obs n

……

Action 1

Action 2

Action n

……

A framework with centralized policy and decentralized control

Q&A


References

[1] Van Den Berg, Jur, et al. “Reciprocal n-body collision avoidance,” Robotics research. Springer, Berlin, Heidelberg, 2011. 3-19.

[2] M. J. Kochenderfer, J. E. Holland, and J. P. Chryssanthacopoulos, “Next generation airborne collision avoidance system,” Lincoln Laboratory Journal, vol. 19, no. 1, pp. 17–33, 2012.

[3] V. Mnih, et al. (2013). “Playing Atari with Deep Reinforcement Learning,” Available at https://arxiv.org/abs/1312.5602.

https://arxiv.org/abs/1312.5602

optimizing collision avoidance in dense airspace using deep … · 2019. 6. 26. · vicas for...

Documents