query-efficient imitation learning for end-to-end ... · • supervised learning • alvinn net...

17
Query-Efficient Imitation Learning for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Query-Efficient Imitation Learningfor End-to-End Simulated Driving

Jiakai Zhang, Kyunghyun Cho

New York University

Page 2: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Introduction• End-to-end learning for self-driving

• Related work

Learning method• Convolutional neural network

• Imitation learning using SafeDAgger

Experiment• Setup

• Results

Conclusion and future work

Overview

Page 3: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Introduction

End-to-end learning for self-driving

• Sensory input from front-facing camera

• Control signal

Steering Brake

Page 4: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Introduction

Related work

• Supervised learning

• ALVINN net [Pomerleau 1989]

• DeepDriving [Chen et al. 2015]

• End-to-end learning for self-driving cars [Bojarski et al. 2016]

• Imitation learning

• DAgger [Ross, Gordon, and Bagnell 2010]

• SafeDAgger [Zhang and Cho 2017]

Page 5: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

DAgger algorithm

Dataset 𝐷0 Policy 𝜋1Initialize

Policy 𝜋𝑖 = 𝛽𝑖𝜋∗ + (1 − 𝛽𝑖) 𝜋𝑖

Dataset 𝐷𝑖 = 𝐷′ ∪ 𝐷𝑖−1

Dataset 𝐷′

Policy 𝜋𝑖

Iteration

Return Best policy 𝜋𝑖

Disadvantage:

• Query a reference policy

constantly

• Safe issue to environment

Page 6: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

SafeDAgger algorithm

Dataset 𝐷0 Policy 𝜋1Initialize

Dataset 𝐷𝑖 = 𝐷′ ∪ 𝐷𝑖−1

Dataset𝐷′ not safe

Policy 𝜋𝑖

Iteration

Return Best policy 𝜋𝑖

Safety classifier 𝑐1

Policy 𝜋𝑖 = 𝛽𝑖𝜋∗ + (1 − 𝛽𝑖) 𝜋𝑖

Safety classifier 𝑐𝑖

Safety classifier 𝑐1

Safety classifier 𝑐𝑖Advantage:

• Query-efficient

• Safety feature

Page 7: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Safety classifier

• Deviation of a primary policy from a reference

policy defined

• Optimal safety classifier defined as

Learning safety classifier

• Minimize a binary cross-entropy loss

Page 8: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Experiment – Setup

TORCS – Open source racing game

Training tracks

Test tracks

Page 9: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Experiment – Model

Input image – 3x160x72

Convolutional layer – 64x3x3

Max Pooling – 2x2

Convolutional layer – 128x5x5

Fully connected layer

Control

signals

Environment

variables

x 4

x 2

Primary policy

Feature map

Fully connected layer

Safety value

x 2

Safety classifier

Optimization algorithm: stochastic gradient descent

Page 10: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

ResultsSafe Frames

Unsafe Frames

Page 11: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Evaluation on test tracks

1. Mean squared error of steering angle

2. Damage per lap

3. Number of laps

4. Portion of time driven by a reference policy

Results

Page 12: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Dashed curve – with traffic

Solid curve – without traffic

Mean squared error of steering angle

Results

# of Dagger Iterations

MS

E (

Ste

erin

g A

ngle

)

Page 13: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Damage per Lap

Dashed curve – with traffic

Solid curve – without traffic

Results

# of Dagger Iterations

Dam

age

per

Lap

Page 14: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Number of Laps

Dashed curve – with traffic

Solid curve – without traffic

Results

# of Dagger Iterations

Avg. # o

f L

aps

Page 15: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Portion of time driven by a reference policy

Dashed curve – with traffic

Solid curve – without traffic

Results

# of Dagger Iterations

% o

f c s

afe

= 0

Page 16: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Demo

Page 17: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving

Conclusion Proposed SafeDAgger algorithm

• Query efficient

• Safety feature

End-to-end simulated driving

• Trained a convolutional neural network to drive in TORCS with traffic

Future work Evaluate SafeDAgger in the real world

Learn to use temporal information