deep learning for robotics...deep learning: end-to-end vision standard computer vision features...

92
Deep Learning for Robotics Sergey Levine

Upload: others

Post on 20-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Deep Learning for Robotics

Sergey Levine

Page 2: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb
Page 3: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Deep Learning: End-to-end vision

standardcomputervision

features(e.g. HOG)

mid-level features(e.g. DPM)

classifier(e.g. SVM)

deeplearning

Felzenszwalb ‘08

Krizhevsky ‘12

Page 4: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Action(run away)

perception

action

Page 5: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Action(run away)

sensorimotor loop

Page 6: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

“When a man throws a ball high in theair and catches it again, he behaves as ifhe had solved a set of differentialequations in predicting the trajectory ofthe ball … at some subconscious level,something functionally equivalent to themathematical calculations is going on.”

-- Richard Dawkins

McLeod & Dienes. Do fielders know where to go to catch the ball or only how to get there? Journal of Experimental

Psychology 1996, Vol. 22, No. 3, 531-543

Page 7: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb
Page 8: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Philipp Krahenbuhl, Stanford University

Page 9: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

End-to-end vision

standardcomputervision

features(e.g. HOG)

mid-level features(e.g. DPM)

classifier(e.g. SVM)

deeplearning

Felzenszwalb ‘08

Krizhevsky ‘12

End-to-end robotic control

deepsensorimotorlearning

observationsmotor

torques

standardroboticcontrol

observationsstate

estimation(e.g. vision)

modeling & prediction

motion planning

low-level controller (e.g. PD)

motor torques

Page 10: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

indirect supervision

actions have consequences

Page 11: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Why should we care?

action

Page 12: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Why should we care?

Page 13: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Goals of this lecture

• Introduce formalisms of decision making

– states, actions, time, cost

– Markov decision processes

• Survey recent research

– imitation learning

– reinforcement learning

• Outstanding research challenges

– what is easy

– what is hard

– where could we go next?

Page 14: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Contents

Imitation learning

Reinforcement learning

Imitation without a human

Research frontiers

Page 15: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Terminology & notation

1. run away

2. ignore

3. pet

Page 16: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Terminology & notation

1. run away

2. ignore

3. pet

Page 17: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Contents

Imitation learning

Reinforcement learning

Imitation without a human

Research frontiers

Page 18: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Imitation Learning

Images: Bojarski et al. ‘16, NVIDIA

trainingdata

supervisedlearning

Page 19: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Does it work? No!

Page 20: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Does it work? Yes!

Video: Bojarski et al. ‘16, NVIDIA

Page 21: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Why did that work?

Bojarski et al. ‘16, NVIDIA

Page 22: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Can we make it work more often?co

st

stability

Page 23: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Learning from a stabilizing controller

(more on this later)

Page 24: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Can we make it work more often?

Page 25: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Can we make it work more often?

DAgger: Dataset Aggregation

Ross et al. ‘11

Page 26: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

DAgger Example

Ross et al. ‘11

Page 27: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

What’s the problem?

Ross et al. ‘11

Page 28: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

• Usually (but not always) insufficient by itself

– Distribution mismatch problem

• Sometimes works well

– Hacks (e.g. left/right images)

– Samples from a stable trajectory distribution

– Add more on-policy data, e.g. using DAgger

Imitation learning: recap

trainingdata

supervisedlearning

Page 29: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

• Distribution mismatch does not seem to be the whole story

– Imitation often works without Dagger

– Can we think about how stability factors in?

• Do we need to add data for imitation to work?

– Can we just use existing data more cleverly?

Imitation learning: questions

trainingdata

supervisedlearning

Page 30: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Contents

Imitation learning

Reinforcement learning

Imitation without a human

Research frontiers

Page 31: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Terminology & notation

1. run away

2. ignore

3. pet

Page 32: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Trajectory optimization

Page 33: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Probabilistic version

Page 34: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Probabilistic version (in pictures)

Page 35: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

DAgger without Humans

Ross et al. ‘11

Page 36: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Another problem

Page 37: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 38: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

path replanned!

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 39: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 40: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 41: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 42: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 43: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 44: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 45: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 46: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

avoids high cost!

input substitution trick

need state at training time

but not at test time!

Page 47: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

PLATO: Policy Learning with Adaptive Trajectory Optimization

Kahn, Zhang, Levine, Abbeel ‘16

Page 48: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Beyond driving & flying

Page 49: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Trajectory Optimization with Unknown Dynamics

[L. et al. NIPS ‘14]

Page 50: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Trajectory Optimization with Unknown Dynamics

new old[L. et al. NIPS ‘14]

Page 51: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Learning on PR2

[L. et al. ICRA ‘15]

Page 52: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Combining with Policy Learning

Page 53: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

expectation undercurrent policy

trajectory distribution(s)

Lagrange multiplier

L. et al. ICML ’14 (dual descent)

can also use BADMM (L. et al.’15)

Page 54: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Guided Policy Search

supervised learningtrajectory-centric RL

Page 55: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

[see L. et al. NIPS ‘14 for details]

Page 56: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

training time test time

L.*, Finn*, Darrell, Abbeel ‘16

Page 57: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

~ 92,000 parameters

Page 58: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Experimental Tasks

Page 59: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

• Any difference from standard imitation learning?

– Can change behavior of the “teacher” programmatically

• Can the policy help optimal control (rather than just the other way around?)

Imitating optimal control: questions

Page 60: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Contents

Imitation learning

Reinforcement learning

Imitation without a human

Research frontiers

Page 61: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Can we avoid dynamics completely?

Page 62: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

A simple reinforcement learning algorithm

Page 63: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

REINFORCElikelihood ratio policy gradient

Williams ‘92

Page 64: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

What the heck did we just do?

one more piece…

Page 65: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Policy gradient challenges

• High variance in gradient estimate– Smarter baselines

• Poor conditioning– Use higher order methods (see:

natural gradient)

• Very hard to choose step size– Trust region policy optimization

(TRPO)

Schulman, L., Moritz, Jordan, Abbeel ‘15

Page 66: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Value functions

Page 67: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Value functions

Page 68: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Value functions challenges

• The usual problems with policy gradient– Poor conditioning (use natural gradient)

– Hard to choose step size (use TRPO)

• Bias/variance tradeoff– Combine Monte Carlo and function approximation: Generalized Advantage Estimation (GAE)

• Instability/overfitting– Limit how much the value function estimate changes per iteration

high variance?

high bias?

Schulman, Moritz, L. Jordan, Abbeel ‘16

Page 69: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Generalized advantage estimation

Schulman, Moritz, L., Jordan, Abbeel ‘16

Page 70: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Online actor-critic methods

Continuous Control with Deep Reinforcement Learning (Lillicrap et al. ‘15)

Deep deterministic policy gradient (DDPG)

Page 71: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Just the Q function?

Page 72: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Discrete Q-learning

Human-Level Control Through Deep Reinforcement Learning (Mnih et al. ‘15)

Page 73: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Continuous Q-learning

Page 74: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

*Random, DDPG, NAF policies: final rewards and

episodes to converge

Normalized Advantage Functions

Page 75: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Policy Learning with Multiple Robots: Deep RL with NAF

Gu*, Holly*, Lillicrap, L., ‘16

Shane Gu Ethan Holly Tim Lillicrap

Page 76: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Deep RL with Policy Gradients

• Unbiased but high-variance gradient

• Stable

• Requires many samples

• Example: TRPO [Schulman et al. ‘15]

Page 77: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Deep RL with Off-Policy Q-Function Critic

• Low-variance but biased gradient

• Much more efficient (because off-policy)

• Much less stable (because biased)

• Example: DDPG [Lillicrap et al. ‘16]

Page 78: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Improving Efficiency & Stability with Q-Prop

Shane Gu

Policy gradient: Q-function critic:

Q-Prop:

• Unbiased gradient, stable

• Efficient (uses off-policy samples)

• Critic comes from off-policy data

• Gradient comes from on-policy data

• Automatic variance-based

adjustment

Page 79: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Comparisons

• Works with smaller batches than TRPO

• More efficient than TRPO

• More stable than DDPG with respect to hyperparameters

• Likely responsible for the better performance on harder task

Page 80: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Sample complexity• Deep reinforcement learning is very data-hungry

– DQN: about 100 hours to learn Breakout

– GAE: about 50 hours to learn to walk

– DDPG/NAF: 4-5 hours to learn basic manipulation, walking

• Model-based methods are more efficient

– Time-varying linear models: 3 minutes for real worldmanipulation

– GPS with vision: 30-40 minutes for real world visuomotorpolicies

Page 81: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Reinforcement learning tradeoffs

• Reinforcement learning (for the purpose of this slide) = model-free RL

• Fewer assumptions

– Don’t need to model dynamics

– Don’t (in general) need state definition, only observations

– Fully general stochastic environments

• Much slower (model-based acceleration?)

• Hard to stabilize

– Few convergence results with neural networks

Page 82: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Contents

Imitation learning

Reinforcement learning

Imitation without a human

Research frontiers

Page 83: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

ingredients for success in learning:

supervised learning: learning sensorimotor skills:

computation

algorithms

data

computation

algorithms~data?

L., Pastor, Krizhevsky, Quillen ‘16

Page 84: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Grasping with Learned Hand-Eye Coordination

• 800,000 grasp attempts for training (3,000 robot-hours)

• monocular camera (no depth)

• 2-5 Hz update

• no prior knowledge

monocularRGB camera

7 DoF arm

2-fingergripper

objectbin

L., Pastor, Krizhevsky, Quillen ‘16

Page 85: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Using Grasp Success Prediction

training testing

L., Pastor, Krizhevsky, Quillen ‘16

Page 86: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Open-Loop vs. Closed-Loop Grasping

Pinto & Gupta, 2015

open-loop grasping closed-loop grasping

failure rate: 33.7% failure rate: 17.5%depth + segmentationfailure rate: 35%

L., Pastor, Krizhevsky, Quillen ‘16

Page 87: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Grasping Experiments

L., Pastor, Krizhevsky, Quillen ‘16

Page 88: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Learning what Success Means

can we learn the costwith visual features?

Page 89: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Learning what Success Means

with C. Finn, P. Abbeel

Page 90: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Learning what Success Means

Page 91: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

Challenges & Frontiers

• Algorithms

– Sample complexity

– Safety

– Scalability

• Supervision

– Automatically evaluate success

– Learn cost functions

Page 92: Deep Learning for Robotics...Deep Learning: End-to-end vision standard computer vision features (e.g. HOG) mid-level features (e.g. DPM) classifier (e.g. SVM) deep learning Felzenszwalb

AcknowledgementsPieter AbbeelChelsea Finn

Peter Pastor Alex Krizhevsky

Trevor Darrell

Deirdre Quillen

John Schulman Philipp Moritz Michael Jordan

Greg Kahn Tianhao Zhang

Shane Gu