deep reinforcement learning in a handful of trials with ... › media › slides › nips › 2018...

26
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Kurtland Chua , Roberto Calandra, Rowan McAllister, Sergey Levine University of California, Berkeley

Upload: others

Post on 07-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey LevineUniversity of California, Berkeley

Page 2: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

How Long Does Learning Take?

~800,000 grasp

attempts

~21 million games

~50 million frames

[Mnih et al. 2015]

[Silver et al. 2017]

[Levine et al. 2017]

Page 3: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Can we speed this up?

Page 4: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Model-Based Reinforcement Learning

OptimizePolicy

ExecutePolicy

Train Dynamics Model

Page 5: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Comparative Performance on HalfCheetah

Page 6: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Comparative Performance on HalfCheetah

Page 7: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 8: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 9: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 10: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 11: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 12: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Probabilistic Neural Nets as Models

Page 13: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Probabilistic Ensembles as Models

Page 14: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Probabilistic Ensembles as Models

Page 15: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 16: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 17: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 18: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 19: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 20: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 21: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 22: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 23: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 24: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 25: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Experimental Results

Page 26: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

https://github.com/kchua/handful-of-trialshttps://sites.google.com/view/drl-in-a-handful-of-trials

Code:Website:

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Kurtland Chua Roberto Calandra Rowan McAllister Sergey Levine

Data efficientCompetitive asymptotic

performanceEasy to implement

Poster #165