google brain skill representation and supervision in multi ...€¦ · robot skill embeddings -...

39
Skill Representation and Supervision in Multi-Task RL Karol Hausman Google Brain In collaboration with

Upload: others

Post on 16-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill Representation and Supervisionin Multi-Task RL

Karol Hausman

Google Brain

In collaboration with

Page 2: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Why multi-task reinforcement learning?

Page 3: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Supervised learning: generalization

training set

training labels

televisioncat

car

training

test settest

Page 4: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Single-task deep RL: generalization

training labels

reward

training set

test set

training

test

Page 5: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Multi-task deep RL: generalization

training set

training labels

reward task 1

reward task 2

reward task 3

training

test settest

Page 6: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Single-task deep RL: resets

[Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning, Chebotar*, Hausman*, Zhang*, et al., 2017]

Page 7: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Multi-task deep RL: resets

[Supervision via Competition: Robot Adversaries for Learning Tasks, Pinto, et al., 2017]

Page 8: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Single-task deep RL: rewards

Page 9: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Multi-task deep RL: rewards

[Hindsight Experience Replay, Andrychowicz, et al., 2017]

Page 10: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Why not multi-task reinforcement learning?

Page 11: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Challenges

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills● optimization of multiple tasks (conflicting gradients, gradient magnitudes)● data imbalance issues (harder easier tasks, good exploration in all of them)● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

.

Multi-task deep RL

Page 12: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Challenges

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills● optimization of multiple tasks (conflicting gradients, gradient magnitudes)● data imbalance issues (harder easier tasks, good exploration in all of them)● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 13: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill Representation and Reusability

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills

Supervision and Efficiency

● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 14: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill Representation and Reusability

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills

Supervision and Efficiency

● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 15: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill representation

[Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al. 2018]

task ID

[Visual Reinforcement Learning with Imagined Goals,

Pong et al. 2018]

Page 16: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Latent space in images and policies

Images

Policies

Page 17: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

main idea: learn multiple re-usable skills and their skill embedding

embedding can represent different solutions for every task:

17

embedding

task atask btask ctask dtask etask f

Robot skill embeddings

Page 18: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

main idea: learn multiple re-usable skills and their skill embedding

embedding can represent different solutions for every task:

18

embedding

task atask btask ctask dtask etask f

Robot skill embeddings

Page 19: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

training:

observationsrobot/policy

environment

embedding

stateshistory

Robot skill embeddings

embeddingtask ID

[Learning an embedding space for reusable robotic skills, Hausman et al.]

Page 20: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

test:

observationsrobot/policy

environment

task ID embedding embedding

stateshistory

Robot skill embeddings

[Learning an embedding space for reusable robotic skills, Hausman et al.]

Page 23: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Robot skill embeddings - sim2real transfer

observationsrobot/policy

environment

embedding

stateshistory

embeddingtask ID

Learn in simLearn in real

[Scaling simulation-to-real transfer by learning composable robot skills Julian, et al., 2018]

[Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations, He, at al., 2018]

Page 24: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Robot skill embeddings - sim2real transfer

Page 25: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill Representation and Reusability

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills

Supervision and Efficiency

● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 26: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill Representation and Reusability

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills

Supervision and Efficiency

● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 27: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Supervision

[Better Language Models and Their Implications, OpenAI Blog, 2019]

[Collective robot reinforcement learning with distributed asynchronous guided policy search,

Yahya et al. 2017]

[Diversity is all you need, Learning Diverse Skills without a Reward Function,

Eysenbach, 2018]

Page 28: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Efficiency

[SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning,

Zhang et al. 2019]

[Learning Dexterous In-Hand Manipulation, OpenAI et al. 2018]

~100 years of experience ~1 hour of experience

Page 29: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Global vs Behavior-Specific Dynamics Models

Page 30: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

main idea: use empowerment to simultaneously optimize for skills and their specific dynamics

mutual information objective:

Dynamics-Aware Unsupervised Discovery of Skills (DADS)

Predictability

Diversity

[Dynamics-Aware Unsupervised Discovery of Skills, Sharma, et al. 2018]

Page 31: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Dynamics-Aware Unsupervised Discovery of Skills (DADS)

Page 32: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Dynamics-Aware Unsupervised Discovery of Skills (DADS)

Page 33: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Dynamics-Aware Unsupervised Discovery of Skills (DADS)

Page 34: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Skill Representation and Reusability

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills

Supervision and Efficiency

● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 35: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Challenges

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills● optimization of multiple tasks (conflicting gradients, gradient magnitudes)● data imbalance issues (harder easier tasks, good exploration in all of them)● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time

Multi-task deep RL

Page 37: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Challenges

● task specifications, what constitutes a task, how to represent a skill?● reuse of already-learned skills● optimization of multiple tasks (conflicting gradients, gradient magnitudes)● data imbalance issues (harder easier tasks, good exploration in all of them)● multiple skills - multiple pains: rewards, setups, etc.● efficient sequencing of skills at test time● and many more...

Multi-task deep RL

Page 38: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task
Page 39: Google Brain Skill Representation and Supervision in Multi ...€¦ · Robot skill embeddings - sim2real transfer observations robot/policy environment embedding states history task

Dynamics-Aware Unsupervised Discovery of Skills, NeurIPS 2019 Submission

A. Sharma, S. Gu, S. Levine, V. Kumar, K. Hausman

Learning an Embedding Space for Transferable Robot Skills, ICLR 2018

K. Hausman, T. Springenberg, Z. Wang, N. Heess, M. Riedmiller