relational transfer in reinforcement learning

30
Lisa Torrey University of Wisconsin – Madison CS 540 Transfer Learning

Upload: butest

Post on 27-May-2015

501 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Relational Transfer in Reinforcement Learning

Lisa Torrey

University of Wisconsin – Madison

CS 540

Transfer Learning

Page 2: Relational Transfer in Reinforcement Learning

EducationHierarchical curriculum

Learning tasks share common stimulus-response elements

Abstract problem-solvingLearning tasks share general underlying principles

MultilingualismKnowing one language affects learning in

anotherTransfer can be both positive and negative

Transfer Learning in Humans

Page 3: Relational Transfer in Reinforcement Learning

Transfer Learning in AI

Given

Learn

Task T

Task S

Page 4: Relational Transfer in Reinforcement Learning

Goals of Transfer Learning

perf

orm

an

ce

training

higher start

higher slope

higher asymptote

Page 5: Relational Transfer in Reinforcement Learning

Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Page 6: Relational Transfer in Reinforcement Learning

Transfer in Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Thrun and Mitchell 1995: Transfer slopes for gradient descent

Page 7: Relational Transfer in Reinforcement Learning

Transfer in Inductive Learning

Bayesian Learning

Bayesian Transfer

Priordistribution

+

Data

=

Posterior Distributio

n

Bayesian methods

Raina et al.2006: Transfer a Gaussian prior

Page 8: Relational Transfer in Reinforcement Learning

Transfer in Inductive Learning

Line Curve

Surface Circle

Pipe

Hierarchical methods

Stracuzzi 2006: Learn Boolean concepts that can depend on each other

Page 9: Relational Transfer in Reinforcement Learning

Transfer in Inductive Learning

Dealing with Missing Data or Labels

Shi et al. 2008: Transfer via active learning

Task S

Task T

Page 10: Relational Transfer in Reinforcement Learning

Reinforcement Learning

Environment

s1

AgentQ(s1, a) =

0π(s1) = a1a

1

s2

r2

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

s3

r3

Page 11: Relational Transfer in Reinforcement Learning

Transfer in Reinforcement Learning

Starting-point

methods

Hierarchical methods

Alterationmethods

Imitation methods

New RL algorithms

Page 12: Relational Transfer in Reinforcement Learning

Transfer in Reinforcement Learning

0 0 0 0

0 0 0 0

0 0 0 0 target-task training

2 5 4 8

9 1 7 2

5 9 1 4

Initial Q-tabletransferno transfer

Source task

Starting-point methods

Taylor et al. 2005: Value-function transfer

Page 13: Relational Transfer in Reinforcement Learning

Transfer in Reinforcement Learning

Hierarchical methods

Run Kick

Pass Shoot

Soccer

Mehta et al. 2008: Transfer a learned hierarchy

Page 14: Relational Transfer in Reinforcement Learning

Transfer in Reinforcement Learning

Alteration methods

Walsh et al. 2006: Transfer aggregate states

Task S

Original statesOriginal actionsOriginal rewards

New statesNew actionsNew rewards

Page 15: Relational Transfer in Reinforcement Learning

Transfer in Reinforcement Learning

New RL Algorithms

Torrey et al. 2006: Transfer advice about skills

Environment

s1

AgentQ(s1, a) =

0π(s1) = a1a

1

s2r2

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

s3r3

Page 16: Relational Transfer in Reinforcement Learning

Transfer in Reinforcement Learning

Imitation methods

training

source

target

policy used

Torrey et al. 2007: Demonstrate a strategy

Page 17: Relational Transfer in Reinforcement Learning

My Research

Starting-point

methods

Imitation methods

Hierarchical methods

Hierarchical methods

New RL algorithms

SkillTransf

er

MacroTransf

er

Page 18: Relational Transfer in Reinforcement Learning

RoboCup Domain

3-on-2 BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfield

2-on-1 BreakAway

Page 19: Relational Transfer in Reinforcement Learning

Inductive Logic Programming

IF [ ]THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 THEN pass(Teammate)

IF distance(Teammate) ≤ 10 THEN pass(Teammate)

Page 20: Relational Transfer in Reinforcement Learning

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning via Support Vector Regression (RL-SVR)

Environment

Agent

Batch 1

Environment

Agent

Batch 2

…Compute

Q-functions

Page 21: Relational Transfer in Reinforcement Learning

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning with Advice (KBKR)

Environment

Agent

Batch 1

Compute Q-

functions Environment

Agent

Batch 2

Advice

+ µ × AdviceMisfit

Page 22: Relational Transfer in Reinforcement Learning

Skill Transfer Algorithm

Source

Target

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30THEN pass(Teammate)

ILP

Advice Taking

[Human advice]

Mapping

Page 23: Relational Transfer in Reinforcement Learning

Selected ResultsSkill transfer to 3-on-2 BreakAway from several tasks

Page 24: Relational Transfer in Reinforcement Learning

Macro-Operators

pass(Teammate)

move(Direction)

shoot(goalRight)

shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(ahead)

IF [ ... ] THEN shoot(goalRight)

IF [ ... ] THEN shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(left)

IF [ ... ] THEN shoot(goalRight)

IF [ ... ] THEN shoot(goalRight)

Page 25: Relational Transfer in Reinforcement Learning

Demonstration

source

target

training

policy used

An imitation method

Page 26: Relational Transfer in Reinforcement Learning

Macro Transfer AlgorithmSourc

e

Target

ILP

Demonstration

Page 27: Relational Transfer in Reinforcement Learning

Macro Transfer AlgorithmLearning structures

Positive: BreakAway

games that score

Negative: BreakAway games that didn’t score

ILP

IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE)

THEN isaGoodGame(Game)

Page 28: Relational Transfer in Reinforcement Learning

Macro Transfer AlgorithmLearning rules for arcs

Positive: states in good games

that took the arc

Negative: states in good games that could have taken the arc but didn’t

ILP

shoot(goalRight)

IF [ … ]THEN enter(State)

IF [ … ]THEN loop(State, Teammate))

pass(Teammate)

Page 29: Relational Transfer in Reinforcement Learning

Selected ResultsMacro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Page 30: Relational Transfer in Reinforcement Learning

Machine learning is often designed in standalone tasks

Transfer is a natural learning ability that we would like to incorporate into machine learners

There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

Summary