advice taking and transfer learning: naturally-inspired extensions to reinforcement learning lisa...

35
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik University of Wisconsin - Madison University of Minnesota - Duluth*

Upload: bennett-sharp

Post on 05-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Advice Taking and Transfer Learning:

Naturally-Inspired Extensionsto Reinforcement Learning

Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

University of Wisconsin - Madison

University of Minnesota - Duluth*

Page 2: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Reinforcement LearningReinforcement Learning

Environment

Agent

action rewardstate

May be

delayed

Page 3: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Q-LearningQ-Learning

Update Q-function incrementallyUpdate Q-function incrementally Follow current Q-function to choose actionsFollow current Q-function to choose actions Converges to accurate Q-functionConverges to accurate Q-function

Q-function

state

actionvalue

policy(state) =

argmaxaction

Page 4: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

LimitationsLimitations

Agents begin without any informationAgents begin without any information

Random exploration required in early Random exploration required in early stages of learningstages of learning

Long training times can resultLong training times can result

Page 5: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Naturally-Inspired Naturally-Inspired ExtensionsExtensions

Advice TakingAdvice Taking

Transfer LearningTransfer Learning

RL AgentHumanTeacher

Knowledge

Target-taskAgent

Knowledge

Source-taskAgent

Page 6: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Potential BenefitsPotential Benefits

perf

orm

ance

training

with knowledgewithout knowledge

higher start

higher slope higher asymptote

Page 7: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

OutlineOutline

RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning

Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer

Page 8: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

The RoboCup DomainThe RoboCup Domain

KeepAway

BreakAway

MoveDownfield

+1 per time step

+1 per meter

+1 upon goal

Page 9: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

The RoboCup DomainThe RoboCup Domain

distBetween(a0, Player)distBetween(a0, GoalPart)distBetween(Attacker, goalCenter)distBetween(Attacker, ClosestDefender)distBetween(Attacker, goalie)angleDefinedBy(topRight, goalCenter, a0)angleDefinedBy(GoalPart, a0, goalie)angleDefinedBy(Attacker, a0, ClosestDefender)angleDefinedBy(Attacker, a0, goalie)timeLeft

state

actions

move(ahead) shoot(GoalPart) pass(Teammate)move(away)move(right)move(left)

Page 10: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Q-LearningQ-Learning

Q-function

state

actionvalue

policy(state) =

argmaxaction

State Action

Q

1 1 0.5

1 2 -0.5

1 3 0

2…

1…

0.3…

Function approximation

Page 11: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Approximating the Q-functionApproximating the Q-function

Feature vectorWeight vector ●

Linear support-vector Linear support-vector regression:regression:

Q-value =

Set weights to minimizeSet weights to minimize:

ModelSize + C × DataMisfit

distBetween(a0, a1)distBetween(a0, a2)

distBetween(a0, goalie)…

0.2-0.10.9…

T

Page 12: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

RL in 3-on-2 BreakAwayRL in 3-on-2 BreakAway

0

0.1

0.2

0.3

0.4

0.5

0.6

0 500 1000 1500 2000 2500 3000

Training Games

Pro

bab

ilit

y o

f G

oal

Page 13: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

OutlineOutline

RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning

Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer

Page 14: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Extension #1: Advice TakingExtension #1: Advice Taking

IF an opponent is near

AND a teammate is open

THEN pass is the best action

Page 15: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Advice in RLAdvice in RL

Advice sets constraints on Q-values Advice sets constraints on Q-values under specified conditionsunder specified conditions

IF an opponent is near meAND a teammate is openTHEN pass has a high Q-value

Apply as Apply as softsoft constraints in constraints in optimizationoptimization

ModelSize + C × DataMisfit + μ × AdviceMisfit

Page 16: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Advice PerformanceAdvice Performance

Page 17: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

OutlineOutline

RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning

Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer

Page 18: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Extension #2: TransferExtension #2: Transfer

3-on-2 BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfield

Page 19: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Relational TransferRelational Transfer

First-order logic describes relationships First-order logic describes relationships between objectsbetween objects

distBetween(a0, Teammate) > 10

distBetween(Teammate, goalCenter) < 15

We want to transfer relational We want to transfer relational knowledgeknowledge Human-level reasoningHuman-level reasoning General representationGeneral representation

Page 20: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

OutlineOutline

RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning

Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer

Page 21: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Skill TransferSkill Transfer Learn advice about good actions from the source Learn advice about good actions from the source

tasktaskgood_action(pass(Teammate)):-

distBetween(a0, Teammate) > 10,

distBetween(Teammate, goalCenter) <15.

Example 1:Example 1:distBetween(a0, a1) = 15distBetween(a0, a1) = 15distBetween(a0, a2) = 5distBetween(a0, a2) = 5distBetween(a0, goalie) = 20distBetween(a0, goalie) = 20......action = pass(a1)action = pass(a1)outcome = caught(a1)outcome = caught(a1)

Select positive and negative examples of good actions Select positive and negative examples of good actions and apply inductive logic programming to learn rulesand apply inductive logic programming to learn rules

Page 22: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

User Advice in Skill TransferUser Advice in Skill Transfer

There may be new skills in the target There may be new skills in the target that cannot be learned from the sourcethat cannot be learned from the source E.g., shooting in BreakAwayE.g., shooting in BreakAway

We allow users to add their own advice We allow users to add their own advice about these new skillsabout these new skills

User advice simply adds to transfer User advice simply adds to transfer advice advice

Page 23: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Skill Transfer to 3-on-2 Skill Transfer to 3-on-2 BreakAwayBreakAway

0

0.1

0.2

0.3

0.4

0.5

0.6

0 500 1000 1500 2000 2500 3000

Training Games

Pro

bab

ilit

y o

f G

oal

Standard RL

Skill Transfer from 2-on-1 BreakAway

Skill Transfer from 3-on-2 MoveDownfield

Skill Transfer from 3-on-2 KeepAway

Page 24: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

OutlineOutline

RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning

Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer

Page 25: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Macro TransferMacro Transfer

Find an action sequence that separates Find an action sequence that separates good games from bad gamesgood games from bad games

Learn first-order rules to control transitions Learn first-order rules to control transitions along the sequencealong the sequence

move(ahead) pass(Teammate) shoot(GoalPart)

Learn a strategy from the source taskLearn a strategy from the source task

Page 26: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Transfer via DemonstrationTransfer via Demonstration

Games played in target task

0 100

Execute macro strategy

Perform standard RL

Agent learns an initial Q-function

Agent adapts to the target task

Page 27: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Macro Transfer to 3-on-2 Macro Transfer to 3-on-2 BreakAwayBreakAway

0

0.1

0.2

0.3

0.4

0.5

0.6

0 500 1000 1500 2000 2500 3000

Training Games

Pro

bab

ilit

y o

f G

oal

Standard RL

Skill Transfer from 2-on-1 BreakAway

Macro Transfer from 2-on-1 BreakAway

Page 28: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

OutlineOutline

RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning

Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer

Page 29: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

MLN TransferMLN Transfer Learn a Markov Logic Network to Learn a Markov Logic Network to

represent the source-task policy represent the source-task policy relationallyrelationally

Apply the policy via demonstration in the Apply the policy via demonstration in the target tasktarget task

MLNQ-function

state

actionvalue

Page 30: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Markov Logic NetworksMarkov Logic Networks A Markov network models a joint distributionA Markov network models a joint distribution

A Markov Logic Network combines probability with A Markov Logic Network combines probability with logic logic Template: a set of first-order formulas with weightsTemplate: a set of first-order formulas with weights Each grounded predicate in a formula becomes a nodeEach grounded predicate in a formula becomes a node Predicates in grounded formula are connected by arcsPredicates in grounded formula are connected by arcs

Probability of a world: (1/Z) exp( Probability of a world: (1/Z) exp( ΣΣ W WiiNNi i ))

X Y Z

A B

Page 31: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

MLN Q-functionMLN Q-function

IF distance(me, Teammate) < 15

AND angle(me, goalie, Teammate) > 45

THEN Q є (0.8, 1.0)

IF distance(me, GoalPart) < 10

AND angle(me, goalie, GoalPart) > 45

THEN Q є (0.8, 1.0)

Formula 1

W1 = 0.75

N1 = 1 teammate

Formula 2

W2 = 1.33

N2 = 3 goal parts

Probability that Q є (0.8, 1.0): __exp(W1N1 + W2N2)__

1 + exp(W1N1 + W2N2)

Page 32: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Using an MLN Q-functionUsing an MLN Q-function

Q є (0.8, 1.0) P1 = 0.75

Q є (0.5, 0.8) P2 = 0.15

Q є (0, 0.5) P2 = 0.10

Q = P1 ● E [Q | bin1]

+ P2 ● E [Q | bin2]

+ P3 ● E [Q | bin3]

Q-value of most similar

training example in bin

Page 33: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

MLN Transfer to 3-on-2 MLN Transfer to 3-on-2 BreakAwayBreakAway

0

0.1

0.2

0.3

0.4

0.5

0.6

0 500 1000 1500 2000 2500 3000Training Games

Pro

bab

ilit

y o

f G

oal

MLN Transfer

Macro Transfer

Value-function Transfer

Standard RL

Page 34: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

ConclusionsConclusions

Advice and transfer can provide RL agents with Advice and transfer can provide RL agents with knowledge that improves early performanceknowledge that improves early performance

Relational knowledge is desirable because it is Relational knowledge is desirable because it is general and involves human-level reasoninggeneral and involves human-level reasoning

More detailed knowledge produces larger initial More detailed knowledge produces larger initial benefits, but is less widely transferrablebenefits, but is less widely transferrable

Page 35: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

AcknowledgementsAcknowledgements

DARPA grant HR0011-04-1-0007DARPA grant HR0011-04-1-0007 DARPA grant HR0011-07-C-0060DARPA grant HR0011-07-C-0060 DARPA grant FA8650-06-C-7606DARPA grant FA8650-06-C-7606 NRL grant N00173-06-1-G002NRL grant N00173-06-1-G002

Thank You