using advice to transfer knowledge acquired in one reinforcement learning task to another lisa...

29
Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

Upload: julianna-sophia-malone

Post on 16-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Using Advice to Transfer Knowledge Acquired in

One Reinforcement Learning Task to Another

Lisa Torrey, Trevor Walker, Jude Shavlik

University of Wisconsin-Madison, USA

Richard MaclinUniversity of Minnesota-Duluth, USA

Page 2: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Our Goal

Transfer knowledge…

… between reinforcement learning tasks

… employing SVM function approximators

… using advice

Page 3: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer

Learn first task

Learn related taskknowledge

acquired

Exploit previously learned models

Improve learning of new tasks

performance

experience

with transferwithout transfer

Page 4: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Reinforcement Learning

state…action…reward…new state

Q-function: value of taking action from state Policy: take action with max Qaction(state)

+2

-1

0

Page 5: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Advice for Transfer

Based on what worked in Task A,

I suggest…

Task ASolution

Task BLearner

I’ll try it, but if it doesn’t work I’ll do something

else.

Advice improves RL performance

Advice can be refined or even discarded

Page 6: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Advice from user (optional)

Transfer Advice

Task BQ-functions

Task B experience

Advice from user (optional)

Mapping from user Task A Task B

Task A experience

Task B experience

Page 7: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

RoboCup Soccer Tasks

KeepAway BreakAway

Keep ball from opponents

[Stone & Sutton, ICML 2001]

Score a goal

[Maclin et al., AAAI 2005]

Page 8: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

RL in RoboCup Tasks

Pass, Hold Pass, Move, Shoot

Each time step: +1 At end: +2, 0, or -1

KeepAway BreakAway

Features

Actions

Rewards

(time left)

Page 9: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Transfer Advice

Task BQ-functions

Task B experience

Mapping from user Task A Task B

Page 10: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Approximating Q-Functions

Learn linear coefficientsy = w1 f1 + … + wn fn + b

Non-linearity from Boolean tile featurestilei,lower,upper = 1 if lower ≤ fi < upper

Given examplesState features Si= <f1 , … , fn>

Estimated values y Qaction(Si)

Page 11: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Support Vector Regression

state S

Q-estimate y

minimize ||w||1 + |b| + C ||k||1

such that y - k Sw + b y + k

Linear Program

Page 12: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Transfer Advice

Task BQ-functions

Task B experience

Mapping from user Task A Task B

Page 13: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Advice Example

Need only follow advice approximately

Add soft constraints to linear program

if distance_to_goal 10

and shot_angle 30

then prefer shoot over all other actions

Page 14: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Incorporating AdviceMaclin et al., AAAI 2005

if v11 f1 + … + v1n fn d1

and vm1 f1 + … + vmn fn dn

then Qshoot > Qother for all other

Advice and Q-functions have same language Linear expressions of features

Page 15: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Transfer Advice

Task BQ-functions

Task B experience

Mapping from user Task A Task B

Page 16: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Expressing Policy with Advice

Qhold_ball(s) Qpass_near(s)Qpass_far(s)

if Qhold_ball(s) > Qpass_near(s)

and Qhold_ball(s) > Qpass_far(s)

then prefer hold_ball over all other actions

Old Q-functions

Advice expressing policy

Page 17: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Mapping Actions

hold_ball move

pass_near pass_near

pass_far

Qhold_ball(s) Qpass_near(s)Qpass_far(s)

if Qhold_ball(s) > Qpass_near(s)

and Qhold_ball(s) > Qpass_far(s)

then prefer move over all other actions

Old Q-functions

Mapped policy

Mapping from user

Page 18: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Mapping Features

Qhold_ball(s) = w1 (dist_keeper1)+ w2 (dist_taker2)+ …

Q´hold_ball(s) = w1 (dist_attacker1)+ w2 (MAX_DIST)+ …

Mapping from user

Q-function mapping

Page 19: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Example

Qx = wx1f1 + wx2f2 + bx

Qy = wy1f1 + by

Qz = wz2f2 + bz

Old model

Q´x = wx1f´1 + wx2f´2 + bx

Q´y = wy1f´1 + by

Q´z = wz2f´2 + bz

Mapped model

if Q´x > Q´y

and Q´x > Q´z

then prefer x´

Advice

if wx1f´1 + wx2f´2 + bx > wy1f´1 + by

and wx1f´1 + wx2f´2 + bx > wz2 f´2 + bz

then prefer x´ to all other actions

Advice (expanded)

Page 20: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Experiment

Between RoboCup subtasks From 3-on-2 KeepAway To 2-on-1 BreakAway

Two simultaneous mappings Transfer passing skills Map passing skills to shooting

Page 21: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Experiment Mappings

Play a moving KeepAway game Pass Pass, Hold Move

Pretend teammate is standing in the goal Pass Shoot

imaginaryteammate

Page 22: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Experimental Methodology

Averaged over 10 BreakAway runs

Transfer: advice from one KeepAway model

Control: runs without advice

Page 23: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Results

0

0.2

0.4

0.6

0.8

0 2500 5000 7500 10000

Games Played

Pro

bab

ility

(S

core

Go

al)

Page 24: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Analysis

Transfer advice helps BreakAway learners 7% more likely to score a goal after learning

Improvement is delayed Advantage begins after 2500 games

Some advice rules apply rarely Preconditions for shoot advice not often met

Page 25: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Related Work: Transfer

Remember action subsequences [Singh, ML 1992]

Restrict action choices [Sherstov & Stone, AAAI 2005]

Transfer Q-values directly in KeepAway [Taylor & Stone, AAMAS 2005]

Page 26: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Related Work: Advice

“Take action A now”[Clouse & Utgoff, ICML 1992]

“In situations S, action A has value X ”[Maclin & Shavlik, ML 1996]

“In situations S, prefer action A over B ”[Maclin et al., AAAI 2005]

Page 27: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Future Work

Increase speed of linear-program solving

Decrease sensitivity to imperfect advice

Extract advice from kernel-based models

Help user map actions and features

Page 28: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Conclusions

Transfer exploits previously learned models to improve learning of new tasks

Advice is an appealing way to transfer

Linear regression approach incorporates advice straightforwardly

Transferring a policy accommodates different reward structures

Page 29: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Acknowledgements

DARPA grant HR0011-04-1-0007 United States Naval Research Laboratory

grant N00173-04-1-G026 Michael Ferris Olvi Mangasarian Ted Wild