relational transfer in reinforcement learning

Post on 27-May-2015

501 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lisa Torrey

University of Wisconsin – Madison

CS 540

Transfer Learning

EducationHierarchical curriculum

Learning tasks share common stimulus-response elements

Abstract problem-solvingLearning tasks share general underlying principles

MultilingualismKnowing one language affects learning in

anotherTransfer can be both positive and negative

Transfer Learning in Humans

Transfer Learning in AI

Given

Learn

Task T

Task S

Goals of Transfer Learning

perf

orm

an

ce

training

higher start

higher slope

higher asymptote

Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Transfer in Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Thrun and Mitchell 1995: Transfer slopes for gradient descent

Transfer in Inductive Learning

Bayesian Learning

Bayesian Transfer

Priordistribution

+

Data

=

Posterior Distributio

n

Bayesian methods

Raina et al.2006: Transfer a Gaussian prior

Transfer in Inductive Learning

Line Curve

Surface Circle

Pipe

Hierarchical methods

Stracuzzi 2006: Learn Boolean concepts that can depend on each other

Transfer in Inductive Learning

Dealing with Missing Data or Labels

Shi et al. 2008: Transfer via active learning

Task S

Task T

Reinforcement Learning

Environment

s1

AgentQ(s1, a) =

0π(s1) = a1a

1

s2

r2

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

s3

r3

Transfer in Reinforcement Learning

Starting-point

methods

Hierarchical methods

Alterationmethods

Imitation methods

New RL algorithms

Transfer in Reinforcement Learning

0 0 0 0

0 0 0 0

0 0 0 0 target-task training

2 5 4 8

9 1 7 2

5 9 1 4

Initial Q-tabletransferno transfer

Source task

Starting-point methods

Taylor et al. 2005: Value-function transfer

Transfer in Reinforcement Learning

Hierarchical methods

Run Kick

Pass Shoot

Soccer

Mehta et al. 2008: Transfer a learned hierarchy

Transfer in Reinforcement Learning

Alteration methods

Walsh et al. 2006: Transfer aggregate states

Task S

Original statesOriginal actionsOriginal rewards

New statesNew actionsNew rewards

Transfer in Reinforcement Learning

New RL Algorithms

Torrey et al. 2006: Transfer advice about skills

Environment

s1

AgentQ(s1, a) =

0π(s1) = a1a

1

s2r2

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

s3r3

Transfer in Reinforcement Learning

Imitation methods

training

source

target

policy used

Torrey et al. 2007: Demonstrate a strategy

My Research

Starting-point

methods

Imitation methods

Hierarchical methods

Hierarchical methods

New RL algorithms

SkillTransf

er

MacroTransf

er

RoboCup Domain

3-on-2 BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfield

2-on-1 BreakAway

Inductive Logic Programming

IF [ ]THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 THEN pass(Teammate)

IF distance(Teammate) ≤ 10 THEN pass(Teammate)

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning via Support Vector Regression (RL-SVR)

Environment

Agent

Batch 1

Environment

Agent

Batch 2

…Compute

Q-functions

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning with Advice (KBKR)

Environment

Agent

Batch 1

Compute Q-

functions Environment

Agent

Batch 2

Advice

+ µ × AdviceMisfit

Skill Transfer Algorithm

Source

Target

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30THEN pass(Teammate)

ILP

Advice Taking

[Human advice]

Mapping

Selected ResultsSkill transfer to 3-on-2 BreakAway from several tasks

Macro-Operators

pass(Teammate)

move(Direction)

shoot(goalRight)

shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(ahead)

IF [ ... ] THEN shoot(goalRight)

IF [ ... ] THEN shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(left)

IF [ ... ] THEN shoot(goalRight)

IF [ ... ] THEN shoot(goalRight)

Demonstration

source

target

training

policy used

An imitation method

Macro Transfer AlgorithmSourc

e

Target

ILP

Demonstration

Macro Transfer AlgorithmLearning structures

Positive: BreakAway

games that score

Negative: BreakAway games that didn’t score

ILP

IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE)

THEN isaGoodGame(Game)

Macro Transfer AlgorithmLearning rules for arcs

Positive: states in good games

that took the arc

Negative: states in good games that could have taken the arc but didn’t

ILP

shoot(goalRight)

IF [ … ]THEN enter(State)

IF [ … ]THEN loop(State, Teammate))

pass(Teammate)

Selected ResultsMacro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Machine learning is often designed in standalone tasks

Transfer is a natural learning ability that we would like to incorporate into machine learners

There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

Summary

top related