relational transfer in reinforcement learning

Lisa Torrey

University of Wisconsin – Madison

CS 540

Transfer Learning

EducationHierarchical curriculum

Learning tasks share common stimulus-response elements

Abstract problem-solvingLearning tasks share general underlying principles

MultilingualismKnowing one language affects learning in

anotherTransfer can be both positive and negative

Transfer Learning in Humans

Transfer Learning in AI

Task T

Task S

Goals of Transfer Learning

training

higher start

higher slope

higher asymptote

Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Transfer in Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Thrun and Mitchell 1995: Transfer slopes for gradient descent

Bayesian Learning

Bayesian Transfer

Priordistribution

Posterior Distributio

Bayesian methods

Raina et al.2006: Transfer a Gaussian prior

Line Curve

Surface Circle

Hierarchical methods

Stracuzzi 2006: Learn Boolean concepts that can depend on each other

Dealing with Missing Data or Labels

Shi et al. 2008: Transfer via active learning

Task S

Task T

Reinforcement Learning

Environment

AgentQ(s1, a) =

0π(s1) = a1a

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

Transfer in Reinforcement Learning

Starting-point

methods

Alterationmethods

Imitation methods

New RL algorithms

0 0 0 0

0 0 0 0 target-task training

2 5 4 8

9 1 7 2

5 9 1 4

Initial Q-tabletransferno transfer

Source task

Starting-point methods

Taylor et al. 2005: Value-function transfer

Run Kick

Pass Shoot

Soccer

Mehta et al. 2008: Transfer a learned hierarchy

Alteration methods

Walsh et al. 2006: Transfer aggregate states

Task S

Original statesOriginal actionsOriginal rewards

New statesNew actionsNew rewards

New RL Algorithms

Torrey et al. 2006: Transfer advice about skills

Environment

AgentQ(s1, a) =

0π(s1) = a1a

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

Imitation methods

training

source

target

policy used

Torrey et al. 2007: Demonstrate a strategy

My Research

Starting-point

methods

Imitation methods

New RL algorithms

SkillTransf

MacroTransf

RoboCup Domain

3-on-2 BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfield

2-on-1 BreakAway

Inductive Logic Programming

IF [ ]THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 THEN pass(Teammate)

IF distance(Teammate) ≤ 10 THEN pass(Teammate)

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning via Support Vector Regression (RL-SVR)

Environment

Batch 1

Environment

Batch 2

…Compute

Q-functions

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning with Advice (KBKR)

Environment

Batch 1

Compute Q-

functions Environment

Batch 2

Advice

+ µ × AdviceMisfit

Skill Transfer Algorithm

Source

Target

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30THEN pass(Teammate)

Advice Taking

[Human advice]

Mapping

Selected ResultsSkill transfer to 3-on-2 BreakAway from several tasks

Macro-Operators

pass(Teammate)

move(Direction)

shoot(goalRight)

shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(ahead)

IF [ ... ] THEN shoot(goalRight)

IF [ ... ] THEN shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(left)

IF [ ... ] THEN shoot(goalRight)

Demonstration

source

target

training

policy used

An imitation method

Macro Transfer AlgorithmSourc

Target

Demonstration

Macro Transfer AlgorithmLearning structures

Positive: BreakAway

games that score

Negative: BreakAway games that didn’t score

IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE)

THEN isaGoodGame(Game)

Macro Transfer AlgorithmLearning rules for arcs

Positive: states in good games

that took the arc

Negative: states in good games that could have taken the arc but didn’t

shoot(goalRight)

IF [ … ]THEN enter(State)

IF [ … ]THEN loop(State, Teammate))

pass(Teammate)

Selected ResultsMacro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Machine learning is often designed in standalone tasks

Transfer is a natural learning ability that we would like to incorporate into machine learners

There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

Summary

relational transfer in reinforcement learning

Documents

relational model for knowledge transfer 1980 international...

transfer reinforcement learning with shared dynamics · pdf...

transfer learning for reinforcement learning domains: a...

transfer in reinforcement learning: a framework and a...

transfer learning for reinforcement learning domains: a...

sim-real joint reinforcement transfer for 3d indoor...

visual transfer for reinforcement learning via wasserstein...

transfer and multi-task...

transfer learning in multi-agent reinforcement learning ......

relational reinforcement learning with the …...10 1...

transfer learning for reinforcement learning domains: a...

from tetris to relational reinforcement learning

transfer learning via inter-task mappings for temporal...

relational transfer in reinforcement learning by lisa...

transfer in variable - reward hierarchical reinforcement...

deep reinforcement learning with knowledge transfer for...

transfer in reinforcement learning: a framework and a...

making transfer climate visible: utilizing social network...

darla: improving zero-shot transfer in reinforcement...

transfer learning for reinforcement learning domains: a...