machine learning via advice taking jude shavlik. thanks to... rich maclin lisa torrey trevor walker...

45
Machine Learning via Advice Taking Jude Shavlik

Post on 20-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Machine Learning via

Advice Taking

Jude Shavlik

Page 2: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Thanks To ...

Rich MaclinLisa TorreyTrevor Walker

Prof. Olvi MangasarianGlenn FungTed Wild

DARPA

Page 3: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Quote (2002) from DARPA

Sometimes an assistant will merely watch you and draw conclusions.

Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.'

It's a combination of learning by example and by being guided.

Page 4: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Widening the “Communication Pipeline” between Humans and Machine Learners

Teacher

Pupil

Machine Learner

Page 5: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Our Approach to Building Better Machine Learners

• Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals

• Agent incorporates advice directly into the function it is learning

• Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually

Page 6: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

“Standard” Machine Learning vs. Theory Refinement

• Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, …

temp = 101.7, age = 37, sex = M, …

• Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, …

temp = 99.6, age = 24, sex = F, …

• Approximate Domain Knowledge if temp = high and age = young … then neg example

Related work by labs of Mooney, Pazzani, Cohen, Giles, etc

Page 7: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Rich Maclin’s PhD (1995)

IF a Bee is (Near and West) & an Ice is (Near and North)Then Begin Move East Move North END

Page 8: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

0

10

00

20

00

30

00

40

00

Number of Training Episodes

Re

info

rce

me

nt

on

Te

sts

et

Sample Results

Without advice

With advice

Page 9: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Our Motto

Give advice

rather than commands

to your computer

Page 10: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Outline

Prior Knowledge and Support Vector Machines Intro to SVM’s Linear Separation Non-Linear Separation Function Fitting (“Regression”) Advice-Taking Reinforcement Learning Transfer Learning via Advice Taking

Page 11: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Support Vector MachinesMaximizing the Margin between Bounding Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjwjj22

Support Vectors?

Margin

Page 12: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Linear Algebra for SVM’s

• Given p points in n dimensional space• Represent by p-by-n matrix A of reals

• More succinctly

D(Awà eí )=e;where e is vector of ones

• Separate by two bounding planes

A iw=í + 1; for D i i = + 1;

A iw5 í à 1; for D i i = à 1:

• Each Ai in class +1 or -1

Page 13: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

“Slack” VariablesDealing with Data that is not Linearly Separable

A+

A-

y

Support Vectors

Page 14: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Support Vector Machines Quadratic Programming Formulation

• Solve this quadratic program

D(Awà eí ) >e;y > 0;w; í e0ymin

s.t. + y÷ + 2

1jjwjj22

• Maximize margin by minimizing21kwk2

2jjwjj22

• Minimize sum of slack vars with wgt

÷e0y

Page 15: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Support Vector MachinesLinear Programming Formulation

Use 1-norm instead of 2-norm(typically runs faster; better feature selection;might generalize better, NIPS ‘03)

÷e0y+ kwk1y > 0;w; í

D(Awà eí ) + y > e

min

s.t.

Page 16: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Knowledge-Based SVM’sGeneralizing “Example” from POINT to REGION

A+

A-

Page 17: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Incorporating “Knowledge Sets”

Into the SVM Linear Program

This implication equivalent to set of constraints (proof in NIPS ’02 paper)

• Suppose that knowledge set belongs to class A+

Hence must lie in half space

èx??Bx 6 b

é

èxjx0w>í + 1

é

Bx6b ) x0w>í + 1

• We therefore have the implication

Page 18: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Resulting LP for KBSVM’s

We get this linear program (LP)

Ranges over # regions

Page 19: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

KBSVM with Slack Variables

Was 0

Page 20: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

SVMs and Non-Linear Separating Surfaces

f1

f2 +

+

_

_

h(f1, f2)

g(f1, f2) +

+

_

_

Non-linearly map to new space

Linearly separate in new space (using kernels)

Result is non-linear separator in original space

Fung et al. (2003) presents knowledge-

based non-linear SVMs

Page 21: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Support Vector Regression(aka Kernel Regression)

Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs

f(x) ≈ x’w + b

Find weights such that

Aw + be ≈ y

In dual space, w = A’, so get

(A A’) + be ≈ y

Kernel’izing (to get non-linear approx)

K(A,A’) + be ≈ y

y

x

Page 22: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

What to Optimize?

Linear program to optimize

• 1st term () is “regularizer” that minimizes model complexity

• 2nd term is approximation error, weighted by parameter C

• Classical “least squares” fit if quadratic version and first term ignored

Page 23: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Predicting Y for New X

y = K(x’, A’) + b

• Use Kernel to compute “distance” to each training point (ie, row in A)

• Weight by i (hopefully many i are zero), Sum

• Add b (a scalar)

Page 24: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Knowledge-Based SVRMangasarian, Shavlik, & Wild, JMLR ‘04

Add soft constraints to linear program (so need only follow advice approximately)

minimize ||w||1 + C ||s||1

+ penalty for violating advice

such that y - s Aw + b y + s “slacked” match to advice

Advice: In this region, y should exceed 4

S

y

4

Page 25: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Testbeds: Subtasks of RoboCup

Keep ball from opponents

[Stone & Sutton, ICML 2001]

Mobile KeepAway

Score goal

[Maclin et al., AAAI 2005]

BreakAway

Page 26: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Reinforcement Learning Overview

Take an actionReceive a state

Receive a reward

Policy: choose the action with the highest Q-value in the current state

Use the rewards to

estimate the Q-values of actions in

states

Described by a set of features

Page 27: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Incorporating Advice in KBKR

Advice format Bx ≤ d f(x) ≥ hx +

TeammatedistanceTo

shotAngle

GoaldistanceTo

is x

0 1 0

0 0 1

-

bwx

30

10x

9.0

If distanceToGoal ≤ 10 and

shotAngle ≥ 30

Then Q(shoot) ≥ 0.9

Page 28: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Giving Advice About Relative Values of Multiple Functions

Maclin et al, AAAI ’05

When the input satisfies

preconditions(input)

Then

f1(input) > f2(input)

Page 29: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Sample Advice-Taking Results

if distanceToGoal 10

and shotAngle 30

then prefer shoot over all other actions

0.0

0.2

0.4

0.6

0.8

1.0

0 5000 10000 15000 20000 25000

Games Played

Pro

b(S

core

Go

al)

advice

std RL2 vs 1 BreakAway, rewards +1, -1

Q(shoot) > Q(pass)Q(shoot) > Q(move)

Page 30: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Transfer Learning

Agent discovers how tasks are related

We use a user

mappingto tell the agent this

Agent learns Task A

Agent encounters related Task B

Agent uses knowledge from Task A to learn Task B faster

Task A is the

source Task B is

the target

Page 31: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Transfer Learning:The Goal for the Target Task

perf

orm

ance

training

with transfer

without transfer

better start

faster rise better asymptote

Page 32: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Our Transfer Algorithm

Observe source task games to learn skills

Use ILP to create advice for the target task

Learn target taskwith KBKR

Translate learned skills

into transfer advice

If there is user advice, add it

in

Page 33: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Learning Skills By Observation

• Source-task games are sequences: (state, action)• Learning skills is like learning to classify states

by their correct actions• ILP = Inductive Logic Programming

State 1distBetween(me,teammate2) = 15distBetween(me,teammate1) = 10distBetween(me,opponent1) = 5...action = pass(teammate2)outcome = caught(teammate2)

Page 34: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

ILP: Searching for First-Order Rules

P :- true

P :- Q P :- R P :- S

P :- R, Q P :- R, S

P :- R, S, V, W, XWe also use a

random-sampling approach

Page 35: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Advantages of ILP

• Can produce first-order rules for skills• Capture only the essential aspects of the skill• We expect these aspects to transfer better

• Can incorporate background knowledge

pass(Teammate)

pass(teammate1)

pass(teammateN)

vs....

Page 36: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Example of a Skill Learned by ILP from KeepAway

pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7.

Also gave “human” advice about shooting, since that is new skill in BreakAway

Page 37: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

TL Level 7: KA to BA Raw Curves

Page 38: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

TL Level 7: KA to BA Averaged Curves

Page 39: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

TL Level 7: StatisticsTL Metrics Average Reward

Type Name KA to BA MD to BA

Score P Value Score P Value

I Jump start 0.05 0.0312 0.08 0.0086

Jump start smoothed 0.08 0.0002 0.06 0.0014

II Transfer ratio 1.82 0.0034 1.86 0.0004

Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004

Average relative reduction (narrow)

0.58 0.0042 0.54 0.0004

Average relative reduction (wide) 0.70 0.0018 0.71 0.0008

Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012

Transfer difference 503.57 0.0046 561.27

0.0008

Transfer difference (scaled) 1017.00

0.0040 1091.2

0.0016

III Asymptotic advantage 0.09 0.0086 0.11 0.0040

Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030

Boldface indicates a significant difference was found

Page 40: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Conclusion

• Can use much more than I/O pairs in ML

• Give advice to computers; theyautomatically refine it based on feedback from user or environment

• Advice an appealing mechanism for transferring learned knowledgecomputer-to-computer

Page 41: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Some Papers (on-line, use Google :-)

Creating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996

Knowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002

Knowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003

Knowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004

Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005

Skill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006

Page 42: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Backups

Page 43: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Breakdown of Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

bab

ilit

y o

f G

oal

all advice

transfer advice onlyuser advice only

no advice

Page 44: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

What if User Advice is Bad?

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

bab

ility

of

Go

al

Transfer with good advice

Transfer with bad adviceBad advice only

No advice

Page 45: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

Related Work on Transfer

• Q-function transfer in RoboCup• Taylor & Stone (AAMAS 2005, AAAI 2005)

• Transfer via policy reuse• Fernandez & Veloso (AAMAS 2006, ICML workshop

2006)• Madden & Howley (AI Review 2004)• Torrey et al. (ECML 2005)

• Transfer via relational RL• Driessens et al. (ICML workshop 2006)