machine learning via advice taking jude shavlik. thanks to... rich maclin lisa torrey trevor walker...
Post on 20-Jan-2016
233 views
TRANSCRIPT
Machine Learning via
Advice Taking
Jude Shavlik
Thanks To ...
Rich MaclinLisa TorreyTrevor Walker
Prof. Olvi MangasarianGlenn FungTed Wild
DARPA
Quote (2002) from DARPA
Sometimes an assistant will merely watch you and draw conclusions.
Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.'
It's a combination of learning by example and by being guided.
Widening the “Communication Pipeline” between Humans and Machine Learners
Teacher
Pupil
Machine Learner
Our Approach to Building Better Machine Learners
• Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals
• Agent incorporates advice directly into the function it is learning
• Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually
“Standard” Machine Learning vs. Theory Refinement
• Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, …
temp = 101.7, age = 37, sex = M, …
• Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, …
temp = 99.6, age = 24, sex = F, …
• Approximate Domain Knowledge if temp = high and age = young … then neg example
Related work by labs of Mooney, Pazzani, Cohen, Giles, etc
Rich Maclin’s PhD (1995)
IF a Bee is (Near and West) & an Ice is (Near and North)Then Begin Move East Move North END
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
0
10
00
20
00
30
00
40
00
Number of Training Episodes
Re
info
rce
me
nt
on
Te
sts
et
Sample Results
Without advice
With advice
Our Motto
Give advice
rather than commands
to your computer
Outline
Prior Knowledge and Support Vector Machines Intro to SVM’s Linear Separation Non-Linear Separation Function Fitting (“Regression”) Advice-Taking Reinforcement Learning Transfer Learning via Advice Taking
Support Vector MachinesMaximizing the Margin between Bounding Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
Support Vectors?
Margin
Linear Algebra for SVM’s
• Given p points in n dimensional space• Represent by p-by-n matrix A of reals
• More succinctly
D(Awà eí )=e;where e is vector of ones
• Separate by two bounding planes
A iw=í + 1; for D i i = + 1;
A iw5 í à 1; for D i i = à 1:
• Each Ai in class +1 or -1
“Slack” VariablesDealing with Data that is not Linearly Separable
A+
A-
y
Support Vectors
Support Vector Machines Quadratic Programming Formulation
• Solve this quadratic program
D(Awà eí ) >e;y > 0;w; í e0ymin
s.t. + y÷ + 2
1jjwjj22
• Maximize margin by minimizing21kwk2
2jjwjj22
• Minimize sum of slack vars with wgt
÷e0y
Support Vector MachinesLinear Programming Formulation
Use 1-norm instead of 2-norm(typically runs faster; better feature selection;might generalize better, NIPS ‘03)
÷e0y+ kwk1y > 0;w; í
D(Awà eí ) + y > e
min
s.t.
Knowledge-Based SVM’sGeneralizing “Example” from POINT to REGION
A+
A-
Incorporating “Knowledge Sets”
Into the SVM Linear Program
This implication equivalent to set of constraints (proof in NIPS ’02 paper)
• Suppose that knowledge set belongs to class A+
Hence must lie in half space
èx??Bx 6 b
é
èxjx0w>í + 1
é
Bx6b ) x0w>í + 1
• We therefore have the implication
Resulting LP for KBSVM’s
We get this linear program (LP)
Ranges over # regions
KBSVM with Slack Variables
Was 0
SVMs and Non-Linear Separating Surfaces
f1
f2 +
+
_
_
h(f1, f2)
g(f1, f2) +
+
_
_
Non-linearly map to new space
Linearly separate in new space (using kernels)
Result is non-linear separator in original space
Fung et al. (2003) presents knowledge-
based non-linear SVMs
Support Vector Regression(aka Kernel Regression)
Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs
f(x) ≈ x’w + b
Find weights such that
Aw + be ≈ y
In dual space, w = A’, so get
(A A’) + be ≈ y
Kernel’izing (to get non-linear approx)
K(A,A’) + be ≈ y
y
x
What to Optimize?
Linear program to optimize
• 1st term () is “regularizer” that minimizes model complexity
• 2nd term is approximation error, weighted by parameter C
• Classical “least squares” fit if quadratic version and first term ignored
Predicting Y for New X
y = K(x’, A’) + b
• Use Kernel to compute “distance” to each training point (ie, row in A)
• Weight by i (hopefully many i are zero), Sum
• Add b (a scalar)
Knowledge-Based SVRMangasarian, Shavlik, & Wild, JMLR ‘04
Add soft constraints to linear program (so need only follow advice approximately)
minimize ||w||1 + C ||s||1
+ penalty for violating advice
such that y - s Aw + b y + s “slacked” match to advice
Advice: In this region, y should exceed 4
S
y
4
Testbeds: Subtasks of RoboCup
Keep ball from opponents
[Stone & Sutton, ICML 2001]
Mobile KeepAway
Score goal
[Maclin et al., AAAI 2005]
BreakAway
Reinforcement Learning Overview
Take an actionReceive a state
Receive a reward
Policy: choose the action with the highest Q-value in the current state
Use the rewards to
estimate the Q-values of actions in
states
Described by a set of features
Incorporating Advice in KBKR
Advice format Bx ≤ d f(x) ≥ hx +
TeammatedistanceTo
shotAngle
GoaldistanceTo
is x
0 1 0
0 0 1
-
bwx
30
10x
9.0
If distanceToGoal ≤ 10 and
shotAngle ≥ 30
Then Q(shoot) ≥ 0.9
Giving Advice About Relative Values of Multiple Functions
Maclin et al, AAAI ’05
When the input satisfies
preconditions(input)
Then
f1(input) > f2(input)
Sample Advice-Taking Results
if distanceToGoal 10
and shotAngle 30
then prefer shoot over all other actions
0.0
0.2
0.4
0.6
0.8
1.0
0 5000 10000 15000 20000 25000
Games Played
Pro
b(S
core
Go
al)
advice
std RL2 vs 1 BreakAway, rewards +1, -1
Q(shoot) > Q(pass)Q(shoot) > Q(move)
Transfer Learning
Agent discovers how tasks are related
We use a user
mappingto tell the agent this
Agent learns Task A
Agent encounters related Task B
Agent uses knowledge from Task A to learn Task B faster
Task A is the
source Task B is
the target
Transfer Learning:The Goal for the Target Task
perf
orm
ance
training
with transfer
without transfer
better start
faster rise better asymptote
Our Transfer Algorithm
Observe source task games to learn skills
Use ILP to create advice for the target task
Learn target taskwith KBKR
Translate learned skills
into transfer advice
If there is user advice, add it
in
Learning Skills By Observation
• Source-task games are sequences: (state, action)• Learning skills is like learning to classify states
by their correct actions• ILP = Inductive Logic Programming
State 1distBetween(me,teammate2) = 15distBetween(me,teammate1) = 10distBetween(me,opponent1) = 5...action = pass(teammate2)outcome = caught(teammate2)
ILP: Searching for First-Order Rules
P :- true
P :- Q P :- R P :- S
P :- R, Q P :- R, S
P :- R, S, V, W, XWe also use a
random-sampling approach
Advantages of ILP
• Can produce first-order rules for skills• Capture only the essential aspects of the skill• We expect these aspects to transfer better
• Can incorporate background knowledge
pass(Teammate)
pass(teammate1)
pass(teammateN)
vs....
Example of a Skill Learned by ILP from KeepAway
pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7.
Also gave “human” advice about shooting, since that is new skill in BreakAway
TL Level 7: KA to BA Raw Curves
TL Level 7: KA to BA Averaged Curves
TL Level 7: StatisticsTL Metrics Average Reward
Type Name KA to BA MD to BA
Score P Value Score P Value
I Jump start 0.05 0.0312 0.08 0.0086
Jump start smoothed 0.08 0.0002 0.06 0.0014
II Transfer ratio 1.82 0.0034 1.86 0.0004
Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004
Average relative reduction (narrow)
0.58 0.0042 0.54 0.0004
Average relative reduction (wide) 0.70 0.0018 0.71 0.0008
Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012
Transfer difference 503.57 0.0046 561.27
0.0008
Transfer difference (scaled) 1017.00
0.0040 1091.2
0.0016
III Asymptotic advantage 0.09 0.0086 0.11 0.0040
Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030
Boldface indicates a significant difference was found
Conclusion
• Can use much more than I/O pairs in ML
• Give advice to computers; theyautomatically refine it based on feedback from user or environment
• Advice an appealing mechanism for transferring learned knowledgecomputer-to-computer
Some Papers (on-line, use Google :-)
Creating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996
Knowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002
Knowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003
Knowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004
Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005
Skill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006
Backups
Breakdown of Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1000 2000 3000 4000 5000
Games Played
Pro
bab
ilit
y o
f G
oal
all advice
transfer advice onlyuser advice only
no advice
What if User Advice is Bad?
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1000 2000 3000 4000 5000
Games Played
Pro
bab
ility
of
Go
al
Transfer with good advice
Transfer with bad adviceBad advice only
No advice
Related Work on Transfer
• Q-function transfer in RoboCup• Taylor & Stone (AAMAS 2005, AAAI 2005)
• Transfer via policy reuse• Fernandez & Veloso (AAMAS 2006, ICML workshop
2006)• Madden & Howley (AI Review 2004)• Torrey et al. (ECML 2005)
• Transfer via relational RL• Driessens et al. (ICML workshop 2006)