ac#ve&learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: aarti...
TRANSCRIPT
![Page 1: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/1.jpg)
Ac#ve Learning
Burr Se0les
Machine Learning 10-‐701 / 15-‐781 April 19, 2011
some slides adapted from: Aarti Singh, Rui Castro, Rob Nowak
![Page 2: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/2.jpg)
Supervised Learning
raw unlabeled data x1, x2, x3, . . .
supervised learner induces a classifier
labeled training instances
�x1, y1�, �x2, y2�, �x3, y3�, . . .
expert / oracle analyzes experiments
to determine labels
random sample
![Page 3: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/3.jpg)
Semi-‐Supervised Learning
raw unlabeled data
expert / oracle analyzes experiments
to determine labels
semi-supervised learner induces a classifier
x1, x2, x3, . . .
labeled training instances
�x1, y1�, �x2, y2�, �x3, y3�, . . .
exploit the structure in unlabeled data
random sample
![Page 4: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/4.jpg)
Ac#ve Learning
raw unlabeled data
expert / oracle analyzes experiments
to determine labels
active learner induces a classifier
x1, x2, x3, . . .inspect the unlabeled data
�x1, y1�
request labels for selected data
�x1, ?�
�x2, y2�
�x2, ?�
![Page 5: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/5.jpg)
The 20 Ques#ons Game
“Are you female?”
“Do you have a moustache?”
“No.”
“Yes.”
our goal is to pose the most informa#ve “queries”
how can we automate this process?
![Page 6: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/6.jpg)
Thought Experiment
• suppose you are on an Earth convoy sent to colonize planet Zelgon
people who ate the smooth Zelgian fruits found them tasty!
people who ate the spikey Zelgian fruits got sick!
![Page 7: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/7.jpg)
Determining Poison vs. Yummy Fruits
• there is a con#nuous range of spikey-‐to-‐smooth fruit shapes on Zelgon:
you need to learn how to recognize fruits as poisonous or safe
and you need to do this while risking as li0le as possible (i.e., colonist health)
![Page 8: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/8.jpg)
Learning a Change Point
goal: learn threshold θ* as accurately as possible, using as few labeled instances as possible.
![Page 9: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/9.jpg)
The Problem with Passive Learning
in passive supervised learning, the instances must be chosen before any “tests” are done!
error rate ε requires us to risk O(1/ε) people’s health!
![Page 10: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/10.jpg)
Can We Do Be0er?
this is just a binary search…
requiring O(1/ε) fruits (samples) and only O (log2 1/ε) tests (queries)
your first “ac#ve learning” algorithm!
![Page 11: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/11.jpg)
Rela#onship to Ac#ve Learning
• key idea: the learner chooses the training data – on Zelgon: whether a fruit was poisonous/safe – in general: the true label of some instance
• goal: reduce the training costs – on Zelgon: the number of “lives at risk”
– in general: the number of “queries” (=> labor costs, disk storage space, etc.)
![Page 12: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/12.jpg)
Prac#cal Query Scenarios
• query synthesis [Anguin, 1988]
• selec#ve sampling [Atlas et al., 1989]
• pool-‐based ac#ve learning [Lewis & Gale, 1994]
![Page 13: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/13.jpg)
learner
Query Synthesis
Input Space or Distribu#on
generate an informative instance ne novo xi
label & add to training set
�xi, yi�
oracle / expert
�xi, ?�query
![Page 14: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/14.jpg)
Problems with Query Synthesis
14
an early real-‐world applica#on: neural-‐net queries synthesized for handwri0en digits [Lang & Baum, 1992]
problem: humans couldn’t interpret the queries!
ideally, we can ensure that the queries come from the underlying “natural” distribu;on
![Page 15: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/15.jpg)
Selec#ve Sampling
learner
Con#nuous Data Source
observe a new instance
xi
label & add to training set
�xi, yi�
decide to query or ignore
oracle / expert
�xi, ?�
![Page 16: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/16.jpg)
Selec#ve Sampling (2)
• advantage: ensures that query instances come from the true underlying data distribu#on
• assump;on: drawing an instance from the distribu#on is significantly less expensive than obtaining its label – onen true in prac#ce, e.g., downloading Web documents vs. assigning topic labels to them
![Page 17: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/17.jpg)
learner
Pool-‐Based Ac#ve Learning
label & add to training set
�xi, yi�
oracle / expert
�xi, ?�query choose the best
out of the sample pool xi
Large Pool of Unlabeled Data
![Page 18: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/18.jpg)
Learning Curves
text classification: baseball vs. hockey
active learning
passive learning
bette
r
![Page 19: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/19.jpg)
Who Uses Ac#ve Learning?
Sentiment analysis for blogs; Noisy relabeling – Prem Melville!
Biomedical NLP & IR; Computer-aided diagnosis – Balaji Krishnapuram
MS Outlook voicemail plug-in [Kapoor et al., IJCAI'07]; “A variety of prototypes that are in use throughout the company.” – Eric Horvitz
“While I can confirm that we're using active learning in earnest on many problem areas… I really can't provide any more details than that. Sorry to be so opaque!" – David Cohn
![Page 20: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/20.jpg)
OK, How Do We Select Queries?
• let’s interpret our Zelgian fruit binary search in terms of a probabilis6c classifier:
0.5 0.0
1.0 0.5 0.5
![Page 21: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/21.jpg)
Uncertainty Sampling
• query instances the learner is most uncertain about
400 instances sampled from 2 class Gaussians
random sampling 30 labeled instances
(accuracy=0.7)
uncertainty sampling 30 labeled instances
(accuracy=0.9)
[Lewis & Gale, SIGIR’94]
![Page 22: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/22.jpg)
Uncertainty Measures
least confident
smallest-margin
entropy
note: for binary tasks, these are equivalent!
![Page 23: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/23.jpg)
Mul#-‐Class Uncertainty
illustration of preferred (dark red) posterior distributions in a 3-label classification task
note: for mul#-‐class tasks, these are not equivalent!
![Page 24: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/24.jpg)
Informa#on-‐Theore#c Interpreta#on
• the “surprisal” I is a measure (in bits, nats, etc.) of the informa#on content for outcome y of variable Y:
• so this is how “informa#ve” the oracle’s label y will be
• but the learner doesn’t know the oracle’s answer yet! we can es#mate it as an expecta6on over all possible labels:
• which is entropy-‐based uncertainty sampling
![Page 25: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/25.jpg)
Uncertainty Sampling in Prac#ce
• pool-‐based ac#ve learning: – evaluate each x in U – rank and query the top K instances
– retrain, repeat
• selec#ve sampling: – threshold a “region of uncertainty,” e.g., [0.2, 0.8] – observe new instances, but only query those that fall within the region
– retrain, repeat
![Page 26: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/26.jpg)
Simple and Widely-‐Used
• text classifica;on – Lewis & Gale ICML’94;
• POS tagging – Dagan & Engelson, ICML’95;
Ringger et al., ACL’07
• disambigua;on – Fujii et al., CL’98;
• parsing – Hwa, CL’ 04
• informa;on extrac;on – Scheffer et al., CAIDA’01;
Se0les & Craven, EMNLP’08
• word segmenta;on – Sassano, ACL’02
• speech recogni;on – Tur et al., SC’05
• translitera;on – Kuo et al., ACL’06
• transla;on – Haffari et al., NAACL’09
![Page 27: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/27.jpg)
Uncertainty Sampling FAIL! [Cohn et al., ML’94]
initial random sample misses the right triangle
neural net uncertainty sampling only queries the left side
![Page 28: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/28.jpg)
What To Do?
• plain uncertainty sampling only uses the confidence of a single classifier – some#mes called a “point es#mate” for parametric models
– this classifier can become overly confident about instances is really knows nothing about!
• instead, let’s consider a different no#on of “uncertainty”… about the classifier itself
![Page 29: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/29.jpg)
Remember Version Spaces?
• the set of all classifiers that are consistent with the labeled training data
• the larger the version space V, the less likely each possible classifier is… we want queries to reduce |V|
![Page 30: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/30.jpg)
Alien Fruits Revisited
• let’s try interpre#ng our binary search in terms of a version-‐space search:
possible classifiers (thresholds): 8 4 2 1
![Page 31: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/31.jpg)
Simple Version Space Algorithm
• enumerate all legal hypotheses – or compute |V| analy#cally
• the op#mal query is the one that most reduces the size of V (in expecta#on over y):
• ideally we can halve the size of the version space • binary search does this in 1D (e.g., Zelgian fruits)
![Page 32: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/32.jpg)
Version Spaces for SVMs
F (feature space)
“version space duality” (Vapnik, 1998)
points in F correspond to hyperplanes in H
and vice versa H (hypothesis space)
V
SVM with largest margin is the center of the largest
hypersphere in V V
![Page 33: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/33.jpg)
Bisec#ng the SVM Version Space
V
given a choice of unlabeled instances (planes in H), we want to query one that mostly “bisects” V
i.e., the instance that comes closest to the SVM weight vector
this corresponds to the instance closest to the SVM decision boundary, i.e., smallest-‐margin uncertainty sampling
special case for SVMs: the best classifier is (hopefully) the center of the version space
![Page 34: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/34.jpg)
Problem: V Can Be a Big Space • in general, V may be too large to enumerate or measure |V| through analysis or trickery
• idea: train two classifiers G and S which represent the two “extremes” of the version space
• if these two models disagree, the instances falls within the “region of uncertainty”
![Page 35: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/35.jpg)
Toy Example: Learning A Square [Atlas et al., NIPS’89]
![Page 36: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/36.jpg)
Triangles Revisited
150 random samples 150 queries (G & S disagreement)
[Cohn et al., ML’94]
![Page 37: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/37.jpg)
Query-‐By-‐Commi0ee (QBC)
• simpler, more general approach
• train a commi0ee of classifiers C – no need to maintain G and S
– commi0ee can be any size (but onen just 2)
• query instances for which commi0ee members disagree
[Seung et al., COLT’92]
![Page 38: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/38.jpg)
QBC Guarantees
• let d be the VC dimension of hypothesis space
• under certain condi#ons, QBC achieves predic#on error ε with high probability using: – O(d/ε) unlabeled instances – O(log2 d/ε) queries
• an exponen#al improvement!
[Freund et al., ML’97]
![Page 39: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/39.jpg)
QBC in Prac#ce
• selec#ve sampling: – train a commi0ee C – observe new instances, but only query those for which there is disacreement (or a lot of disagreement)
– retrain, repeat
• pool-‐based ac#ve learning: – train a commi0ee C
– measure disagreement for each x in U
– rank and query the top K instances – retrain, repeat
![Page 40: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/40.jpg)
QBC Design Decisions
• how to build a commi0ee: – “sample” models from P(θ|L)
• [Dagan & Engelson, ICML’95; McCallum & Nigam, ICML’98]
– standard ensembles (e.g., bagging, boos#ng) • [Abe & Mamitsuka, ICML’98]
• how to measure disagreement (many): – “XOR” commi0ee classifica#ons – view vote distribu#on as probabili#es, use uncertainty measures (e.g., entropy)
![Page 41: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/41.jpg)
Bayesian Interpreta#on • we can use Bayes’ rule to derive an es#mate of the ensemble predic#on for a new x:
PC(y|x) =�
θ∈CPθ(y|x)P (θ)
φV E(x) = −�
y
�
θ∈C
�Pθ(y|x)P (θ)
�log
�Pθ(y|x)P (θ)
�
• QBC a0empts to reduce uncertainty over both: – the label y – the classifier θ
![Page 42: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/42.jpg)
If Andy Warhol Were Bayesian…
2-dimensional 3-class
problem
4 example decision trees from 50 labeled instances
Bayesian prediction
from 10-tree ensemble
vote entropy among committee of 10 trees
![Page 43: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/43.jpg)
Tangent: Ac#ve vs. Semi-‐Supervised
uncertainty sampling query instances the model
is least confident about
query-by-committee (QBC) use ensembles to rapidly reduce the version space
self-training expectation-maximization (EM)
entropy regularization (ER) propagate confident labelings
among unlabeled data
co-training multi-view learning
use ensembles with multiple views to constrain the version space
w.r.t. unlabeled data
• both try to a0ack the same problem: making the most of unlabeled data U
![Page 44: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/44.jpg)
Problem: Outliers
• an instance may be uncertain or controversial (for QBC) simply because it’s an outlier
• querying outliers is not likely to help us reduce error on more typical data
![Page 45: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/45.jpg)
Solu#on 1: Density Weigh#ng • weight the uncertainty (“informa#veness”) of an instance by its density w.r.t. the pool U [Settles & Craven, EMNLPʼ08]$
“base” informativeness
density term
[McCallum & Nigam, ICML’98; Nguyen & Smeulders, ICML’04; Xu et al., ECIR’07]
• use U to approximate P(x) and avoid outliers
![Page 46: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/46.jpg)
Solu#on 2: Es#mated Error Reduc#on
• minimize the risk R(x) of a query candidate – expected uncertainty over U if x is added to L
[Roy & McCallum, ICML’01; Zhu et al., ICML-WS’03]
sum over unlabeled instances
uncertainty of u after retraining with x
expectation over possible labelings of x
![Page 47: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/47.jpg)
Text Classifica#on Examples [Roy & McCallum, ICML’01]
![Page 48: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/48.jpg)
Text Classifica#on Examples [Roy & McCallum, ICML’01]
![Page 49: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/49.jpg)
Rela#onship to Uncertainty Sampling
• a different perspec#ve: aim to maximize the informa6on gain over U
…reduces to uncertainty sampling!
uncertainty before query risk term
assume x is representative of U
assume this evaluates to zero
![Page 50: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/50.jpg)
“Error Reduc#on” Scoresheet
• pros: – more principled query strategy
– can be model-‐agnos#c • literature examples: naïve Bayes, LR, GP, SVM
• cons: – too expensive for most model classes
• some solu#ons: subsample U; use approximate training
– intractable for mul#-‐class and structured outputs
![Page 51: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/51.jpg)
Alterna#ve Query Types • for some tasks, we can onen intui#vely label features
– the feature word “puck” indicates the label hockey – the feature word “strike” indicates the label baseball
• dual supervision exploits this domain knowledge using both instance-‐ and feature labels [Se0les, 2011; A0enberg et al., 2010; Druck et al., 2009]
– e.g., “does puck indicate the class hockey?”
• does it help to ac6vely solicit domain knowledge?
![Page 52: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/52.jpg)
DUALIST
• open-‐source sonware project for interac#ve text annota#on which combines: – semi-‐supervised learning
• naïve Bayes + EM
– domain knowledge • i.e., priors on P(word|y) parameters
– ac#ve learning • instance queries using uncertainty sampling • feature queries using mutual informa#on
![Page 53: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/53.jpg)
Results: University Web Pages
![Page 54: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/54.jpg)
Results: Science Newsgroups
![Page 55: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/55.jpg)
Real-‐World Annota#on Costs • so far, we’ve assumed that queries are equally expensive to label – for many tasks, labeling “costs” vary
less costly $ more costly $$$
![Page 56: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/56.jpg)
Example: Annota#on Time As Cost
• where does this variance come from? – some#mes annotator-‐dependent
– stochas#c effects
[Settles et. al, NIPS’08]
• do annota#on #mes vary among instances?
![Page 57: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/57.jpg)
Can Labeling Times be Predicted?
cost predictor: regression model using meta-features
[Settles et. al, NIPS’08]
![Page 58: Ac#ve&Learning&tom/10701_sp11/slides/settles-2.active.pdfsome slides adapted from: Aarti Singh, Rui Castro, Rob Nowak . Supervised&Learning& raw unlabeled data x 1,x 2,x 3,... supervised](https://reader034.vdocuments.net/reader034/viewer/2022042909/5f3afd5d142ff35e6008a8b5/html5/thumbnails/58.jpg)
Interes#ng Open Issues
• be0er cost-‐sensi#ve approaches • “crowdsourced” labels (noisy oracles) • batch ac#ve learning (many queries at once)
• mul#-‐task ac#ve learning
• HCI / user interface issues • data reusability