agnostic active learning maria-florina balcan*, alina beygelzimer**, john langford*** * : carnegie...
TRANSCRIPT
![Page 1: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/1.jpg)
Agnostic Active Learning
Maria-Florina Balcan*, Alina Beygelzimer**, John Langford***
* : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research
Journal of Computer and System Sciences 2009
2010-10-08
Presented by Yongjin Kwon
![Page 2: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/2.jpg)
Copyright 2010 by CEBT
Introduction
Nowadays a plentiful amount of data are cheaply avail-able and are used to find useful patterns or concepts.
Traditional machine learning has concentrated on the problems that require labeled data only.
However, labeling is expensive!
speech recognition, document classification, etc.
How can we reduce the number of labeled data required?
Exploit the abundance of unlabeled data!
2
![Page 3: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/3.jpg)
Copyright 2010 by CEBT
Introduction (Cont’d)
Semi-supervised Learning
Use a set of unlabeled data under additional assumptions.
Active Learning
Ask for labels of “informative” data.
3
Supervised Learn-ing
Semi-supervised and Active Learn-
ing
more informa-tive
less informa-tive
![Page 4: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/4.jpg)
Copyright 2010 by CEBT
Active Learning
If the machine actively tries to learn some “informative” data, it will perform better with less training!
4
Answer
Query “informative” points only.
(b) Active Learn-ing
One-way teach-ing
(a) Passive Learn-ing
Learn some-thing
Everything should be pre-pared!
![Page 5: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/5.jpg)
Copyright 2010 by CEBT
Active Learning (Cont’d)
What are “informative” points?
If the learner is NOT unsure about the label of a point, then the point will be less informative.
5
less informa-tive
more informa-tive
![Page 6: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/6.jpg)
Copyright 2010 by CEBT
Typical Active Learning Approach
Start by querying the labels of a few randomly-chosen points.
Repeat the following process:
Determine the decision boundary on current set of labeled points.
Choose the next unlabeled point closest to the current de-cision boundary. (i.e. the most “uncertain” or “informative” point)
Query that point and obtain its label.
6
Decision Bound-
ary
Binary Classifica-tion:
![Page 7: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/7.jpg)
Copyright 2010 by CEBT
Improvement in Label Complexity
1-D Binary Classification in the noise-free setting
Find the optimal threshold (or classifier).
In order to achieve misclassification error ≤ ε,
– Supervised Learning : O(1/ ε) labeled examples are needed.
– Active Learning : O(log 1/ ε) labeled examples are needed!
Exponential improvement in label complexity!!
How general is this phenomenon?
7
Number of label re-quests to achieve a
given accuracy
thresh-old
+++
- - -
(Binary Search)
![Page 8: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/8.jpg)
Copyright 2010 by CEBT
CAL Active Learning
General-purpose learning strategy (in the noise-free set-ting)
8
Region of uncer-tainty
Binary Classifica-tion
Rectangular Classi-fier
Ask its la-bel!
![Page 9: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/9.jpg)
Copyright 2010 by CEBT
Lebel Complexity of CAL
In realizable (or noise-free) case
Label complexity for misclassification error ≤ ε,
– Supervised Learning : O(1/ ε) labeled examples
– Active Learning : O(log 1/ ε) labeled examples
In unrealizable (or agnostic) case
There is no perfect classifier of any form!
A small amount of adversarial noise can make CAL fail to find the (ε -)optimal classifier!
A noise-robust algorithm is needed…
9
Binary Classifica-tion
Threshold
Opti-mal
Classi-fier
![Page 10: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/10.jpg)
Copyright 2010 by CEBT
A Algorithm
General-purpose learning strategy (in the agnostic set-ting)
Do NOT trust answers from the oracle completely.
Compare error bounds between classifiers.
10
2
Still uncer-tain
(b) Unrealizable Case
Binary Classifica-tion
Linear Classifier
Must be RED!
(a) Realizable Case
Now it must be RED!
Blue
Best Classi-fier?
Best Classi-fier?
Best Classi-fier!
![Page 11: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/11.jpg)
Copyright 2010 by CEBT
Size of region of uncer-tainty
In my opinion, the paper is wrong at these points.
Upper bound of er-ror
Lower bound of er-ror
A Algorithm (Cont’d)
General-purpose learning strategy (in the agnostic set-ting)
Do NOT trust answers from the oracle completely.
Compare error bounds between classifiers.
11
2
![Page 12: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/12.jpg)
Copyright 2010 by CEBT
A Algorithm (Cont’d)
12
2
Binary Classifica-tion
Threshold
Error Rates of Classi-fiers
Sampling and Label-ing
Err
or
Rate
Do-main
Upper Bound
Lower Bound
min upper bound
Remove classifiers such that
![Page 13: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/13.jpg)
Copyright 2010 by CEBT
A Algorithm (Cont’d)
Correctness
It returns an ε -optimal classifier with high probability.
Fallback Analysis
It is never much worse than a standard batch, bound-based algorithm in terms of label complexity.
Improvement in label complexity
It achieve great improvement compared to passive learning in some special cases (thresholds, and homogeneous linear sepa-rators under a uniform distribution).
13
2
![Page 14: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/14.jpg)
Copyright 2010 by CEBT
Conclusions
A Algorithm
First active learning algorithm that finds an (ε -)optimal classifier in the unrealizable (or agnostic) case
It achieves a (near-)exponential improvement in label com-plexity for several unrealizable settings.
It never requires substantially more labeling requests than passive learning.
14
2
![Page 15: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,](https://reader036.vdocuments.net/reader036/viewer/2022083009/5697bff61a28abf838cbdd92/html5/thumbnails/15.jpg)
Copyright 2010 by CEBT
Discussions
This paper shows a theoretical approach of active learn-ing, especially in the unrealizable (or agnostic) case.
It does NOT ensure the improvement in label complexity for any kind of hypothesis class.
The A Algorithm is intended to theoretically extend the power of active learning to the unrealizable case.
How can we apply it for practical purposes?
15
2