agnostic active learning maria-florina balcan, alina beygelzimer, john langford * : carnegie...

Agnostic Active Learning

Maria-Florina Balcan*, Alina Beygelzimer**, John Langford***

* : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research

Journal of Computer and System Sciences 2009

2010-10-08

Presented by Yongjin Kwon

Copyright 2010 by CEBT

Introduction

Nowadays a plentiful amount of data are cheaply avail-able and are used to find useful patterns or concepts.

Traditional machine learning has concentrated on the problems that require labeled data only.

However, labeling is expensive!

speech recognition, document classification, etc.

How can we reduce the number of labeled data required?

Exploit the abundance of unlabeled data!

2


Introduction (Cont’d)

Semi-supervised Learning

Use a set of unlabeled data under additional assumptions.

Active Learning

Ask for labels of “informative” data.

3

Supervised Learn-ing

Semi-supervised and Active Learn-

ing

more informa-tive

less informa-tive


Active Learning

If the machine actively tries to learn some “informative” data, it will perform better with less training!

4

Answer

Query “informative” points only.

(b) Active Learn-ing

One-way teach-ing

(a) Passive Learn-ing

Learn some-thing

Everything should be pre-pared!


Active Learning (Cont’d)

What are “informative” points?

If the learner is NOT unsure about the label of a point, then the point will be less informative.

5

less informa-tive

more informa-tive


Typical Active Learning Approach

Start by querying the labels of a few randomly-chosen points.

Repeat the following process:

Determine the decision boundary on current set of labeled points.

Choose the next unlabeled point closest to the current de-cision boundary. (i.e. the most “uncertain” or “informative” point)

Query that point and obtain its label.

6

Decision Bound-

ary

Binary Classifica-tion:


Improvement in Label Complexity

1-D Binary Classification in the noise-free setting

Find the optimal threshold (or classifier).

In order to achieve misclassification error ≤ ε,

– Supervised Learning : O(1/ ε) labeled examples are needed.

– Active Learning : O(log 1/ ε) labeled examples are needed!

Exponential improvement in label complexity!!

How general is this phenomenon?

7

Number of label re-quests to achieve a

given accuracy

thresh-old

+++

- - -

(Binary Search)


CAL Active Learning

General-purpose learning strategy (in the noise-free set-ting)

8

Region of uncer-tainty

Binary Classifica-tion

Rectangular Classi-fier

Ask its la-bel!


Lebel Complexity of CAL

In realizable (or noise-free) case

Label complexity for misclassification error ≤ ε,

– Supervised Learning : O(1/ ε) labeled examples

– Active Learning : O(log 1/ ε) labeled examples

In unrealizable (or agnostic) case

There is no perfect classifier of any form!

A small amount of adversarial noise can make CAL fail to find the (ε -)optimal classifier!

A noise-robust algorithm is needed…

9


Threshold

Opti-mal

Classi-fier


A Algorithm

General-purpose learning strategy (in the agnostic set-ting)

Do NOT trust answers from the oracle completely.

Compare error bounds between classifiers.

10

2

Still uncer-tain

(b) Unrealizable Case


Linear Classifier

Must be RED!

(a) Realizable Case

Now it must be RED!

Blue

Best Classi-fier?

Best Classi-fier?

Best Classi-fier!


Size of region of uncer-tainty

In my opinion, the paper is wrong at these points.

Upper bound of er-ror

Lower bound of er-ror

A Algorithm (Cont’d)

General-purpose learning strategy (in the agnostic set-ting)

Do NOT trust answers from the oracle completely.

Compare error bounds between classifiers.

11

2



12

2


Threshold

Error Rates of Classi-fiers

Sampling and Label-ing

Err

or

Rate

Do-main

Upper Bound

Lower Bound

min upper bound

Remove classifiers such that



Correctness

It returns an ε -optimal classifier with high probability.

Fallback Analysis

It is never much worse than a standard batch, bound-based algorithm in terms of label complexity.

Improvement in label complexity

It achieve great improvement compared to passive learning in some special cases (thresholds, and homogeneous linear sepa-rators under a uniform distribution).

13

2


Conclusions

A Algorithm

First active learning algorithm that finds an (ε -)optimal classifier in the unrealizable (or agnostic) case

It achieves a (near-)exponential improvement in label com-plexity for several unrealizable settings.

It never requires substantially more labeling requests than passive learning.

14

2


Discussions

This paper shows a theoretical approach of active learn-ing, especially in the unrealizable (or agnostic) case.

It does NOT ensure the improvement in label complexity for any kind of hypothesis class.

The A Algorithm is intended to theoretically extend the power of active learning to the unrealizable case.

How can we apply it for practical purposes?

15

2

agnostic active learning maria-florina balcan*, alina beygelzimer**, john langford*** * : carnegie...

Documents

agnostic active learning maria-florina balcan, alina beygelzimer, john langford * : carnegie...