agnostic active learning maria-florina balcan*, alina beygelzimer**, john langford*** * : carnegie...

15
Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research Journal of Computer and System Sciences 2009 2010-10-08 Presented by Yongjin Kwon

Upload: eunice-ferguson

Post on 18-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Agnostic Active Learning

Maria-Florina Balcan*, Alina Beygelzimer**, John Langford***

* : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research

Journal of Computer and System Sciences 2009

2010-10-08

Presented by Yongjin Kwon

Page 2: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Introduction

Nowadays a plentiful amount of data are cheaply avail-able and are used to find useful patterns or concepts.

Traditional machine learning has concentrated on the problems that require labeled data only.

However, labeling is expensive!

speech recognition, document classification, etc.

How can we reduce the number of labeled data required?

Exploit the abundance of unlabeled data!

2

Page 3: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Introduction (Cont’d)

Semi-supervised Learning

Use a set of unlabeled data under additional assumptions.

Active Learning

Ask for labels of “informative” data.

3

Supervised Learn-ing

Semi-supervised and Active Learn-

ing

more informa-tive

less informa-tive

Page 4: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Active Learning

If the machine actively tries to learn some “informative” data, it will perform better with less training!

4

Answer

Query “informative” points only.

(b) Active Learn-ing

One-way teach-ing

(a) Passive Learn-ing

Learn some-thing

Everything should be pre-pared!

Page 5: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Active Learning (Cont’d)

What are “informative” points?

If the learner is NOT unsure about the label of a point, then the point will be less informative.

5

less informa-tive

more informa-tive

Page 6: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Typical Active Learning Approach

Start by querying the labels of a few randomly-chosen points.

Repeat the following process:

Determine the decision boundary on current set of labeled points.

Choose the next unlabeled point closest to the current de-cision boundary. (i.e. the most “uncertain” or “informative” point)

Query that point and obtain its label.

6

Decision Bound-

ary

Binary Classifica-tion:

Page 7: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Improvement in Label Complexity

1-D Binary Classification in the noise-free setting

Find the optimal threshold (or classifier).

In order to achieve misclassification error ≤ ε,

– Supervised Learning : O(1/ ε) labeled examples are needed.

– Active Learning : O(log 1/ ε) labeled examples are needed!

Exponential improvement in label complexity!!

How general is this phenomenon?

7

Number of label re-quests to achieve a

given accuracy

thresh-old

+++

- - -

(Binary Search)

Page 8: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

CAL Active Learning

General-purpose learning strategy (in the noise-free set-ting)

8

Region of uncer-tainty

Binary Classifica-tion

Rectangular Classi-fier

Ask its la-bel!

Page 9: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Lebel Complexity of CAL

In realizable (or noise-free) case

Label complexity for misclassification error ≤ ε,

– Supervised Learning : O(1/ ε) labeled examples

– Active Learning : O(log 1/ ε) labeled examples

In unrealizable (or agnostic) case

There is no perfect classifier of any form!

A small amount of adversarial noise can make CAL fail to find the (ε -)optimal classifier!

A noise-robust algorithm is needed…

9

Binary Classifica-tion

Threshold

Opti-mal

Classi-fier

Page 10: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

A Algorithm

General-purpose learning strategy (in the agnostic set-ting)

Do NOT trust answers from the oracle completely.

Compare error bounds between classifiers.

10

2

Still uncer-tain

(b) Unrealizable Case

Binary Classifica-tion

Linear Classifier

Must be RED!

(a) Realizable Case

Now it must be RED!

Blue

Best Classi-fier?

Best Classi-fier?

Best Classi-fier!

Page 11: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Size of region of uncer-tainty

In my opinion, the paper is wrong at these points.

Upper bound of er-ror

Lower bound of er-ror

A Algorithm (Cont’d)

General-purpose learning strategy (in the agnostic set-ting)

Do NOT trust answers from the oracle completely.

Compare error bounds between classifiers.

11

2

Page 12: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

A Algorithm (Cont’d)

12

2

Binary Classifica-tion

Threshold

Error Rates of Classi-fiers

Sampling and Label-ing

Err

or

Rate

Do-main

Upper Bound

Lower Bound

min upper bound

Remove classifiers such that

Page 13: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

A Algorithm (Cont’d)

Correctness

It returns an ε -optimal classifier with high probability.

Fallback Analysis

It is never much worse than a standard batch, bound-based algorithm in terms of label complexity.

Improvement in label complexity

It achieve great improvement compared to passive learning in some special cases (thresholds, and homogeneous linear sepa-rators under a uniform distribution).

13

2

Page 14: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Conclusions

A Algorithm

First active learning algorithm that finds an (ε -)optimal classifier in the unrealizable (or agnostic) case

It achieves a (near-)exponential improvement in label com-plexity for several unrealizable settings.

It never requires substantially more labeling requests than passive learning.

14

2

Page 15: Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Copyright 2010 by CEBT

Discussions

This paper shows a theoretical approach of active learn-ing, especially in the unrealizable (or agnostic) case.

It does NOT ensure the improvement in label complexity for any kind of hypothesis class.

The A Algorithm is intended to theoretically extend the power of active learning to the unrealizable case.

How can we apply it for practical purposes?

15

2