e nsemble l earning : a da b oost jianping fan dept of computer science unc-charlotte

46
ENSEMBLE LEARNING: ADABOOST Jianping Fan Dept of Computer Science UNC-Charlotte

Upload: angel-welch

Post on 20-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

ENSEMBLE LEARNING: ADABOOST

Jianping Fan Dept of Computer Science UNC-Charlotte

Page 2: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

ENSEMBLE LEARNINGA machine learning paradigm where multiple learners are used to solve the problem

Problem

… ...… ...

Problem

Learner Learner Learner Learner

Previously:

Ensemble:

The generalization ability of the ensemble is usually significantly better than that of an individual learner

Boosting is one of the most important families of ensemble methods

Page 3: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

3

Bootstrapping

Bagging

Boosting (Schapire 1989)

Adaboost (Schapire 1995)

A BRIEF HISTORY Resampling for estimating statistic

Resampling for classifier design

Page 4: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOTSTRAP ESTIMATION

Repeatedly draw n samples from D

For each set of samples, estimate a statistic

The bootstrap estimate is the mean of the individual estimates

Used to estimate a statistic (parameter) and its variance

Page 5: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BAGGING - AGGREGATE BOOTSTRAPPING

For i = 1 .. MDraw n*<n samples from D with

replacementLearn classifier Ci

Final classifier is a vote of C1 .. CM

Increases classifier stability/reduces variance

Page 6: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BAGGING

f1

f2

fT

ML

ML

ML

f

Random sa

mple

with re

placem

ent

Random sample

with replacement

Page 7: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING

Training Sample

Weighted Sample

Weighted Sample

fT

f1

f2

f

ML

ML

ML

Page 8: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

REVISIT BAGGING

Page 9: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING CLASSIFIER

Page 10: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BAGGING VS BOOSTING

Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods.

Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.

Page 11: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING (SCHAPIRE 1989)BOOSTING (SCHAPIRE 1989) Randomly select n1 < n samples from D without

replacement to obtain D1Train weak learner C1

Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2Train weak learner C2

Select all samples from D that C1 and C2 disagree onTrain weak learner C3

Final classifier is vote of weak learners

Page 12: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

ADABOOST (SCHAPIRE ADABOOST (SCHAPIRE 1995)1995)Instead of sampling, re-weight

Previous weak learner has only 50% accuracy over new distribution

Can be used to learn weak classifiers

Final classification based on weighted vote of weak classifiers

Page 13: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

ADABOOST TERMSLearner = Hypothesis = Classifier

Weak Learner: < 50% error over any distribution

Strong Classifier: thresholded linear combination of weak learner outputs

Page 14: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

AdaBoostAdaptive

A learning algorithm

Building a strong classifier a lot of weaker ones

Boosting

Page 15: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

ADABOOST CONCEPT

1 { 1 }) , 1(h x

.

.

.

weak classifiers

slightly better than random

1

( )( )T

ttT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Page 16: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

WEAKER CLASSIFIERS

1 { 1 }) , 1(h x

.

.

.

weak classifiers

slightly better than random

1

( )( )T

ttT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Each weak classifier learns by considering one simple feature

T most beneficial features for classification should be selected

How to– define features?– select beneficial features?– train weak classifiers?– manage (weight) training

samples?– associate weight to each weak

classifier?

Page 17: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE STRONG CLASSIFIERS

1 { 1 }) , 1(h x

.

.

.

weak classifiers

slightly better than random

1

( )( )T

ttT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

How good the strong one will be?How good the strong one will be?

Page 18: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE ADABOOST ALGORITHM

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 1

1( ) , 1, ,mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

:probability distribution of 's at time ( )t iD i x t:probability distribution of 's at time ( )t iD i x t

• Weight classifier: 11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

minimize weighted errorminimize weighted error

for minimize exponential lossfor minimize exponential loss

Give error classified patterns more chance for learning.Give error classified patterns more chance for learning.

Page 19: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE ADABOOST ALGORITHM

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 1

1( ) , 1, ,mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

• Weight classifier: 11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

Page 20: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING ILLUSTRATION

Weak

Classifier 1

Page 21: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING ILLUSTRATION

Weights

Increased

Page 22: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE ADABOOST ALGORITHM

typicallywhere

the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set

where

Page 23: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING ILLUSTRATION

Weak

Classifier 2

Page 24: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING ILLUSTRATION

Weights

Increased

Page 25: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING ILLUSTRATION

Weak

Classifier 3

Page 26: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BOOSTING ILLUSTRATION

Final classifier is

a combination of weak classifiers

Page 27: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE ADABOOST ALGORITHM

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 1

1( ) , 1, ,mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

• Weight classifier: 11

ln2

tt

t

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

What goal the AdaBoost wants to reach?

Page 28: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE ADABOOST ALGORITHM

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

Initialization: 1

1( ) , 1, ,mD i i m

For :1, ,t T

• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

• Weight classifier:

• Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

What goal the AdaBoost wants to reach?

11ln

2t

tt

They are goal dependent.They are goal dependent.

Page 29: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

GOAL

Minimize exponential loss

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ,( ) yH x

x yloss H x E e

Page 30: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

GOAL

Minimize exponential loss

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ,( ) yH x

x yloss H x E e ( )yH x

Maximize the margin yH(x)

Page 31: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

GOALFinal classifier:

1

( ) ( )T

t tt

sign H x h x

( )exp ,( ) yH x

x yloss H x E e Minimize

( ) ( ), |t tyH x yH x

x y x yE e E E e x

Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x

Then, ( ) ( )TH x H x

1[ ( ) ( )] |t t ty H x h xx yE E e x

1 ( ) ( ) |t t tyH x y h xx yE E e e x

1 ( ) ( ( )) ( ( ))t t tyH xx t tE e e P y h x e P y h x

Page 32: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

( ) ( ), |t tyH x yH x

x y x yE e E E e x

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ,( ) yH x

x yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ( ) ( ( )) ( ( ))t t tyH xx t tE e e P y h x e P y h x

( ), 0tyH x

x yt

E e

Set

1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x

0

?t

Page 33: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

Final classifier:1

( ) ( )T

t tt

sign H x h x

Minimize

Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x

0

( ( ))1ln

2 ( ( ))t

tt

P y h x

P y h x

11

ln2

tt

t

(error)t P

?t ( )

exp ,( ) yH xx yloss H x E e

( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i

1

( )[ ( )]m

t i j ii

D i y h x

Page 34: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

( )exp ,( ) yH x

x yloss H x E e

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x

0

( ( ))1ln

2 ( ( ))t

tt

P y h x

P y h x

11

ln2

tt

t

?t Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

I nitialization: 11( ) , 1, ,mD i i m

For :1, ,t T

•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

•Weight classifier:11

ln2

tt

t

•Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

I nitialization: 11( ) , 1, ,mD i i m

For :1, ,t T

•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

•Weight classifier:11

ln2

tt

t

•Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

(error)t P

( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i

1

( )[ ( )]m

t i j ii

D i y h x

Page 35: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x

0

( ( ))1ln

2 ( ( ))t

tt

P y h x

P y h x

11

ln2

tt

t

1 ?tD

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

I nitialization: 11( ) , 1, ,mD i i m

For :1, ,t T

•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

•Weight classifier:11

ln2

tt

t

•Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

I nitialization: 11( ) , 1, ,mD i i m

For :1, ,t T

•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

•Weight classifier:11

ln2

tt

t

•Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

(error)t P

( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i

1

( )[ ( )]m

t i j ii

D i y h x

Page 36: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

1, ,

t t t tyH yH y hx y x yE e E e e

1 2 2 2,

11

2tyH

x y t t t tE e y h y h

1 2 2 2,

1arg min 1

2tyH

t x y t th

h E e y h y h

2 2 1y h

1 2,

1arg min 1

2tyH

t x y t th

h E e y h

1 21arg min 1 |

2tyH

t x y t th

h E E e y h x

Page 37: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

1 21arg min 1 |

2tyH

t x y t th

h E E e y h x

1arg min |tyHt x y t

hh E E e y h x

1arg max |tyHt x y

hh E E e yh x

1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H xt x

hh E h x e P y x h x e P y x

Page 38: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

( )1, ~ ( | )arg max ( )yH xtt x y e P y xh

h E yh x maximized when ( ) y h x x

( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x

h x sign P y x P y x

( )1, ~ ( | )( ) |yH xtt x y e P y x

h x sign E y x

1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H xt x

hh E h x e P y x h x e P y x

Page 39: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x

h x sign P y x P y x

1 ( ), ~ ( | )tyH xx y e P y xAt time t

Page 40: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

with

Final classifier:1

( ) ( )T

t tt

sign H x h x

( )exp ~ ,( ) yH x

x D yloss H x E e Minimize

Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x

Then, ( ) ( )TH x H x

1 ?tD

1 ( ), ~ ( | )tyH xx y e P y xAt time t

At time 1

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

I nitialization: 11( ) , 1, ,mD i i m

For :1, ,t T

•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

•Weight classifier:11

ln2

tt

t

•Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y

I nitialization: 11( ) , 1, ,mD i i m

For :1, ,t T

•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j iih

h D i y h x

•Weight classifier:11

ln2

tt

t

•Update distribution: 1

( ) exp[ ( )], is for normalizati( ) ont t i t i

t tt

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t tt

sign H x h x

, ~ ( | )x y P y x ( | ) 1i iP y x 11

1 1(1)D

Z m

At time t+1( ), ~ ( | )tyH xx y e P y x ( )t tyh x

tD e

1

( ) exp[ ( ), is for normaliza i

]( ) t ont t i t i

t tt

D i y h xD i Z

Z

Page 41: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

41

PROS AND CONS OF ADABOOST

AdvantagesVery simple to implementDoes feature selection resulting

in relatively simple classifierFairly good generalization

DisadvantagesSuboptimal solutionSensitive to noisy data and

outliers

Page 42: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

INTUITION

Train a set of weak hypotheses: h1, …., hT.

The combined hypothesis H is a weighted majority vote of the T weak hypotheses. Each hypothesis ht has a weight αt.

During the training, focus on the examples that are misclassified. At round t, example xi has the weight Dt(i).

Page 43: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

BASIC SETTING Binary classification problem Training data:

Dt(i): the weight of xi at round t. D1(i)=1/m.

A learner L that finds a weak hypothesis ht: X Y given the training set and Dt

The error of a weak hypothesis ht:

}1,1{,),,(),....,,( 11 YyXxwhereyxyx iimm

Page 44: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE BASIC ADABOOST ALGORITHM

For t=1, …, T

• Train weak learner using training data and Dt

• Get ht: X {-1,1} with error

• Choose

• Update

iit yxhitt iD

)(:

)(

t

tt

1ln

2

1

t

xhyt

iit

iit

t

tt

Z

eiD

yxhife

yxhife

Z

iDiD

itit

t

t

)(

1

)(

)(

)(*

)()(

Page 45: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

THE GENERAL ADABOOST ALGORITHM

Page 46: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

46

PROS AND CONS OF ADABOOST

AdvantagesVery simple to implementDoes feature selection resulting

in relatively simple classifierFairly good generalization

DisadvantagesSuboptimal solutionSensitive to noisy data and

outliers