classification: ensemble methods combines multiple models construct multiple classifiers from...

CLASSIFICATION: Ensemble Methods

Combines multiple models Construct multiple classifiers from

training set Aggregate their predictions on

testing set

Meta-algorithm

CLASSIFICATION: Ensemble Methods

Improves stability and accuracy

Reduces variance

Helps avoid overfitting

Compensates for poor learning algorithms

Uses more computation

ENSEMBLE METHODS: Examples

Bagging (bootstrap aggregation) Bagging with MetaCost

Random forests

Boosting

Stacked generalization Usually used on different learning

algorithms

Bayesian model combination

ENSEMBLE METHODS: Bagging

Randomly create samples (with replacement) from a data set

Create classifiers (same type) for each sample

Run classifiers on testing sample

Use majority voting to determine classification of testing sample

ENSEMBLE METHODS: Bagging with MetaCost

Used when each model can output probability estimates

Probability estimates used to obtain expected cost of each prediction

Classifies training instances to minimize the expected cost

Learns new classifier

ENSEMBLE METHODS: Random Forests

Modification of applying bagging to tree learners

Uses only random subsets of features at each split

Promotes tree diversity

ENSEMBLE METHODS: Boosting

Seeks models that complement one another

Combines models of same type

New models constructed to better handle those instances incorrectly handled by previous models – focuses on hard to classify examples

Uses weighted averaging often adaptively

ENSEMBLE METHODS: Stacked Generalization

Introduced by David Wolpert, 1992

Other algorithms trained from training set

Stacking (“level-1”) algorithm uses predicitions from base (“level-0”) algorithms as inputs


Employs j-fold cross validation of training set

Train and test each of the level-0 algorithms using the split training data to create the level-0 models

Test each model on each split to create level-1 data


Can be used for both supervised and unsupervised learning

Best performers in Netflix competition were forms of stacked generalization

Can even create multiple levels of stacking(“level-2”, etc.)


Best performers in Netflix competition were forms of stacked generalization

Can even create multiple layers (“stacked stacking”)

Works best with class probabilities (Tang and Witten, 1999)

ENSEMBLE METHODS: Bayesian Model Combination

Built upon Bayes Model Averaging and Bayes Optimal Classifier

Bayes Optimal Classifier Ensemble (using Bayes’ rule) of all

hypotheses in hypothesis space

On average, it is the ideal ensemble


Bayes Model Averaging Approximates Bayes optimal classifer Samples from hypothesis space

Monte Carlo sampling Tends to promote overfitting Performs worse in practice than simpler

techniques (eg bagging)


Bayes Model Combination Correction to Bayes Model Averaging Uses model weightings to create samples Overcomes drawback of BMA giving weight

to single model Better performance than BMA or bagging

classification: ensemble methods combines multiple models construct multiple classifiers from...

Documents