ensemble methods. “no free lunch theorem” wolpert and macready 1995
TRANSCRIPT
Ensemble Methods
“No free lunch theorem” Wolpert and Macready 1995
“No free lunch theorem” Wolpert and Macready 1995
Solution search also involves searching for learners
Different algorithms
Different algorithmsDifferent parameters
Different algorithmsDifferent parametersDifferent input
representations/features
Different algorithmsDifferent parametersDifferent input
representations/featuresDifferent data
Base learner
Diversity over accuracy
Model combination
VotingBaggingBoostingCascading
Data set = [1,2,3,4,5,6,7,8,9,10]
Samples: Input to learner 1 = [10,2,5,10,3] Input to learner 2 = [4,5,2,7,6,3] Input to learner 3 = [8,8,4,9,1]
Create complementary learners
Create complementary learnersTrain successive learners on the
mistakes of predecessors
Weak learners combine to a strong learner
Adaboost – Adaptive Boosting
Adaboost – Adaptive BoostingAllows for a smaller training set
Adaboost – Adaptive BoostingAllows for a smaller training setSimple classifiers
Adaboost – Adaptive BoostingAllows for a smaller training setSimple classifiersBinary
Modify probability of drawing examples from a training set based on errors
€
α1= 12log(
1− error
error)
€
α1= 12log(
1− .33
.33)
€
α1= 0.35€
error = 0.33
Step 3
Demo
Sequence classifiers by complexity
Sequence classifiers by complexityUse classifier j+1 if classifier j
doesn’t meet a confidence threshold
Sequence classifiers by complexityUse classifier j+1 if classifier j
doesn’t meet a confidence thresholdTrain cascading classifiers on
instances the previous classifier is not confident about
Sequence classifiers by complexityUse classifier j+1 if classifier j
doesn’t meet a confidence thresholdTrain cascading classifiers on
instances the previous classifier is not confident about
Most examples classified quickly, harder ones passed to more expensive classifiers
Boosting and Cascading
Object detection/trackingCollaborative filteringNeural networksOptical character recognition ++BiometricsData mining
Ensemble methods are proven effective, but why?