csc 411: introduction to machine learningmren/teach/csc411_19s/lec/lec05_matt.… · uoft csc411...
TRANSCRIPT
CSC 411: Introduction to Machine LearningLecture 5: Ensembles II
Mengye Ren and Matthew MacKay
University of Toronto
UofT CSC411 2019 Winter Lecture 02 1 / 24
Boosting
Recall that an ensemble is a set of predictors whose individual decisions arecombined in some way to classify new examples.
(Previous lecture) Bagging: Train classifiers independently on randomsubsets of the training data.
(This lecture) Boosting: Train classifiers sequentially, each time focusing ontraining data points that were previously misclassified.
Let’s start with the concepts of weighted training sets and weaklearner/classifier (or base classifiers).
UofT CSC411 2019 Winter Lecture 02 2 / 24
Weighted Training set
The misclassification error∑N
n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally
Key idea: can learn a classifier using different costs (aka weights) forexamples
I Classifier “tries harder” on examples with higher cost
Change cost function:
N∑
n=1
I[h(x (n)) 6= t(n)] becomesN∑
n=1
w (n)I[h(x (n)) 6= t(n)]
Usually require each w (n) > 0 and∑N
n=1 w(n) = 1
UofT CSC411 2019 Winter Lecture 02 3 / 24
Weighted Training set
The misclassification error∑N
n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally
Key idea: can learn a classifier using different costs (aka weights) forexamples
I Classifier “tries harder” on examples with higher cost
Change cost function:
N∑
n=1
I[h(x (n)) 6= t(n)] becomesN∑
n=1
w (n)I[h(x (n)) 6= t(n)]
Usually require each w (n) > 0 and∑N
n=1 w(n) = 1
UofT CSC411 2019 Winter Lecture 02 3 / 24
Weighted Training set
The misclassification error∑N
n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally
Key idea: can learn a classifier using different costs (aka weights) forexamples
I Classifier “tries harder” on examples with higher cost
Change cost function:
N∑
n=1
I[h(x (n)) 6= t(n)] becomesN∑
n=1
w (n)I[h(x (n)) 6= t(n)]
Usually require each w (n) > 0 and∑N
n=1 w(n) = 1
UofT CSC411 2019 Winter Lecture 02 3 / 24
Weighted Training set
The misclassification error∑N
n=1 I[h(x (n)) 6= t(n)] error weights each trainingexample equally
Key idea: can learn a classifier using different costs (aka weights) forexamples
I Classifier “tries harder” on examples with higher cost
Change cost function:
N∑
n=1
I[h(x (n)) 6= t(n)] becomesN∑
n=1
w (n)I[h(x (n)) 6= t(n)]
Usually require each w (n) > 0 and∑N
n=1 w(n) = 1
UofT CSC411 2019 Winter Lecture 02 3 / 24
Weak Learner/Classifier
(Informal) Weak learner is a learning algorithm that outputs a hypothesis(e.g., a classifier) that performs slightly better than chance, e.g., it predictsthe correct label with probability 0.5 + ε in binary label case.
I For weighted training set, gets slightly less than 0.5 error
We are interested in weak learners that are computationally efficient.
I Decision treesI Even simpler: Decision Stump: A decision tree with only a single split
[Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that itsguarantee holds only probabilistically.]
UofT CSC411 2019 Winter Lecture 02 4 / 24
Weak Learner/Classifier
(Informal) Weak learner is a learning algorithm that outputs a hypothesis(e.g., a classifier) that performs slightly better than chance, e.g., it predictsthe correct label with probability 0.5 + ε in binary label case.
I For weighted training set, gets slightly less than 0.5 error
We are interested in weak learners that are computationally efficient.
I Decision treesI Even simpler: Decision Stump: A decision tree with only a single split
[Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that itsguarantee holds only probabilistically.]
UofT CSC411 2019 Winter Lecture 02 4 / 24
Weak Learner/Classifier
(Informal) Weak learner is a learning algorithm that outputs a hypothesis(e.g., a classifier) that performs slightly better than chance, e.g., it predictsthe correct label with probability 0.5 + ε in binary label case.
I For weighted training set, gets slightly less than 0.5 error
We are interested in weak learners that are computationally efficient.
I Decision treesI Even simpler: Decision Stump: A decision tree with only a single split
[Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that itsguarantee holds only probabilistically.]
UofT CSC411 2019 Winter Lecture 02 4 / 24
Weak Classifiers
These weak classifiers, which are decision stumps, consist of the set of horizontaland vertical half spaces.
Vertical half spaces Horizontal half spaces
UofT CSC411 2019 Winter Lecture 02 5 / 24
Weak Classifiers
Vertical half spaces Horizontal half spaces
A single weak classifier is not capable of making the training error verysmall. It only perform slightly better than chance, i.e., the error of classifierh according to the given weights {w (1), . . . ,w (N)} (with
∑Nn=1 w
(n) = 1 andw (n) ≥ 0)
err =N∑
n=1
w (n)I[h(x(n)) 6= t(n)]
is at most 12 − γ for some γ > 0.
Can we combine a set of weak classifiers in order to make a better ensembleof classifiers?
UofT CSC411 2019 Winter Lecture 02 6 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our
new classifier.4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our
new classifier.4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our
new classifier.4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.
3. We add this weak classifier to the ensemble of classifiers. This is ournew classifier.
4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our
new classifier.
4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our
new classifier.4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost (Adaptive Boosting)
Boosting: Train classifiers sequentially, each time assigning higher weight totraining data points that were previously misclassified.
Key steps of AdaBoost:
1. At each iteration we re-weight the training samples by assigning largerweights to samples (i.e., data points) that were classified incorrectly.
2. We train a new weak classifier based on the re-weighted samples.3. We add this weak classifier to the ensemble of classifiers. This is our
new classifier.4. We repeat the process many times.
The weak learner needs to minimize weighted error.
AdaBoost reduces bias by making each classifier focus on previous mistakes.
UofT CSC411 2019 Winter Lecture 02 7 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]
Return H(x) = sign(∑T
t=1 αtht(x))
UofT CSC411 2019 Winter Lecture 02 8 / 24
AdaBoost Algorithm
Input: Data DN = {x(n), t(n)}Nn=1, weak classifier WeakLearn (a classificationprocedure that return a classifier from base hypothesis space H withh : x→ {−1,+1} for h ∈ H), number of iterations T
Output: Classifier H(x)
Initialize sample weights: w (n) = 1N
for n = 1, . . . ,N
For t = 1, . . . ,T
I Fit a classifier to data using weighted samples (ht ←WeakLearn(DN ,w)),e.g.,
ht ← argminh∈H
N∑n=1
w (n)I{h(x(n)) 6= t(n)}
I Compute weighted error errt =∑N
n=1 w (n)I{ht (x(n)) 6=t(n)}∑Nn=1 w (n)
I Compute classifier coefficient αt = 12
log 1−errterrt
I Update data weights
w (n) ← w (n) exp(−αtt
(n)ht(x(n))) [≡ w (n) exp
(2αtI{ht(x(n)) 6= t(n)}
)]Return H(x) = sign
(∑Tt=1 αtht(x)
)UofT CSC411 2019 Winter Lecture 02 8 / 24
Weighting Intuition
Recall: H(x) = sign(∑T
t=1 αtht(x))
where αt = 12 log 1−errt
errt
Weak classifiers which get lower weighted error get more weight inthe final classifier
Also: w (n) ← w (n) exp(2αtI{ht(x(n)) 6= t(n)}
)
I If errt ≈ 0, αt high so misclassified examples more attentionI If errt ≈ 0.5, αt low so misclassified examples are not emphasized
UofT CSC411 2019 Winter Lecture 02 9 / 24
Weighting Intuition
Recall: H(x) = sign(∑T
t=1 αtht(x))
where αt = 12 log 1−errt
errt
Weak classifiers which get lower weighted error get more weight inthe final classifier
Also: w (n) ← w (n) exp(2αtI{ht(x(n)) 6= t(n)}
)
I If errt ≈ 0, αt high so misclassified examples more attentionI If errt ≈ 0.5, αt low so misclassified examples are not emphasized
UofT CSC411 2019 Winter Lecture 02 9 / 24
Weighting Intuition
Recall: H(x) = sign(∑T
t=1 αtht(x))
where αt = 12 log 1−errt
errt
Weak classifiers which get lower weighted error get more weight inthe final classifier
Also: w (n) ← w (n) exp(2αtI{ht(x(n)) 6= t(n)}
)
I If errt ≈ 0, αt high so misclassified examples more attentionI If errt ≈ 0.5, αt low so misclassified examples are not emphasized
UofT CSC411 2019 Winter Lecture 02 9 / 24
AdaBoost Example
Training data
[Slide credit: Verma & Thrun]
UofT CSC411 2019 Winter Lecture 02 10 / 24
AdaBoost Example
Round 1
w =
(1
10, . . . ,
1
10
)⇒ Train a classifier (using w)⇒ err1 =
∑10n=1 w
(n)I[h1(x(n)) 6= t(n)]∑10n=1 w
(n)=
3
10
⇒α1 =1
2log
1− err1
err1=
1
2log(
1
0.3− 1) ≈ 0.42⇒ H(x) = sign (α1h1(x))
[Slide credit: Verma & Thrun]
UofT CSC411 2019 Winter Lecture 02 11 / 24
AdaBoost Example
Round 1
w =
(1
10, . . . ,
1
10
)⇒ Train a classifier (using w)⇒ err1 =
∑10n=1 w
(n)I[h1(x(n)) 6= t(n)]∑10n=1 w
(n)=
3
10
⇒α1 =1
2log
1− err1
err1=
1
2log(
1
0.3− 1) ≈ 0.42⇒ H(x) = sign (α1h1(x))
[Slide credit: Verma & Thrun]
UofT CSC411 2019 Winter Lecture 02 11 / 24
AdaBoost Example
Round 2
w← new weights⇒ Train a classifier (using w)⇒ err2 =
∑10n=1 w
(n)I{h2(x(n)) 6= t(n)}∑10n=1 w
(n)= 0.21
⇒α2 =1
2log
1− err2
err2=
1
2log(
1
0.21− 1) ≈ 0.66⇒ H(x) = sign (α1h1(x) + α2h2(x))
[Slide credit: Verma & Thrun]UofT CSC411 2019 Winter Lecture 02 12 / 24
AdaBoost Example
Round 3
w← new weights⇒ Train a classifier (using w)⇒ err3 =
∑10n=1 w
(n)I{h3(x(n)) 6= t(n)}∑10i=1 w
(n)= 0.14
⇒α3 =1
2log
1− err3
err3=
1
2log(
1
0.14− 1) ≈ 0.91⇒ H(x) = sign (α1h1(x) + α2h2(x) + α3h3(x))
[Slide credit: Verma & Thrun]
UofT CSC411 2019 Winter Lecture 02 13 / 24
AdaBoost Example
Final classifier
[Slide credit: Verma & Thrun]
UofT CSC411 2019 Winter Lecture 02 14 / 24
AdaBoost Algorithm
Samples h1<latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit><latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit><latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit><latexit sha1_base64="WQi1lWWoKHIxIqloXQAEwQJU1/k=">AAACP3icdVBLS8NAGNz4rPXV6tFLsCieSiKCHou9eKxoH9CGstlskqX7CLsboYT8BK/6e/wZ/gJv4tWb2zQH29KBD4aZb2AYP6FEacf5tDY2t7Z3dit71f2Dw6PjWv2kp0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcn7Znff8FSEcGf9TTBHoMRJyFBUBvpKR6741rDaToF7FXilqQBSnTGdetyFAiUMsw1olCpoesk2sug1ARRnFdHqcIJRBMY4aGhHDKsvKzomtsXRgnsUEhzXNuF+j+RQabUlPnmk0Edq2VvJq7zdMzyRY1GQhIjE7TGWGqrwzsvIzxJNeZoXjZMqa2FPRvPDojESNOpIRCZPEE2iqGESJuJq6MimLUFY5AHKjfLuss7rpLeddN1mu7jTaN1X25cAWfgHFwBF9yCFngAHdAFCETgFbyBd+vD+rK+rZ/564ZVZk7BAqzfP80ksI4=</latexit>
Re-weighted Samples
Re-weighted Samples
Re-weighted Samples
h2<latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit><latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit><latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit><latexit sha1_base64="RJacZp1vCxsEVFHPWV5JdBplswg=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVJIi6LHYi8eK9gFtKJvNJl26j7C7EUrIT/Cqv8ef4S/wJl69uU1zsC0d+GCY+QaG8WNKlHacT6u0tb2zu1ferxwcHh2fVGunPSUSiXAXCSrkwIcKU8JxVxNN8SCWGDKf4r4/bc/9/guWigj+rGcx9hiMOAkJgtpIT5Nxc1ytOw0nh71O3ILUQYHOuGZdjQKBEoa5RhQqNXSdWHsplJogirPKKFE4hmgKIzw0lEOGlZfmXTP70iiBHQppjms7V/8nUsiUmjHffDKoJ2rVm4ubPD1h2bJGIyGJkQnaYKy01eGdlxIeJxpztCgbJtTWwp6PZwdEYqTpzBCITJ4gG02ghEibiSujPJi2BWOQByozy7qrO66TXrPhOg338abeui82LoNzcAGugQtuQQs8gA7oAgQi8ArewLv1YX1Z39bP4rVkFZkzsATr9w/O/bCP</latexit>
h3<latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit><latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit><latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit><latexit sha1_base64="BTVvM4uCgypSE15+jQl7JzQyFQw=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIV9FjsxWNF+4A2lM1mky7dR9jdCCXkJ3jV3+PP8Bd4E6/e3KY52JYOfDDMfAPD+DElSjvOp1Xa2Nza3invVvb2Dw6PqrXjrhKJRLiDBBWy70OFKeG4o4mmuB9LDJlPcc+ftGZ+7wVLRQR/1tMYewxGnIQEQW2kp/HoelStOw0nh71K3ILUQYH2qGZdDAOBEoa5RhQqNXCdWHsplJogirPKMFE4hmgCIzwwlEOGlZfmXTP73CiBHQppjms7V/8nUsiUmjLffDKox2rZm4nrPD1m2aJGIyGJkQlaYyy11eGdlxIeJxpzNC8bJtTWwp6NZwdEYqTp1BCITJ4gG42hhEibiSvDPJi2BGOQByozy7rLO66S7lXDdRru4029eV9sXAan4AxcAhfcgiZ4AG3QAQhE4BW8gXfrw/qyvq2f+WvJKjInYAHW7x/Q1rCQ</latexit>
hT<latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit><latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit><latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit><latexit sha1_base64="OPt3ynkhCqxqBzDpgrYhiyO9cFA=">AAACP3icdVBLS8NAGNzUV62vVo9egkXxVBIR9FjsxWPFvqANZbPZJEv3EXY3Qgn9CV719/gz/AXexKs3t2kOtqUDHwwz38AwfkKJ0o7zaZW2tnd298r7lYPDo+OTau20p0QqEe4iQYUc+FBhSjjuaqIpHiQSQ+ZT3Pcnrbnff8FSEcE7eppgj8GIk5AgqI30HI8742rdaTg57HXiFqQOCrTHNetqFAiUMsw1olCpoesk2sug1ARRPKuMUoUTiCYwwkNDOWRYeVnedWZfGiWwQyHNcW3n6v9EBplSU+abTwZ1rFa9ubjJ0zGbLWs0EpIYmaANxkpbHd57GeFJqjFHi7JhSm0t7Pl4dkAkRppODYHI5AmyUQwlRNpMXBnlwawlGIM8UDOzrLu64zrp3TRcp+E+3dabD8XGZXAOLsA1cMEdaIJH0AZdgEAEXsEbeLc+rC/r2/pZvJasInMGlmD9/gEN3rCx</latexit>
H(x) = sign
TX
t=1
↵tht(x)
!
<latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit><latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit><latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit><latexit sha1_base64="GB1VUXh93SXCU1F/0RNvxLUajOE=">AAACfHicdVHLattAFB2rr9R9Oe2ym6Fui0OpkUppskkJzSbLFOIkELnianwlDZmHmLkqMUJ/k6/Jtt30Z0rHjheNQw4MHM65B+49k9dKeorjP73o3v0HDx9tPO4/efrs+YvB5stjbxsncCKssu40B49KGpyQJIWntUPQucKT/Hx/4Z/8ROelNUc0r3GqoTSykAIoSNng68HoYovv8pTwglovS9PxVGFBI576Rmct7SbdjyOegqoryIhXGS0SqZNlRVvZYBiP4yX4bZKsyJCtcJht9t6nMysajYaEAu/PkrimaQuOpFDY9dPGYw3iHEo8C9SARj9tl4d2/F1QZrywLjxDfKn+n2hBez/XeZjUQJVf9xbiXR5VurupqdI6GWQp7jDWtqViZ9pKUzeERlwvWzSKk+WL5vlMOhSk5oGACHkpuKjAgaDwP/10GWz3rdZgZr4LzSbrPd4mx5/GSTxOvn8e7n1bdbzBXrM3bMQSts322AE7ZBMm2CW7Yr/Y797f6G30Ifp4PRr1VplX7AaiL/8ARbLDuA==</latexit>
errt =
PNi=1 wiI{ht(x
(i) 6= t(i)}PN
i=1 wi<latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit><latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit><latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit><latexit sha1_base64="5U5SI+PZ7uJNFYG/ReSMtHFZ5Po=">AAACqXicdVFNaxsxENVuvxL3y2mPvYiaFpeC2Q2F5hIIzaWnNIE4Mc06i1aetYX1sZFm2xixP7LH/pJeI9t7SJzmCcHTm3lieFNUUjhMkr9R/Ojxk6fPtrY7z1+8fPW6u/PmzJnachhyI40dFcyBFBqGKFDCqLLAVCHhvJgfLuvnv8A6YfQpLioYKzbVohScYZDy7jxDuEYP1jY50n2alZZxTzNXq9yL/bS5PKK/c0GzHxr8LLT0s6L0182l74tPDc00XFFsH+F0tteg//miybu9ZJCsQO+TtCU90uI434k+ZhPDawUauWTOXaRJhWPPLAouoelktYOK8TmbwkWgmilwY79KpaEfgjKhpbHhaqQr9bbDM+XcQhWhUzGcuc3aUnyohjPV3NXk1FgRZMEfKGxMi+Xe2Atd1Qiar4cta0nR0OWa6ERY4CgXgTAe/IJTPmNhNxiW2clWRn9olGJ64pbJpps53idnu4M0GaQnX3oH39qMt8g78p70SUq+kgPynRyTIeHkD/kXkSiKP8cn8Sj+uW6No9bzltxBzG8AkrLQdg==</latexit>
↵t =1
2log
✓1� errt
errt
◆
<latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit><latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit><latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit><latexit sha1_base64="o3z3ZC8kU1t+nm6vA42CJrkmjUo=">AAACk3icdVFNaxRBEO0dv7Lr10bx5KVxUeLBZSYICiKExIMXIYKbBDLLUtNbM9OkP4buGnFp5pf5Szx61T9h72YOZkMKGl69Vw+qXxWNkp7S9NcguXX7zt17O8PR/QcPHz0e7z458bZ1AmfCKuvOCvCopMEZSVJ41jgEXSg8LS6O1vrpd3ReWvONVg3ONVRGllIARWoxng2HOaimhgXxjzwvHYiQdWG/46OoKFvlCkva64U3PCf8QQGd66KhC1s9z52sanq9GE/Sabopfh1kPZiwvo4Xu4NX+dKKVqMhocD78yxtaB7AkRQKu1HeemxAXECF5xEa0OjnYfP/jr+MzJKX1sVniG/Y/x0BtPcrXcRJDVT7bW1N3qRRrburnKqsk5GW4gZha1sq38+DNE1LaMTlsmWrOFm+PghfSoeC1CoCENEvBRc1xLgpnm2Ub4zhyGoNZum7mGy2neN1cLI/zdJp9vXt5OCwz3iHPWcv2B7L2Dt2wD6zYzZjgv1kv9kf9jd5lnxIDpNPl6PJoPc8ZVcq+fIPOUDMCw==</latexit>
wi wi exp⇣2↵tI{ht(x
(i)) 6= t(i)}⌘
<latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit><latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit><latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit><latexit sha1_base64="E7ey2D1vUl4iSw4a8mCWl+cdJ5s=">AAACl3icdZHbahsxEIbl7SGpe4jTXpXeiJoW+8bshkJ719BAyF0TqJOUyF208qxXRIeNNNvELPtseY4+QG7TV6i89kXjkAHBP9/oh+GfrFTSYxz/6USPHj95urH5rPv8xctXW73t18feVk7AWFhl3WnGPShpYIwSFZyWDrjOFJxk53uL+clvcF5a8wPnJUw0nxmZS8ExoLT38zKVlCnIkTtnL2nbwlW5ZAO6w7gqC54iZd8N1LRIA2RZXl81v+qBHDZDygxcUFx2lDaUOTkrcJj2+vEoboveF8lK9MmqDtPtzkc2taLSYFAo7v1ZEpc4qblDKRQ0XVZ5KLk45zM4C9JwDX5Stxk09EMgU5pbF55B2tL/HTXX3s91Fn5qjoVfny3gQzMsdHOXqZl1MmApHhisbYv5l0ktTVkhGLFcNq8URUsXR6FT6UCgmgfBRfBLQUXBHRcYTtdlrbHes1pzM/VNSDZZz/G+ON4ZJfEoOfrU3/22yniTvCPvyYAk5DPZJQfkkIyJINfkhtySv9Hb6Gu0Hx0sv0adlecNuVPR0T9Ty80k</latexit>
UofT CSC411 2019 Winter Lecture 02 15 / 24
AdaBoost Example
Each figure shows the number m of base learners trained so far, the decisionof the most recent learner (dashed black), and the boundary of the ensemble(green)
UofT CSC411 2019 Winter Lecture 02 16 / 24
AdaBoost Minimizes the Training Error
TheoremAssume that at each iteration of AdaBoost the WeakLearn returns a hypothesiswith error errt ≤ 1
2 − γ for all t = 1, . . . ,T with γ > 0. The training error of the
output hypothesis H(x) = sign(∑T
t=1 αtht(x))
is at most
LN(H) =1
N
N∑
i=1
I{H(x(i)) 6= t(i))} ≤ exp(−2γ2T
).
This is under the simplifying assumption that each weak learner is γ-betterthan a random predictor.
Analyzing the convergence of AdaBoost is generally difficult.
UofT CSC411 2019 Winter Lecture 02 17 / 24
AdaBoost Minimizes the Training Error
TheoremAssume that at each iteration of AdaBoost the WeakLearn returns a hypothesiswith error errt ≤ 1
2 − γ for all t = 1, . . . ,T with γ > 0. The training error of the
output hypothesis H(x) = sign(∑T
t=1 αtht(x))
is at most
LN(H) =1
N
N∑
i=1
I{H(x(i)) 6= t(i))} ≤ exp(−2γ2T
).
This is under the simplifying assumption that each weak learner is γ-betterthan a random predictor.
Analyzing the convergence of AdaBoost is generally difficult.
UofT CSC411 2019 Winter Lecture 02 17 / 24
Generalization Error of AdaBoost
AdaBoost’s training error (loss) converges to zero. What about the testerror of H?
As we add more weak classifiers, the overall classifier H becomes more“complex”.
We expect more complex classifiers overfit.
If one runs AdaBoost long enough, it can in fact overfit.
Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen
0
5
10
15
20
25
30
1 10 100 1000
test
train
erro
r
# rounds
(boosting “stumps” onheart-disease dataset)
• but often doesn’t...
UofT CSC411 2019 Winter Lecture 02 18 / 24
Generalization Error of AdaBoost
AdaBoost’s training error (loss) converges to zero. What about the testerror of H?
As we add more weak classifiers, the overall classifier H becomes more“complex”.
We expect more complex classifiers overfit.
If one runs AdaBoost long enough, it can in fact overfit.
Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen
0
5
10
15
20
25
30
1 10 100 1000
test
train
erro
r
# rounds
(boosting “stumps” onheart-disease dataset)
• but often doesn’t...
UofT CSC411 2019 Winter Lecture 02 18 / 24
Generalization Error of AdaBoost
AdaBoost’s training error (loss) converges to zero. What about the testerror of H?
As we add more weak classifiers, the overall classifier H becomes more“complex”.
We expect more complex classifiers overfit.
If one runs AdaBoost long enough, it can in fact overfit.
Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen
0
5
10
15
20
25
30
1 10 100 1000
test
train
erro
r
# rounds
(boosting “stumps” onheart-disease dataset)
• but often doesn’t...
UofT CSC411 2019 Winter Lecture 02 18 / 24
Generalization Error of AdaBoost
AdaBoost’s training error (loss) converges to zero. What about the testerror of H?
As we add more weak classifiers, the overall classifier H becomes more“complex”.
We expect more complex classifiers overfit.
If one runs AdaBoost long enough, it can in fact overfit.
Overfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can HappenOverfitting Can Happen
0
5
10
15
20
25
30
1 10 100 1000
test
train
erro
r
# rounds
(boosting “stumps” onheart-disease dataset)
• but often doesn’t...
UofT CSC411 2019 Winter Lecture 02 18 / 24
Generalization Error of AdaBoost
But often it does not!
Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run
10 100 10000
5
10
15
20
erro
r
test
train
)T# of rounds (
(boosting C4.5 on“letter” dataset)
• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)
• test error continues to drop even after training error is zero!
# rounds5 100 1000
train error 0.0 0.0 0.0
test error 8.4 3.3 3.1
• Occam’s razor wrongly predicts “simpler” rule is better
How does that happen?
We will provide an alternative viewpoint on AdaBoost later in the course.
[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]
UofT CSC411 2019 Winter Lecture 02 19 / 24
Generalization Error of AdaBoost
But often it does not!
Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run
10 100 10000
5
10
15
20er
ror
test
train
)T# of rounds (
(boosting C4.5 on“letter” dataset)
• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)
• test error continues to drop even after training error is zero!
# rounds5 100 1000
train error 0.0 0.0 0.0
test error 8.4 3.3 3.1
• Occam’s razor wrongly predicts “simpler” rule is better
How does that happen?
We will provide an alternative viewpoint on AdaBoost later in the course.
[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]
UofT CSC411 2019 Winter Lecture 02 19 / 24
Generalization Error of AdaBoost
But often it does not!
Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run
10 100 10000
5
10
15
20er
ror
test
train
)T# of rounds (
(boosting C4.5 on“letter” dataset)
• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)
• test error continues to drop even after training error is zero!
# rounds5 100 1000
train error 0.0 0.0 0.0
test error 8.4 3.3 3.1
• Occam’s razor wrongly predicts “simpler” rule is better
How does that happen?
We will provide an alternative viewpoint on AdaBoost later in the course.
[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]
UofT CSC411 2019 Winter Lecture 02 19 / 24
Generalization Error of AdaBoost
But often it does not!
Sometimes the test error decreases even after the training error is zero!Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run
10 100 10000
5
10
15
20er
ror
test
train
)T# of rounds (
(boosting C4.5 on“letter” dataset)
• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)
• test error continues to drop even after training error is zero!
# rounds5 100 1000
train error 0.0 0.0 0.0
test error 8.4 3.3 3.1
• Occam’s razor wrongly predicts “simpler” rule is better
How does that happen?
We will provide an alternative viewpoint on AdaBoost later in the course.
[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ]
UofT CSC411 2019 Winter Lecture 02 19 / 24
AdaBoost for Face Detection
Famous application of boosting: detecting faces in images
Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.
A few twists on standard algorithm
I Change loss function for weak learners: false positives less costly thanmisses
I Smart way to do inference in real-time (in 2001 hardware)
UofT CSC411 2019 Winter Lecture 02 20 / 24
AdaBoost for Face Detection
Famous application of boosting: detecting faces in images
Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.
A few twists on standard algorithm
I Change loss function for weak learners: false positives less costly thanmisses
I Smart way to do inference in real-time (in 2001 hardware)
UofT CSC411 2019 Winter Lecture 02 20 / 24
AdaBoost for Face Detection
Famous application of boosting: detecting faces in images
Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.
A few twists on standard algorithm
I Change loss function for weak learners: false positives less costly thanmisses
I Smart way to do inference in real-time (in 2001 hardware)
UofT CSC411 2019 Winter Lecture 02 20 / 24
AdaBoost for Face Detection
Famous application of boosting: detecting faces in images
Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.
A few twists on standard algorithmI Change loss function for weak learners: false positives less costly than
misses
I Smart way to do inference in real-time (in 2001 hardware)
UofT CSC411 2019 Winter Lecture 02 20 / 24
AdaBoost for Face Detection
Famous application of boosting: detecting faces in images
Viola and Jones created a very fast face detector that can be scanned acrossa large image to find the faces.
A few twists on standard algorithmI Change loss function for weak learners: false positives less costly than
missesI Smart way to do inference in real-time (in 2001 hardware)
UofT CSC411 2019 Winter Lecture 02 20 / 24
AdaBoost for Face Recognition
The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.
I There is a neat trick for computing the total intensity in a rectangle ina few operations.
I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.
I The algorithm adds classifiers greedily based on their quality on theweighted training cases
I Each classifier uses just one feature
UofT CSC411 2019 Winter Lecture 02 21 / 24
AdaBoost for Face Recognition
The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.
I There is a neat trick for computing the total intensity in a rectangle ina few operations.
I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.
I The algorithm adds classifiers greedily based on their quality on theweighted training cases
I Each classifier uses just one feature
UofT CSC411 2019 Winter Lecture 02 21 / 24
AdaBoost for Face Recognition
The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.
I There is a neat trick for computing the total intensity in a rectangle ina few operations.
I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.
I The algorithm adds classifiers greedily based on their quality on theweighted training cases
I Each classifier uses just one feature
UofT CSC411 2019 Winter Lecture 02 21 / 24
AdaBoost for Face Recognition
The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.
I There is a neat trick for computing the total intensity in a rectangle ina few operations.
I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.
I The algorithm adds classifiers greedily based on their quality on theweighted training cases
I Each classifier uses just one feature
UofT CSC411 2019 Winter Lecture 02 21 / 24
AdaBoost for Face Recognition
The base classifier/weak learner just compares the total intensity in tworectangular pieces of the image and classifies based on comparison of thisdifference to some threshold.
I There is a neat trick for computing the total intensity in a rectangle ina few operations.
I So it is easy to evaluate a huge number of base classifiers and they arevery fast at runtime.
I The algorithm adds classifiers greedily based on their quality on theweighted training cases
I Each classifier uses just one feature
UofT CSC411 2019 Winter Lecture 02 21 / 24
Summary
Boosting reduces bias by generating an ensemble of weak classifiers.
Each classifier is trained to reduce errors of previous ensemble.
It is quite resilient to overfitting, though it can overfit.
We will later provide a loss minimization viewpoint to AdaBoost. It allowsus to derive other boosting algorithms for regression, ranking, etc.
UofT CSC411 2019 Winter Lecture 02 23 / 24
Ensembles Recap
Ensembles combine classifiers to improve performance
Boosting
I Reduces biasI Increases variance (large ensemble can cause overfitting)I SequentialI High dependency between ensemble elements
Bagging
I Reduces variance (large ensemble can’t cause overfitting)I Bias is not changed (much)I ParallelI Want to minimize correlation between ensemble elements.
Next Lecture: Linear Regression
UofT CSC411 2019 Winter Lecture 02 24 / 24