Download - Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection
![Page 1: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/1.jpg)
Principled Asymmetric Boosting Approachesto Rapid Training and Classification
in Face Detection
Minh-Tri PhamPh.D. Candidate and Research AssociateNanyang Technological University, Singapore
presented by
![Page 2: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/2.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 3: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/3.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 4: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/4.jpg)
Problem
![Page 5: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/5.jpg)
Application
![Page 6: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/6.jpg)
Application
Face recognition
![Page 7: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/7.jpg)
Application
3D face reconstruction
![Page 8: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/8.jpg)
Application
Camera auto-focusing
![Page 9: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/9.jpg)
ApplicationWindows face logon
• Lenovo Veriface Technology
![Page 10: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/10.jpg)
Appearance-based Approach• Scan image with probe
window patch (x,y,s)– at different positions and scales– Binary classify each patch into
• face, or• non-face
• Desired output state: – (x,y,s) containing face
0 1
Most popular approach•Viola-Jones ‘01-’04, Li et.al. ‘02, Wu et.al. ’04, Brubaker et.al. ‘04, Liu et.al. ’04, Xiao et.al ‘04, •Bourdev-Brandt ‘05, Mita et.al. ‘05, Huang et.al. ’05 – ‘07, Wu et.al. ‘05, Grabner et.al.
’05-’07, •And many more
![Page 11: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/11.jpg)
Appearance-based Approach• Statistics:
– 6,950,440 patches in a 320x240 image
– P(face) < 10-5
• Key requirement:– A very fast classifier
0 1
![Page 12: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/12.jpg)
A very fast classifier
A very fast classifier• Cascade of non-face rejectors:
F1 F2 FN….passpasspass pass
reject reject reject
face
non-face
![Page 13: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/13.jpg)
F1 F2 FN….passpasspass pass
reject reject reject
face
non-face
• Cascade of non-face rejectors:
• F1, F2, …, FN : asymmetric classifiers– FRR(Fk) 0– FAR(Fk) as small as possible (e.g. 0.5 – 0.8)
A very fast classifier
F1 F2
non-face
F1 F2 FN faceF1 F2
non-face
F1 F2 FN faceF1 F2
non-face
F1 F2 FN faceF1 F2
non-face
F1 F2 FN face
![Page 14: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/14.jpg)
• Cascade of non-face rejectors:
• F1, F2, …, FN : asymmetric classifiers– FRR(Fk) 0– FAR(Fk) as small as possible (e.g. 0.5 – 0.8)
A very fast classifier
F1 FN….passpasspass pass
reject reject reject
face
non-face
F2
![Page 15: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/15.jpg)
• A strong combination of weak classifiers:
Non-face Rejector
– f1,1, f1,2, …, f1,K : weak classifiers– : threshold
pass
reject
F1
…. +++ yes
no
f1,1 f1,2 f1,K > ?
![Page 16: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/16.jpg)
Boosting
WeakClassifierLearner
1
WeakClassifierLearner
2
Wrongly classified
Wrongly classified
Correctly classified
Correctly classified
: negative example: positive example
Stage 1 Stage 2
![Page 17: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/17.jpg)
Asymmetric Boosting
WeakClassifierLearner
1
WeakClassifierLearner
2
: negative example: positive example
Stage 1 Stage 2
• Weight positives times more than negatives
![Page 18: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/18.jpg)
pass
reject
F1
…. +++ yes
no
f1,2 f1,K > ?
• A strong combination of weak classifiers:
Non-face Rejector
– f1,1, f1,2, …, f1,K : weak classifiers– : threshold
f1,1
![Page 19: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/19.jpg)
pass
reject
F1
…. +++ yes
no
f1,2 f1,K > ?
• A strong combination of weak classifiers:
Non-face Rejector
– f1,1, f1,2, …, f1,K : weak classifiers– : threshold
f1,1
![Page 20: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/20.jpg)
• Classify a Haar-like feature value
Weak classifier
input patch
featurevalue v
Classifyv
score
![Page 21: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/21.jpg)
• Classify a Haar-like feature value
Weak classifier
input patch
featurevalue v
Classifyv
score
…
![Page 22: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/22.jpg)
• Requires too much intervention from experts
Main issues
![Page 23: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/23.jpg)
• Cascade of non-face rejectors:
• F1, F2, …, FN : asymmetric classifiers– FRR(Fk) 0– FAR(Fk) as small as possible (e.g. 0.5 – 0.8)
A very fast classifier
F1 FN….passpasspass pass
reject reject reject
face
non-face
F2
How to choose bounds for FRR(Fk) and FAR(Fk)?
![Page 24: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/24.jpg)
Asymmetric Boosting
WeakClassifierLearner
1
WeakClassifierLearner
2
: negative example: positive example
Stage 1 Stage 2
• Weight positives times more than negativesHow to
choose ?
![Page 25: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/25.jpg)
pass
reject
F1
…. +++ yes
no
f1,2 f1,K > ?
• A strong combination of weak classifiers:
Non-face Rejector
– f1,1, f1,2, …, f1,K : weak classifiers– : threshold
f1,1
How to choose ?
![Page 26: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/26.jpg)
• Requires too much intervention from experts
• Very long learning time
Main issues
![Page 27: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/27.jpg)
• Classify a Haar-like feature value
Weak classifier
input patch
featurevalue v
Classifyv
score
…10 minutes to learn a
weak classifier
![Page 28: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/28.jpg)
• Requires too much intervention from experts
• Very long learning time– To learn a face detector ( 4000 weak classifiers):
• 4,000 * 10 minutes 1 month
• Only suitable for objects with small shape variance
Main issues
![Page 29: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/29.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 30: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/30.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 31: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/31.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 32: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/32.jpg)
Detection with Multi-exit Asymmetric Boosting
CVPR’08 poster paper:Minh-Tri Pham and Viet-Dung D. Hoang and Tat-Jen Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, 2008.
• Won Travel Grant Award
![Page 33: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/33.jpg)
Problem overview• Common appearance-based approach:
– F1, F2, …, FN : boosted classifiers
– f1,1, f1,2, …, f1,K : weak classifiers– : threshold
F1 F2 FN….passpasspass pass
reject reject reject
object
non-object
pass
reject
F1
…. +++ yes
no
f1,1 f1,2 f1,K > ?
![Page 34: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/34.jpg)
Objective
• Find f1,1, f1,2, …, f1,K, and such that:– – – K is minimized proportional to F1’s evaluation time
pass
reject
F1
…. +++ yes
no
f1,1 f1,2 f1,K > ?
01
01
)()(
FFRRFFAR
K
ii xfsignxF
1,11 )()(
![Page 35: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/35.jpg)
Existing trends (1)
Idea• For k from 1 until convergence:
– Let
– Learn new weak classifier f1,k(x):
– Let
– Adjust to see if we can achieve FAR(F1) <= 0 and FRR(F1) <= 0:• Break loop if such exists
Issues• Weak classifiers are sub-
optimal w.r.t. training goal.• Too many weak classifiers
are required in practice.
k
ii xfsignxF
1,11 )()(
)()(minargˆ11,1
,1
FFRRFFARfkf
k
k
ii xfsignxF
1,11 )()(
![Page 36: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/36.jpg)
Existing trends (2)
Idea• For k from 1 until convergence:
– Let
– Learn new weak classifier f1,k(x):
– Break loop if FAR(F1) <= 0 and FRR(F1) <= 0Pros• Reduce FRR at the
cost of increasing FAR – acceptable for cascades
• Fewer weak classifiers
k
ii xfsignxF
1,11 )()(
)()(minargˆ11,1
,1
FFRRFFARfkf
k
Cons• How to choose ?• Much longer training
time
Solution to con• Trial and error:
• choose such that K is minimized.
![Page 37: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/37.jpg)
Our solution
Why?
Learn every weak classifier using the same asymmetric goal:
where
)(,1 xf k
,)()(minargˆ11,1
,1
FFRRFFARfkf
k
.0
0
![Page 38: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/38.jpg)
Because…• Consider two desired bounds (or targets) for learning a boosted classifier
– Exact bound: and– Conservative bound:
• (2) is more conservative than (1) because (2) => (1).
0)( MFFAR 0)( MFFRR
00
0 )()(
MM FFRRFFAR
:)(xFM
(2)(1)
0 1
1
0
= 1
H1
H2
H200H201
H3
H4
0
Q1Q2
Q200
Q201
Q3Q4
FAR
FRR
exact bound
conservativebound
FRR0 1
1
= 0/0
FAR
H1
H2
H3
H39
H40
0
0
H41
Q1
Q2
Q3Q39
Q41
Q40
exact bound
conservativebound
At for every new weak classifier learned, the ROC operating
point moves the fastest toward the conservative bound
,0
0
![Page 39: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/39.jpg)
Implication
• When the ROC operation point lies in the conservative bound:– – – Conditions met, therefore = 0.
pass
reject
F1
…. +++ yes
no
f1,1 f1,2 f1,K > ?
01
01
)()(
FFRRFFAR
K
ii xfsignxF
1,11 )()(
![Page 40: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/40.jpg)
Multi-exit BoostingA method to train a single boosted classifier with multiple exit nodes:
: a weak classifier : a weak classifier followed by a decision to continue or reject – an exit node
f1 f2 f3 f4 f5 f6 f7 f8 object
non-obj
pass pass passreject reject reject
fi fi
+ + + + + + +
.0
0
• Features:• Weak classifiers are trained with the same goal:• Every pass/reject decision is guaranteed with and• The classifier is a cascade.• Score is propagated from one node to another.
• Main advantages:• Weak classifiers are learned (approximately) optimally.• No training of multiple boosted classifiers.• Much fewer weak classifiers are needed than traditional cascades.
0FAR .0FRR
F2F1 F3
![Page 41: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/41.jpg)
ResultsGoal () vs. Number of weak classifiers (K)
• Toy problem: To learn a (single-exit) boosted classifier F for classifying face/non-face patches such that FAR(F) < 0.8 and FRR(F) < 0.01– Empirically best goal:
– Our method chooses:
• Similar results were obtained for tests on other desired error rates.
.8001.08.0
].100,10[
![Page 42: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/42.jpg)
Ours vs. Others (in Face Detection)
• Use Fast StatBoost as base method for fast-training a weak classifier.
Method No of weak
classifiers
No of exit
nodes
Total training
time
Viola Jones [3] 4,297 32 6h20m
Viola Jones [4] 3,502 29 4h30m
Boosting chain [7] 959 22 2h10m
Nested cascade [5] 894 20 2h
Soft cascade [1] 4,871 4,871 6h40m
Dynamic cascade [6] 1,172 1,172 2h50m
Multi-exit Asymmetric Boosting
575 24 1h20m
![Page 43: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/43.jpg)
Ours vs. Others (in Face Detection)• MIT+CMU Frontal Face Test set:
![Page 44: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/44.jpg)
Conclusion
• Multi-exit Asymmetric Boosting trains every weak classifier approximately optimally.
– Better accuracy
– Much fewer weak classifiers
– Significantly reduces training time• No more trial-and-error for training a boosted classifier
![Page 45: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/45.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 46: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/46.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 47: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/47.jpg)
Fast Training and Selection of Haar-like Features using Statistics
ICCV’07 oral paper:Minh-Tri Pham and Tat-Jen Cham. Fast Training and Selection of Haar Features using Statistics in Boosting-based Face Detection. In Proc. International Conference on on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007.
• Won Travel Grant Award• Won Second Prize, Best Student Paper in Year 2007 Award, Pattern Recognition and Machine
Intelligence Association (PREMIA), Singapore
![Page 48: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/48.jpg)
Motivation
• Face detectors today– Real-time detection
speed
…but…
– Weeks of training time
![Page 49: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/49.jpg)
Factor
Description Common value
N number of examples 10,000
M number of weak classifiers in total
4,000 - 6,000
T number of Haar-like features
40,000
Why is Training so Slow?
• Time complexity: O(MNT log N)– 15ms to train a feature classifier– 10 minutes to train a weak classifier– 27 days to train a face detector
A view of a face detector training algorithm
for weak classifier m from 1 to M:…update weights – O(N)for feature t from 1 to T:
compute N feature values – O(N)sort N feature values – O(N log N)train feature classifier – O(N)
select best feature classifier – O(T)…
![Page 50: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/50.jpg)
Why Should the Training Time be Improved?• Tradeoff between time and generalization
– E.g. training 100 times slower if we increase both N and T by 10 times
• Trial and error to find key parameters for training– Much longer training time needed
• Online-learning face detectors have the same problem
![Page 51: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/51.jpg)
Existing Approaches to Reduce the Training Time• Sub-sample Haar-like feature set
– Simple but loses generalization
• Use histograms and real-valued boosting (B. Wu et. al. ‘04)– Pro: Reduce from O(MNT log N) to O(MNT)– Con: Raise overfitting concerns:
• Real AdaBoost not known to be overfitting resistant• Weak classifier may overfit if too many histogram bins are used
• Pre-compute feature values’ sorting orders (J. Wu et. al. ‘07)– Pro: Reduce from O(MNT log N) to O(MNT)– Con: Require huge memory storage
• For N = 10,000 and T = 40,000, a total of 800MB is needed.
![Page 52: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/52.jpg)
A view of a face detector training algorithm
for weak classifier m from 1 to M:…update weights – O(N)for feature t from 1 to T:
compute N feature values – O(N)sort N feature values – O(N log N)train feature classifier – O(N)
select best feature classifier – O(T)…
Factor
Description Common value
N number of examples 10,000
M number of weak classifiers in total
4,000 - 6,000
T number of Haar-like features
40,000
Why is Training so Slow?
• Time complexity: O(MNT log N)– 15ms to train a feature classifier– 10min to train a weak classifier– 27 days to train a face detector
• Bottleneck:– At least O(NT) to train a weak
classifier
• Can we avoid O(NT)?
![Page 53: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/53.jpg)
Our Proposal
• Fast StatBoost: To train feature classifiers using statistics rather than using input data– Con:
• Less accurate… but not critical for a feature classifier
– Pro: • Much faster training time:
Constant time instead of linear time
![Page 54: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/54.jpg)
Fast StatBoost• Training feature classifiers using
statistics:– Assumption: feature value v(t) is normally
distributed given face class c is known – Closed-form solution for optimal threshold
• Fast linear projections of the statistics of a window’s integral image into 1D statistics of a feature value
Non-faceFace
Optimalthreshold
Featurevalue
)()( tTt gmJ )()(2)( tTtt gg J
constant time to train a feature classifier
: Haar-like feature, a sparse vector with less than 20 non-zero elements
: mean vector and covariance matrix ofJJm , J
)(tg
: random vector representing a window’s integral imageJ : mean and variance of feature value v(t)2)()( , tt
![Page 55: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/55.jpg)
Fast StatBoost• Integral image’s statistics are obtained directly from the weighted input data
– Input: N training integral images and their current weights w(m):
– We compute:• Sample total weight:
• Sample mean vector:
• Sample covariance matrix:
NNmN
mm ccc ,,,...,,,,,, )(22
)(2
)(1 JwJwJw 11
ccn
nmncc
n
wz:
)(1ˆˆ Jm
ccn
mnc
n
wz:
)(ˆ
Tcc
ccn
Tnn
mncc
n
wz mmJJ ˆˆˆˆ:
)(1
![Page 56: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/56.jpg)
Factor
Description Common value
N number of examples 10,000
M number of weak classifiers in total
4,000 - 6,000
T number of Haar-like features
40,000
d number of pixels of a window
300-500
Fast StatBoost• To train a weak classifier:
– Extract the class-conditional integral image statistics
• Time complexity: O(Nd2)• Factor d2 negligible because fast algorithms
exist, hence in practice: O(N)
– Train T feature classifiers by projecting the statistics into 1D:
• Time complexity: O(T)
– Select the best feature classifier• Time complexity: O(T)
• Time complexity: O(N+T)
A view of our face detector training algorithm
for weak classifier m from 1 to M:…update weights – O(N)Extract statistics of integral image – O(Nd2)for feature t from 1 to T:
project statistics into 1D – O(1)train feature classifier – O(1)
select best feature classifier – O(T)…
![Page 57: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/57.jpg)
Experimental Results• Setup
– Intel Pentium IV 2.8GHz– 19 types 295,920 Haar-like
features
• Time for extracting the statistics:– Main factor: covariance matrices
• GotoBLAS: 0.49 seconds per matrix
• Time for training T features:– 2.1 seconds
(1) (2)
(17)
(7)
(3) (4) (5) (6)
(14)(15)
(16)
(8) (9)(10) (11) (12) (13)
(18) (19)
Edge features: Corner features:
Diagonal line features:
Line features: Center-surround features:
Nineteen feature types used in our experiments
Total training time: 3.1 seconds per weak classifier with 300K features• Existing methods: up to 10 minutes with 40K features or fewer
![Page 58: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/58.jpg)
Experimental Results• Comparison with Fast AdaBoost (J. Wu et. al. ‘07), the fastest known
implementation of Viola-Jones’ framework:
0 50000 100000 150000 200000 250000 30000002468
1012
training time of a weak classifier
Fast AdaBoostFast StatBoost
number of features (T)
seco
nds
(s)
![Page 59: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/59.jpg)
Experimental Results• Performance of a cascade:
ROC curves of the final cascades for face detection
Method Total training time
Memory requirement
Fast AdaBoost (T=40K)
13h 20m 800 MB
Fast StatBoost (T=40K)
02h 13m 30 MB
Fast StatBoost (T=300K)
03h 02m 30 MB
![Page 60: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/60.jpg)
Conclusions
• Fast StatBoost: use of statistics instead of input data to train feature classifiers
• Time:– Reduction of the face detector training time from up to a month to 3 hours– Significant gain in both N and T with little increase in training time
• Due to O(N+T) per weak classifier
• Accuracy:– Even better accuracy for face detector
• Due to much more members of Haar-like features explored
![Page 61: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/61.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 62: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/62.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 63: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/63.jpg)
• Cascade of non-face rejectors:
Weak classifier
![Page 64: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/64.jpg)
• Cascade of non-face rejectors:
Weak classifier
![Page 65: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/65.jpg)
• Cascade of non-face rejectors:
Weak classifier
![Page 66: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/66.jpg)
• Cascade of non-face rejectors:
Weak classifier
![Page 67: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/67.jpg)
Outline
• Motivation• Contributions
– Automatic Selection of Asymmetric Goal– Fast Weak Classifier Learning– Online Asymmetric Boosting– Generalization Bounds on the Asymmetric Error
• Future Work• Summary
![Page 68: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/68.jpg)
Summary
• Online Asymmetric Boosting– Integrates Asymmetric Boosting with Online Learning
• Fast Training and Selection of Haar-like Features using Statistics– Dramatically reduce training time from weeks to a few hours
• Multi-exit Asymmetric Boosting– Approximately minimizes the number of weak classifiers
![Page 69: Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection](https://reader036.vdocuments.net/reader036/viewer/2022062521/56816741550346895ddbf5c4/html5/thumbnails/69.jpg)
Thank You