combining randomization and discrimination for fine...
TRANSCRIPT
Motivation
Reference B. Yao*, A. Khosla* and L. Fei-Fei. “Combining Randomization and Discrimination for Fine-Grained Image Categorization.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
Our Work
• Approach: A model combining randomization and discrimination Dense feature representation; Random forest with discriminative decision trees classifier
original image
traditional method (SPM)
our intuition: what humans do
Dense Feature Representation • Our representation consists of (pairs of) image regions of arbitrary sizes and at arbitrary locations:
Experiment
People-Playing-Musical-Instruments (PPMI)
PASCAL Action Dataset
Future Work: •Improve speed by exploiting the inherent parallel nature of random forests using GPUs •Strong classifiers with analytical solution (e.g. LDA) • Incorporate multiple features
Combining Randomization and Discrimination for Fine-Grained Image Categorization Bangpeng Yao*, Aditya Khosla* and Li Fei-Fei
{bangpeng,aditya86,feifeili}@cs.stanford.edu The Vision Lab Computer Science Dept. Stanford University
• Objective: Finding image regions that contain discriminative information for fine-grained image categorization.
Caltech-UCSD Birds 200 • 200-class classification of 200 bird species from North America
(* - indicates equal contribution)
California Gull
California Gull
Herring Gull
our goal
16.1%
18.0%
19% 19.2%
0.15
0.16
0.17
0.18
0.19
0.2
SPM LLC MKL Ours
Cla
ssifi
catio
n Ac
cura
cy
• MKL: Uses many features including Gray/ColorSIFT, geometric blur, color histograms, etc.
class heatmap
image heatmap
image heatmap
Flute
Playing Instrument
Not Playing Instrument
Guitar Violin
Random forest with Discriminative Decision Trees
Leaf node
Strong classifier
Coarse-to-fine Learning
• 9-class classification of human actions (%mAP)
Playing Instrument Reading Riding
Bike
1 2 3 4 5
0.225 0.277 0.237 0.178 0.167
... ...
... Region Height
Region Width
Normalized Image
Size of image region
Center of image region
PPMI Binary Classification
Generalization Error of RF
Training a Strong Classifier • {1, …, 5} represent original class labels
Random binary
assignment
Train a binary SVM 2
3
4
5
0
1
1
1
0
1
… … … … Control Experiments
22.7%
36.7% 39.1% 41.8%
47.0%
0.2
0.25
0.3
0.35
0.4
0.45
0.5
BoW Grouplet SPM LLC Ours
mA
P
PPMI 24 –Class Classification
78.0%
85.1% 88.2% 89.2%
92.1%
0.75
0.81
0.87
0.93
mA
P
0.70.75
0.80.85
0.90.95
1
Bassoon Erhu Flute French Horn Guitar Saxophone Violin Trumpet Cello Clarinet Harp Recorder
mA
P
BoW SPM Our Method Grouplet LLC
Bassoon Erhu Flute French Horn Saxophone Trumpet Cello Clarinet Recorder
Overall mAP
• Ours: Uses a single feature (ColorSIFT)
85.7%
0.0%
66.7%
45.5%
31.8%
20.0%
correctly classified
wrongly classified
Class Accuracy
• Learning our random forest classifier:
• Select best sample using information gain criterion:
Features for image regions: • Grayscale SIFT descriptors for
PPMI and PASCAL Action • ColorSIFT descriptors for
Caltech-UCSD Birds-200 • Dense SIFT sampling at multiple
scales (8, 12, 16, 24, 30) • Locality-constrained Linear
Coding (LLC) Features
: set of all training examples
: entropy of training examples
• Classification of test example:
: number of trees
: class label of test example
: probability of test example belonging to class c for tree t
• Generalization error of a random forest:
: correlation between decision trees
: strength of the decision trees
• Dense feature space decreases • Strong classifiers increases
Better generalization
Depth:
Area:
• Our method automatically learns a coarse-to-fine region of interest (e.g. shown below for ‘playing trumpet’ class)
• This is similar to the human visual system which is believed to analyze raw input from low to high spatial frequencies or from large global shapes to smaller local ones