object recognition with informative features and linear ... · pdf fileobject recognition with...
TRANSCRIPT
Object Recognition with Informative Features
and Linear Classification
A paper by Michel Vidal-Naquet and Shimon Ullman
Presented by Aciel Eshky and Marco Brigham
Overview
Introduction
Features Informative Features Simple Generic Features
Classification Schemes Classification by linear separation Tree Augmented Network
Experiments
Conclusion
Future Research
Introduction – Trade-off
Complexity of features
vs.
Complexity of classification scheme
Introduction – Classification Process
1. Extract features represent objects using these features
2. Apply a classifier to the measured features reach decision regarding represented class
Introduction – Features and Schemes
Two types of features Informative features (class-specific)I Generic features (simple, non class-specific)G
Two classification schemes Simple linear separation (LSVM)S Complex classification scheme (TAN)C
Introduction - Proposed Strategy
Generic Features Simple Non-class specific In high dimensional space
Complex classification scheme are more suitable
Informative Features Rich Class specific In low dimensional space
simple Linear classification schemes are more suitable
Introduction – Proposed Strategy
Maximize feature information
in order to
use simple and efficient linear classification
Informative Features
Selection of informative features that convey maximal information about the class
Informative Features
Selection of informative features
• Generate set {Fi} of candidate fragments
• Compute for each Fi the optimal threshold θ for visual similarity detection
• Select set of features S={Fj} that convey maximal information about the class C
Informative Features
(1) Set {Fi} of candidate fragments
Source: training images of the class
Rectangular crops from different sizes and locations.(typically 104 fragments)
Informative Features
(2) Visual similarity measure Sliding fragment over image and measuring the normalized cross-
correlation
Alternative similarity measures have also been used: Ordinal ranking of pixels + intensity gradient Colour, texture and 3D cues
Informative Features
(2) Detection threshold Fragment Fi is detected in an image if the visual similarity measure is above
threshold θi
Xi is the presence of fragment Fi in an image
θi is obtained by maximizing the mutual information I(Xi;C) for the images in the training set
Informative Features
Example training set with 100 car and 100 non-car images For a given value of θi, the fragment Fi is detected 44 times in car images
and 6 times in non-car images H(C)=1 and H(C|Xi)=0.847 I(Xi;C)=0.153
A higher value for θi, i.e. higher minimum visual similarity for the fragment Fi to be detected in an image, will lower the detection frequency in the non-car images but also in the car images.
The mutual information I(Xi;C) measures the information that is conveyed about the class C by the presence or absence of fragment Fi in an image
Informative Features
(3) Greedy search The most informative fragments are added in succession to set S, until
additional fragments no longer increase the information content of set S
– S={X1}, where X1 is the fragment with highest I(X1,C)
– X2 is selected on basis of how much information it can add in respect to X1
i.e. X2 maximizes I(X2, X1, C) - I(X1, C)
– Iteration n+1: Xn+1 maximizes the minimal addition of I to each element of Sn
Simple Generic Features
Individually little information non class-specific
Correct combinations Capture class specific visual properties
Wavelet Transform
Captures the image's local frequency and orientation information:
Within an analysis window at different scales at every location
By means of Kernel Function:
Characterises wavelet transform by choosing type of feature to which the transform is sensitive
Wavelet Transform
Choice of wavelet feature
Tested three types of waveletswith different resolutions
Working with low resolution images
Third type chosen for best performance
Quantization
Binarize wavelet transform for comparison purposes Representing presence or absence of feature
Classification by linear separation
During classification: system generates feature vector X = [X1,...,Xn] represents encoding of image in feature space obtained by measuring presence of specific visual features in
image
Final decision: plugging vector into classification function f(X)p estimates presence or absence returns 0 if not present, 1 if present
Classification by linear separation
Linear discriminant classification function learned by Linear Support Vector Machine training (LSVM)
Optimal Best performance given new data
When data is not linearly Separable minimization function combined with cost depending on number of
mis-classification
Tree Augmented Network (TAN)
Pair-wise statistical dependencies between features to better approximates underlying distribution rich model
Representation Bayesian network nodes represent features, or class variable connection represent statistical correlation
Tree Augmented Network (TAN)
Hierarchical implication feature depends on (1)class and (2)parent feature tree structure
Restriction to tree structure restricts modelling but, aids computation
Tree Augmented Network (TAN)
Tree structure found during learning by searching for maximum weighted spanning trees
weight of edge connecting Xi and Xj = mutual information: I(Xi;Xj)w Class conditional distribution given by:
Bayesian Decision Rule:
Experiments
Performance evaluation 4 combinations of feature extraction and classifiers
Fragments / wavelets Linear SVM / TAN
Classification task Detecting side views of cars in 14 x 21 pixel images Full database contained: 573 positives, 461 negatives Cars occupy 10 x 15 pixel box Feature space dimension: 168 binary features At each iteration the database is reshuffled to generate training
and test sets. 20 cross-validation iterations to generate ROC curve.
Experiments
(1) Fragment based classifiers Initial fragment pool
59200 fragments from first 100 car images Sizes: 4x4 to 10x14 (!not full car) Taken from area surrounding car (10x15 pixel) Approximate original fragment location recorded Detection based in 5x5 window from original location Training and test sets from the remaining images
(473 car and 461 non car)
Training procedure from 200 positive + 200 negative samples from reshuffled data selects useful fragments and learns classification parameters
Testing procedure 273 positive + 261 negative from remaining data
Experiments
Feature selection is computationally challenging Intermediate complexity fragments (size and resolution) Specificity vs Relative Frequency (face vs eye brow)
▲Merit = I(X;C) ■ Weight = Log2 ( P(X|C) / P(X | not C) )
(2) Wavelet based classifiers Simplified wavelet in 2 scales
(other complexity patches probably too big for the image size)
Training procedure 200 positive + 200 negative from (reshuffled) full database Training on car location (10x15 pixel) for both positive and negative samples Selects classification parameters
Testing procedure Remaining samples of each cross-validation set Classification based on maximum response from (10x15 pixel) windows in
images
Experiments
Experiments
Classification results ROC curves: fragment based performed better at 5% FA: fragment 92% (all) vs [80% (TAN) and 70% (LSVM)]
Classifier choice has great impact on wavelet scheme No impact on fragment based scheme Polynomial (deg 3) SMV classifier did not improve classification
Experiments
Classification results information gain (at 5% FA) ΔI = I(C;Ĉ
TAN) - I(C,Ĉ
LSVM)
averaged over cross-validation difference between class information provided by the classifiers
(perfect classification I=1, random I=0)
Complex classification benefited wavelets, not fragments
Crosses: Wavelets Squares: fragments
Experiments
Complex classifier parameters involve second order statistics requires more data to be effective
Lower performance due to overfitting of complex classifier parameters complex classifier requires more data to be accurate
Fragment based relies on the features themselves no reliance on higher order interactions
Poor performance due to quantization of wavelet coefficients? Tested linear and kernel SMV with full wavelet coefficients decrease performance in low resolution application
Experiments
Different object classes used: faces, animals, etc similar results
Also used back propagation neural network to extract face features
low information content of features (less than 10% of that of fragments) severe performance decrease with linear classifier
Experiments
Feature type and training difficulty Amount of training examples vs generalization capability Measure I(C, Ĉ)) at 5% FA
Crosses: Wavelets + TAN
Squares: fragments + LSVM
Fragments learn common structure that discriminates between classes with fewer examples.
Experiments
For features that are conditionally independent:
Less information content more features needed
Note: features are often selected to reduce conditional dependence
Discussion and Conclusion
Common approaches for object recognition with simple features:
Simple generic features do not perform well with linear classification
Use more complex classification stages (neural nets, etc)
Issues: no general optimal method, case-by-case techniques must be evaluated
Mapping to higher dimensional spaces where linear separation is possible
SVM and kernel based methods
Issues: no general method to find good mapping, case-by-case mappings must be evaluated
Discussion and Conclusion
The approach used with informative features:
Extracting information rich + class specific features combined with linear classification scheme
Results demonstrated superiority of this appraoch
Dimensionality:
Linear separation can be obtained in low dimensional feature space If features are chosen with high information content
If low information content then feature space is high dimensional
Future Studies
Combine simple and complex features in multi-stage classification
1. Use simpler features for initial filtering and identification of sub-regions of interest
3. Apply informative features on selected region
Hierarchical informative features Fragments are organized by type (covering the same area of object) Features of common type are represented in terms of simpler fragments
References
[1] Object recognition with informative features and linear classificationS. Ullman, and M. Vidal-Naquet
[2] Visual features of intermediate complexity and their use in classificationS. Ullman, E. Sali and M. Vidal-Naquet
[3] Fragment-based approach to object representation and classificationS. Ullman, E. Sali and M. Vidal-Naquet
[4] Object recognition and segmentation by fragment-based hierarchyS. Ullman