object recognition with informative features and linear ... · pdf fileobject recognition with...

Object Recognition with Informative Features

and Linear Classification

A paper by Michel Vidal-Naquet and Shimon Ullman

Presented by Aciel Eshky and Marco Brigham

Overview

Introduction

Features Informative Features Simple Generic Features

Classification Schemes Classification by linear separation Tree Augmented Network

Experiments

Conclusion

Future Research

Introduction – Trade-off

Complexity of features

vs.

Complexity of classification scheme

Introduction – Classification Process

1. Extract features represent objects using these features

2. Apply a classifier to the measured features reach decision regarding represented class

Introduction – Features and Schemes

Two types of features Informative features (class-specific)I Generic features (simple, non class-specific)G

Two classification schemes Simple linear separation (LSVM)S Complex classification scheme (TAN)C

Introduction - Proposed Strategy

Generic Features Simple Non-class specific In high dimensional space

Complex classification scheme are more suitable

Informative Features Rich Class specific In low dimensional space

simple Linear classification schemes are more suitable

Introduction – Proposed Strategy

Maximize feature information

in order to

use simple and efficient linear classification

Informative Features

Selection of informative features that convey maximal information about the class


Selection of informative features

• Generate set {Fi} of candidate fragments

• Compute for each Fi the optimal threshold θ for visual similarity detection

• Select set of features S={Fj} that convey maximal information about the class C


(1) Set {Fi} of candidate fragments

Source: training images of the class

Rectangular crops from different sizes and locations.(typically 104 fragments)


(2) Visual similarity measure Sliding fragment over image and measuring the normalized cross-

correlation

Alternative similarity measures have also been used: Ordinal ranking of pixels + intensity gradient Colour, texture and 3D cues


(2) Detection threshold Fragment Fi is detected in an image if the visual similarity measure is above

threshold θi

Xi is the presence of fragment Fi in an image

θi is obtained by maximizing the mutual information I(Xi;C) for the images in the training set


Example training set with 100 car and 100 non-car images For a given value of θi, the fragment Fi is detected 44 times in car images

and 6 times in non-car images H(C)=1 and H(C|Xi)=0.847 I(Xi;C)=0.153

A higher value for θi, i.e. higher minimum visual similarity for the fragment Fi to be detected in an image, will lower the detection frequency in the non-car images but also in the car images.

The mutual information I(Xi;C) measures the information that is conveyed about the class C by the presence or absence of fragment Fi in an image


(3) Greedy search The most informative fragments are added in succession to set S, until

additional fragments no longer increase the information content of set S

– S={X1}, where X1 is the fragment with highest I(X1,C)

– X2 is selected on basis of how much information it can add in respect to X1

i.e. X2 maximizes I(X2, X1, C) - I(X1, C)

– Iteration n+1: Xn+1 maximizes the minimal addition of I to each element of Sn

Simple Generic Features

Individually little information non class-specific

Correct combinations Capture class specific visual properties

Wavelet Transform

Captures the image's local frequency and orientation information:

Within an analysis window at different scales at every location

By means of Kernel Function:

Characterises wavelet transform by choosing type of feature to which the transform is sensitive

Wavelet Transform

Choice of wavelet feature

Tested three types of waveletswith different resolutions

Working with low resolution images

Third type chosen for best performance

Quantization

Binarize wavelet transform for comparison purposes Representing presence or absence of feature

Classification by linear separation

During classification: system generates feature vector X = [X1,...,Xn] represents encoding of image in feature space obtained by measuring presence of specific visual features in

image

Final decision: plugging vector into classification function f(X)p estimates presence or absence returns 0 if not present, 1 if present

Classification by linear separation

Linear discriminant classification function learned by Linear Support Vector Machine training (LSVM)

Optimal Best performance given new data

When data is not linearly Separable minimization function combined with cost depending on number of

mis-classification

Tree Augmented Network (TAN)

Pair-wise statistical dependencies between features to better approximates underlying distribution rich model

Representation Bayesian network nodes represent features, or class variable connection represent statistical correlation


Hierarchical implication feature depends on (1)class and (2)parent feature tree structure

Restriction to tree structure restricts modelling but, aids computation


Tree structure found during learning by searching for maximum weighted spanning trees

weight of edge connecting Xi and Xj = mutual information: I(Xi;Xj)w Class conditional distribution given by:

Bayesian Decision Rule:

Experiments

Performance evaluation 4 combinations of feature extraction and classifiers

Fragments / wavelets Linear SVM / TAN

Classification task Detecting side views of cars in 14 x 21 pixel images Full database contained: 573 positives, 461 negatives Cars occupy 10 x 15 pixel box Feature space dimension: 168 binary features At each iteration the database is reshuffled to generate training

and test sets. 20 cross-validation iterations to generate ROC curve.

Experiments

(1) Fragment based classifiers Initial fragment pool

59200 fragments from first 100 car images Sizes: 4x4 to 10x14 (!not full car) Taken from area surrounding car (10x15 pixel) Approximate original fragment location recorded Detection based in 5x5 window from original location Training and test sets from the remaining images

(473 car and 461 non car)

Training procedure from 200 positive + 200 negative samples from reshuffled data selects useful fragments and learns classification parameters

Testing procedure 273 positive + 261 negative from remaining data

Experiments

Feature selection is computationally challenging Intermediate complexity fragments (size and resolution) Specificity vs Relative Frequency (face vs eye brow)

▲Merit = I(X;C) ■ Weight = Log2 ( P(X|C) / P(X | not C) )

(2) Wavelet based classifiers Simplified wavelet in 2 scales

(other complexity patches probably too big for the image size)

Training procedure 200 positive + 200 negative from (reshuffled) full database Training on car location (10x15 pixel) for both positive and negative samples Selects classification parameters

Testing procedure Remaining samples of each cross-validation set Classification based on maximum response from (10x15 pixel) windows in

images

Experiments

Experiments

Classification results ROC curves: fragment based performed better at 5% FA: fragment 92% (all) vs [80% (TAN) and 70% (LSVM)]

Classifier choice has great impact on wavelet scheme No impact on fragment based scheme Polynomial (deg 3) SMV classifier did not improve classification

Experiments

Classification results information gain (at 5% FA) ΔI = I(C;Ĉ

TAN) - I(C,Ĉ

LSVM)

averaged over cross-validation difference between class information provided by the classifiers

(perfect classification I=1, random I=0)

Complex classification benefited wavelets, not fragments

Crosses: Wavelets Squares: fragments

Experiments

Complex classifier parameters involve second order statistics requires more data to be effective

Lower performance due to overfitting of complex classifier parameters complex classifier requires more data to be accurate

Fragment based relies on the features themselves no reliance on higher order interactions

Poor performance due to quantization of wavelet coefficients? Tested linear and kernel SMV with full wavelet coefficients decrease performance in low resolution application

Experiments

Different object classes used: faces, animals, etc similar results

Also used back propagation neural network to extract face features

low information content of features (less than 10% of that of fragments) severe performance decrease with linear classifier

Experiments

Feature type and training difficulty Amount of training examples vs generalization capability Measure I(C, Ĉ)) at 5% FA

Crosses: Wavelets + TAN

Squares: fragments + LSVM

Fragments learn common structure that discriminates between classes with fewer examples.

Experiments

For features that are conditionally independent:

Less information content more features needed

Note: features are often selected to reduce conditional dependence

Discussion and Conclusion

Common approaches for object recognition with simple features:

Simple generic features do not perform well with linear classification

Use more complex classification stages (neural nets, etc)

Issues: no general optimal method, case-by-case techniques must be evaluated

Mapping to higher dimensional spaces where linear separation is possible

SVM and kernel based methods

Issues: no general method to find good mapping, case-by-case mappings must be evaluated

Discussion and Conclusion

The approach used with informative features:

Extracting information rich + class specific features combined with linear classification scheme

Results demonstrated superiority of this appraoch

Dimensionality:

Linear separation can be obtained in low dimensional feature space If features are chosen with high information content

If low information content then feature space is high dimensional

Future Studies

Combine simple and complex features in multi-stage classification

1. Use simpler features for initial filtering and identification of sub-regions of interest

3. Apply informative features on selected region

Hierarchical informative features Fragments are organized by type (covering the same area of object) Features of common type are represented in terms of simpler fragments

References

[1] Object recognition with informative features and linear classificationS. Ullman, and M. Vidal-Naquet

[2] Visual features of intermediate complexity and their use in classificationS. Ullman, E. Sali and M. Vidal-Naquet

[3] Fragment-based approach to object representation and classificationS. Ullman, E. Sali and M. Vidal-Naquet

[4] Object recognition and segmentation by fragment-based hierarchyS. Ullman

object recognition with informative features and linear ... · pdf fileobject recognition with...

Documents