class 3: feature coding and pooling

Class 3: Feature Coding and Pooling

Liangliang Cao, Feb 7, 2013EECS 6890 - Topics in Information Processing

Spring 2013, Columbia Universityhttp://rogerioferis.com/VisualRecognitionAndSearch

Problem

The Myth of Human Vision

Can you understand the following?

婴儿 बचचा

People may have difficultiesto understand differenttexts, but NOT images.

아가 bebé

Photo courtesy to lusterVisual Recognition And Search 2 Columbia University, Spring 2013

Problem

Can the computer vision systemrecognize objects or scenes like

human?

Visual Recognition And Search 3 Columbia University, Spring 2013

Why This Problem Important?

Traditional companies Searching engines Mobile Apps


Problem

Examples of Object Recognition Dataset

http://www.vision.caltech.edu


Problem

http://groups.csail.mit.edu/vision/SUN/Visual Recognition And Search 6 Columbia University, Spring 2013

Overview of Classification Model

Coding

Histogramof SIFT

Vector

Uncertainty-Based

Quantization

Soft

SparseCoding

Soft

Fisher vector/Supervector

GMMprobabilityquantization

Pooling

quantization quantizationestimation

Histogram GMMHistogramaggregation aggregation

Max poolingadptation

Coding: to map local features into a compact representation

Pooling: to aggregate these compact representation together


Outline

Outlines

• Histogram of local features• Bag of words model

• Soft quantization and sparse coding• Fisher vector and supervector


Histogram of Local Features


Bag of Words Models

Recall of Last Class• Powerful local features

- DoG- Hessian, Harris- Dense-sampling

Non-fixed number oflocal regions per image!


Bag of Words Models

Recall of Last Class (2)• Histograms can provide a fixed size representation ofimages

• Spatial pyramid/gridding can enhance histogrampresentation with spatial information


Bag of Words Models


…..codewords dim = # codewords


Bag of Words Models

Histogram of Local Features (2)

……

dim = #codewords x #grids


Bag of Words Models

Local Feature Quantization

…

Slide courtesy to Fei-Fei Li


Bag of Words Models

Local Feature Quantization…


Bag of Words Models

Local Feature Quantization

…

- Vector quantization- Dictionary learning



Dictionary for Codewords


Bag of Words Models

Most slides in this section are courtesy to Fei-Fei Li


Object Bag of ‘words’


Bag of Words ModelsUnderlining Assumptions - Text

Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on the

messaFor a l

image sensory, brain,center

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The Commerce

Ministry said the surplus would be created bya pred

compa$660b China, trade,

annoymovieimagediscovknow tpercepmore cfollowito theHubeldemonimagewise astoredhas itsa specimage.

visual, perception,retinal, cerebral cortex,

eye, cell, opticalnerve, imageHubel, Wiesel

China'delibeagreeyuan igoveralso ndemacountryuan apermitthe USfreely.it will tallowi

surplus, commerce,exports, imports, US,yuan, bank, domestic,

foreign, increase,trade, value


Bag of Words Models

Underlining Assumptions - Image


learning

feature detection& representation

image representation

category models(and/or) classifiers

recognition

K-means

categorydecision

Bag of Words Models

Borrowing Techniques from Text Classification• PLSA

• Naïve Bayesian Model

• wn: each patch in an image- wn = [0,0,…1,…,0,0T

• w: a collection of all N patches in an image- w = [w1,w2,…,wN]

• dj: the jth image in an image collection• c: category of the image

• z: theme or topic of the patch


Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

d z wND Hoffman, 2001

Latent Dirichlet Allocation (LDA)

c z w

ND Blei et al., 2001


Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

d z wND

“face”

Visual Recognition And Search 25 ColSivic et al. ICCV 200 5

Bag of Words Models

Kd z wND

p(w| di j ) k1

p(wi |zk)p(zk | d j )

Observed codeword Codeword distributions Theme distributionsdistributions per theme (topic) per image

Parameter estimated by EM or Gibbs sampling

Visual Recognition And Search 26 Columbia University, Spring 2013Slide credit: Josef Sivic

Bag of Words Models

Recognition using pLSA

z arg max p(z|d)z

Visual Recognition And Search 27 Columbia UnSlide credit: Josef Siv ic

Bag of Words Models

Scene Recognition using LDA

“beach”

Latent Dirichlet Allocation (LDA)

c z w

ND Fei-Fei et al. ICCV 2005


Bag of Words Models

Spatial-Coherent Latent Topic Model

Cao and Fei-Fei, ICCV 2007


Bag of Words Models

Simultaneous Segmentation and Recognition


Bag of Words Models

Pros and ConsImages differ from texts!

Bag of Words Models are good in- Modeling prior knowledge

- Providing intuitive interpretation

But these models suffer from- Loss of information in quantization of “visual words”

- Loss of spatial informationBetter coding

Better poolingVisual Recognition And Search 31 Columbia University, Spring 2013

Soft Quantization and Sparse Coding


Soft Quantization

Hard Quantization


Soft Quantization

Uncertainty-Based Quantization

Model the uncertainty across multiple codewords

Gemert et al, Visual Word Ambiguity, PAMI 2009


Soft Quantization

Intuition of UNC

Hard quantization

Soft quantization based on uncertainty

Gemert et al, Visual Word Ambiguity, PAMI 2009


Soft Quantization

Improvement of UNC


Soft Quantization

Sparse Coding-Based Quantization

Hard quantization can be viewed as an “extremelysparse representation”

s.t.

A more general but hard to solve representation

In practice we considerSparse coding


Soft Quantization

Yang et al obtain goodrecognition accuracy bycombining sparse

coding with spatialpyramid and dictionarytraining.

More details will be ingroup presentation.

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009


Fisher Vector and Supervector

One of the most powerful image/video classification techniques

Thanks to Zhen Li and Qiang Chen constructive suggestions to this section



Winning Systems

2009 2010 2011 2012

Classification task

2010 2011

Large Scale VisualRecognition Challenge


Fisher Vector and SupervectorLiterature

[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis.

Supervector

Fisher Vector

ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for imageclassification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector

coding of local image descriptors. ECCV 2010

[5] Perronnin et al, Improving the Fisher kernel forlarge-scale image classification, ECCV 2010

[6] Jégou et al, Aggregating local image descriptorsinto compact codes PAMI 2011.

These papers are not very easy to read.Let me take a simplified perspective via coding&pooling framework



Coding without Information Loss• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?



An Intuitive IllustrationCoding



An Intuitive IllustrationCoding Component 1 Component 2 Component 3



An Intuitive Illustration

Component 1

+

+

Pooling

Component 2

+

+

Component 3

+

+



Implementation of SupervectorIn speech (speaker identification), supervector refer to

stacked means of adaptive GMMs.

Supervector =



Interpretation with SupervectorOrigin distribution New distribution

Picture from Reynolds, Quatieri, and Dunn, DSP, 2001



Normalization of SupervectorIn practice, a normalization process using the

covariance matrix often improves the performance

Moreover, we can subtract the original mean vector forthe ease of normalization

The representation is also called Hierarchical Gaussianization(HG).



Fisher Vector

Let be the probability density function with para

[Jaakkola and Haussler , NIPS 98] suggested X can bedescribed by the derivative subject to

Now we can define the Fisher Kernelwhere is called Fisher information matrix



Fisher Vector with GMM

Consider the Gaussian Mixture Model

We consider

With GMM, Fisher vectors can be obtained:

Let

The Fisher vector



Comparison• Supervector

• Fisher vector



Comparison

Diagonal covariance matrix

Posterior estimation ofDiagonal covariance with same derivation

The two representations are almost the same even withdifferent motivations.



How to Code Your Own

• Learn from existing codehttp://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac)

• Learn from public GMM code• Be careful with pitfalls

- Probability is comparable to machine’s rounding error:compute logP instead of P

- Try different normalization strategy- Try to make the code efficient


Summary

Coding

Histogramof SIFT

Vector

Uncertainty-Based

Quantization

Soft

SparseCoding

Soft

Fisher vector/Supervector

GMMprobabilityquantization

Pooling

quantization quantizationestimation

Histogram GMMHistogramaggregation aggregation

Max poolingadptation


Summary

• Simplest model: histogram of visual words• Models with good illustration: PLSA, LDA, S-LTM, …

• Soft quantization: soft quantization and sparse coding• Very good performance: fisher vector or super vector


class 3: feature coding and pooling

Documents