class 3: feature coding and pooling
DESCRIPTION
Class 3: Feature Coding and Pooling. Liangliang Cao, Feb 7, 2013. EECS 6890 - Topics in Information Processing. Spring 2013, Columbia University. http://rogerioferis.com/VisualRecognitionAndSearch. Problem. The Myth of Human Vision. Can you understand the following?. 婴儿 बचचा. 아가 bebé. - PowerPoint PPT PresentationTRANSCRIPT
Class 3: Feature Coding and Pooling
Liangliang Cao, Feb 7, 2013EECS 6890 - Topics in Information Processing
Spring 2013, Columbia Universityhttp://rogerioferis.com/VisualRecognitionAndSearch
Problem
The Myth of Human Vision
Can you understand the following?
婴儿 बचचा
People may have difficultiesto understand differenttexts, but NOT images.
아가 bebé
Photo courtesy to lusterVisual Recognition And Search 2 Columbia University, Spring 2013
Problem
Can the computer vision systemrecognize objects or scenes like
human?
Visual Recognition And Search 3 Columbia University, Spring 2013
Why This Problem Important?
Traditional companies Searching engines Mobile Apps
Visual Recognition And Search 4 Columbia University, Spring 2013
Problem
Examples of Object Recognition Dataset
http://www.vision.caltech.edu
Visual Recognition And Search 5 Columbia University, Spring 2013
Problem
http://groups.csail.mit.edu/vision/SUN/Visual Recognition And Search 6 Columbia University, Spring 2013
Overview of Classification Model
Coding
Histogramof SIFT
Vector
Uncertainty-Based
Quantization
Soft
SparseCoding
Soft
Fisher vector/Supervector
GMMprobabilityquantization
Pooling
quantization quantizationestimation
Histogram GMMHistogramaggregation aggregation
Max poolingadptation
Coding: to map local features into a compact representation
Pooling: to aggregate these compact representation together
Visual Recognition And Search 7 Columbia University, Spring 2013
Outline
Outlines
• Histogram of local features• Bag of words model
• Soft quantization and sparse coding• Fisher vector and supervector
Visual Recognition And Search 8 Columbia University, Spring 2013
Histogram of Local Features
Visual Recognition And Search 9 Columbia University, Spring 2013
Bag of Words Models
Recall of Last Class• Powerful local features
- DoG- Hessian, Harris- Dense-sampling
Non-fixed number oflocal regions per image!
Visual Recognition And Search 10 Columbia University, Spring 2013
Bag of Words Models
Recall of Last Class (2)• Histograms can provide a fixed size representation ofimages
• Spatial pyramid/gridding can enhance histogrampresentation with spatial information
Visual Recognition And Search 11 Columbia University, Spring 2013
Bag of Words Models
Histogram of Local Features
…..codewords dim = # codewords
Visual Recognition And Search 12 Columbia University, Spring 2013
Bag of Words Models
Histogram of Local Features (2)
……
dim = #codewords x #grids
Visual Recognition And Search 13 Columbia University, Spring 2013
Bag of Words Models
Local Feature Quantization
…
Slide courtesy to Fei-Fei Li
Visual Recognition And Search 14 Columbia University, Spring 2013
Bag of Words Models
Local Feature Quantization…
Visual Recognition And Search 15 Columbia University, Spring 2013
Bag of Words Models
Local Feature Quantization
…
- Vector quantization- Dictionary learning
Visual Recognition And Search 16 Columbia University, Spring 2013
Histogram of Local Features
Dictionary for Codewords
Visual Recognition And Search 17 Columbia University, Spring 2013
Bag of Words Models
Most slides in this section are courtesy to Fei-Fei Li
Visual Recognition And Search 18 Columbia University, Spring 2013
Object Bag of ‘words’
Visual Recognition And Search 19 Columbia University, Spring 2013
Bag of Words ModelsUnderlining Assumptions - Text
Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on the
messaFor a l
image sensory, brain,center
China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The Commerce
Ministry said the surplus would be created bya pred
compa$660b China, trade,
annoymovieimagediscovknow tpercepmore cfollowito theHubeldemonimagewise astoredhas itsa specimage.
visual, perception,retinal, cerebral cortex,
eye, cell, opticalnerve, imageHubel, Wiesel
China'delibeagreeyuan igoveralso ndemacountryuan apermitthe USfreely.it will tallowi
surplus, commerce,exports, imports, US,yuan, bank, domestic,
foreign, increase,trade, value
Visual Recognition And Search 20 Columbia University, Spring 2013
Bag of Words Models
Underlining Assumptions - Image
Visual Recognition And Search 21 Columbia University, Spring 2013
learning
feature detection& representation
image representation
category models(and/or) classifiers
recognition
K-means
categorydecision
Bag of Words Models
Borrowing Techniques from Text Classification• PLSA
• Naïve Bayesian Model
• wn: each patch in an image- wn = [0,0,…1,…,0,0T
• w: a collection of all N patches in an image- w = [w1,w2,…,wN]
• dj: the jth image in an image collection• c: category of the image
• z: theme or topic of the patch
Visual Recognition And Search 23 Columbia University, Spring 2013
Bag of Words Models
Probabilistic Latent Semantic Analysis (pLSA)
d z wND Hoffman, 2001
Latent Dirichlet Allocation (LDA)
c z w
ND Blei et al., 2001
Visual Recognition And Search 24 Columbia University, Spring 2013
Bag of Words Models
Probabilistic Latent Semantic Analysis (pLSA)
d z wND
“face”
Visual Recognition And Search 25 ColSivic et al. ICCV 200 5
Bag of Words Models
Kd z wND
p(w| di j ) k1
p(wi |zk)p(zk | d j )
Observed codeword Codeword distributions Theme distributionsdistributions per theme (topic) per image
Parameter estimated by EM or Gibbs sampling
Visual Recognition And Search 26 Columbia University, Spring 2013Slide credit: Josef Sivic
Bag of Words Models
Recognition using pLSA
z arg max p(z|d)z
Visual Recognition And Search 27 Columbia UnSlide credit: Josef Siv ic
Bag of Words Models
Scene Recognition using LDA
“beach”
Latent Dirichlet Allocation (LDA)
c z w
ND Fei-Fei et al. ICCV 2005
Visual Recognition And Search 28 Columbia University, Spring 2013
Bag of Words Models
Spatial-Coherent Latent Topic Model
Cao and Fei-Fei, ICCV 2007
Visual Recognition And Search 29 Columbia University, Spring 2013
Bag of Words Models
Simultaneous Segmentation and Recognition
Visual Recognition And Search 30 Columbia University, Spring 2013
Bag of Words Models
Pros and ConsImages differ from texts!
Bag of Words Models are good in- Modeling prior knowledge
- Providing intuitive interpretation
But these models suffer from- Loss of information in quantization of “visual words”
- Loss of spatial informationBetter coding
Better poolingVisual Recognition And Search 31 Columbia University, Spring 2013
Soft Quantization and Sparse Coding
Visual Recognition And Search 32 Columbia University, Spring 2013
Soft Quantization
Hard Quantization
Visual Recognition And Search 33 Columbia University, Spring 2013
Soft Quantization
Uncertainty-Based Quantization
Model the uncertainty across multiple codewords
Gemert et al, Visual Word Ambiguity, PAMI 2009
Visual Recognition And Search 34 Columbia University, Spring 2013
Soft Quantization
Intuition of UNC
Hard quantization
Soft quantization based on uncertainty
Gemert et al, Visual Word Ambiguity, PAMI 2009
Visual Recognition And Search 35 Columbia University, Spring 2013
Soft Quantization
Improvement of UNC
Visual Recognition And Search 36 Columbia University, Spring 2013
Soft Quantization
Sparse Coding-Based Quantization
Hard quantization can be viewed as an “extremelysparse representation”
s.t.
A more general but hard to solve representation
In practice we considerSparse coding
Visual Recognition And Search 37 Columbia University, Spring 2013
Soft Quantization
Sparse Coding-Based Quantization
Hard quantization can be viewed as an “extremelysparse representation”
s.t.
A more general but hard to solve representation
In practice we considerSparse coding
Visual Recognition And Search 38 Columbia University, Spring 2013
Soft Quantization
Yang et al obtain goodrecognition accuracy bycombining sparse
coding with spatialpyramid and dictionarytraining.
More details will be ingroup presentation.
Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009
Visual Recognition And Search 39 Columbia University, Spring 2013
Fisher Vector and Supervector
One of the most powerful image/video classification techniques
Thanks to Zhen Li and Qiang Chen constructive suggestions to this section
Visual Recognition And Search 40 Columbia University, Spring 2013
Fisher Vector and Supervector
Winning Systems
2009 2010 2011 2012
Classification task
2010 2011
Large Scale VisualRecognition Challenge
Visual Recognition And Search 41 Columbia University, Spring 2013
Fisher Vector and SupervectorLiterature
[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis.
Supervector
Fisher Vector
ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for imageclassification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector
coding of local image descriptors. ECCV 2010
[5] Perronnin et al, Improving the Fisher kernel forlarge-scale image classification, ECCV 2010
[6] Jégou et al, Aggregating local image descriptorsinto compact codes PAMI 2011.
These papers are not very easy to read.Let me take a simplified perspective via coding&pooling framework
Visual Recognition And Search 42 Columbia University, Spring 2013
Fisher Vector and Supervector
Coding without Information Loss• Coding with hard assignment
• Coding with soft assignment
• How to keep all the information?
Visual Recognition And Search 43 Columbia University, Spring 2013
Fisher Vector and Supervector
Coding without Information Loss• Coding with hard assignment
• Coding with soft assignment
• How to keep all the information?
Visual Recognition And Search 44 Columbia University, Spring 2013
Fisher Vector and Supervector
An Intuitive IllustrationCoding
Visual Recognition And Search 45 Columbia University, Spring 2013
Fisher Vector and Supervector
An Intuitive IllustrationCoding Component 1 Component 2 Component 3
Visual Recognition And Search 46 Columbia University, Spring 2013
Fisher Vector and Supervector
An Intuitive Illustration
Component 1
+
+
Pooling
Component 2
+
+
Component 3
+
+
Visual Recognition And Search 47 Columbia University, Spring 2013
Fisher Vector and Supervector
Implementation of SupervectorIn speech (speaker identification), supervector refer to
stacked means of adaptive GMMs.
Supervector =
Visual Recognition And Search 48 Columbia University, Spring 2013
Fisher Vector and Supervector
Interpretation with SupervectorOrigin distribution New distribution
Picture from Reynolds, Quatieri, and Dunn, DSP, 2001
Visual Recognition And Search 49 Columbia University, Spring 2013
Fisher Vector and Supervector
Normalization of SupervectorIn practice, a normalization process using the
covariance matrix often improves the performance
Moreover, we can subtract the original mean vector forthe ease of normalization
The representation is also called Hierarchical Gaussianization(HG).
Visual Recognition And Search 50 Columbia University, Spring 2013
Fisher Vector and Supervector
Fisher Vector
Let be the probability density function with para
[Jaakkola and Haussler , NIPS 98] suggested X can bedescribed by the derivative subject to
Now we can define the Fisher Kernelwhere is called Fisher information matrix
Visual Recognition And Search 51 Columbia University, Spring 2013
Fisher Vector and Supervector
Fisher Vector with GMM
Consider the Gaussian Mixture Model
We consider
With GMM, Fisher vectors can be obtained:
Let
The Fisher vector
Visual Recognition And Search 52 Columbia University, Spring 2013
Fisher Vector and Supervector
Comparison• Supervector
• Fisher vector
Visual Recognition And Search 53 Columbia University, Spring 2013
Fisher Vector and Supervector
Comparison
Diagonal covariance matrix
Posterior estimation ofDiagonal covariance with same derivation
The two representations are almost the same even withdifferent motivations.
Visual Recognition And Search 54 Columbia University, Spring 2013
Fisher Vector and Supervector
How to Code Your Own
• Learn from existing codehttp://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac)
• Learn from public GMM code• Be careful with pitfalls
- Probability is comparable to machine’s rounding error:compute logP instead of P
- Try different normalization strategy- Try to make the code efficient
Visual Recognition And Search 55 Columbia University, Spring 2013
Summary
Coding
Histogramof SIFT
Vector
Uncertainty-Based
Quantization
Soft
SparseCoding
Soft
Fisher vector/Supervector
GMMprobabilityquantization
Pooling
quantization quantizationestimation
Histogram GMMHistogramaggregation aggregation
Max poolingadptation
Visual Recognition And Search 56 Columbia University, Spring 2013
Visual Recognition And Search 57 Columbia University, Spring 2013
Summary
• Simplest model: histogram of visual words• Models with good illustration: PLSA, LDA, S-LTM, …
• Soft quantization: soft quantization and sparse coding• Very good performance: fisher vector or super vector
Visual Recognition And Search 58 Columbia University, Spring 2013