class 3: feature coding and pooling

58
Class 3: Feature Coding and Poolin Liangliang Cao, Feb 7, 2013 EECS 6890 - Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch

Upload: zilya

Post on 16-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Class 3: Feature Coding and Pooling. Liangliang Cao, Feb 7, 2013. EECS 6890 - Topics in Information Processing. Spring 2013, Columbia University. http://rogerioferis.com/VisualRecognitionAndSearch. Problem. The Myth of Human Vision. Can you understand the following?. 婴儿 बचचा. 아가 bebé. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Class 3: Feature Coding and Pooling

Class 3: Feature Coding and Pooling

Liangliang Cao, Feb 7, 2013EECS 6890 - Topics in Information Processing

Spring 2013, Columbia Universityhttp://rogerioferis.com/VisualRecognitionAndSearch

Page 2: Class 3: Feature Coding and Pooling

Problem

The Myth of Human Vision

Can you understand the following?

婴儿 बचचा

People may have difficultiesto understand differenttexts, but NOT images.

아가 bebé

Photo courtesy to lusterVisual Recognition And Search 2 Columbia University, Spring 2013

Page 3: Class 3: Feature Coding and Pooling

Problem

Can the computer vision systemrecognize objects or scenes like

human?

Visual Recognition And Search 3 Columbia University, Spring 2013

Page 4: Class 3: Feature Coding and Pooling

Why This Problem Important?

Traditional companies Searching engines Mobile Apps

Visual Recognition And Search 4 Columbia University, Spring 2013

Page 5: Class 3: Feature Coding and Pooling

Problem

Examples of Object Recognition Dataset

http://www.vision.caltech.edu

Visual Recognition And Search 5 Columbia University, Spring 2013

Page 6: Class 3: Feature Coding and Pooling

Problem

http://groups.csail.mit.edu/vision/SUN/Visual Recognition And Search 6 Columbia University, Spring 2013

Page 7: Class 3: Feature Coding and Pooling

Overview of Classification Model

Coding

Histogramof SIFT

Vector

Uncertainty-Based

Quantization

Soft

SparseCoding

Soft

Fisher vector/Supervector

GMMprobabilityquantization

Pooling

quantization quantizationestimation

Histogram GMMHistogramaggregation aggregation

Max poolingadptation

Coding: to map local features into a compact representation

Pooling: to aggregate these compact representation together

Visual Recognition And Search 7 Columbia University, Spring 2013

Page 8: Class 3: Feature Coding and Pooling

Outline

Outlines

• Histogram of local features• Bag of words model

• Soft quantization and sparse coding• Fisher vector and supervector

Visual Recognition And Search 8 Columbia University, Spring 2013

Page 9: Class 3: Feature Coding and Pooling

Histogram of Local Features

Visual Recognition And Search 9 Columbia University, Spring 2013

Page 10: Class 3: Feature Coding and Pooling

Bag of Words Models

Recall of Last Class• Powerful local features

- DoG- Hessian, Harris- Dense-sampling

Non-fixed number oflocal regions per image!

Visual Recognition And Search 10 Columbia University, Spring 2013

Page 11: Class 3: Feature Coding and Pooling

Bag of Words Models

Recall of Last Class (2)• Histograms can provide a fixed size representation ofimages

• Spatial pyramid/gridding can enhance histogrampresentation with spatial information

Visual Recognition And Search 11 Columbia University, Spring 2013

Page 12: Class 3: Feature Coding and Pooling

Bag of Words Models

Histogram of Local Features

…..codewords dim = # codewords

Visual Recognition And Search 12 Columbia University, Spring 2013

Page 13: Class 3: Feature Coding and Pooling

Bag of Words Models

Histogram of Local Features (2)

……

dim = #codewords x #grids

Visual Recognition And Search 13 Columbia University, Spring 2013

Page 14: Class 3: Feature Coding and Pooling

Bag of Words Models

Local Feature Quantization

Slide courtesy to Fei-Fei Li

Visual Recognition And Search 14 Columbia University, Spring 2013

Page 15: Class 3: Feature Coding and Pooling

Bag of Words Models

Local Feature Quantization…

Visual Recognition And Search 15 Columbia University, Spring 2013

Page 16: Class 3: Feature Coding and Pooling

Bag of Words Models

Local Feature Quantization

- Vector quantization- Dictionary learning

Visual Recognition And Search 16 Columbia University, Spring 2013

Page 17: Class 3: Feature Coding and Pooling

Histogram of Local Features

Dictionary for Codewords

Visual Recognition And Search 17 Columbia University, Spring 2013

Page 18: Class 3: Feature Coding and Pooling

Bag of Words Models

Most slides in this section are courtesy to Fei-Fei Li

Visual Recognition And Search 18 Columbia University, Spring 2013

Page 19: Class 3: Feature Coding and Pooling

Object Bag of ‘words’

Visual Recognition And Search 19 Columbia University, Spring 2013

Page 20: Class 3: Feature Coding and Pooling

Bag of Words ModelsUnderlining Assumptions - Text

Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on the

messaFor a l

image sensory, brain,center

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The Commerce

Ministry said the surplus would be created bya pred

compa$660b China, trade,

annoymovieimagediscovknow tpercepmore cfollowito theHubeldemonimagewise astoredhas itsa specimage.

visual, perception,retinal, cerebral cortex,

eye, cell, opticalnerve, imageHubel, Wiesel

China'delibeagreeyuan igoveralso ndemacountryuan apermitthe USfreely.it will tallowi

surplus, commerce,exports, imports, US,yuan, bank, domestic,

foreign, increase,trade, value

Visual Recognition And Search 20 Columbia University, Spring 2013

Page 21: Class 3: Feature Coding and Pooling

Bag of Words Models

Underlining Assumptions - Image

Visual Recognition And Search 21 Columbia University, Spring 2013

Page 22: Class 3: Feature Coding and Pooling

learning

feature detection& representation

image representation

category models(and/or) classifiers

recognition

K-means

categorydecision

Page 23: Class 3: Feature Coding and Pooling

Bag of Words Models

Borrowing Techniques from Text Classification• PLSA

• Naïve Bayesian Model

• wn: each patch in an image- wn = [0,0,…1,…,0,0T

• w: a collection of all N patches in an image- w = [w1,w2,…,wN]

• dj: the jth image in an image collection• c: category of the image

• z: theme or topic of the patch

Visual Recognition And Search 23 Columbia University, Spring 2013

Page 24: Class 3: Feature Coding and Pooling

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

d z wND Hoffman, 2001

Latent Dirichlet Allocation (LDA)

c z w

ND Blei et al., 2001

Visual Recognition And Search 24 Columbia University, Spring 2013

Page 25: Class 3: Feature Coding and Pooling

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

d z wND

“face”

Visual Recognition And Search 25 ColSivic et al. ICCV 200 5

Page 26: Class 3: Feature Coding and Pooling

Bag of Words Models

Kd z wND

p(w| di j ) k1

p(wi |zk)p(zk | d j )

Observed codeword Codeword distributions Theme distributionsdistributions per theme (topic) per image

Parameter estimated by EM or Gibbs sampling

Visual Recognition And Search 26 Columbia University, Spring 2013Slide credit: Josef Sivic

Page 27: Class 3: Feature Coding and Pooling

Bag of Words Models

Recognition using pLSA

z arg max p(z|d)z

Visual Recognition And Search 27 Columbia UnSlide credit: Josef Siv ic

Page 28: Class 3: Feature Coding and Pooling

Bag of Words Models

Scene Recognition using LDA

“beach”

Latent Dirichlet Allocation (LDA)

c z w

ND Fei-Fei et al. ICCV 2005

Visual Recognition And Search 28 Columbia University, Spring 2013

Page 29: Class 3: Feature Coding and Pooling

Bag of Words Models

Spatial-Coherent Latent Topic Model

Cao and Fei-Fei, ICCV 2007

Visual Recognition And Search 29 Columbia University, Spring 2013

Page 30: Class 3: Feature Coding and Pooling

Bag of Words Models

Simultaneous Segmentation and Recognition

Visual Recognition And Search 30 Columbia University, Spring 2013

Page 31: Class 3: Feature Coding and Pooling

Bag of Words Models

Pros and ConsImages differ from texts!

Bag of Words Models are good in- Modeling prior knowledge

- Providing intuitive interpretation

But these models suffer from- Loss of information in quantization of “visual words”

- Loss of spatial informationBetter coding

Better poolingVisual Recognition And Search 31 Columbia University, Spring 2013

Page 32: Class 3: Feature Coding and Pooling

Soft Quantization and Sparse Coding

Visual Recognition And Search 32 Columbia University, Spring 2013

Page 33: Class 3: Feature Coding and Pooling

Soft Quantization

Hard Quantization

Visual Recognition And Search 33 Columbia University, Spring 2013

Page 34: Class 3: Feature Coding and Pooling

Soft Quantization

Uncertainty-Based Quantization

Model the uncertainty across multiple codewords

Gemert et al, Visual Word Ambiguity, PAMI 2009

Visual Recognition And Search 34 Columbia University, Spring 2013

Page 35: Class 3: Feature Coding and Pooling

Soft Quantization

Intuition of UNC

Hard quantization

Soft quantization based on uncertainty

Gemert et al, Visual Word Ambiguity, PAMI 2009

Visual Recognition And Search 35 Columbia University, Spring 2013

Page 36: Class 3: Feature Coding and Pooling

Soft Quantization

Improvement of UNC

Visual Recognition And Search 36 Columbia University, Spring 2013

Page 37: Class 3: Feature Coding and Pooling

Soft Quantization

Sparse Coding-Based Quantization

Hard quantization can be viewed as an “extremelysparse representation”

s.t.

A more general but hard to solve representation

In practice we considerSparse coding

Visual Recognition And Search 37 Columbia University, Spring 2013

Page 38: Class 3: Feature Coding and Pooling

Soft Quantization

Sparse Coding-Based Quantization

Hard quantization can be viewed as an “extremelysparse representation”

s.t.

A more general but hard to solve representation

In practice we considerSparse coding

Visual Recognition And Search 38 Columbia University, Spring 2013

Page 39: Class 3: Feature Coding and Pooling

Soft Quantization

Yang et al obtain goodrecognition accuracy bycombining sparse

coding with spatialpyramid and dictionarytraining.

More details will be ingroup presentation.

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009

Visual Recognition And Search 39 Columbia University, Spring 2013

Page 40: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

One of the most powerful image/video classification techniques

Thanks to Zhen Li and Qiang Chen constructive suggestions to this section

Visual Recognition And Search 40 Columbia University, Spring 2013

Page 41: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Winning Systems

2009 2010 2011 2012

Classification task

2010 2011

Large Scale VisualRecognition Challenge

Visual Recognition And Search 41 Columbia University, Spring 2013

Page 42: Class 3: Feature Coding and Pooling

Fisher Vector and SupervectorLiterature

[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis.

Supervector

Fisher Vector

ACM Multimedia 2008[3] Zhou et al, Hierarchical Gaussianization for imageclassification. ICCV 2009: 1971-1977[4] Zhou et al, Image classification using super-vector

coding of local image descriptors. ECCV 2010

[5] Perronnin et al, Improving the Fisher kernel forlarge-scale image classification, ECCV 2010

[6] Jégou et al, Aggregating local image descriptorsinto compact codes PAMI 2011.

These papers are not very easy to read.Let me take a simplified perspective via coding&pooling framework

Visual Recognition And Search 42 Columbia University, Spring 2013

Page 43: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Coding without Information Loss• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Visual Recognition And Search 43 Columbia University, Spring 2013

Page 44: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Coding without Information Loss• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Visual Recognition And Search 44 Columbia University, Spring 2013

Page 45: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

An Intuitive IllustrationCoding

Visual Recognition And Search 45 Columbia University, Spring 2013

Page 46: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

An Intuitive IllustrationCoding Component 1 Component 2 Component 3

Visual Recognition And Search 46 Columbia University, Spring 2013

Page 47: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

An Intuitive Illustration

Component 1

+

+

Pooling

Component 2

+

+

Component 3

+

+

Visual Recognition And Search 47 Columbia University, Spring 2013

Page 48: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Implementation of SupervectorIn speech (speaker identification), supervector refer to

stacked means of adaptive GMMs.

Supervector =

Visual Recognition And Search 48 Columbia University, Spring 2013

Page 49: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Interpretation with SupervectorOrigin distribution New distribution

Picture from Reynolds, Quatieri, and Dunn, DSP, 2001

Visual Recognition And Search 49 Columbia University, Spring 2013

Page 50: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Normalization of SupervectorIn practice, a normalization process using the

covariance matrix often improves the performance

Moreover, we can subtract the original mean vector forthe ease of normalization

The representation is also called Hierarchical Gaussianization(HG).

Visual Recognition And Search 50 Columbia University, Spring 2013

Page 51: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Fisher Vector

Let be the probability density function with para

[Jaakkola and Haussler , NIPS 98] suggested X can bedescribed by the derivative subject to

Now we can define the Fisher Kernelwhere is called Fisher information matrix

Visual Recognition And Search 51 Columbia University, Spring 2013

Page 52: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Fisher Vector with GMM

Consider the Gaussian Mixture Model

We consider

With GMM, Fisher vectors can be obtained:

Let

The Fisher vector

Visual Recognition And Search 52 Columbia University, Spring 2013

Page 53: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Comparison• Supervector

• Fisher vector

Visual Recognition And Search 53 Columbia University, Spring 2013

Page 54: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

Comparison

Diagonal covariance matrix

Posterior estimation ofDiagonal covariance with same derivation

The two representations are almost the same even withdifferent motivations.

Visual Recognition And Search 54 Columbia University, Spring 2013

Page 55: Class 3: Feature Coding and Pooling

Fisher Vector and Supervector

How to Code Your Own

• Learn from existing codehttp://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac)

• Learn from public GMM code• Be careful with pitfalls

- Probability is comparable to machine’s rounding error:compute logP instead of P

- Try different normalization strategy- Try to make the code efficient

Visual Recognition And Search 55 Columbia University, Spring 2013

Page 56: Class 3: Feature Coding and Pooling

Summary

Coding

Histogramof SIFT

Vector

Uncertainty-Based

Quantization

Soft

SparseCoding

Soft

Fisher vector/Supervector

GMMprobabilityquantization

Pooling

quantization quantizationestimation

Histogram GMMHistogramaggregation aggregation

Max poolingadptation

Visual Recognition And Search 56 Columbia University, Spring 2013

Page 57: Class 3: Feature Coding and Pooling

Visual Recognition And Search 57 Columbia University, Spring 2013

Page 58: Class 3: Feature Coding and Pooling

Summary

• Simplest model: histogram of visual words• Models with good illustration: PLSA, LDA, S-LTM, …

• Soft quantization: soft quantization and sparse coding• Very good performance: fisher vector or super vector

Visual Recognition And Search 58 Columbia University, Spring 2013