a bayesian hierarchical model for learning natural …venky/se263/slides/soumya_p1.pdfa bayesian...
TRANSCRIPT
A Bayesian Hierarchical Model for
Learning Natural Scene
Categories
L. Fei-Fei and P. Perona. CVPR 2005
Presented By
N. Soumya, ME (SSA)
Goal: Learn and Recognize Natural Scene
Categories
Classify a scene without first extracting objects.
The key idea is to use intermediate representation
(themes)before classifying scenes.
In previous work, such themes were learnt from hand-
annotations of experts, while method in this paper learns the
theme distributions as well as the codewords distribution over
the themes without supervision.
Visual Themes
zebra
grass
tree
Mixture Models
zebra grass treeNew image
= α1 + α2 + α3
Flow Chart: Quick Overview
Local Region Detection
Four Different ways of extracting local regions
1) Evenly sampled grid.
2) Random Sampling.
3) Kadir and Brady Saliency Detector.
4) Lowe's DOG Detector.
Local Region Representation
1) Normalised 11x11 pixel gray values.
2) 128-Dimensional SIFT Vector.
Codebook
• 174 code words .
• Code words are sorted in the descending order according to membership.
• Most dominant code words represent simple orientations and illumination patterns similar to the ones early human visual system responds to.
Topic Models for Learning & Recognition
The basic idea is that the documents are represented as randommixtures over latent topics (themes), where a topic is characterizedby a distribution over words.
Learning:
Achieve a model that best represents the distribution of these codewords in each category of scenes.
Recognition:
Identify all the codewords in the unknown image. Then we find the category model that fits best the distribution of the codewords of the particular image.
Algorithm used here is Latent Dirichlet Allocation (LDA) model proposed by Blei et al.[Ref-2]
w
N
c z
D
Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA)
“beach”
Latent Dirichlet Allocation
Analogy to document classification
• Image Document
• Mixture of themes Mixture of topics.
• Visual codeword Word.
• Each topic is represented as multinomial distribution over words with a Dirichlet prior .
• A document is generated by sampling mixture of topics and then words from that mixture.
• Distribution of words is also multiinomial
η – distribution of class labelsθ – parameter (estimated )c – class labelπ – distribution of themes for imagez – themex – patchβ – parameter (estimated)
Hierarchial Representation Of Scene Category Model
Observed Variable
Unobserved Variable
How to Generate an Image?
Given scene generate an intermediate
probability vector over „themes‟
Determine current theme from mixture
of themes
Choose a scene (mountain, beach, …)
For each word:
Draw a codeword from that theme
p(c|η)= Mult(c|η)
Theta and beta must be estimated before we can find the topic mixing
proportions belonging to a previously unseen image
Theta - a matrix which encodes the probability of observing a
codeword
w conditioned on a topic z.
Beta - a matrix which encodes the Dirichlet parameters for each image
class.
Must integrate over hidden variables π, z
Variational Inference in LDA
• Goal is to maximise the log likelihood term
log p(x/θ,β,c) by estimating optimal θ and
β. Unfortunately, this is intractable to
compute in general to compute due to
coupling between π and β.
• Variational Methods: Use Jensen‟s
inequality to obtain a lower bound
(variational distribution) on the log likelihood
that is indexed by a set of variational
parameters γ and φ.
Variational EM
This leads to variational EM algorithm
• (E Step) For each class of images, find the optimizing values of the variational parameters (γ, φ).
• (M Step) Maximize variational distribution w.r.t. θ, β for the γ and φ values found in the E step.
Iterate steps 1 and 2 till convergence.
Results
Results: The Distributions
Theme
distribution
Codeword
distribution
Theme Distribution
Testing Image Results
Correct Incorrect
Superimposed are the patches of most significant code words.In Incorrectlycategorized images, the number of significant codewords of the model tendsto occur less likely.
Performance Summary
References
1) L. Fei-Fei, P. Perona. A Bayesian Hierarchical Model
for Learning Natural Scene Categories. CVPR 2005.
2) D. M. Blei, A. Y. Ng and M. I. Jordan.
Latent Dirichlet Allocation. JMLR, 2003.
3) L. Fei Fei. Bag of words models.
CVPR 2007 Short Course. Presentation Slides.http: //
vision. cs. princeton. edu/ documents/ CVPR2007_
tutorial_ bag_of_ words. ppt