a bayesian hierarchical model for learning natural …venky/se263/slides/soumya_p1.pdfa bayesian...

A Bayesian Hierarchical Model for

Learning Natural Scene

Categories

L. Fei-Fei and P. Perona. CVPR 2005

Presented By

N. Soumya, ME (SSA)

Goal: Learn and Recognize Natural Scene

Categories

Classify a scene without first extracting objects.

The key idea is to use intermediate representation

(themes)before classifying scenes.

In previous work, such themes were learnt from hand-

annotations of experts, while method in this paper learns the

theme distributions as well as the codewords distribution over

the themes without supervision.

Visual Themes

zebra

grass

tree

Mixture Models

zebra grass treeNew image

= α1 + α2 + α3

Flow Chart: Quick Overview

Local Region Detection

Four Different ways of extracting local regions

1) Evenly sampled grid.

2) Random Sampling.

3) Kadir and Brady Saliency Detector.

4) Lowe's DOG Detector.

Local Region Representation

1) Normalised 11x11 pixel gray values.

2) 128-Dimensional SIFT Vector.

Codebook

• 174 code words .

• Code words are sorted in the descending order according to membership.

• Most dominant code words represent simple orientations and illumination patterns similar to the ones early human visual system responds to.

Topic Models for Learning & Recognition

The basic idea is that the documents are represented as randommixtures over latent topics (themes), where a topic is characterizedby a distribution over words.

Learning:

Achieve a model that best represents the distribution of these codewords in each category of scenes.

Recognition:

Identify all the codewords in the unknown image. Then we find the category model that fits best the distribution of the codewords of the particular image.

Algorithm used here is Latent Dirichlet Allocation (LDA) model proposed by Blei et al.[Ref-2]

w

N

c z

D

Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA)

“beach”

Latent Dirichlet Allocation

Analogy to document classification

• Image Document

• Mixture of themes Mixture of topics.

• Visual codeword Word.

• Each topic is represented as multinomial distribution over words with a Dirichlet prior .

• A document is generated by sampling mixture of topics and then words from that mixture.

• Distribution of words is also multiinomial

η – distribution of class labelsθ – parameter (estimated )c – class labelπ – distribution of themes for imagez – themex – patchβ – parameter (estimated)

Hierarchial Representation Of Scene Category Model

Observed Variable

Unobserved Variable

How to Generate an Image?

Given scene generate an intermediate

probability vector over „themes‟

Determine current theme from mixture

of themes

Choose a scene (mountain, beach, …)

For each word:

Draw a codeword from that theme

p(c|η)= Mult(c|η)

Theta and beta must be estimated before we can find the topic mixing

proportions belonging to a previously unseen image

Theta - a matrix which encodes the probability of observing a

codeword

w conditioned on a topic z.

Beta - a matrix which encodes the Dirichlet parameters for each image

class.

Must integrate over hidden variables π, z

Variational Inference in LDA

• Goal is to maximise the log likelihood term

log p(x/θ,β,c) by estimating optimal θ and

β. Unfortunately, this is intractable to

compute in general to compute due to

coupling between π and β.

• Variational Methods: Use Jensen‟s

inequality to obtain a lower bound

(variational distribution) on the log likelihood

that is indexed by a set of variational

parameters γ and φ.

Variational EM

This leads to variational EM algorithm

• (E Step) For each class of images, find the optimizing values of the variational parameters (γ, φ).

• (M Step) Maximize variational distribution w.r.t. θ, β for the γ and φ values found in the E step.

Iterate steps 1 and 2 till convergence.

Results

Results: The Distributions

Theme

distribution

Codeword

distribution

Theme Distribution

Testing Image Results

Correct Incorrect

Superimposed are the patches of most significant code words.In Incorrectlycategorized images, the number of significant codewords of the model tendsto occur less likely.

Performance Summary

References

1) L. Fei-Fei, P. Perona. A Bayesian Hierarchical Model

for Learning Natural Scene Categories. CVPR 2005.

2) D. M. Blei, A. Y. Ng and M. I. Jordan.

Latent Dirichlet Allocation. JMLR, 2003.

3) L. Fei Fei. Bag of words models.

CVPR 2007 Short Course. Presentation Slides.http: //

vision. cs. princeton. edu/ documents/ CVPR2007_

tutorial_ bag_of_ words. ppt

a bayesian hierarchical model for learning natural …venky/se263/slides/soumya_p1.pdfa bayesian...

Documents