shiri gordon electrical engineering – system, faculty of engineering, tel-aviv university

27
Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University Under the supervision of: Doctor Hayit Greenspan

Upload: balin

Post on 21-Mar-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles. Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University Under the supervision of: Doctor Hayit Greenspan. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Unsupervised Image Clustering using

Probabilistic Continuous Models and

Information Theoretic Principles

Shiri Gordon Electrical Engineering – System, Faculty of Engineering,

Tel-Aviv University

Under the supervision of: Doctor Hayit Greenspan

Page 2: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Introduction : Content-Based Image Retrieval (CBIR)

• The interest in Content-Based Image Retrieval (CBIR) and efficient image search algorithms has grown out of the necessity of managing large image databases

• Most CBIR systems are based on search-by-query– The user provides an example image– The database is searched exhaustively for

images which are most similar to the query

Page 3: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

CBIR: Issues

• Image representation

• Distance measure between images

• Image search algorithms

• Qbic - IBMBlobworld – BerkeleyPhotobook – MITVisualSEEk – Colombia

Page 4: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

What is Image Clustering ??

• Performing supervised / unsupervised mapping of the archive images into classes

• The classes should provide the same information about the image archive as the entire image collection

Page 5: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Why do we need Clustering ??

• Faster search-by-query algorithms

• Browsing environment

• Image categorization

Queryimage

Clustercenter

Images

Page 6: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Why do we need Clustering ??

• Browsing environment

• Image categorization

• Faster search-by-query algorithms

Clustercenter

Images

Page 7: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Why do we need Clustering ??“Yellow”

“Blue”

“Green”

• Browsing environment

• Image categorization

• Faster search-by-query algorithms

Clustercenter

Images

Page 8: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

GMM-IB System Block-DiagramClustering via

Information-Bottleneck (IB) method

Image GMM

Cluster GMMImages Image

Clusters

Page 9: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

• Feature space= color (CIE-lab); Spatial (x,y); …

• Grouping the feature vectors in a 5-dimensional space

• Image is modeled as a Gaussian mixture distribution in feature space

Image Representation[ “Blobworld”: Belongie, Carson, Greenspan, Malik, PAMI 2002]

Pixels Feature vectors Regions

Page 10: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Image Representation via Gaussian Mixture Modeling (GMM)

• Feature Space GMM

• Parameter set :

• Expectation-maximization (EM) algorithm- to determine the maximum likelihood parameters of a mixture of k Gaussians

– Initialization of the EM algorithm via K-means– Model selection via MDL (Minimum Description Length)

EM

1

1

1 1( | ) exp ( )( )2(2 ) | |

kT

j jj jdj j

f y yy

10 , 1

,

k

j jj

dj jR is a d d positive definite matrix

1{ , , }kj j j j

Page 11: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

5-dimensional space:Color (L*a*b)&Spatial (x,y)

GMM

Page 12: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Category GMM

Images Image Models Category Model

• Variability in colors per spatial location

• Variability in location per spatial color

Page 13: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

8.529.128.714.4(4)flowers

27.714.236.330.2(3)sunset

30.442.110.429.6(2)snow

16.434.832.56.5(1)monkey

(4)(3)(2)(1)Image\category

• KL distance between Image model to category model:

• Kullback-Leibler (KL) distance between distributions:

GMM – KL Framework [Greenspan, Goldberger, Ridel . CVIU 2001]

1

( )( ) 1( || ) log log( ) ( )

nI ItI

I C fItC C It

f xf xD f f Ef x n f x

Imagedistribution

Category distribution

Feature setextracted

from image

Data setsize

Page 14: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

• The desired clustering is the one that minimizes the loss of mutual information between objects and features extracted from them

• The information contained in the objects about the features is ‘squeezed’ through a compact ‘bottleneck’ of clusters

Unsupervised Clustering using the Information-Bottleneck (IB) principle

•N.Slonim, N.Tishby. In Proc. of NIPS 1999

Page 15: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Clusters

Information Bottleneck Principle Motivation

| |max ( ; )c K

I C Y

FeaturesNumber ofrequired clusters

min ( ( ; ) ( ; ))C

I X Y I C Y

Objects

Page 16: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

• The minimization problem posed by the IB principle can be approximated by various algorithms using a greedy merging criterion:

Information Bottleneck Principle Greedy Criterion

1 2( , ) ( , ) ( , )before afterd c c I C Y I C Y

1 21 2

, 1,2 1 2

( , ) ( , )( , ) log ( , ) log( ) ( ) ( ) ( )

ii

y i yi

p c y p c c yp c y p c c yp c p y p c c p y

1 21,2

( ) ( ( | ) || ( | ))i ii

p c D p y c p y c c

KL distance:Prior probability ( || ) logffD f g Eg

Page 17: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

GMM-IB Framework

Image

clusters

Images

Prior probability

KL distance

1 ( | )| | X C

GMM p y XC

1 2 1 21,2

( , ) ( ) ( ( | ) || ( | ))i ii

d c c p c D p y c p y c c

min ( ( ; ) ( ; ))C

I X Y I C Y

Feature vectors

Page 18: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Example 8

7

6

5

4

3

2

1

0

Page 19: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

ResultsAIB - Optimum number of clusters

Loss of mutual information during the clustering process

Page 20: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

ResultsAIB - Generated Tree

?

Page 21: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Mutual Information as a quality measure

( , )( ; ) ( , ) log( ) ( )x X y Y

p x yI X Y p x yp x p y

• The reduction in the uncertainty of X based on the knowledge of Y:

• No closed-form expression for a mixture of Gaussian distribution

• The greedy criterion derived from the IB principle provides a tool for approximating this measure

Page 22: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Mutual Information as a quality measureExample

C1 C2 C3

C1 C2 C3

I(C;Y) 1.51 1.32 1.18I(X;Y) 2.73 2.72 2.72

Page 23: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Results

• Image database of 1460 images selectively hand-picked from the COREL database to create 16 labeled categories

• Building the GMM model for each image

• Applying the various algorithms, using various image representations to the database

Page 24: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

ResultsRetrieval Experiments

Clustering for efficient retrieval

Comparing between clustering methodologies

Page 25: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

ResultsMutual Information as a quality measure

• Comparing between image representations

1.67SIB + average GMM1.68K-means + reduced GMM1.63AIB

I(C;Y)Clustering method • Comparing between clustering algorithms

Page 26: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Summary• Image clustering is done using the IB method

• IB is applied on continuous representations of images and categories with Gaussian Mixture Models

• From the AIB algorithm :– We conclude the optimal number of clusters in the database– We have a “built-in” distance measure– The database is arranged in a tree structure that provides a browsing

environment and more efficient search algorithms– The tree can be modified using algorithms like the SIB and K-means

to achieve a more stable solution

Page 27: Shiri Gordon  Electrical Engineering – System, Faculty of Engineering,  Tel-Aviv University

Future Work

• Making the current framework more feasible for large databases:

– A simpler approximation for the KL-distance– Incorporating the reduced category GMM into the clustering algorithms

• Performing relaxation on the hierarchical tree structure

• Using the tree structure for the creation of a “user-friendly” environment

• Extending the feature space