cs848 similarity search in multimedia databases dr. gisli hjaltason content-based retrieval using...

33
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases Perspective Laurent Amsaleg & Patrick Gros IRISA-CNRS, Campus de Beaulieu, Rennes, France Presented by: Wei Jiang February 19, 2003

Upload: ami-turner

Post on 18-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

CS848 Similarity Search in Multimedia DatabasesDr. Gisli Hjaltason

Content-based Retrieval Using Local Descriptors: Problems and Issues

from Databases Perspective

Laurent Amsaleg & Patrick Gros

IRISA-CNRS, Campus de Beaulieu, Rennes, France

Presented by: Wei Jiang

February 19, 2003

Page 2: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 3: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Introduction Descriptor

– Root of image content-based retrieval system– Represented by multi-dimentional vectors of real numbers– Encodes specific information extracted from an image– Invariant to some types of variations

Content-based retrieval– Image processing techniques to extract descriptors from

images• Large-grain recognition: color histogram, grey-level histogram• Fine-grain recognition: local descriptors

– Database techniques to store descriptors and accelerate searches

• Dimensional curse• Pyramid-Tree and VA-File

Page 4: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Introduction (con’d)

Motivation:

To explore the consequences of using local descriptors together with up-to-date database multi-dimensional indexing strategies.

Page 5: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 6: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Global vs. Local Descriptors

Global Descriptor Local Descriptor

Computes one descriptor per image

Cannot identify elements within images

Lower computation cost Smaller database size Lower searching cost Globally robust

Computes several descriptors per image

Can identify elements within images

Higher computation cost Bigger database size Higher searching cost Increase the recognition

power

Page 7: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 8: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Extracting Interest Points

Interest Points– Points in one image that will be also found in similar images– Where the signal changes 2-dimensionally

Page 9: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Harris Corner Detector

Gaussian filter to avoid image noise– Gaussian function:

– Gaussian distribution and discrete approximation (with mean (0,0) and =1)

Page 10: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Harris Corner Detector

Gaussian filter to avoid image noise– Computing the convolution of the original signal with

Gaussian function to get the smoothed signal I

Computing the eigenvalues of the matrix

Significant values of the eigenvalues indicate an interest point

Page 11: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 12: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Computing Local Descriptor

Goal:– To make local descriptors invariant w. r. t. the type of

variations

Computing in two steps:– Computing the derivatives of the smoothed signal I (up to

the third order), to provide a basic description– Mixing the derivatives to enforce invariance

properties and to make descriptors robust to changes

Page 13: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Local Descriptors for Grey-level Images

Gaining translational and rotational invariance– Nine invariant quantities to eliminate the angle of

rotation

Gaining photometric invariance– Illumination is modelled by I --> aI + b

Gaining scale invariance– Multi-scale approach

F is a function, a is a scale. F(x)= G(ax).

Page 14: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 15: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Extension to Colour Images

RGB system Extracting interest points by Harris detector Computing the derivatives separately for every

channel Mixing the derivatives

– Gaining rotational invariance– Gaining scale invariance– Gaining photometric invariance

Page 16: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 17: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Database Indexing Techniques

Problem:– The space for descriptor gets too big to fit in main memory

Solution:– To store descriptors on disks– Multi-dimensional index structures to accelerate searches

Goal:– To minimize the resulting number of I/Os

Page 18: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 19: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Traditional Approaches

Data-partitioning index methods– Dividing the data space according to the distribution of data

• R-Tree: Minimum bounding rectangles and overlaps

• SS-Tree: bounding spheres instead of rectangles

• SR-Tree: intersection of a bounding sphere and a bounding rectangle

• TV-Tree: divide the dimensions into three classes

– Drawback: in high-dimensional space, the probability of accessing every index page gets close to 1

Page 20: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Traditional Approaches

Space-partitioning indexing– Dividing the data space along predefined lines, regardless of

the actual data clustering

Grid-file, KDB-Tree, etc.

– Drawbacks• Inefficient in high-dimensional space

• Indexing large volumes of empty space

• When the query point is near a cell boundary, the search cost is increased.

Page 21: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 22: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

VA-File

To improve the sequential search Splitting each dimension, and encoding the grid cells A file storing all the descriptors Another file storing the geometrical approximations of

these descriptors associates a descriptor id to a cell # Searching algorithm computes geometrical

approximation, determines close cells, and scans them in an increasing order of distance

Page 23: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Pyramid-Tree

Dividing a space into 2xd pyramids, the top of each pyramid is the center of the data space

Each pyramid is cut into slices parallel to its base Any point of the multi-dimensional space is mapped

into a pair(pyramid number, height in the pyramid) A given slice of a specific pyramid is stored as a page

of B+ tree The number of slices increases linearly (and not

exponentially) with the number of dimensions

Page 24: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 25: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Performance Evaluations

Experimental environment– SUN Ultra 5 workstation running SunOS 5.7– CPU 333MHz UltraSPARC-Iii– Main memory 384 Mb– Local secondary storage 8 Gb

Databases– One color image database– One grey-level image database– Third one for recognition evaluation test only

Page 26: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Experiments

Comparing the recognition power of color and grey descriptors

Influence of the Dimensionality of Data

Influence of the Database Size

Impact of the Number of Descriptors in a Request

Page 27: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases
Page 28: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Experiment 2: Influence of the Dimensionality of Data

Page 29: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Experiment 3: Influence of the Database Size

Page 30: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Experiment 4: Impact of the Number of Descriptors in a Request

Page 31: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Outline

Introduction Image Processing Techniques

– Global vs Local Descriptors– Extracting Interest Points– Computing Local Descriptors– Extension to Colour Images

Database Indexing Techniques– Traditional Approaches– VA-File and Pyramid-Tree

Performance Evaluation Conclusion & Perspectives

Page 32: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Conclusion and Perspectives

Slowdown is caused by the dimensionality of data, the size of database, and the number of descriptors.

The three efficient multi-dimensional indexing techniques do NOT efficiently cope with the fine-grain recognition with local descriptors.

It’s crucial to come up with new indexing techniques specially designed to efficiently support the use of local descriptors.

Page 33: CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases

Research Direction

Numerous local descriptors for a single query creates redundancy

Exploit the distribution of data to accelerate the queries

Change the management of memory to benefit from consecutive queries

Using several low-dimension indexes instead of a unique high-dimension index