bit 3193 multimedia database chapter 4 : quering multimedia databases

BIT 3193 MULTIMEDIA DATABASE

CHAPTER 4 : QUERING MULTIMEDIA DATABASES

• The structure of image is much less explicit.

• so need to apply techniques that will identify a structure

• characterizing the content of visual objects is much more complex and uncertain.

• characterized by feature vectors

• A feature is an attribute derived from transforming the original visual object by using an image analysis algorithm.

• The visual query mode involves matching the input image to pre-extracted features of real objects.

• pre-extracted features are held in the database

• Purpose:

• to extract a set of numerical features that removes redundancy from the image and reduces its dimension

• The most commonly used features for content-based image retrieval are shape, color and texture.

• A content-based image retrieval (CBIR) system uses image visual content features to retrieve relevant images from an image database.

• CBIR systems retrieve images according to specified features that users are interested in.

• features such as texture, color, shape, and location properties can reflect the contents of an image

Example:Trainable System for Object Detection

•A set of positive example images of the object class considered (e.g., images of frontal faces) and a set of negative examples (e.g., any non face image) are collected.•The images are transformed into (feature) vectors in a chosen representation

• (e.g., a vector of the size of the image with the values at each pixel location below this is called the “pixel” representation)

• The vectors (examples) are used to train a pattern classifier, the Support Vector Machine (SVM), to learn the classification task of separating positive from negative examples.

• To detect objects in out-of-sample images, the system slides a fixed size window over an image and uses the trained classifier to decide which patterns show the objects of interest.

• At each window position, the system extracts the same set of features as in the training step and feed them into the classifier; the classifier output determines whether or not it is an object of interest.

• Representation Technique for Face and People Detection

• Pixel Representation

• Eigen Vector Representation

• Wavelet Representation

• Multimedia data, such as images or video, are typically represented or stored as very high-dimensional vectors.

• The processing time for searching or performing other operations for such systems is highly impacted by the fact that the data are so high dimensional.

• It is therefore practically important to find compact representations of multimedia data, while at the same time not significantly affecting the performance of systems such as detection

• Can be based on:

• color

• using color histograms and color variants

• texture

• variation in intensity and topography of surfaces

• shape

• using aspect ratios

• circularity and moments for global features

• using boundary segments for local features

• Can be based on:

• position

• using spatial indexing

• image transformations

• using transformations

• appearance

• using a combination of color, texture and intensity surfaces

Feature Measures Theory Main use Problems

Color Histogram Swain and Ballard Color indexing Lighting variations

Texture

Pixel Intensity-illumination-topographyDegrees of-directionality-regularity-periodicity

Gabor filtersFractals

IndexingTexture thesaurus

Shape

Global features-aspect ratio-circularity-momentsLocal features- boundary segments

Active contoursShape indexingObject recognition

Spikes and holes in objects cause errors in indexing

Appearance

Global features-curvature-orientationLocal features- local curvatures and orientation

Transforms Image classification

Position Spatial relationships Tessellations (Voronoi) Object Recognition

Table 4.1 : Features used in retrieval

Feature Advantage Disadvantage

ColorCan be applied to all colored images, 2D and 3D

TextureDistinguishes between image regions with similar colore.g sea and sky

Large feature vectors each containing 4000 elements have been used

ShapeImportant in image segmentationCan classify images as stick like, plate like or blob like

Representation is difficultViewpoint change an object’s shapeSpikes and holes3D is very difficult

Appearance

Important way of judging similarityCan generate invariant measuresDescribe an image at varying levels of detail

Position Can be applied to 2D and 3D images

Images must contain objects in defined spatial relationshipsSpatial indexing not useful unless combined with color and texture

Feature Advantage Disadvantage

ColorCan be applied to all colored images, 2D and 3D

TextureDistinguishes between image regions with similar colore.g sea and sky

Large feature vectors each containing 4000 elements have been used

ShapeImportant in image segmentationCan classify images as stick like, plate like or blob like

Representation is difficultViewpoint change an object’s shapeSpikes and holes3D is very difficult

Appearance

Important way of judging similarityCan generate invariant measuresDescribe an image at varying levels of detail

Position Can be applied to 2D and 3D images

Images must contain objects in defined spatial relationshipsSpatial indexing not useful unless combined with color and texture

Table 4.2 : Advantages and Disadvantages of features methods of retrieval

• There are two alternative approaches:

• use a query image

• user can provide an image or compose a target image by selecting and clicking color palettes and texture patterns

• use user-defined features

• allow user to select a sample image

• query process

• the distribution of image objects is then recomputed in terms of the distance from sample image

• use automatic methods for generating content-dependent metadata

• speech recognition techniques is used for the identification of both speakers and the spoken words

• factors which influence the complexity of the identification problems encountered include:

• isolated words (easier to recognize)

• single speaker (one is easier)

• vocabulary size (smaller is easier)

• grammar (tightly constraint is easier)

• users can use query by example (QBE)

• the technologies used to achieve this have to be integrated and include:

• large vocabulary speech recognition

• speaker segmentation

• speaker clustering

• speaker identification

• name spotting

• topic classification

• story segmentation

• Videos are far more complex.

• Role of video feature extraction:

• image-based features

• motion-based features (e.g motion of the camera)

• object detection and tracking

• speech recognition

• speaker identification

• word spotting

• audio classification

Clip 1

Story 1 Story m

Shot 1 Shot 2

Shot k

Attributes

Index

Category

Title

Date

Source

Duration

Theme

Duration

Frame Start

Frame End

Number of Shots

Event

Keywords

Theme

Duration

Frame Start

Frame End

Camera

Audio Level

Frame Frame number

Clip

Scene / StorySegment

Shot captured between a record and stop camera operation

• Clip

• digital video document that can last from a few seconds to a few hours

• Scene

• sequential collection of shots unified by a common event or locale (background).

• a clip have one or more scenes

• Shot

• fundamental unit

• much research has focused on segmenting video by detecting boundary between camera shots

• defined as a sequence of frames captured by a single camera in a single continuous action in time and space

• example : two people having a conversation

• low-level syntactic building blocks of a video sequence

• The video operations are:

• create

• concatenate, union and intersection (based on temporal and spatial conditions)

• output

• Query example:

“ Show the details of movies where a character said “I am not interested in a semantic argument, I just need the protein”

Access control and rights management Query Processing

Query Presentation

IndexesVisual summariesVideo processing and annotation summaries

Digital video collection

Content delivery

Query results

Query inputs

User

Figure A : Video retrieval system

ISO/IEC 13249 (SQL/MM) •SQL Multimedia and Applications•Standardized in 2001 by ISO subcommittee SC32 Working Group

•Provides structured object types , methods to store, manipulate image data by content

•Supports OR (Object Relational) Data Model

Part 1: FrameworkPart 2: Full TextPart 3: SpatialPart 5: Still ImagePart 6: Data Mining

Object types that comply with the first edition of the

ISO/IEC 13249-5:2001 SQL MM Part5: StillImage standard

SI_AverageColor Object TypeDescribes the average color feature of an image.

SI_Color Object TypeEncapsulates color values of a digitized image.

SI_ColorHistogram Object TypeDescribes the relative frequencies of the colors exhibited by samples of an image.

SI_FeatureList Object TypeDescribes an image that is represented by a composite feature. The composite feature is based on up to four basic image features (SI_AverageColor, SI_ColorHistogram, SI_PositionalColor, and SI_Texture) and their associated feature weights.

SI_StillImage Object TypeRepresents digital images with inherent image characteristics such as height, width, format, and so on.

SI_PositionalColor Object TypeDescribes the positional color feature of an image. Assuming that an image is divided into n by m rectangles, the positional color feature characterizes an image by the n by m most significant colors of the rectangles.

SI_Texture Object TypeDescribes the texture feature of the image characterized by the size of repeating items (coarseness), brightness variations (contrast), and predominant direction (directionality).

Read the following website for further information on Oracle implementation of SQL/MM Still Image:

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14297/ch_stimgref.htm#CHDBAGID.

Example of media table for still Images defined as per SQL/MM standards

Given the following PM.SI_MEDIA table definition in Oracle implementation:

CREATE TABLE PM.SI_MEDIA( PRODUCT_ID NUMBER(6), PRODUCT_PHOTO SI_StillImage, AVERAGE_COLOR SI_AverageColor, COLOR_HISTOGRAM SI_ColorHistogram, FEATURE_LIST SI_FeatureList, POSITIONAL_COLOR SI_PositionalColor, TEXTURE SI_Texture, CONSTRAINT id_pk PRIMARY KEY (PRODUCT_ID));

Example1:•Construct an SI_AverageColor object from a specified color using the SI_AverageColor(averageColorSpec) constructor.

Solution:

DECLARE myColor SI_Color; myAvgColor SI_AverageColor;

BEGIN myColor := NEW SI_COLOR(null, null, null); myColor.SI_RGBColor(10, 100, 200); myAvgColor := NEW SI_AverageColor(myColor); INSERT INTO PM.SI_MEDIA (product_id, average_color)

VALUES (75, myAvgColor); COMMIT;

END;

Example 2:•Derive an SI_AverageColor value using the SI_AverageColor(sourceImage) constructor:

Solution:

DECLARE myimage SI_StillImage; myAvgColor SI_AverageColor;

BEGIN SELECT product_photo INTO myimage FROM PM.SI_MEDIA WHERE product_id=1; myAvgColor := NEW SI_AverageColor(myimage);

END;

Example 3:•Insert into PM.SI_MEDIA table an object with PRODUCT_ID = 1 and have average color of RED = 20, GREEN = 30 and BLUE = 50.

Solution:

DECLARE myColor SI_Color; myAvgColor SI_AverageColor;

BEGIN myColor := NEW SI_COLOR(null, null, null); myColor.SI_RGBColor(20, 30, 50); myAvgColor := NEW SI_AverageColor(myColor); INSERT INTO PM.SI_MEDIA (product_id, average_color) VALUES (1, myAvgColor); COMMIT;END;

Example 4:•Derive SI_AverageColor object for image with PRODUCT_ID = 13 using the SI_FindAvgClr() function.

Solution:

DECLARE myimage SI_StillImage; myAvgColor SI_AverageColor;

BEGIN SELECT product_photo INTO myimage FROM PM.SI_MEDIA WHERE product_id=13; myAvgColor := SI_FindAvgClr(myimage);END;

• In 2002, ISO subcommittee MPEG published a standard: MPEG-7

• Formally named Multimedia Content Description Interface• MPEG-4, the first Multimedia representation Standard

• Object based coding

• MPEG-7 , Currently the most complete description standard for multimedia data

• Any audio/visual material associated with multimedia data can be indexed & searched

• Provides• Set of descriptors (D)

Quantitative measures of audio/visual features

• Description Scheme (DS)

Structure of Descriptors & relationship

• MPEG-7 descriptions associated with

• Still pictures, graphics, 3D models, audio, speech, video

• Composition information about how these elements are combined in a multimedia presentation (scenarios)

• MPEG-7 descriptions do not depend on the ways the described content is coded or stored

• It is possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper, in the same way as of digitized content.

• MPEG-7 can exploit the advantages provided by MPEG-4 coded content

• Material encoded using MPEG-4 provides the means to encode audio-visual material as• Objects having certain relations in time

(synchronization) and space (on the screen for video, or in the room for audio),

• Possible to attach descriptions to elements (objects) within the scene, such as audio and visual objects

• Same material can be described using different types of features, tuned to the area of application

• Eg : A visual material:

• Lower abstraction level would be a description of shape, size, texture, color, movement (trajectory) and position (“where in the scene can the object be found?”)

• The highest level would give semantic information: “This is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background”

• Apart from the description what is depicted in the content, Following additional information about the multimedia data:

• The form (e.g. JPEG, MPEG-2),• The overall data size (helps determining whether the material

can be “read” by the user terminal)

• Conditions for accessing the material (Includes links to a registry with intellectual property rights information, and price)

• Classification -(Includes parental rating, and content classification into a number of pre-defined categories)

• Links to other relevant material -(helps the user speeding up the search)

• The context -( The occasion of the recording, Like Olympic Games 1996, final of 200 meter hurdles, men )

• Main elements of the MPEG-7 standard

• Description Tools: Descriptors (D), Description Schemes (DS),

• A Description Definition Language (DDL) • Defines the syntax of the MPEG-7 Description Tools and to

allow the creation of new Description Schemes • System tools

• Supports binary coded representation for efficient storage and transmission, transmission mechanisms (both for textual and binary formats), multiplexing of descriptions, synchronization of descriptions with content, management and protection of intellectual property in MPEG-7 descriptions, etc.

• The key info that the description tools capture includes

• Structural information on spatial, temporal or spatio-temporal components of the content (scene cuts, segmentation in regions, region motion tracking).

• Low level features in the content (colors, textures, sound timbres, melody description).

• Conceptual information of the reality captured by the content (objects and events, interactions among objects).

• Information about how to browse the content in an efficient way (summaries, variations, spatial and frequency subbands,).

• Information about collections of objects.

• Information about the interaction of the user with the content (user preferences, usage history)

Scope of MPEG-7

MPEG-7 Main Elements

Abstract representation of possible applications using MPEG-7

Integration of MPEG-7 into MMDBMS

•MPEG-7 relies on XML Schema, mapping strategies from XML to database data model is an issue!!!

•SQL/MM , Querying

• Due the rich description provided by MPEG-7, enhancements in SQL/MM is needed

• Operations that manipulate, produce as results, an XML is an option

•Indexing methods for multidimensional data can be used to index multimedia data

• MPEG-7 Provides methods for semantic indexing!!!

More on MPEG-7 can be found from

ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO- MPEG-7 Overview http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-

7.htm

bit 3193 multimedia database chapter 4 : quering multimedia databases

Documents

image visual content

image example

image database

structure of image

set of features

input image

used features

global features