bit 3193 multimedia database chapter 4 : quering multimedia databases
TRANSCRIPT
BIT 3193 MULTIMEDIA DATABASE
CHAPTER 4 : QUERING MULTIMEDIA DATABASES
• The structure of image is much less explicit.
• so need to apply techniques that will identify a structure
• characterizing the content of visual objects is much more complex and uncertain.
• characterized by feature vectors
• A feature is an attribute derived from transforming the original visual object by using an image analysis algorithm.
• The visual query mode involves matching the input image to pre-extracted features of real objects.
• pre-extracted features are held in the database
• Purpose:
• to extract a set of numerical features that removes redundancy from the image and reduces its dimension
• The most commonly used features for content-based image retrieval are shape, color and texture.
• A content-based image retrieval (CBIR) system uses image visual content features to retrieve relevant images from an image database.
• CBIR systems retrieve images according to specified features that users are interested in.
• features such as texture, color, shape, and location properties can reflect the contents of an image
Example:Trainable System for Object Detection
•A set of positive example images of the object class considered (e.g., images of frontal faces) and a set of negative examples (e.g., any non face image) are collected.•The images are transformed into (feature) vectors in a chosen representation
• (e.g., a vector of the size of the image with the values at each pixel location below this is called the “pixel” representation)
• The vectors (examples) are used to train a pattern classifier, the Support Vector Machine (SVM), to learn the classification task of separating positive from negative examples.
• To detect objects in out-of-sample images, the system slides a fixed size window over an image and uses the trained classifier to decide which patterns show the objects of interest.
• At each window position, the system extracts the same set of features as in the training step and feed them into the classifier; the classifier output determines whether or not it is an object of interest.
• Representation Technique for Face and People Detection
• Pixel Representation
• Eigen Vector Representation
• Wavelet Representation
• Multimedia data, such as images or video, are typically represented or stored as very high-dimensional vectors.
• The processing time for searching or performing other operations for such systems is highly impacted by the fact that the data are so high dimensional.
• It is therefore practically important to find compact representations of multimedia data, while at the same time not significantly affecting the performance of systems such as detection
• Can be based on:
• color
• using color histograms and color variants
• texture
• variation in intensity and topography of surfaces
• shape
• using aspect ratios
• circularity and moments for global features
• using boundary segments for local features
• Can be based on:
• position
• using spatial indexing
• image transformations
• using transformations
• appearance
• using a combination of color, texture and intensity surfaces
Feature Measures Theory Main use Problems
Color Histogram Swain and Ballard Color indexing Lighting variations
Texture
Pixel Intensity-illumination-topographyDegrees of-directionality-regularity-periodicity
Gabor filtersFractals
IndexingTexture thesaurus
Shape
Global features-aspect ratio-circularity-momentsLocal features- boundary segments
Active contoursShape indexingObject recognition
Spikes and holes in objects cause errors in indexing
Appearance
Global features-curvature-orientationLocal features- local curvatures and orientation
Transforms Image classification
Position Spatial relationships Tessellations (Voronoi) Object Recognition
Table 4.1 : Features used in retrieval
Feature Advantage Disadvantage
ColorCan be applied to all colored images, 2D and 3D
TextureDistinguishes between image regions with similar colore.g sea and sky
Large feature vectors each containing 4000 elements have been used
ShapeImportant in image segmentationCan classify images as stick like, plate like or blob like
Representation is difficultViewpoint change an object’s shapeSpikes and holes3D is very difficult
Appearance
Important way of judging similarityCan generate invariant measuresDescribe an image at varying levels of detail
Position Can be applied to 2D and 3D images
Images must contain objects in defined spatial relationshipsSpatial indexing not useful unless combined with color and texture
Feature Advantage Disadvantage
ColorCan be applied to all colored images, 2D and 3D
TextureDistinguishes between image regions with similar colore.g sea and sky
Large feature vectors each containing 4000 elements have been used
ShapeImportant in image segmentationCan classify images as stick like, plate like or blob like
Representation is difficultViewpoint change an object’s shapeSpikes and holes3D is very difficult
Appearance
Important way of judging similarityCan generate invariant measuresDescribe an image at varying levels of detail
Position Can be applied to 2D and 3D images
Images must contain objects in defined spatial relationshipsSpatial indexing not useful unless combined with color and texture
Table 4.2 : Advantages and Disadvantages of features methods of retrieval
• There are two alternative approaches:
• use a query image
• user can provide an image or compose a target image by selecting and clicking color palettes and texture patterns
• use user-defined features
• allow user to select a sample image
• query process
• the distribution of image objects is then recomputed in terms of the distance from sample image
• use automatic methods for generating content-dependent metadata
• speech recognition techniques is used for the identification of both speakers and the spoken words
• factors which influence the complexity of the identification problems encountered include:
• isolated words (easier to recognize)
• single speaker (one is easier)
• vocabulary size (smaller is easier)
• grammar (tightly constraint is easier)
• users can use query by example (QBE)
• the technologies used to achieve this have to be integrated and include:
• large vocabulary speech recognition
• speaker segmentation
• speaker clustering
• speaker identification
• name spotting
• topic classification
• story segmentation
• Videos are far more complex.
• Role of video feature extraction:
• image-based features
• motion-based features (e.g motion of the camera)
• object detection and tracking
• speech recognition
• speaker identification
• word spotting
• audio classification
Clip 1
Story 1 Story m
Shot 1 Shot 2
Shot k
Attributes
Index
Category
Title
Date
Source
Duration
Theme
Duration
Frame Start
Frame End
Number of Shots
Event
Keywords
Theme
Duration
Frame Start
Frame End
Camera
Audio Level
Frame Frame number
Clip
Scene / StorySegment
Shot captured between a record and stop camera operation
• Clip
• digital video document that can last from a few seconds to a few hours
• Scene
• sequential collection of shots unified by a common event or locale (background).
• a clip have one or more scenes
• Shot
• fundamental unit
• much research has focused on segmenting video by detecting boundary between camera shots
• defined as a sequence of frames captured by a single camera in a single continuous action in time and space
• example : two people having a conversation
• low-level syntactic building blocks of a video sequence
• The video operations are:
• create
• concatenate, union and intersection (based on temporal and spatial conditions)
• output
• Query example:
“ Show the details of movies where a character said “I am not interested in a semantic argument, I just need the protein”
Access control and rights management Query Processing
Query Presentation
IndexesVisual summariesVideo processing and annotation summaries
Digital video collection
Content delivery
Query results
Query inputs
User
Figure A : Video retrieval system
ISO/IEC 13249 (SQL/MM) •SQL Multimedia and Applications•Standardized in 2001 by ISO subcommittee SC32 Working Group
•Provides structured object types , methods to store, manipulate image data by content
•Supports OR (Object Relational) Data Model
Part 1: FrameworkPart 2: Full TextPart 3: SpatialPart 5: Still ImagePart 6: Data Mining
Object types that comply with the first edition of the
ISO/IEC 13249-5:2001 SQL MM Part5: StillImage standard
SI_AverageColor Object TypeDescribes the average color feature of an image.
SI_Color Object TypeEncapsulates color values of a digitized image.
SI_ColorHistogram Object TypeDescribes the relative frequencies of the colors exhibited by samples of an image.
SI_FeatureList Object TypeDescribes an image that is represented by a composite feature. The composite feature is based on up to four basic image features (SI_AverageColor, SI_ColorHistogram, SI_PositionalColor, and SI_Texture) and their associated feature weights.
SI_StillImage Object TypeRepresents digital images with inherent image characteristics such as height, width, format, and so on.
SI_PositionalColor Object TypeDescribes the positional color feature of an image. Assuming that an image is divided into n by m rectangles, the positional color feature characterizes an image by the n by m most significant colors of the rectangles.
SI_Texture Object TypeDescribes the texture feature of the image characterized by the size of repeating items (coarseness), brightness variations (contrast), and predominant direction (directionality).
Read the following website for further information on Oracle implementation of SQL/MM Still Image:
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14297/ch_stimgref.htm#CHDBAGID.
Example of media table for still Images defined as per SQL/MM standards
Given the following PM.SI_MEDIA table definition in Oracle implementation:
CREATE TABLE PM.SI_MEDIA( PRODUCT_ID NUMBER(6), PRODUCT_PHOTO SI_StillImage, AVERAGE_COLOR SI_AverageColor, COLOR_HISTOGRAM SI_ColorHistogram, FEATURE_LIST SI_FeatureList, POSITIONAL_COLOR SI_PositionalColor, TEXTURE SI_Texture, CONSTRAINT id_pk PRIMARY KEY (PRODUCT_ID));
Example1:•Construct an SI_AverageColor object from a specified color using the SI_AverageColor(averageColorSpec) constructor.
Solution:
DECLARE myColor SI_Color; myAvgColor SI_AverageColor;
BEGIN myColor := NEW SI_COLOR(null, null, null); myColor.SI_RGBColor(10, 100, 200); myAvgColor := NEW SI_AverageColor(myColor); INSERT INTO PM.SI_MEDIA (product_id, average_color)
VALUES (75, myAvgColor); COMMIT;
END;
Example 2:•Derive an SI_AverageColor value using the SI_AverageColor(sourceImage) constructor:
Solution:
DECLARE myimage SI_StillImage; myAvgColor SI_AverageColor;
BEGIN SELECT product_photo INTO myimage FROM PM.SI_MEDIA WHERE product_id=1; myAvgColor := NEW SI_AverageColor(myimage);
END;
Example 3:•Insert into PM.SI_MEDIA table an object with PRODUCT_ID = 1 and have average color of RED = 20, GREEN = 30 and BLUE = 50.
Solution:
DECLARE myColor SI_Color; myAvgColor SI_AverageColor;
BEGIN myColor := NEW SI_COLOR(null, null, null); myColor.SI_RGBColor(20, 30, 50); myAvgColor := NEW SI_AverageColor(myColor); INSERT INTO PM.SI_MEDIA (product_id, average_color) VALUES (1, myAvgColor); COMMIT;END;
Example 4:•Derive SI_AverageColor object for image with PRODUCT_ID = 13 using the SI_FindAvgClr() function.
Solution:
DECLARE myimage SI_StillImage; myAvgColor SI_AverageColor;
BEGIN SELECT product_photo INTO myimage FROM PM.SI_MEDIA WHERE product_id=13; myAvgColor := SI_FindAvgClr(myimage);END;
• In 2002, ISO subcommittee MPEG published a standard: MPEG-7
• Formally named Multimedia Content Description Interface• MPEG-4, the first Multimedia representation Standard
• Object based coding
• MPEG-7 , Currently the most complete description standard for multimedia data
• Any audio/visual material associated with multimedia data can be indexed & searched
• Provides• Set of descriptors (D)
Quantitative measures of audio/visual features
• Description Scheme (DS)
Structure of Descriptors & relationship
• MPEG-7 descriptions associated with
• Still pictures, graphics, 3D models, audio, speech, video
• Composition information about how these elements are combined in a multimedia presentation (scenarios)
• MPEG-7 descriptions do not depend on the ways the described content is coded or stored
• It is possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper, in the same way as of digitized content.
• MPEG-7 can exploit the advantages provided by MPEG-4 coded content
• Material encoded using MPEG-4 provides the means to encode audio-visual material as• Objects having certain relations in time
(synchronization) and space (on the screen for video, or in the room for audio),
• Possible to attach descriptions to elements (objects) within the scene, such as audio and visual objects
• Same material can be described using different types of features, tuned to the area of application
• Eg : A visual material:
• Lower abstraction level would be a description of shape, size, texture, color, movement (trajectory) and position (“where in the scene can the object be found?”)
• The highest level would give semantic information: “This is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background”
• Apart from the description what is depicted in the content, Following additional information about the multimedia data:
• The form (e.g. JPEG, MPEG-2),• The overall data size (helps determining whether the material
can be “read” by the user terminal)
• Conditions for accessing the material (Includes links to a registry with intellectual property rights information, and price)
• Classification -(Includes parental rating, and content classification into a number of pre-defined categories)
• Links to other relevant material -(helps the user speeding up the search)
• The context -( The occasion of the recording, Like Olympic Games 1996, final of 200 meter hurdles, men )
• Main elements of the MPEG-7 standard
• Description Tools: Descriptors (D), Description Schemes (DS),
• A Description Definition Language (DDL) • Defines the syntax of the MPEG-7 Description Tools and to
allow the creation of new Description Schemes • System tools
• Supports binary coded representation for efficient storage and transmission, transmission mechanisms (both for textual and binary formats), multiplexing of descriptions, synchronization of descriptions with content, management and protection of intellectual property in MPEG-7 descriptions, etc.
• The key info that the description tools capture includes
• Structural information on spatial, temporal or spatio-temporal components of the content (scene cuts, segmentation in regions, region motion tracking).
• Low level features in the content (colors, textures, sound timbres, melody description).
• Conceptual information of the reality captured by the content (objects and events, interactions among objects).
• Information about how to browse the content in an efficient way (summaries, variations, spatial and frequency subbands,).
• Information about collections of objects.
• Information about the interaction of the user with the content (user preferences, usage history)
Scope of MPEG-7
MPEG-7 Main Elements
Abstract representation of possible applications using MPEG-7
Integration of MPEG-7 into MMDBMS
•MPEG-7 relies on XML Schema, mapping strategies from XML to database data model is an issue!!!
•SQL/MM , Querying
• Due the rich description provided by MPEG-7, enhancements in SQL/MM is needed
• Operations that manipulate, produce as results, an XML is an option
•Indexing methods for multidimensional data can be used to index multimedia data
• MPEG-7 Provides methods for semantic indexing!!!
More on MPEG-7 can be found from
ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO- MPEG-7 Overview http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-
7.htm