content-based video indexing and retrieval

56
Content-based Video Indexing and Retrieval

Upload: jeremy-castro

Post on 03-Jan-2016

104 views

Category:

Documents


6 download

DESCRIPTION

Content-based Video Indexing and Retrieval. Motivation. There is an amazing growth in the amount of digital video data in recent years. Lack of tools for classify and retrieve video content There exists a gap between low-level features and high-level semantic content. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Content-based  Video Indexing and Retrieval

Content-based Video Indexing and Retrieval

Page 2: Content-based  Video Indexing and Retrieval

Motivation

There is an amazing growth in the amount of digital video data in recent years.

Lack of tools for classify and retrieve video content

There exists a gap between low-level features and high-level semantic content.

To let machine understand video is important and challenging.

Page 3: Content-based  Video Indexing and Retrieval

Motivation

Necessity of Video Database Management System Increase in the amount of video data captured. Efficient way to handle multimedia data.

Traditional Databases Vs Video Databases Traditional Databases has tuple as basic unit of data. Video Databases has shot as basic unit of data.

Page 4: Content-based  Video Indexing and Retrieval

Video Management

Video consists of:

Text Audio Images

All change over time

Page 5: Content-based  Video Indexing and Retrieval

Video Data Management

Metadata-based method Text-based method Audio-based method Content-based method Integrated approach

Page 6: Content-based  Video Indexing and Retrieval

Metadata-based Method

Video is indexed and retrieved based on structured metadata information by using a traditional DBMS

Metadata examples are the title, author, producer, director, date, types of video.

Page 7: Content-based  Video Indexing and Retrieval

Text-based Method

Video is indexed and retrieved based on associated subtitles (text) using traditional IR techniques for text documents.

Transcripts and subtitles are already exist in many types of video such as news and movies, eliminating the need for manual annotation.

Page 8: Content-based  Video Indexing and Retrieval

Text-based Method

Basic method is to use human annotation Can be done automatically where subtitles /

transcriptions exist BBC: 100% output subtitled by 2008

Speech recognition for archive material

Page 9: Content-based  Video Indexing and Retrieval

Text-based Method Key word search based on

subtitles Content based

Live demo: http://km.doc.ic.ac.uk/vse/

Page 10: Content-based  Video Indexing and Retrieval

Text-based Method

Page 11: Content-based  Video Indexing and Retrieval

Audio-based Method

Video is indexed and retrieved based on associated soundtracks using the methods for audio indexing and retrieval.

Speech recognition is applied if necessary.

Page 12: Content-based  Video Indexing and Retrieval

Content-based Method

There are two approaches for content-based video retrieval:

Treat video as a collection of images Divide video sequences into groups of

similar frames

Page 13: Content-based  Video Indexing and Retrieval

Integrated Approach

Two or more of the above techniques are used as a combination in order to provide more flexibility in video retrieval.

Page 14: Content-based  Video Indexing and Retrieval

Video Data Management

1. Video Parsing• Manipulation of whole video for breakdown into key

frames.

2. Video Indexing• Retrieving information about the frame for indexing

in a database.

3. Video Retrieval and browsing• Users access the db through queries or through

interactions.

Page 15: Content-based  Video Indexing and Retrieval

Video Parsing

Scene: single dramatic event taken by a small number of related cameras.

Shot: A sequence taken by a single camera Frame: A still image

Page 16: Content-based  Video Indexing and Retrieval

Detection and identification of meaningful segments of video.

Video

Scenes

Shots

Frames

Key Frame Analysis

Shot Boundary Analysis

Obvious Cuts

Video Parsing

Page 17: Content-based  Video Indexing and Retrieval

Video Parsing

Page 18: Content-based  Video Indexing and Retrieval

System overview

Query

Feature vectors

Feature generation

Distances Retrieved results

Weighted sum of

distances

Shot boundary detection

Key frames

Test data

.

.

.

45217853

ΣwD

Relevance feedback

k-nn (boost, VSM)

.

.

. 45217853

45217853

45217853

Page 19: Content-based  Video Indexing and Retrieval

Video Shot Definition

A shot is a contiguous recording of one or more video frames depicting a contiguous action in time and space.

During a shot, the camera may remain fixed, or may exhibit such motions as panning, tilting, zooming, tracking, etc.

Page 20: Content-based  Video Indexing and Retrieval

Video Shot Detection

Segmentation is a process for dividing a video sequence into shots.

Consecutive frames on either side of a camera break generally display a significant quantitative change in content.

We need a suitable quantitative measure that captures the difference between two frames.

Page 21: Content-based  Video Indexing and Retrieval

Video Shot Detection

Use of pixel differences: tend to be very sensitive to camera motion and minor illumination changes.

Global histogram comparisons: produce relatively accurate results compared to others.

Local histogram comparisons: produce the most accurate results compared to others.

Use of motion vectors: produce more false positives than histogram-based methods.

Use of the DCT coefficients from MPEG files: produce more false positives than histogram-based methods.

Page 22: Content-based  Video Indexing and Retrieval

Frame Dissimilarity Normalized color histogram difference is

adopted Measure of dissimilarity, or distance

Shot Dissimilarity Minimum dissimilarity between any two

frames of two shots

Shot Boundary Detection

Distance of FramesD(ƒi , ƒj) = ∑b|hib – hjb|/N

hib , a given bin in the histogram of frame i,

N , the total number of pixels in each frame.

Measure of Shot DissimilarityD(Si , Sj) = mink,l D(ƒki , ƒlj)

ƒki , frame k of shot i.

Page 23: Content-based  Video Indexing and Retrieval

Shot boundary detection

Split video into meaningful segments Traditionally look at inter-frame differences Common problems

Gradual changes Rapid motion

Our solution Inspired by Pye et al, Zhang et al Moving average over greater range

Page 24: Content-based  Video Indexing and Retrieval

Shot boundary detection

At each frame, compute 4 distance measures – d2 , d4, d8, d16 across ranges of 2,4,8,16 frames respectively

Coincident peaks indicate shot boundaries d4 difference used to find transition

start/end times

Page 25: Content-based  Video Indexing and Retrieval

SBD examples

Cut Gradual

Page 26: Content-based  Video Indexing and Retrieval

Video Indexing and Retrieval

Based on representative frames Based on motion information Based on objects

Page 27: Content-based  Video Indexing and Retrieval

Representative Frames

The most common way of creating a shot index is to use a representative frame to represent each shot.

Features of this frame are extracted and indexed based on color, shape, texture (as in image retrieval)

Page 28: Content-based  Video Indexing and Retrieval

Representative Frames

If shots are quite static, any frame within the shot can be used as a representative.

Otherwise, more effective methods should be used to select the representative frame.

Page 29: Content-based  Video Indexing and Retrieval

Representative Frames

Two issues:

How many frames must be selected from each shot.

How to select these frames.

Page 30: Content-based  Video Indexing and Retrieval

Representative Frames

How many frames per shot? Three methods:

One frame per shot. The method does not consider the length and content changes.

The number of selected representatives depends on the length of the video shot. Content is not handled properly.

Divide shots into subshots and select one representative frame from each subshot. Length and content are taken into account.

Page 31: Content-based  Video Indexing and Retrieval

Representative Frames

Now we know the number of representative frames per shot.

The next step is to determine HOW to select these frames.

Page 32: Content-based  Video Indexing and Retrieval

Representative Frames

Definition:

SEGMENT is

a shot, a second of a video, a subshot

Page 33: Content-based  Video Indexing and Retrieval

Representative Frames

Method I

The first frame is selected from the segment. This is based on the observation that a segment is usually described by using the first few frames.

Page 34: Content-based  Video Indexing and Retrieval

Representative Frames

Method II

An average frame is defined so that each pixel in this frame is the average of pixel values at the same grid point in all the frames of the segment.

Then the frame within the segment that is most similar to the average frame is selected as the representative.

Page 35: Content-based  Video Indexing and Retrieval

Representative Frames

Method III

The histograms of all the frames in the segment are averaged. The frame whose histogram is closest to this average histogram is selected as the represenative frame of the segment.

Page 36: Content-based  Video Indexing and Retrieval

Representative Frames

Method IV

Each frame is divided into background and foreground objects. A large background is then constructed from the background of all frames, and then the main foreground objects of all frames are superimposed onto the constructed background.

Page 37: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method Overview

Videos are divided into categories along with their shots. We calculate the Foreground Variance,Background Variance and

Average Color of shots, and store them in the database. Shots are retrieved by comparing the Foreground Variance,

Background variance and Average Color values.

What is Background and Foreground? Background is the area that is outside the primary object. Foreground is the area where the primary object can be found.

Page 38: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method

Choosing Foreground and Background. W = C*(1/10).

c

w

w

w

Page 39: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method Actual Method

Steps for calculating Foreground Variance values Take each pixel of the Foreground area and access its individual

Red, Green, and Blue values. Calculate Average Red, Average Green, and Average Blue

Color values for Foreground.Repeat the above process for all the frames of the shot and calculate Average Red, Average Green, and Average Blue Color values.

Using the above Foreground values of all the frames of a shot, we calculate the Variance of Red, Green, and Blue.

Page 40: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method Actual Method

Steps for calculating Foreground Variance values The Formula for calculating the Variance of Red Color for

Foreground is VFgRed = ∑ (Xi – Mean)2/(N-1), where Xi = Average values of Red Color of all the frames for Foreground, and N = Total No of frames.

The same process in the above step is repeated for Green and Blue and we find VFgGreen and VFgBlue

In the similar lines we find the Background Variance values VBgRed, VBgGreen, and VBgBlue.

Page 41: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method Actual Method

Steps for calculating Average Color values. Access each pixel of each frame and calculate individual color values. Add up all the individual Red, Green and Blue values of each pixel

separately. For calculating Average Red Color for one frame, divide the sum of

all the red pixel values by the total No of pixels in one frame. For calculating the Average Red Color values for the entire shot, we

divide the sum of all the Red Color values of individual frames by the total No of frames. Thus we will be getting AvgRed.

Page 42: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method Actual Method

Steps for calculating Average Color values. Similarly we calculate the AvgGreen, and AvgBlue

values for the entire shot. We have a total of nine different variables and we

store these values in the Database. For retrieving similar shots, we compare the

above nine values in the Database to the corresponding values of the query shot.

Page 43: Content-based  Video Indexing and Retrieval

Foreground and Background Variance Method Actual Method

We compare the Foreground, Background, and the Average Color values using the formula

Ri = √((∆1-∂1)2 + (∆2-∂2)2 + (∆3-∂3)2) where ∆1, ∆2, and ∆3 are Database values. where ∂1, ∂2, and ∂3 are query shot values. We add up all the Ri values after comparing the Foreground,

Background and Average Color values. If that values is less than 100, then that shot is retrieved and

displayed. The shots are displayed in the increasing order of their closeness to the query shot.

Page 44: Content-based  Video Indexing and Retrieval

Motion Information

Motivation

Indexing and retrieval based on representative frames ignores the motion information contained in a video segment.

Page 45: Content-based  Video Indexing and Retrieval

Motion Information

The following parameters are used:

Motion content Motion uniformity Motion panning Motion tilting

Page 46: Content-based  Video Indexing and Retrieval

Motion Information (content)

This a measure of the total amount of motion within a given video. It measures the action content of the video.

For example, a talking person video has a very small motion content, while a violent car explosion typically has high motion content.

Page 47: Content-based  Video Indexing and Retrieval

Motion Information (uniformity)

This is a measure of the smoothness of the motion within a video as a function of time.

Page 48: Content-based  Video Indexing and Retrieval

Motion Information (panning)

This measure captures the panning motion (left to right, right to left motion of a camera).

Page 49: Content-based  Video Indexing and Retrieval

Motion Information (tilting)

This is a measure of the vertical motion component of the motion within a video. Panning shots have a lower value than a video with large amount of vertical motion.

Page 50: Content-based  Video Indexing and Retrieval

Motion Information

The above measures are associated either with the entire video or with each shot of the video.

Page 51: Content-based  Video Indexing and Retrieval

Object-based Retrieval

Motivation

The major drawback of shot-based video indexing is that while the shot is the smallest unit in the video sequence, it does not lend itself directly to content-based representation.

Page 52: Content-based  Video Indexing and Retrieval

Object-based Retrieval

Any given scene is a complex collection of parts or objects.

The location and physical qualities of each object, as well as the interaction with others, define the content of the scene.

Object-based techniques try to identify objects and relationships among these objects.

Page 53: Content-based  Video Indexing and Retrieval

Object-based Retrieval

In a still image object segmentation and identification is normally a difficult task.

In a video sequence, an object moves as a whole. Therefore, we can group pixels that move together into an object.

Object segmentation is quite accurate by using the above idea.

Page 54: Content-based  Video Indexing and Retrieval

Object-based Retrieval

Object-based video indexing and retrieval can be performed easily when video is compressed using the MPEG-4 object-based coding standard.

An MPEG-4 video is composed of one or more VOs. A VO consists of one ore more video object layers (VOLs).

Page 55: Content-based  Video Indexing and Retrieval

RawVideo

Database

RawImage

Database

PhysicalObject

Database

Frame

Object Description

Image Features

Spatial-Semanticsof Objects

(human,building,…)

Sequence of Frames(indexed)

Object Identification and Tracking

Intra/Inter-Frame Analysis

(Motion Analysis)

Inter-Object Movement(Analysis)

Semantic Association(President, Capitol,...)

Spatio-Temporal Semantics:Formal Specification of

Event/Activity/Episode forContent-Based Retrieval

Spatial Abstraction

TemporalAbstraction

Object Definitions(Events/Concepts)

An Architecture for Video Database System

Page 56: Content-based  Video Indexing and Retrieval

Conclusion

Video indexing and retrieval is very important in multimedia database management.

Video contains more information than other media types (text, audio, images).

Methods: representative frames, motion information, object-based retrieval.