1 video syntax analysis 2

92
Wei-Ta Chu 2010/10/7 Video Syntax Analysis 2 1 Multimedia Content Analysis, CSIE, CCU

Upload: others

Post on 11-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Video Syntax Analysis 2

Wei-Ta Chu

2010/10/7

Video Syntax Analysis 21

Multimedia Content Analysis, CSIE, CCU

Page 2: 1 Video Syntax Analysis 2

Scene Detection in Movies and TV shows2

Multimedia Content Analysis, CSIE, CCU

Rasheed, et al. “Scene detection in Hollywood movies and tvshows” Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 343-348, 2003.

Page 3: 1 Video Syntax Analysis 2

Introduction

Multimedia Content Analysis, CSIE, CCU

3

Every year around 4500 motion pictures arereleased around the world spanning overapproximately 9000 hours of video.

Inexpensive and popular digital technology isavailable through cable and internet, such as videoon demand.

Accessing video content Detect shots and set of key frames Combine similar shots and form scenes or story units

Page 4: 1 Video Syntax Analysis 2

Problems in Previous Works

Multimedia Content Analysis, CSIE, CCU

4

A false color match between shots of two differentscenes may wrongly combine scenes.

Action scenes may be broken into many scenes fornot satisfying the color matching criterion.

Page 5: 1 Video Syntax Analysis 2

System Flowchart

Multimedia Content Analysis, CSIE, CCU

5

BSC: backward shot coherencePSB: potential scene boundaries

Page 6: 1 Video Syntax Analysis 2

Shot Detection

Multimedia Content Analysis, CSIE, CCU

6

Based on histogram intersection 16 bin HSV normalized color histogram

8 bins for hue, 4 bins each for saturation and value

Page 7: 1 Video Syntax Analysis 2

Key frame Selection

Multimedia Content Analysis, CSIE, CCU

7

(1) Initially, the middle frame of the shot is selected andadded to the null set Ki.

(2) Each frame within a shot is compared to every frame in Ki. (3) If the frame differs from all previously chosen key frames

by a fixed threshold, it is added in Ki.

Page 8: 1 Video Syntax Analysis 2

Shot-based Features

Multimedia Content Analysis, CSIE, CCU

8

Shot length Shot motion content

Estimate the parameters of a global affine motion model Calculate the difference between the actual and the re-

projected motion of blocks

Page 9: 1 Video Syntax Analysis 2

Shot Motion Content

Multimedia Content Analysis, CSIE, CCU

9

Page 10: 1 Video Syntax Analysis 2

Scene Boundary Detection Algorithm

Multimedia Content Analysis, CSIE, CCU

10

Pass 1: Detecting potential scene boundaries basedon color properties

Pass 2: Removal of weak scene boundaries byanalyzing the shot length and motion content

Page 11: 1 Video Syntax Analysis 2

Potential Scene Boundaries

Multimedia Content Analysis, CSIE, CCU

11

Backward shot coherence

Page 12: 1 Video Syntax Analysis 2

Potential Scene Boundaries12

Backward shot coherence Compute the shot coherence of the shot i in a window of previous shots

Taking the maximum shot coherence in a window of length N

The shots with local mimimum BSCs are scene boundary candidates

To filter out false alarms: if a pair of key frames of twoadjacent potential scenes are similar, merger them into onescene.

Page 13: 1 Video Syntax Analysis 2

Potential Scene Boundaries

Multimedia Content Analysis, CSIE, CCU

13

BSC for 300 shots.

First key frame of eachshot

Page 14: 1 Video Syntax Analysis 2

Selection of Window Size

Multimedia Content Analysis, CSIE, CCU

14

The computation of BSC is controlled by the selectionof window size N.

A memory parameter which mimics a human’s ability to recall a shot seen in the past.

If N is too large, it may span over several scenes. If N is too small, over-segmentation of video may be

obtained. N=10 in this paper

Page 15: 1 Video Syntax Analysis 2

Scene Dynamics Analysis

Multimedia Content Analysis, CSIE, CCU

15

Scenes with weak structure are often broken in severalscenes. E.g. action scenes–non-repetitiveness of shots

Scene dynamics

Action scenes: larger SMC and smaller L (length of shot) The PSB between two consecutive scenes will be

removed if SD of both scenes exceed a fixed threshold.

Page 16: 1 Video Syntax Analysis 2

Scene Dynamics Analysis

Multimedia Content Analysis, CSIE, CCU

16

Page 17: 1 Video Syntax Analysis 2

Scene Representation

Multimedia Content Analysis, CSIE, CCU

17

A shot is a good representative whenThe shot is shown several times (higher SC)The shot spans over longer period of time (larger shot

length)The shot has minimal motion content (smaller SMC)

Multiple faces are preferred.

Page 18: 1 Video Syntax Analysis 2

Shot Goodness

Multimedia Content Analysis, CSIE, CCU

18

A correlation matrix of dimension N X N is constructed whereelement (i,j) is the coherence of shot i with shot j.

Three shots with the highest W are selected as candidate shotsand face detection is performed.

Page 19: 1 Video Syntax Analysis 2

Detection of Faces

Multimedia Content Analysis, CSIE, CCU

19

A method based on skin detection is adopted. The middle frame of candidate shots are tested. Each isolated segment of skin is considered as face

and the frame with highest votes is taken as thescene key frame.

In the case of a tie or no face, the key frame of theshot with the highest goodness value is selected.

Page 20: 1 Video Syntax Analysis 2

Scene Key Frame

Multimedia Content Analysis, CSIE, CCU

20

Page 21: 1 Video Syntax Analysis 2

More Examples

Multimedia Content Analysis, CSIE, CCU

21

Page 22: 1 Video Syntax Analysis 2

Experimental Results

Multimedia Content Analysis, CSIE, CCU

22

Five movies, one sitcom, and one TV show

False alarm(false positive)

Miss(false negative)

Page 23: 1 Video Syntax Analysis 2

Experimental Results

Multimedia Content Analysis, CSIE, CCU

23

Slightly over segmentationis preferable over under-segmentation.

While browsing a video,it’s better to have two segments of one scenerather than on segmentconsisting of two scenes.

Page 24: 1 Video Syntax Analysis 2

Experimental Results

Multimedia Content Analysis, CSIE, CCU

24

Page 25: 1 Video Syntax Analysis 2

References

Multimedia Content Analysis, CSIE, CCU

25

Rasheed, et al. “Scene detection in Hollywood movies and tvshows” Proc. of IEEE Computer Society Conference on Computer Vision and PatternRecognition, vol. 2, pp. 343-348, 2003.

Yeung, et al. “Segmentation of video by clustering and graph analysis” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109,1998.

Vendrig, et al. “Systematic evaluation of logical story unit segmentation” IEEE Transactions on Multimedia, vol. 4, no. 4, pp. 492-499, 2002.

Page 26: 1 Video Syntax Analysis 2

Brief Introduction of Montage26

Multimedia Content Analysis, CSIE, CCU

Page 27: 1 Video Syntax Analysis 2

Montage

Multimedia Content Analysis, CSIE, CCU

27

Montage “refers to the editing of the film, the cutting and piecing together of exposed film in amanner that best conveys the intent of the work”

Page 28: 1 Video Syntax Analysis 2

Methods of Montage

Multimedia Content Analysis, CSIE, CCU

28

Metric The editing follows a specific number of frames, cutting to the next shot

no matter what is happening within the image. This montage is used to elicit the most basal and emotional of reactions

in the audience. Example

http://en.wikipedia.org/wiki/Soviet_montage_theory

Page 29: 1 Video Syntax Analysis 2

Methods of Montage

Multimedia Content Analysis, CSIE, CCU

29

Rhythmic Cutting based on time -- along with a change in the speed of the metric

cuts -- to induce more complex meanings than what is possible withmetric montage.

Once sound was introduced, rhythmic montage also included auralelements (music, dialogue, sounds).

Example

Page 30: 1 Video Syntax Analysis 2

Methods of Montage

Multimedia Content Analysis, CSIE, CCU

30

Tonal A tonal montage uses the emotional meaning of the shots -- not just

manipulating the temporal length of the cuts or its rhythmicalcharacteristics -- to elicit a reaction from the audience even morecomplex than from the metric or rhythmic montage.

For example, a sleeping baby would emote calmness and relaxation. Example: This is the clip following the death of the revolutionary sailor

Vakulinchuk, a martyr for sailors and workers.

Page 31: 1 Video Syntax Analysis 2

Methods of Montage

Multimedia Content Analysis, CSIE, CCU

31

Overtonal/Associational The overtonal montage is the accumulation of metric, rhythmic, and tonal

montage to synthesize its affect on the audience for an even moreabstract and complicated effect.

Example: In this clip, the men are workers walking towards aconfrontation at their factory, and later in the movie, the protagonist usesice as a means of escape.

Page 32: 1 Video Syntax Analysis 2

Methods of Montage

Multimedia Content Analysis, CSIE, CCU

32

Intellectual Uses shots which, combined, elicit an intellectual meaning Example: from Eisenstein's October and Strike. In Strike, a shot of striking

workers being attacked cut with a shot of a bull being slaughteredcreates a film metaphor suggesting that the workers are being treatedlike cattle. This meaning does not exist in the individual shots; it onlyarises when they are juxtaposed.

http://www.tcf.ua.edu/classes/Jbutler/T112/EditingIllustrations06.htm

Page 33: 1 Video Syntax Analysis 2

Wei-Ta Chu

2010/10/7

Overview of CBIR33

Multimedia Content Analysis, CSIE, CCU

Y. Rui, T.S. Huang, and S.-F. Chang, “Image retrieval: current techniques, promising directions, and open issues” Journal of Visual Communication and Image Representation, vol. 10, pp. 39-62,1999.

Page 34: 1 Video Syntax Analysis 2

Image Retrieval

Multimedia Content Analysis, CSIE, CCU

34

Image retrieval has been an active research areasince 1970s, with the thrust from the researchcommunities of database management and computervision.

Text-based approachesAnnotate images by textUse text-based database management systems to

perform image retrieval

Page 35: 1 Video Syntax Analysis 2

Needs of Content-based ImageRetrieval

Multimedia Content Analysis, CSIE, CCU

35

In the early 1990s, two difficulties arise largelyVast amount of labor required in annotating large-

scale image collectionsRich content in the images and the subjectivity of human

perception

Instead of annotating by text-based keywords,images are indexed by their own visual content,such as color and texture.

Page 36: 1 Video Syntax Analysis 2

An Image Retrieval System Architecture

Multimedia Content Analysis, CSIE, CCU

36

Image processingand compression

Computer vision andimage understanding Information retrieval

and databasemanagement system

Computational geometry,database management, patternrecognition

Database management

User psychology anduser interface

Page 37: 1 Video Syntax Analysis 2

Feature Extraction

Multimedia Content Analysis, CSIE, CCU

37

General featuresColorTextureShape…

Domain-specific featuresHuman facesFingerprints

Page 38: 1 Video Syntax Analysis 2

Color (1/2)

Multimedia Content Analysis, CSIE, CCU

38

Robust to background complication andindependent of image size and orientation

Color histogramHistogram intersection (L1 metric) L2-related metricCumulated color histogram

Euclidean distance is the L2 norm.

Page 39: 1 Video Syntax Analysis 2

Color (2/2)

Multimedia Content Analysis, CSIE, CCU

39

Color momentsMost of the information is concentrated on the low-

order moments.The first moment (mean), the second (variance) and the

third (skewness)

Color setA selection of colors from the quantized color space.Color set feature vectors are binary. Thus a binary

search tree was constructed to allow a fast search.

Page 40: 1 Video Syntax Analysis 2

Texture (1/2)

Multimedia Content Analysis, CSIE, CCU

40

Visual patterns that have properties of homogeneitythat do not result from a single color or intensity.

Containing important information about the structuralarrangement of surfaces and their relationship to thesurrounding environment.

Rushing, et al., “Using association rules as texture” IEEE Trans. on PAMI, vol. 23, no.8, pp. 845-858, 2001.

Page 41: 1 Video Syntax Analysis 2

Texture (2/2)

Multimedia Content Analysis, CSIE, CCU

41

Co-occurrence matrix of texture featuresExplore the gray level spatial dependence of texture.Based on the orientation and distance between image

pixels

Tamura texture features Use of wavelet transform in texture representation

Page 42: 1 Video Syntax Analysis 2

Shape

Multimedia Content Analysis, CSIE, CCU

42

Boundary-based featureUse only the outer boundary of the shapeFourier descriptor–use the Fourier transformed

boundary as the shape feature

Region-based featureUse the entire shape regionMoment invariants–use region-based moments which

are invariant to transformations

Page 43: 1 Video Syntax Analysis 2

Color Layout

Multimedia Content Analysis, CSIE, CCU

43

Global color feature tends to give too many falsepositives when the image collection is large.

Using both color feature and spatial relationsDivide the whole image into blocks and extract color

features from each blocks.

Kasutani, et al., “The mpeg-7 colorlayout descriptor: a compact imagefeature description for high-speedimage/video segmentation retrieval” Proc. Of ICIP, pp. 674-677, 2001.

Page 44: 1 Video Syntax Analysis 2

Segmentation

Multimedia Content Analysis, CSIE, CCU

44

Very important to image retrieval Both the shape feature and the layout feature depend on

good segmentation Still a unsolved problem

Chien, et al., “Predictive watershed: a fast watershed algorithm for videosegmentation” IEEE Trans. on CSVT, vol. 13, no. 5, pp. 453-461, 2003.

Page 45: 1 Video Syntax Analysis 2

Summary

Multimedia Content Analysis, CSIE, CCU

45

Many visual features have been explored. What features and representations should be used

is application dependent. MPEG-7 standard–multimedia content description

interface

Page 46: 1 Video Syntax Analysis 2

High Dimensional Indexing

Multimedia Content Analysis, CSIE, CCU

46

Make CBIR truly scalable to large size imagecollections

Two main challengesHigh dimensionalityNon-Euclidean similarity measure

ApproachDimension reductionUse appropriate multidimensional indexing techniques

Page 47: 1 Video Syntax Analysis 2

Curse of Dimensionality

Multimedia Content Analysis, CSIE, CCU

47

An example of classifying data in two dimension

Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

Page 48: 1 Video Syntax Analysis 2

Curse of Dimensionality

Multimedia Content Analysis, CSIE, CCU

48

If we divide a space into regular cells, then the number of suchcells grows exponentially with the dimensionality of the space.

We need an exponentially large quantity of training data inorder to ensure that the cells are not empty.

Page 49: 1 Video Syntax Analysis 2

Dimension Reduction

Multimedia Content Analysis, CSIE, CCU

49

Although there is curse of dimensionality, it doesn’t prevent us from finding effective techniques.Real data will often be confined to a region of the

space having lower dimensionality.Real data will typically exhibit some smoothness

properties so that small changes in the input variableswill produce small changes in the target variables.

Karhunen-Loeve transform (KLT) Clustering

Page 50: 1 Video Syntax Analysis 2

Multidimensional Indexing Techniques

Multimedia Content Analysis, CSIE, CCU

50

Select appropriate multidimensional indexingalgorithms to index the reduced but still highdimensional feature vectors.

Techniquesk-d treeR-tree…

Page 51: 1 Video Syntax Analysis 2

Retrieval Systems

Multimedia Content Analysis, CSIE, CCU

51

Most image retrieval systems supportRandom browsingSearch by exampleSearch by sketchSearch by text (including keyword or speech)Navigation with customized image categories

Page 52: 1 Video Syntax Analysis 2

QBIC, Query by Image Content

Multimedia Content Analysis, CSIE, CCU

52

Support query based on example images, user-constructed sketches and drawings, and selectedcolor and texture patterns

FeaturesColor, texture, shape

IndexingR*-tree

http://wwwqbic.almaden.ibm.com/

Page 53: 1 Video Syntax Analysis 2

Virage

Multimedia Content Analysis, CSIE, CCU

53

Support visual queries based on color, composition,texture, and structure. Support arbitrarycombinations of the above atomic queries.

http://www.virage.com/

Page 54: 1 Video Syntax Analysis 2

PhotoBook

Multimedia Content Analysis, CSIE, CCU

54

A set of interactive tools for browsing and searchingimages

FeaturesShape, texture, face

Include human in the image annotation and retrievalloopRelevance feedback

Page 55: 1 Video Syntax Analysis 2

VisualSEEk and WebSEEk

Multimedia Content Analysis, CSIE, CCU

55

Spatial relationship query of image regions andvisual feature extraction from compressed domain

FeaturesColor set, wavelet transform based texture

http://persia.ee.columbia.edu:8008/

Page 56: 1 Video Syntax Analysis 2

MARS

Multimedia Content Analysis, CSIE, CCU

56

The research features are the integration of DBMSand IR, integration of indexing and retrieval, andintegration of computer and human.

Investigate how to organize various visual featuresinto a meaningful retrieval architecture which candynamically adapt to different applications anddifferent users.

Page 57: 1 Video Syntax Analysis 2

Others

Multimedia Content Analysis, CSIE, CCU

57

ALIPR (Automatic Linguistic Indexing of Pictures -Real Time)

RetrievalWare Netra ART MUSEUM Blob-world…

Page 58: 1 Video Syntax Analysis 2

Wei-Ta Chu

2009/10/15

VisualSEEk58

Multimedia Content Analysis, CSIE, CCU

J.R. Smith and S.-F. Chang, “VisualSEEk: a fully automated content-based image query system” Proc. of ACM Multimedia, pp. 87-98, 1996.

Page 59: 1 Video Syntax Analysis 2

Introduction

Multimedia Content Analysis, CSIE, CCU

59

Enable querying by image regions and spatial layout. Unconstrained images are decomposed into near-symbolic

images which lend to efficient spatial query Address spatial queries involving adjacency, overlap, and

encapsulation of regions

Page 60: 1 Video Syntax Analysis 2

Introduction

Multimedia Content Analysis, CSIE, CCU

60

Need to devise an image similarity function whichcontains both color feature and spatial components.

Intrinsic parametersSimilarity between query and target colors and/or

region sizes and (absolute) spatial locations Derived parameters

The inferences that can be made from the intrinsicparameters, such as relative spatial locations andoverall assessment of image matches with multipleregions.

Page 61: 1 Video Syntax Analysis 2

Image Query Process

Multimedia Content Analysis, CSIE, CCU

61

Page 62: 1 Video Syntax Analysis 2

Characteristics of VisualSEEk

Multimedia Content Analysis, CSIE, CCU

62

Automated extraction of localized regions andfeatures

Querying by both feature and spatial information Feature extraction from compressed data Development of techniques for fast indexing and

retrieval Development of highly functional user tools

Page 63: 1 Video Syntax Analysis 2

System Overview

Multimedia Content Analysis, CSIE, CCU

63

Page 64: 1 Video Syntax Analysis 2

Color Sets

Multimedia Content Analysis, CSIE, CCU

64

Tc: color space transformation

QcM: quantizer that partitions the color space into M subspaces

BcM: M dimensional binary space such that each axis corresponds to one

unique index value m

A color set is a binary vector in BcM which corresponds to a selection of

colors {m}

Page 65: 1 Video Syntax Analysis 2

Example

Multimedia Content Analysis, CSIE, CCU

65

Tc: RGB to HSV M = 8 for Qc

M. Quantize the HSV color space to 2hues, 2 saturations, and 2 values.

BcM is an eight dimensional binary space

A color set c contains a selection from the eightcolorsE.g. c = [10010100] corresponds to the selection of

three colors, m = 0, m = 3, and m = 5, from thequantized HSV color space.

Page 66: 1 Video Syntax Analysis 2

Color Sets

Multimedia Content Analysis, CSIE, CCU

66

Color sets provide a compact alternative to colorhistograms for representing color information.

Their utilization stems from the conjecture thatsalient regions have not more than a few, equallyprominent colors.

Page 67: 1 Video Syntax Analysis 2

Color Sets

Multimedia Content Analysis, CSIE, CCU

67

Page 68: 1 Video Syntax Analysis 2

Color Set Back-Projection

Multimedia Content Analysis, CSIE, CCU

68

Goal: to extract color regions Back-project process

Color set selection Back-projection onto the image Thresholding and labeling

Back-projection Given image I and color set c, let k be the index of the color

at image point I(x,y), then generate image B(x,y) by B(x,y) = c[k]

Page 69: 1 Video Syntax Analysis 2

Color Set Back-Projection69

Page 70: 1 Video Syntax Analysis 2

Color Similarity

Multimedia Content Analysis, CSIE, CCU

70

In HSV color space, the similarity between any two colors mi=(hi, si, vi) and mj = (hj, sj, vj)is

Color histogram

Histogram distance Minkowski metric A dark red image is equally dissimilar to a red image as to a blue

image.

Page 71: 1 Video Syntax Analysis 2

Color Similarity

Multimedia Content Analysis, CSIE, CCU

71

Histogram Quadratic Distance It measures the weighted similarity between histograms The quadratic distance between histograms hq and ht

A = [ai,j] and ai,j denotes the similarity between colors with indices iand j

Since the histogram quadratic distance computes the cross similaritybetween colors, it’s computational expensive.

Hafner, et al. , "Efficient Color Histogram Indexing for Quadratic Form DistanceFunctions," IEEE Trans. on PAMI, vol. 17, no. 7, pp. 729-736, Jul., 1995

Page 72: 1 Video Syntax Analysis 2

Color Similarity

Multimedia Content Analysis, CSIE, CCU

72

Histogram Quadratic Distance

Consider a pure red image x=[1.0, 0.0, 0.0]T, and a pureorange image y=[0.0, 1.0, 0.0]T.

The quadratic distance between x and y is 0.2. The Euclidean distance between x and y is .

Hafner, et al. , "Efficient Color Histogram Indexing for Quadratic Form DistanceFunctions," IEEE Trans. on PAMI, vol. 17, no. 7, pp. 729-736, Jul., 1995

Page 73: 1 Video Syntax Analysis 2

Color Similarity

Multimedia Content Analysis, CSIE, CCU

73

Color sets give only a selection colors. Color set distance

The quadratic distance between two color sets cq and ct is

Page 74: 1 Video Syntax Analysis 2

Color Set Query Strategy

Multimedia Content Analysis, CSIE, CCU

74

The color set query compares only the color content of regionsor images.

Given query Q = {cq}, the best match to Q is target Tj = {cj},where

Color region matching is accomplished by performing severalrange queries on the query color set’s colors, taking the intersection of these lists and minimizing the sum of attributesin the intersection list.

Page 75: 1 Video Syntax Analysis 2

Single Region Query

Multimedia Content Analysis, CSIE, CCU

75

Fixed query location The spatial distance between regions is given by the Euclidean distance

of centroids

Page 76: 1 Video Syntax Analysis 2

Single Region Query76

Bounded query location User specify bounds within which a target region is assigned a spatial

distance of zero. When a target regions is outside of the bounds, calculate by Euclidean

distance. Useful in many situations when users don’t care about the exact

position

Page 77: 1 Video Syntax Analysis 2

Centroid Location Spatial Access–Spatial Quad-Trees

77

The centroids of the image regions are indexed using a spatialquad-tree on their x and y values.

The quad-tree provides quick access to 2-D data points. A query for region at location (xt,yt) is processed by first

traversing the spatial quad-tree to the containing node, thenexhaustively searching the block for the points that minimize

Page 78: 1 Video Syntax Analysis 2

Rectangle Location Spatial Access–R-Trees

78

Region spatial locations are also indexed by their minimum boundingrectangles. (MBRs)

MRBs of the regions are indexed using an R-tree.

The R-tree provides a dynamic structure for indexing rectangles.

The R-tree, which consists of a hierarchy of overlapping spatial nodes, isdesigned to visit only a small number of nodes in a spatial search.

Page 79: 1 Video Syntax Analysis 2

Size

Multimedia Content Analysis, CSIE, CCU

79

Area distance

Spatial extentCalculated based on the widths and heights of minimum

bounded rectangles

Page 80: 1 Video Syntax Analysis 2

Single Region Query Strategy

Multimedia Content Analysis, CSIE, CCU

80

Integrating distances of color set, region location, area, andspatial extent. Weighted sum:

Page 81: 1 Video Syntax Analysis 2

Single Region Query Strategy

Multimedia Content Analysis, CSIE, CCU

81

Query: find the region that best matches Q = {cq, (xq,yq), areaq,(wq,hq)}

First computing the individual queries for color, location, sizeand spatial extend

The intersection of the region match lists is then computed toobtain the set of common images.

Page 82: 1 Video Syntax Analysis 2

Multiple Regions Query82

Intersecting the results of single region matches Computing image match scores based on adding the weighted

scores from the best regions matches. Check relative spatial locations

Page 83: 1 Video Syntax Analysis 2

Absolution Locations

Multimedia Content Analysis, CSIE, CCU

83

Query: find the region that best matches Q ={QA,QB,QC},where Qi = {ci

q, (xiq,yi

q), areaiq, (wi

q,hiq)}

The query is processed by intersecting the query region lists toobtain the list of candidate images. The best match minimizesthe weighted sum of the region distances between the queryand target image.

Page 84: 1 Video Syntax Analysis 2

Region Relative Location

Multimedia Content Analysis, CSIE, CCU

84

Convert relative location into 2-D strings (t0t1 < t2 < t7 < t3 < t6 < t4 < t5) (t0 < t5t7 < t6 < t2 < t3t1 < t4)

Scale invariance and rotation invariance (t0 < t7 t2 t1 < t6 t5 t3 < t4) (t5 < t6t7 < t4 t0 t3 t2 < t1)

Adjacency, nearness, overlap, and surroundcan be detected via checking 2-D strings.

Page 85: 1 Video Syntax Analysis 2

Relative Locations

Multimedia Content Analysis, CSIE, CCU

85

For each candidate image, the 2-D string isgenerated from the identified region and iscompared to the 2-D string of the query image.

This final operation either validates the targetimage or rejects it.

Page 86: 1 Video Syntax Analysis 2

Evaluation

Multimedia Content Analysis, CSIE, CCU

86

Users sketch regions, positionsthem on the query grid, and assignsthem properties of color, size andabsolution location.

The user may also assignboundaries for location and size.

Page 87: 1 Video Syntax Analysis 2

Evaluation87

Global color histogramquery process gives userslittle control in specifyingthe query and more readilyreturns images that are notdesired.

Page 88: 1 Video Syntax Analysis 2

Evaluation88

Page 89: 1 Video Syntax Analysis 2

Synthetic Evaluation Data89

Page 90: 1 Video Syntax Analysis 2

Evaluation

Multimedia Content Analysis, CSIE, CCU

90

Q1: region indexing anddistance computation strategyin this paper

Q2: the same query strategyon a region database that wasgenerated automatically fromthe target images using colorset back-projection

Q3: based on color histogram

Page 91: 1 Video Syntax Analysis 2

Evaluation of Color Sets

Multimedia Content Analysis, CSIE, CCU

91

Retrieval effectivenessdegrades only slightlyusing color sets.

This indicates that theperceptually significantcolor information isretained in the color sets.

Page 92: 1 Video Syntax Analysis 2

Examples of VisualSEEk Queries

Multimedia Content Analysis, CSIE, CCU

92