1 video syntax analysis 2

Wei-Ta Chu

2010/10/7

Video Syntax Analysis 21

Multimedia Content Analysis, CSIE, CCU

Scene Detection in Movies and TV shows2


Rasheed, et al. “Scene detection in Hollywood movies and tvshows” Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 343-348, 2003.

Introduction


3

Every year around 4500 motion pictures arereleased around the world spanning overapproximately 9000 hours of video.

Inexpensive and popular digital technology isavailable through cable and internet, such as videoon demand.

Accessing video content Detect shots and set of key frames Combine similar shots and form scenes or story units

Problems in Previous Works


4

A false color match between shots of two differentscenes may wrongly combine scenes.

Action scenes may be broken into many scenes fornot satisfying the color matching criterion.

System Flowchart


5

BSC: backward shot coherencePSB: potential scene boundaries

Shot Detection


6

Based on histogram intersection 16 bin HSV normalized color histogram

8 bins for hue, 4 bins each for saturation and value

Key frame Selection


7

(1) Initially, the middle frame of the shot is selected andadded to the null set Ki.

(2) Each frame within a shot is compared to every frame in Ki. (3) If the frame differs from all previously chosen key frames

by a fixed threshold, it is added in Ki.

Shot-based Features


8

Shot length Shot motion content

Estimate the parameters of a global affine motion model Calculate the difference between the actual and the re-

projected motion of blocks

Shot Motion Content


9

Scene Boundary Detection Algorithm


10

Pass 1: Detecting potential scene boundaries basedon color properties

Pass 2: Removal of weak scene boundaries byanalyzing the shot length and motion content

Potential Scene Boundaries


11

Backward shot coherence

Potential Scene Boundaries12

Backward shot coherence Compute the shot coherence of the shot i in a window of previous shots

Taking the maximum shot coherence in a window of length N

The shots with local mimimum BSCs are scene boundary candidates

To filter out false alarms: if a pair of key frames of twoadjacent potential scenes are similar, merger them into onescene.

Potential Scene Boundaries


13

BSC for 300 shots.

First key frame of eachshot

Selection of Window Size


14

The computation of BSC is controlled by the selectionof window size N.

A memory parameter which mimics a human’s ability to recall a shot seen in the past.

If N is too large, it may span over several scenes. If N is too small, over-segmentation of video may be

obtained. N=10 in this paper

Scene Dynamics Analysis


15

Scenes with weak structure are often broken in severalscenes. E.g. action scenes–non-repetitiveness of shots

Scene dynamics

Action scenes: larger SMC and smaller L (length of shot) The PSB between two consecutive scenes will be

removed if SD of both scenes exceed a fixed threshold.

Scene Dynamics Analysis


16

Scene Representation


17

A shot is a good representative whenThe shot is shown several times (higher SC)The shot spans over longer period of time (larger shot

length)The shot has minimal motion content (smaller SMC)

Multiple faces are preferred.

Shot Goodness


18

A correlation matrix of dimension N X N is constructed whereelement (i,j) is the coherence of shot i with shot j.

Three shots with the highest W are selected as candidate shotsand face detection is performed.

Detection of Faces


19

A method based on skin detection is adopted. The middle frame of candidate shots are tested. Each isolated segment of skin is considered as face

and the frame with highest votes is taken as thescene key frame.

In the case of a tie or no face, the key frame of theshot with the highest goodness value is selected.

Scene Key Frame


20

More Examples


21

Experimental Results


22

Five movies, one sitcom, and one TV show

False alarm(false positive)

Miss(false negative)



23

Slightly over segmentationis preferable over under-segmentation.

While browsing a video,it’s better to have two segments of one scenerather than on segmentconsisting of two scenes.



24

References


25

Rasheed, et al. “Scene detection in Hollywood movies and tvshows” Proc. of IEEE Computer Society Conference on Computer Vision and PatternRecognition, vol. 2, pp. 343-348, 2003.

Yeung, et al. “Segmentation of video by clustering and graph analysis” Computer Vision and Image Understanding, vol. 71, no. 1, pp. 94-109,1998.

Vendrig, et al. “Systematic evaluation of logical story unit segmentation” IEEE Transactions on Multimedia, vol. 4, no. 4, pp. 492-499, 2002.

Brief Introduction of Montage26


Montage


27

Montage “refers to the editing of the film, the cutting and piecing together of exposed film in amanner that best conveys the intent of the work”

Methods of Montage


28

Metric The editing follows a specific number of frames, cutting to the next shot

no matter what is happening within the image. This montage is used to elicit the most basal and emotional of reactions

in the audience. Example

http://en.wikipedia.org/wiki/Soviet_montage_theory

Methods of Montage


29

Rhythmic Cutting based on time -- along with a change in the speed of the metric

cuts -- to induce more complex meanings than what is possible withmetric montage.

Once sound was introduced, rhythmic montage also included auralelements (music, dialogue, sounds).

Example

Methods of Montage


30

Tonal A tonal montage uses the emotional meaning of the shots -- not just

manipulating the temporal length of the cuts or its rhythmicalcharacteristics -- to elicit a reaction from the audience even morecomplex than from the metric or rhythmic montage.

For example, a sleeping baby would emote calmness and relaxation. Example: This is the clip following the death of the revolutionary sailor

Vakulinchuk, a martyr for sailors and workers.

Methods of Montage


31

Overtonal/Associational The overtonal montage is the accumulation of metric, rhythmic, and tonal

montage to synthesize its affect on the audience for an even moreabstract and complicated effect.

Example: In this clip, the men are workers walking towards aconfrontation at their factory, and later in the movie, the protagonist usesice as a means of escape.

Methods of Montage


32

Intellectual Uses shots which, combined, elicit an intellectual meaning Example: from Eisenstein's October and Strike. In Strike, a shot of striking

workers being attacked cut with a shot of a bull being slaughteredcreates a film metaphor suggesting that the workers are being treatedlike cattle. This meaning does not exist in the individual shots; it onlyarises when they are juxtaposed.

http://www.tcf.ua.edu/classes/Jbutler/T112/EditingIllustrations06.htm

Wei-Ta Chu

2010/10/7

Overview of CBIR33


Y. Rui, T.S. Huang, and S.-F. Chang, “Image retrieval: current techniques, promising directions, and open issues” Journal of Visual Communication and Image Representation, vol. 10, pp. 39-62,1999.

Image Retrieval


34

Image retrieval has been an active research areasince 1970s, with the thrust from the researchcommunities of database management and computervision.

Text-based approachesAnnotate images by textUse text-based database management systems to

perform image retrieval

Needs of Content-based ImageRetrieval


35

In the early 1990s, two difficulties arise largelyVast amount of labor required in annotating large-

scale image collectionsRich content in the images and the subjectivity of human

perception

Instead of annotating by text-based keywords,images are indexed by their own visual content,such as color and texture.

An Image Retrieval System Architecture


36

Image processingand compression

Computer vision andimage understanding Information retrieval

and databasemanagement system

Computational geometry,database management, patternrecognition

Database management

User psychology anduser interface

Feature Extraction


37

General featuresColorTextureShape…

Domain-specific featuresHuman facesFingerprints

Color (1/2)


38

Robust to background complication andindependent of image size and orientation

Color histogramHistogram intersection (L1 metric) L2-related metricCumulated color histogram

Euclidean distance is the L2 norm.

Color (2/2)


39

Color momentsMost of the information is concentrated on the low-

order moments.The first moment (mean), the second (variance) and the

third (skewness)

Color setA selection of colors from the quantized color space.Color set feature vectors are binary. Thus a binary

search tree was constructed to allow a fast search.

Texture (1/2)


40

Visual patterns that have properties of homogeneitythat do not result from a single color or intensity.

Containing important information about the structuralarrangement of surfaces and their relationship to thesurrounding environment.

Rushing, et al., “Using association rules as texture” IEEE Trans. on PAMI, vol. 23, no.8, pp. 845-858, 2001.

Texture (2/2)


41

Co-occurrence matrix of texture featuresExplore the gray level spatial dependence of texture.Based on the orientation and distance between image

pixels

Tamura texture features Use of wavelet transform in texture representation

Shape


42

Boundary-based featureUse only the outer boundary of the shapeFourier descriptor–use the Fourier transformed

boundary as the shape feature

Region-based featureUse the entire shape regionMoment invariants–use region-based moments which

are invariant to transformations

Color Layout


43

Global color feature tends to give too many falsepositives when the image collection is large.

Using both color feature and spatial relationsDivide the whole image into blocks and extract color

features from each blocks.

Kasutani, et al., “The mpeg-7 colorlayout descriptor: a compact imagefeature description for high-speedimage/video segmentation retrieval” Proc. Of ICIP, pp. 674-677, 2001.

Segmentation


44

Very important to image retrieval Both the shape feature and the layout feature depend on

good segmentation Still a unsolved problem

Chien, et al., “Predictive watershed: a fast watershed algorithm for videosegmentation” IEEE Trans. on CSVT, vol. 13, no. 5, pp. 453-461, 2003.

Summary


45

Many visual features have been explored. What features and representations should be used

is application dependent. MPEG-7 standard–multimedia content description

interface

High Dimensional Indexing


46

Make CBIR truly scalable to large size imagecollections

Two main challengesHigh dimensionalityNon-Euclidean similarity measure

ApproachDimension reductionUse appropriate multidimensional indexing techniques

Curse of Dimensionality


47

An example of classifying data in two dimension

Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

Curse of Dimensionality


48

If we divide a space into regular cells, then the number of suchcells grows exponentially with the dimensionality of the space.

We need an exponentially large quantity of training data inorder to ensure that the cells are not empty.

Dimension Reduction


49

Although there is curse of dimensionality, it doesn’t prevent us from finding effective techniques.Real data will often be confined to a region of the

space having lower dimensionality.Real data will typically exhibit some smoothness

properties so that small changes in the input variableswill produce small changes in the target variables.

Karhunen-Loeve transform (KLT) Clustering

Multidimensional Indexing Techniques


50

Select appropriate multidimensional indexingalgorithms to index the reduced but still highdimensional feature vectors.

Techniquesk-d treeR-tree…

Retrieval Systems


51

Most image retrieval systems supportRandom browsingSearch by exampleSearch by sketchSearch by text (including keyword or speech)Navigation with customized image categories

QBIC, Query by Image Content


52

Support query based on example images, user-constructed sketches and drawings, and selectedcolor and texture patterns

FeaturesColor, texture, shape

IndexingR*-tree

http://wwwqbic.almaden.ibm.com/

Virage


53

Support visual queries based on color, composition,texture, and structure. Support arbitrarycombinations of the above atomic queries.

http://www.virage.com/

PhotoBook


54

A set of interactive tools for browsing and searchingimages

FeaturesShape, texture, face

Include human in the image annotation and retrievalloopRelevance feedback

VisualSEEk and WebSEEk


55

Spatial relationship query of image regions andvisual feature extraction from compressed domain

FeaturesColor set, wavelet transform based texture

http://persia.ee.columbia.edu:8008/

MARS


56

The research features are the integration of DBMSand IR, integration of indexing and retrieval, andintegration of computer and human.

Investigate how to organize various visual featuresinto a meaningful retrieval architecture which candynamically adapt to different applications anddifferent users.

Others


57

ALIPR (Automatic Linguistic Indexing of Pictures -Real Time)

RetrievalWare Netra ART MUSEUM Blob-world…

Wei-Ta Chu

2009/10/15

VisualSEEk58


J.R. Smith and S.-F. Chang, “VisualSEEk: a fully automated content-based image query system” Proc. of ACM Multimedia, pp. 87-98, 1996.

Introduction


59

Enable querying by image regions and spatial layout. Unconstrained images are decomposed into near-symbolic

images which lend to efficient spatial query Address spatial queries involving adjacency, overlap, and

encapsulation of regions

Introduction


60

Need to devise an image similarity function whichcontains both color feature and spatial components.

Intrinsic parametersSimilarity between query and target colors and/or

region sizes and (absolute) spatial locations Derived parameters

The inferences that can be made from the intrinsicparameters, such as relative spatial locations andoverall assessment of image matches with multipleregions.

Image Query Process


61

Characteristics of VisualSEEk


62

Automated extraction of localized regions andfeatures

Querying by both feature and spatial information Feature extraction from compressed data Development of techniques for fast indexing and

retrieval Development of highly functional user tools

System Overview


63

Color Sets


64

Tc: color space transformation

QcM: quantizer that partitions the color space into M subspaces

BcM: M dimensional binary space such that each axis corresponds to one

unique index value m

A color set is a binary vector in BcM which corresponds to a selection of

colors {m}

Example


65

Tc: RGB to HSV M = 8 for Qc

M. Quantize the HSV color space to 2hues, 2 saturations, and 2 values.

BcM is an eight dimensional binary space

A color set c contains a selection from the eightcolorsE.g. c = [10010100] corresponds to the selection of

three colors, m = 0, m = 3, and m = 5, from thequantized HSV color space.

Color Sets


66

Color sets provide a compact alternative to colorhistograms for representing color information.

Their utilization stems from the conjecture thatsalient regions have not more than a few, equallyprominent colors.

Color Sets


67

Color Set Back-Projection


68

Goal: to extract color regions Back-project process

Color set selection Back-projection onto the image Thresholding and labeling

Back-projection Given image I and color set c, let k be the index of the color

at image point I(x,y), then generate image B(x,y) by B(x,y) = c[k]

Color Set Back-Projection69

Color Similarity


70

In HSV color space, the similarity between any two colors mi=(hi, si, vi) and mj = (hj, sj, vj)is

Color histogram

Histogram distance Minkowski metric A dark red image is equally dissimilar to a red image as to a blue

image.

Color Similarity


71

Histogram Quadratic Distance It measures the weighted similarity between histograms The quadratic distance between histograms hq and ht

A = [ai,j] and ai,j denotes the similarity between colors with indices iand j

Since the histogram quadratic distance computes the cross similaritybetween colors, it’s computational expensive.

Hafner, et al. , "Efficient Color Histogram Indexing for Quadratic Form DistanceFunctions," IEEE Trans. on PAMI, vol. 17, no. 7, pp. 729-736, Jul., 1995

Color Similarity


72

Histogram Quadratic Distance

Consider a pure red image x=[1.0, 0.0, 0.0]T, and a pureorange image y=[0.0, 1.0, 0.0]T.

The quadratic distance between x and y is 0.2. The Euclidean distance between x and y is .

Hafner, et al. , "Efficient Color Histogram Indexing for Quadratic Form DistanceFunctions," IEEE Trans. on PAMI, vol. 17, no. 7, pp. 729-736, Jul., 1995

Color Similarity


73

Color sets give only a selection colors. Color set distance

The quadratic distance between two color sets cq and ct is

Color Set Query Strategy


74

The color set query compares only the color content of regionsor images.

Given query Q = {cq}, the best match to Q is target Tj = {cj},where

Color region matching is accomplished by performing severalrange queries on the query color set’s colors, taking the intersection of these lists and minimizing the sum of attributesin the intersection list.

Single Region Query


75

Fixed query location The spatial distance between regions is given by the Euclidean distance

of centroids

Single Region Query76

Bounded query location User specify bounds within which a target region is assigned a spatial

distance of zero. When a target regions is outside of the bounds, calculate by Euclidean

distance. Useful in many situations when users don’t care about the exact

position

Centroid Location Spatial Access–Spatial Quad-Trees

77

The centroids of the image regions are indexed using a spatialquad-tree on their x and y values.

The quad-tree provides quick access to 2-D data points. A query for region at location (xt,yt) is processed by first

traversing the spatial quad-tree to the containing node, thenexhaustively searching the block for the points that minimize

Rectangle Location Spatial Access–R-Trees

78

Region spatial locations are also indexed by their minimum boundingrectangles. (MBRs)

MRBs of the regions are indexed using an R-tree.

The R-tree provides a dynamic structure for indexing rectangles.

The R-tree, which consists of a hierarchy of overlapping spatial nodes, isdesigned to visit only a small number of nodes in a spatial search.

Size


79

Area distance

Spatial extentCalculated based on the widths and heights of minimum

bounded rectangles

Single Region Query Strategy


80

Integrating distances of color set, region location, area, andspatial extent. Weighted sum:

Single Region Query Strategy


81

Query: find the region that best matches Q = {cq, (xq,yq), areaq,(wq,hq)}

First computing the individual queries for color, location, sizeand spatial extend

The intersection of the region match lists is then computed toobtain the set of common images.

Multiple Regions Query82

Intersecting the results of single region matches Computing image match scores based on adding the weighted

scores from the best regions matches. Check relative spatial locations

Absolution Locations


83

Query: find the region that best matches Q ={QA,QB,QC},where Qi = {ci

q, (xiq,yi

q), areaiq, (wi

q,hiq)}

The query is processed by intersecting the query region lists toobtain the list of candidate images. The best match minimizesthe weighted sum of the region distances between the queryand target image.

Region Relative Location


84

Convert relative location into 2-D strings (t0t1 < t2 < t7 < t3 < t6 < t4 < t5) (t0 < t5t7 < t6 < t2 < t3t1 < t4)

Scale invariance and rotation invariance (t0 < t7 t2 t1 < t6 t5 t3 < t4) (t5 < t6t7 < t4 t0 t3 t2 < t1)

Adjacency, nearness, overlap, and surroundcan be detected via checking 2-D strings.

Relative Locations


85

For each candidate image, the 2-D string isgenerated from the identified region and iscompared to the 2-D string of the query image.

This final operation either validates the targetimage or rejects it.

Evaluation


86

Users sketch regions, positionsthem on the query grid, and assignsthem properties of color, size andabsolution location.

The user may also assignboundaries for location and size.

Evaluation87

Global color histogramquery process gives userslittle control in specifyingthe query and more readilyreturns images that are notdesired.

Evaluation88

Synthetic Evaluation Data89

Evaluation


90

Q1: region indexing anddistance computation strategyin this paper

Q2: the same query strategyon a region database that wasgenerated automatically fromthe target images using colorset back-projection

Q3: based on color histogram

Evaluation of Color Sets


91

Retrieval effectivenessdegrades only slightlyusing color sets.

This indicates that theperceptually significantcolor information isretained in the color sets.

Examples of VisualSEEk Queries


92

1 video syntax analysis 2

Documents