searching video collections: representation, indexing ... · universidad de chile 3 searching video...

114
1 Dulce Ponceleon Searching Video Collections: Representation, Indexing, Browsing and Evaluation Part I Universidad de Chile December 2002

Upload: others

Post on 29-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

1Dulce Ponceleon

Searching Video Collections: Representation, Indexing, Browsing and Evaluation

Part I

Universidad de Chile December 2002

Page 2: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile2

Searching Video Collections: Overview

Part IIntroduction to Multimedia Information RetrievalMultimedia RepresentationMultimedia Indexing

Part II Audio AnalysisSpeech Indexing Query Formulation Multimedia Retrieval

Part IIIBrowsing Distribution/StreamingEvaluation Multimedia IR ApplicationsConclusions

Page 3: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile3

Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation

Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion

Multimedia IndexingVideo Segmentation

Shot-Boundary DetectionEffects Detection

Beyond Basic Visual Features: Text, Face

Page 4: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile4

What is Multimedia?

Unstructured Data types: text, images, audio, videoDifferent from DBMS structured recordsName: <s>, Sex: <s>, Age: <I>, SSN: <I>…

Structure in Unstructured DataAll unstructured data has contentTypically also has associated metadataText has layout and logical structureMultimedia has complex spatial, temporal, and semantic structure

Page 5: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile5

History: from Text IR to MMIRLibrary of Alexandria (3rd century BC)

500,000 volumes, catalogues, classificationFirst concordance of the bible (13th century AD)Printing press (15th century)Johnson’s dictionary (1755)Dewey Decimal classification (1876)Punched card retrieval (1930’s)Luhn describes statistical retrieval/abstracting (1959)MEDLINE (1964, goes on-line in 1971)

*Adapted from a presentation © Bruce Croft

Page 6: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile6

History:From Text IR to MMIR

Cranfield effort defines evaluation (1966)DIALOG from Lockheed (1967)Salton’s book about SMART and IR (1968)

discusses many techniques that are used today

Relevance ranking available (late 80’s)Large-scale probabilistic system (West, 1992)Google, Search Engines (1996)

Page 7: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile7

Do you use Google?

Do you use Google once a day?

Do you use Google 10 times a day?

Do you ?

Page 8: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile8

Do you Image ?

Do you use Google image search?

More than once a day?

Do you video Google?

Page 9: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile9

What does come to mind when we say MM Retrieval?

Keanu Reeves avoid bullets

Helicopter Crash

i.e. Hollywood’s Multimedia Retrieval

Page 10: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile10

Little Value in Indexing Published Content (?)

Publishing impliesHigh production effortBroad appeal

Easy to manually annotate (once)Somebody edits in a dissolve=> They can add manual annotation

No demandPeople aren’t clamoring for image retrieval“Give me Rock Hudson washing up on beach”

Page 11: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile11

An MM Indexing Product

Bare Facts Video Guide

Indexes nudity in Hollywood videos

Very specialized

Page 12: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile12

Kundi (.com-era startup)

Hot Now buttonA user identifies important content

Voting/moderation (ala /.) scheme)

Notifications shared with other users

HotNow!

Page 13: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile13

World is not Bleak!

Cameras everywhere!Number of sensors doubling every yearFixed (webcams)Mobile (on your person)

Security

Customer Relationship Management

Page 14: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile14

Security

Important in new worldFind “interesting” events

Look for anomaliesName that eventLook for secondaryevents

Page 15: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile15

Customer Relationship Management

High value in customizationImagine camera at store entranceCan we determine gender?

Suggest sale item in men’s clothing

Can we recognize previous customer?

Probably not well enough

Page 16: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile16

Text vs. Multimedia

Page 17: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile17

Properties of Multimedia1. Visual Components2. Spatial Components3. Temporal Components4. Ease of data entry5. Well defined interaction unit?6. Well defined semantic unit?

NNVery Difficult

YYYVideo

NNDifficultYNNAudio

NNDifficultNYYImage

YYEasyNYYText

654321Data Type

Page 18: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile18

What Type of Queries would you like to Answer?

Downhill Skiing [Foote99, Over01]Scenes that include space shuttle launchingScenes with a yellow boat, pink flowerPeople on the beachSpeaker talking in front of the US FlagCorn on the cob in a fieldImpact of heavy airliner landing on runways

Page 19: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile19

What Type of Queries CAN you Answer Today?

Use a Sample Image or Video Clip[Flickner95]

Use Basic Art tools to express “a red object moving from the upper left to the lower right corner on a white background”

[Dimitrova94, Chang98, Smith96, Yining98]Not at the semantic level desired

Page 20: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile20

Two Fundamental Multimedia Retrieval Paradigms

Expression-based retrieval aka Query-by-Example [Foote99]

Semantic-based Retrieval based on automatically extracted metadata or manually annotated metadata [Barnard01]

Page 21: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile21

What is Content Analysis?Analysis of low-level features

Basic features, physical propertiesSemantics for high-level abstractionsSpecial algorithms borrowing from several disciplinesUse of all media availableRelated Areas

signal processing, computer vision, speech recognition, pattern and image recognition, OCR, natural language, audio analysis

Page 22: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile22

Query Based Multimedia IR System Overview

Users InformationNeed

MultimediaContent

Represented as Represented as

Audio-VisualTextQuery

Indexed Multimedia Content

Retrieve and ComputeSimilarity

RankResults

Evaluate

Page 23: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile23

Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation

Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion

Multimedia IndexingVideo Segmentation

Shot-Boundary DetectionEffects Detection

Beyond Basic Visual Features: Text, Face

Page 24: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile24

Similarity-based Image SearchManual annotations are far from suitable

subjective, feasible?a picture is worth …how many keywordsa picture with no-keywords, how much is worth?

Typical automatic procedureUse features to characterize imagesStore feature vectors Enable the user to start (limited!) semantic queriesYield a set of resulting images

based on distance of featuresSmallest distance represents the best match

Page 25: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile25

Analysis of Picture Sequences

GoalsRecognition of ObjectsRecognition of camera motion

Features Object MotionHints to semantics

Example: motion vs. non-motion sequences

Recognition of motion in combination with segmentationTracking of object boundaries in subsequent frames yields higher segmentation performance than use of still images.

Page 26: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile26

Visual Features in Multimedia

Color, Color, Color Texture ShapeEdgesObject Outline

foreground vs. backgroundedge detection

Motion TrajectoriesHigher-level Semantics Multimodal

Page 27: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile27

Audio Features in Multimedia

Features depend on audio category SpeechMusicSounds (i.e. explosions, street noise, etc.)

FeaturesEnergy, LoudnessPitchCepstral CoefficientsBeatHarmonics

Page 28: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile28

Visual Features: Color

IntroductionColor Models Color RepresentationsColor FeaturesSimilarity Measures

Page 29: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile29

What is Color?

It is a perceptual phenomenonEach color corresponds to a narrow band of wavelength within the electromagnetic spectrum

Visible wavelengths: 400 – 700 nm range400 – 480 is blue, ~ 520 is green, 600-700 is red

Human eye can distinguish 400,000 colors< 400nm ultraviolet and X-rays> 700 nm infrared, microwaves, FM radio, TV, AM radio, etc.

Page 30: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile30

Visual Features: Color

The Color PhenomenonDominant wavelength is a light called hueIntensity (energy) of a light is called luminanceor brightnessAmount of pure light (pink vs. red) is saturation or purityCollectively the hue and saturation are referred as Chromaticity

Page 31: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile31

Color RetrievalIt is a global featureIndependent of view and resolutionNo-object background segmentation is requiredCan handle deformation of objectCan handle articulated objectColor Coherence Color Layout

Drawbacks: color constancy

Page 32: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile32

Color SpaceThe RBG Model

(red, blue, green) different intensitiesUsed for active devices

The YIQ ModelDeveloped by NTSC and used for the first color TV broadcast in 1953To be compatible with black & white TVLuminance signal

Y = 0.30 R + 0.59 G + 0.11 B Two color difference signals

I = 0.6 R - 0.28 G + 0.32 BQ = 0.21 R - 0.52 G + 0.32 B

Page 33: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile33

Color Space Linearity

For color retrieval we need a measure of color differenceRBG color space, each color (r,g,b)Drawbacks

It is not designed for humansMainly used for active display monitorsIt is perceptually non-linear

A linear color space is needed which corresponds to our perception

Page 34: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile34

The CIE Color Space

There are several linear colors spaces used in color industry for quality control, such as L u vThese are non-linear transformations of the RGB spaceIt is device independentEuclidean distance can be use as a measure of similarityEmpirical studies show that this is very close to human perception of color differenced

Page 35: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile35

Color Model towards Image Representation

Digital Image: 2D array of pixels

2D array of intensitiesbinary (1 bit/pixel), grayscale (8 bits/pixel) or color (24 bits/pixel)

2D array of codes Code corresponds to RBG triple

134 135 132 12 15...133 134 133 133 11...130 133 132 16 12...137 135 13 14 13...140 135 134 14 12...

Page 36: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile36

Color Modeled as Blocks

Divide into 8x8 blocks and convert RGB to YUVLuminance (Y) and Chrominance (Cb,Cr)Blue color difference CbRed color difference Cr

Only half resolution needed from Chrominance

Page 37: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile37

Discrete Cosine Transform

Transform each block of 8x8 samples into a block of 8x8 spatial frequency coefficientsEnergy tends to be concentrated into a few significant coefficientsOther coefficients are close to zero

DCT Basis

Page 38: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile38

Color and Color Mappings

Copyright by Smith&Chang 1996

RBG HSV

Color Sets = binary vector representing color (good for regional color)

Page 39: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile39

Color Representations

Pair-wiseRepresents color with a matrix of pixelsComputes changes at corresponding pixel locationsAdvantage: it considers spatial locationDisadvantage: too low level, dependent on image size, non a concise representation

Histogram Color representationLinearly re-quantize the contents into N levelsSimple method, used for video segmentation

Cluster Color Representation

Page 40: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile40

Color Similarity

For (L,u, v) space, we can use Mahalanobis distance, where

Data = colorCorrelation = perceptual similarity

For HSV space, similarity is derived form the distance in the cylindrical HSV color spaceHistogram Quadratic Distance:

Introduced in QBIC project (IBM 1993)Provides better similarity than “like-bin”comparisonComputationally expensive

Page 41: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile41

What is Texture

It is a perceptual phenomenonIt is a region phenomenon (not a point phenomenon)Depends a lot on the scaleRepeating patterns of local variations in image intensity which are too fine to be distinguished as a separate object

Page 42: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile42

Visual Features: Texture

ApproachesStatistical (coarseness, directionality, contrast) [Tamura78, Liu96]Spectral [Ma96]

Should be invariant to intensity, scale, orientationNatural Scenes are challenging

Query Image

MIT’s Photobook Texture Matching

Page 43: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile43

Tamura Texture Feature

Primary FeaturesContrast - related to picture-quality, sharpnessCoarseness – coarse-grained vs. fine-grainedDirectionality

Secondary FeaturesLine-likeness (line-like vs. blob-like) RegularityRoughness

Page 44: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile44

What is Shape?

It is also a perceptual phenomenonA 2D shape descriptor should be invariant to

translation, scale changes, rotation

Measures:

Page 45: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile45

Visual Features: Shape

Region-based Approach

Boundary-based Approach

Use contours, ignore interior

Use interior details (holes, etc) besides boundary details

Can we reconstruct the object from the shape descriptors?

Page 46: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile46

Shape Techniques OverviewShape Description

Boundary Based Region Based

Spatial Domain Transform Domain

StructuralGeometric

Partial Complete

Corner PointsChain PointsShape NumbersPerimeterAreaElongationCompactnessFourier Descriptors

Contour SegmentsBreakpoints

Areas, holes, Euler NumberMoment Invariants, Sernike MomentsCompactness, Elongation, Symmetry

PrimitivesRules2D Strings

Hough TransformationWalsh TransformWavelet Transform

Page 47: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile47

Region-Based Shape & Texture Matching

MIT’s Photobook:

FourEyes

Page 48: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile48

Visual Features: Motion

• Align two images to achieve the best match.

• Determine motion between sequence imagesCopyright Lucas & Kanade

Motion Field

Page 49: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile49

Optical FlowReal world object motion are transformed to color changes in imagesEfficient computation of motion vectors: use gray-value images

Optical Flow

motion of gray-value patterns in the image plane

first step: calculate motion vector of each gray-value pixel

second step: calculate continuous vector field (interpolation)

Page 50: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile50

Optical Flow ...

Constraintsboth steps use constraints

both steps introduce motion vector failures

Approachesdifferential techniques (derivatives of gray values)

correlation-based techniques (correlation of regions)

energy-based techniques (velocity filters)

phase-based techniques (phase dependence with regard to band pass filters

Page 51: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile51

Optical Flow: Examples

originalneedle flicker

Page 52: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile52

Optical Flow: ProblemsCorrespondence Problem

???

• Other Problems

?

?

?

Aperture Problem Solution of Aperture Problem

DeformableObjects

Periodical Structures

t0

t1

t0

t1

t1t1

t0t0

?

• Optical Flow unreliable feature for content analysis!

Page 53: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile53

Aperture Problem

Page 54: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile54

Aperture Problem

Page 55: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile55

Motion Estimation: Examples

Block-based Region-basedPixel-based

Pixel-based Motion Vector in Video Compression

Page 56: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile56

Motion Vectors

Modern compression algorithms for video calculate motion vectors for pixel blocks (examples: MPEG-1, MPEG-2, H.261, H.263). Block motion can be used to detect camera operations, but cannot be used to analyze object motion.

Advantage: motion vectors are available without expensive calculation if encoder/decoder information is usedExample

Page 57: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile57

Motion Vectors

Example: famous MPEG test clip

Displacement Vectors

Velocity vector (flow vector)

ASSUMPTION

For Small time interval velocity is constant

Page 58: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile58

Local Motion:Motion Trajectory Extraction

Object tracking through motion estimation

In spatial domain 2D or 3DIn compressed domain using motion vectors

Trajectory representation using symbolic or analytical notation

Page 59: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile59

Trajectory Representation and Retrievala) Trajectory motion pattern b) B-Spline curve

c) Chain code d) Differential chain codeDimitrova94

Page 60: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile60

MPEG-7 Visual Descriptors

Page 61: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile61

Motion Activity: Motivation

Need to capture “pace” or Intensity of activityFor example, draw distinction between

“High Action” segments such as chase scenes.“Low Action” segments such as talking heads

Emphasize simple extraction and matchingUse Gross Motion Characteristics thus avoiding object segmentation, tracking etc.Compressed domain extraction is important

Page 62: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile62

MPEG-7 Motion Activity Descriptor

Attributes UsedIntensity/Magnitude - 3 bitsSpatial Characteristics - 16 bitsTemporal Characteristics - 30 bitsDirectional Characteristics - 3 bits

Page 63: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile63

MPEG-7 Motion Activity Descriptor

IntensityExpresses “pace” or Intensity of ActionExtracted by suitably quantizing variance of motion vector magnitude

DirectionExpresses dominant direction if definable as one of a set of eight equally spaced directionsExtracted by using averages of angle (direction) of each motion vectorUseful where there is strong directional motion

Page 64: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile64

Captures the size and number of moving regions in the shot on a frame by frame basisEnables distinction between shots with one large region in the middle such as talking heads and shots with several small moving regions such as aerial soccer shotsThus “sparse” shots have many long runs while “dense” shots do not have many long runs.

MPEG-7 Motion Activity Descriptormedium

long

short

Spatial Distribution : using run-lengths

Page 65: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile65

Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation

Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion

Multimedia IndexingVideo Segmentation

Shot-Boundary DetectionEffects Detection

Beyond Basic Visual Features: Text, Face

Page 66: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile66

Video IndexingAnalysis of Still Image

Features: Color, Texture, ShapeDistance Metrics

Analysis of Image SequenceSegmentationCut DetectionMotion VectorsShot TransitionsCamera OperationsScene AnalysisSelection of KeyframesShot Similarity

video

scenes

shots

frames

Page 67: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile67

Camera Motion Descriptors

Camera track, boom, and dolly motion modes,

Camera pan, tilt and

roll motion modes.

Page 68: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile68

Video IndexingMultilayered Hierarchical Structure of a Video Clip

Copyright by J. Hunter 2001,

Dublin Core and MPEG-7 Metadata for Video

Page 69: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile69

Video IndexingSemantic Units (Hierarchy)

Object, Regions, FramesShot: continuous sequence of frames captured from one cameraScene: one or more shots presenting different views of the same event (time or space related)Segment: one or more related scenes

TransitionsCut - an abrupt shot change that occurs in a single frameDissolves – continuous transition, progressive linear combination Fade - a slow change in brightness usually resulting in or starting with a solid black frameWipes – pixels from the second shot replace those of the first shot in a regular patternOthers –special effects, editing tools can offer up to 200 effects

Page 70: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile70

Video Indexing Example

Controlled VocabularyClose Trans

Controlled VocabularyOpen Trans

Controlled VocabularyLighting

GIF, JPEGKeyFrame

secs, frame #, SMPTEEnd Time

secs, frame #, SMPTEStart Time

secs, framesDuration

Controlled VocabularyCamera Motion

Controlled VocabularyCamera Angle

Controlled VocabularyCamera Distance

TextText

FormatsDescription

TextObject

TextCast

TextLocale

GIF, JPEGKeyFrame

secs, frame #, SMPTEEnd Time

secs, frame #, SMPTEStart Time

secs, framesDuration

TextEdit List

TextTranscript

TextScript

TextText

FormatsDescription

Shots Scenes

Dublin Core Metadata

Page 71: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile71

Reliable Shot Detection

The three most commonly used transition types are:

Abrupt Cut, Hard CutsFadesDissolves

Page 72: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile72

Cut Detection

Cut: Sudden Change of Image Content between continuous shotsCut Detection: Separate Video into Shots and calculate Features for Shots separately.

Time

Page 73: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile73

Shot TransitionsFade In

change of image content from monochrome color to image

example: fade from white/black

Fade Outchange of image content from image to monochrome color

example: fade to white/black

Time

Page 74: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile74

What is Dissolve?Dissolve: Shot Transition with Image Overlays

Time

Page 75: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile75

Types of Dissolve

Cross dissolve

Additive dissolve

Page 76: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile76

Shot Boundary DetectionPixel DifferencesStatistical DifferencesHistogramsCompression DifferencesEdge TrackingMotion Vectors

SMPTE 00:12:45:20

Page 77: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile77

Pixel Differences: Basic Idea

Compute total number of pixels that change in value more than a threshold If this total is greater than a second

threshold then a shot boundary is detectedDrawbacks

Sensitive to camera motion (pan, zoom)Sensitive to object motion

t

bT

Page 78: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile78

Pixel Differences: ImprovementsBasic method plus the use of a 3x3 averaging filter before the comparison

[Zhang93]Divide image in 12 regions and find the best match for each region in a neighborhood around the region in the other image. Difference is the sum of the region differences.

[Shahraray95]Chromatic images:

Change in gray level in 2nd imageRelatively constant for dissolves and fadesStill sensitive to camera and object motion

Page 79: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile79

Histogram DifferencesUse color/gray-scale histograms of pixels as a feature to detect shot boundariesAssumption: for the same background and same objects, there is very little change in the histogramLet be the histogram for the bin of the

frame, then difference is given by

If the difference exceeds a threshold A shot boundary is detected

)( jHithj

thi

|)()(| 1∑ +−= j iii jHjHCHD

bi TCHD >

Page 80: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile80

Histograms: Example

Cut

Page 81: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile81

Histograms: Difference GraphCuts

Threshold

Page 82: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile82

Histogram-Based Cut DetectionDifferent images can have same histograms

Same Histogram

Same Histogram

Obvious example

Not so obvious example

Page 83: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile83

Histogram-Based Cut Detection: Challenges

Different images can have similar histograms

Color values of subsequent images change significantly without a cut occurring

explosions

change of scene illumination

fast movement of large objects

Performance of histogram-based cut detectionbetween 90 and even 98 (in some cases)

Page 84: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile84

Histogram Differences:improvements

A coarse quantization is good enough. Typically, 6-bit code: 2 higher order bits or R, G and B channels.

This leads to 64-bin histograms.Good trade-off between accuracy and speed for shot boundary detectionThreshold selection is crucial. Threshold depends very much on the contentGradual transitions: use two thresholds instead of one global threshold, one for abrupt cuts and one for special effects

bT

Page 85: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile85

Histogram Comparison405 459 810

810 972 1026

0.4264 0.4298

0.1602 0.0383

Frame Number

Similarity Measure

Talk Show Sequence

Copyright Philips (MPEG-7 contribution)

Page 86: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile86

Histograms Differences:Twin-Comparison MethodCompute for all frames in videoMark camera breaks where Mark potential gradual transitions subsequences

wherever For each gradual transitions ,accumulate frame-to-frame difference:If , then declare as a gradual transition This algorithm works well and is widely used

iCHD

si TCHD >

bi TCHD >

bTAC >

]},{[eF

sFGT =

],[eF

sF

],[eF

sF

Page 87: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile87

IBM’s CueVideo Shot Boundary Detection

SMPTE 00:12:45:20

Detects cuts, dissolves, fades and other gradual changesCompare multiple pairs of frames: 1, 3 and 7 frames apartProcesses decoded frames

Supports MPEG, QT, AVI, live feed,…No user-tuned parameters - allows batch processingDetection of flashes, bad framesOne pass - allows live video processing

Copyright IBM Almaden

Page 88: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile88

CueVideo Histogram Example:

Page 89: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile89

Edge Change Ratio (ECR)

Properties

edge pixel in image i and (i-1): si and si-1

Eout: pixel in image (i-1) is edge pixel, pixel in image i is not an edge pixel

Ein: pixel in image (i-1) is not an edge pixel, pixel in image i is edge pixel

use of broad edges (noise independence)

edge change ratio between images i and (i-1)

=

−−

i

out

i

ini s

EsEECR ,max

11

Page 90: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile90

Computation of ECR: Example

Image (i-1)

Image i Edge Image i

Edge Image (i-1)

Inverted Images

ECR

AND

ECi

in

EC outi-1

ECR-Images

AND

Page 91: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile91

ECR Cut Detection

D

Time

D

Time

D

Time

D

Time

D

Time

Inside Shot Cut Fade Out

Fade In Dissolve

Page 92: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile92

ECR Cut Detection: Cutsif ECRi is edge change ratio between frames i and (i-1) a cut is detected if

where T is a threshold

Fast object and camera motion leads to high ECR-values without cuts

TECRi ≥

Cuts

Page 93: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile93

ECR Cut DetectionFade In, Fade Out

Fade out: number of edge pixels zero after last frame of sequence

Fade in: number of edge pixels zero before first frame of sequence

Fade In Fade Out

Page 94: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile94

ECR Cut Detection: Problems

Fast object or camera motion

Explosions

Fades and dissolves

soft transitions are difficult to detect

other effects: wipe detection unreliable

Performancetypically between 90 and 95 percent

Page 95: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile95

Shot-Boundary Detection: Conclusions

Histogram-based technique are good to recognize cuts

Standard deviation techniques good to recognize fades

Dissolves are the more challengingProblems

Ground truth: experimental data must be analyzed manually

Database ? Benchmarks?

Definition of a fade/dissolve

Page 96: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile96

Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation

Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion

Multimedia IndexingVideo Segmentation

Shot-Boundary DetectionEffects Detection

Beyond Basic Visual Features: Text, Face

Page 97: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile97

Text Detection: ApplicationsAnnotation and search of image and video libraries

TV, movie studios, advertising, and surveillance

Automatic identification and logging of the beginning and end of key events based on captionsVideo SummarizationTicker Tape analysisCommercial DetectionSports Programs indexing

Page 98: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile98

Text Detection: Design DecisionsWhat kind of text occurrences?

Scene text Overlay text

With what style attributes?

Font sizeFont typeText color

In what kind of media data?

Image-basedVideo-based

What should be achieved?

LocalizationSegmentationRecognition

How will the results be used?

IndexingObject-based video encoding

any

both

Page 99: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile99

Example: MPEG-4 Text Extraction

Locate text of any size at any position in images, web pages and videosSegment and recognize textEncode extracted text as rigid foreground object in MPEG4 (with Yen-Kuang Chen) 27.5

2828.5

2929.5

3030.5

3131.5

160 165 170 175 180 185 190 195

KBits/sec

PSNR

Y

Signle VOP Multiple VOP

Page 100: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile100

Example:

Dec 25 1998OCR result:

Page 101: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile101

Text Detection Example - Latin Script

Page 102: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile102

Text Detection: Korean Script Example

Page 103: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile103

Text Extracted from Video

Page 104: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile104

Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation

Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion

Multimedia IndexingVideo Segmentation

Shot-Boundary DetectionEffects Detection

Beyond Basic Visual Features: Text, Face

Page 105: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile105

Face Detection

Page 106: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile106

Pool of Features

=> ~130.000 features for 24x24 window

Page 107: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile107

Rapid Computationx

y

x

y

Rainer Lienhart,Jochen Maydt. An Extended Set of Haar-like Features for Rapid Object Detection. IEEE ICIP 2002, pp. 900-903, Sep. 2002.

Page 108: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile108

Cascade of Classifiers

PremiseSize of feature pool (>100000) exceeds what any reasonable classifier can handleCascade of classifiers (special kind of decision tree) can outperform a single stage classifier because it can use more features at the same computational complexityUse Boosting (Discrete/Real/ Gentle Adaboost, LogitBoost)

Input Pattern

Stage N

Stage 2

Stage 1 P(x|¬o)=.5P(x|o) = .002

P(x|¬o)=.52

P(x|o) = .004

P(x|¬o)=.5N

P(x|o) ~ .1

Object

P(x|o) = .998

P(x|o) = .9982 = .996

P(x|o) = .998N ~ .90

Page 109: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile109

Cascade Concept

Target ConceptBackground removal in stage 1

Background removal in stage 2

Background removal in stage 5

Background removal in stage 3

Background removal in stage 4

Background removal in stage 3

Page 110: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile110

Gracias por su Atencion

Page 111: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile111

Searching Video Collections: Overview

Part IIntroduction to Multimedia Information RetrievalMultimedia RepresentationMultimedia Indexing

Part II Audio AnalysisSpeech Indexing Query Formulation Multimedia Retrieval

Part IIIBrowsing Distribution/StreamingEvaluation Multimedia IR ApplicationsConclusions

Page 112: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile112

Edge Detection

Basic Idea:1st and 2nd derivative of an edge position of the edge can be estimated with the maximum of the 1st derivative or with the zero-crossing of the 2nd derivativeGeneralize technique to calculate the derivative of a two-dimensional image

Page 113: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile113

Canny Edge Detector

designed to be an optimal edge detector (according to particular criteria)It takes as input a gray scale image

as output an image showing the positions of tracked intensity discontinuities.

Page 114: Searching Video Collections: Representation, Indexing ... · Universidad de Chile 3 Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia

Universidad de Chile114

Canny Edge Detector

Multi-stage processImage Smoothed by Gaussian ConvolutionSimple 2-D first derivative operator to highlight regions of the image with high first spatial derivativestracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top

non-maximal suppressionThe tracking process exhibits hysteresis