searching video collections: representation, indexing ... · universidad de chile 3 searching video...
TRANSCRIPT
1Dulce Ponceleon
Searching Video Collections: Representation, Indexing, Browsing and Evaluation
Part I
Universidad de Chile December 2002
Universidad de Chile2
Searching Video Collections: Overview
Part IIntroduction to Multimedia Information RetrievalMultimedia RepresentationMultimedia Indexing
Part II Audio AnalysisSpeech Indexing Query Formulation Multimedia Retrieval
Part IIIBrowsing Distribution/StreamingEvaluation Multimedia IR ApplicationsConclusions
Universidad de Chile3
Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation
Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion
Multimedia IndexingVideo Segmentation
Shot-Boundary DetectionEffects Detection
Beyond Basic Visual Features: Text, Face
Universidad de Chile4
What is Multimedia?
Unstructured Data types: text, images, audio, videoDifferent from DBMS structured recordsName: <s>, Sex: <s>, Age: <I>, SSN: <I>…
Structure in Unstructured DataAll unstructured data has contentTypically also has associated metadataText has layout and logical structureMultimedia has complex spatial, temporal, and semantic structure
Universidad de Chile5
History: from Text IR to MMIRLibrary of Alexandria (3rd century BC)
500,000 volumes, catalogues, classificationFirst concordance of the bible (13th century AD)Printing press (15th century)Johnson’s dictionary (1755)Dewey Decimal classification (1876)Punched card retrieval (1930’s)Luhn describes statistical retrieval/abstracting (1959)MEDLINE (1964, goes on-line in 1971)
*Adapted from a presentation © Bruce Croft
Universidad de Chile6
History:From Text IR to MMIR
Cranfield effort defines evaluation (1966)DIALOG from Lockheed (1967)Salton’s book about SMART and IR (1968)
discusses many techniques that are used today
Relevance ranking available (late 80’s)Large-scale probabilistic system (West, 1992)Google, Search Engines (1996)
Universidad de Chile7
Do you use Google?
Do you use Google once a day?
Do you use Google 10 times a day?
Do you ?
Universidad de Chile8
Do you Image ?
Do you use Google image search?
More than once a day?
Do you video Google?
Universidad de Chile9
What does come to mind when we say MM Retrieval?
Keanu Reeves avoid bullets
Helicopter Crash
i.e. Hollywood’s Multimedia Retrieval
Universidad de Chile10
Little Value in Indexing Published Content (?)
Publishing impliesHigh production effortBroad appeal
Easy to manually annotate (once)Somebody edits in a dissolve=> They can add manual annotation
No demandPeople aren’t clamoring for image retrieval“Give me Rock Hudson washing up on beach”
Universidad de Chile11
An MM Indexing Product
Bare Facts Video Guide
Indexes nudity in Hollywood videos
Very specialized
Universidad de Chile12
Kundi (.com-era startup)
Hot Now buttonA user identifies important content
Voting/moderation (ala /.) scheme)
Notifications shared with other users
HotNow!
Universidad de Chile13
World is not Bleak!
Cameras everywhere!Number of sensors doubling every yearFixed (webcams)Mobile (on your person)
Security
Customer Relationship Management
Universidad de Chile14
Security
Important in new worldFind “interesting” events
Look for anomaliesName that eventLook for secondaryevents
Universidad de Chile15
Customer Relationship Management
High value in customizationImagine camera at store entranceCan we determine gender?
Suggest sale item in men’s clothing
Can we recognize previous customer?
Probably not well enough
Universidad de Chile16
Text vs. Multimedia
Universidad de Chile17
Properties of Multimedia1. Visual Components2. Spatial Components3. Temporal Components4. Ease of data entry5. Well defined interaction unit?6. Well defined semantic unit?
NNVery Difficult
YYYVideo
NNDifficultYNNAudio
NNDifficultNYYImage
YYEasyNYYText
654321Data Type
Universidad de Chile18
What Type of Queries would you like to Answer?
Downhill Skiing [Foote99, Over01]Scenes that include space shuttle launchingScenes with a yellow boat, pink flowerPeople on the beachSpeaker talking in front of the US FlagCorn on the cob in a fieldImpact of heavy airliner landing on runways
Universidad de Chile19
What Type of Queries CAN you Answer Today?
Use a Sample Image or Video Clip[Flickner95]
Use Basic Art tools to express “a red object moving from the upper left to the lower right corner on a white background”
[Dimitrova94, Chang98, Smith96, Yining98]Not at the semantic level desired
Universidad de Chile20
Two Fundamental Multimedia Retrieval Paradigms
Expression-based retrieval aka Query-by-Example [Foote99]
Semantic-based Retrieval based on automatically extracted metadata or manually annotated metadata [Barnard01]
Universidad de Chile21
What is Content Analysis?Analysis of low-level features
Basic features, physical propertiesSemantics for high-level abstractionsSpecial algorithms borrowing from several disciplinesUse of all media availableRelated Areas
signal processing, computer vision, speech recognition, pattern and image recognition, OCR, natural language, audio analysis
Universidad de Chile22
Query Based Multimedia IR System Overview
Users InformationNeed
MultimediaContent
Represented as Represented as
Audio-VisualTextQuery
Indexed Multimedia Content
Retrieve and ComputeSimilarity
RankResults
Evaluate
Universidad de Chile23
Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation
Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion
Multimedia IndexingVideo Segmentation
Shot-Boundary DetectionEffects Detection
Beyond Basic Visual Features: Text, Face
Universidad de Chile24
Similarity-based Image SearchManual annotations are far from suitable
subjective, feasible?a picture is worth …how many keywordsa picture with no-keywords, how much is worth?
Typical automatic procedureUse features to characterize imagesStore feature vectors Enable the user to start (limited!) semantic queriesYield a set of resulting images
based on distance of featuresSmallest distance represents the best match
Universidad de Chile25
Analysis of Picture Sequences
GoalsRecognition of ObjectsRecognition of camera motion
Features Object MotionHints to semantics
Example: motion vs. non-motion sequences
Recognition of motion in combination with segmentationTracking of object boundaries in subsequent frames yields higher segmentation performance than use of still images.
Universidad de Chile26
Visual Features in Multimedia
Color, Color, Color Texture ShapeEdgesObject Outline
foreground vs. backgroundedge detection
Motion TrajectoriesHigher-level Semantics Multimodal
Universidad de Chile27
Audio Features in Multimedia
Features depend on audio category SpeechMusicSounds (i.e. explosions, street noise, etc.)
FeaturesEnergy, LoudnessPitchCepstral CoefficientsBeatHarmonics
Universidad de Chile28
Visual Features: Color
IntroductionColor Models Color RepresentationsColor FeaturesSimilarity Measures
Universidad de Chile29
What is Color?
It is a perceptual phenomenonEach color corresponds to a narrow band of wavelength within the electromagnetic spectrum
Visible wavelengths: 400 – 700 nm range400 – 480 is blue, ~ 520 is green, 600-700 is red
Human eye can distinguish 400,000 colors< 400nm ultraviolet and X-rays> 700 nm infrared, microwaves, FM radio, TV, AM radio, etc.
Universidad de Chile30
Visual Features: Color
The Color PhenomenonDominant wavelength is a light called hueIntensity (energy) of a light is called luminanceor brightnessAmount of pure light (pink vs. red) is saturation or purityCollectively the hue and saturation are referred as Chromaticity
Universidad de Chile31
Color RetrievalIt is a global featureIndependent of view and resolutionNo-object background segmentation is requiredCan handle deformation of objectCan handle articulated objectColor Coherence Color Layout
Drawbacks: color constancy
Universidad de Chile32
Color SpaceThe RBG Model
(red, blue, green) different intensitiesUsed for active devices
The YIQ ModelDeveloped by NTSC and used for the first color TV broadcast in 1953To be compatible with black & white TVLuminance signal
Y = 0.30 R + 0.59 G + 0.11 B Two color difference signals
I = 0.6 R - 0.28 G + 0.32 BQ = 0.21 R - 0.52 G + 0.32 B
Universidad de Chile33
Color Space Linearity
For color retrieval we need a measure of color differenceRBG color space, each color (r,g,b)Drawbacks
It is not designed for humansMainly used for active display monitorsIt is perceptually non-linear
A linear color space is needed which corresponds to our perception
Universidad de Chile34
The CIE Color Space
There are several linear colors spaces used in color industry for quality control, such as L u vThese are non-linear transformations of the RGB spaceIt is device independentEuclidean distance can be use as a measure of similarityEmpirical studies show that this is very close to human perception of color differenced
Universidad de Chile35
Color Model towards Image Representation
Digital Image: 2D array of pixels
2D array of intensitiesbinary (1 bit/pixel), grayscale (8 bits/pixel) or color (24 bits/pixel)
2D array of codes Code corresponds to RBG triple
134 135 132 12 15...133 134 133 133 11...130 133 132 16 12...137 135 13 14 13...140 135 134 14 12...
Universidad de Chile36
Color Modeled as Blocks
Divide into 8x8 blocks and convert RGB to YUVLuminance (Y) and Chrominance (Cb,Cr)Blue color difference CbRed color difference Cr
Only half resolution needed from Chrominance
Universidad de Chile37
Discrete Cosine Transform
Transform each block of 8x8 samples into a block of 8x8 spatial frequency coefficientsEnergy tends to be concentrated into a few significant coefficientsOther coefficients are close to zero
DCT Basis
Universidad de Chile38
Color and Color Mappings
Copyright by Smith&Chang 1996
RBG HSV
Color Sets = binary vector representing color (good for regional color)
Universidad de Chile39
Color Representations
Pair-wiseRepresents color with a matrix of pixelsComputes changes at corresponding pixel locationsAdvantage: it considers spatial locationDisadvantage: too low level, dependent on image size, non a concise representation
Histogram Color representationLinearly re-quantize the contents into N levelsSimple method, used for video segmentation
Cluster Color Representation
Universidad de Chile40
Color Similarity
For (L,u, v) space, we can use Mahalanobis distance, where
Data = colorCorrelation = perceptual similarity
For HSV space, similarity is derived form the distance in the cylindrical HSV color spaceHistogram Quadratic Distance:
Introduced in QBIC project (IBM 1993)Provides better similarity than “like-bin”comparisonComputationally expensive
Universidad de Chile41
What is Texture
It is a perceptual phenomenonIt is a region phenomenon (not a point phenomenon)Depends a lot on the scaleRepeating patterns of local variations in image intensity which are too fine to be distinguished as a separate object
Universidad de Chile42
Visual Features: Texture
ApproachesStatistical (coarseness, directionality, contrast) [Tamura78, Liu96]Spectral [Ma96]
Should be invariant to intensity, scale, orientationNatural Scenes are challenging
Query Image
MIT’s Photobook Texture Matching
Universidad de Chile43
Tamura Texture Feature
Primary FeaturesContrast - related to picture-quality, sharpnessCoarseness – coarse-grained vs. fine-grainedDirectionality
Secondary FeaturesLine-likeness (line-like vs. blob-like) RegularityRoughness
Universidad de Chile44
What is Shape?
It is also a perceptual phenomenonA 2D shape descriptor should be invariant to
translation, scale changes, rotation
Measures:
Universidad de Chile45
Visual Features: Shape
Region-based Approach
Boundary-based Approach
Use contours, ignore interior
Use interior details (holes, etc) besides boundary details
Can we reconstruct the object from the shape descriptors?
Universidad de Chile46
Shape Techniques OverviewShape Description
Boundary Based Region Based
Spatial Domain Transform Domain
StructuralGeometric
Partial Complete
Corner PointsChain PointsShape NumbersPerimeterAreaElongationCompactnessFourier Descriptors
Contour SegmentsBreakpoints
Areas, holes, Euler NumberMoment Invariants, Sernike MomentsCompactness, Elongation, Symmetry
PrimitivesRules2D Strings
Hough TransformationWalsh TransformWavelet Transform
Universidad de Chile47
Region-Based Shape & Texture Matching
MIT’s Photobook:
FourEyes
Universidad de Chile48
Visual Features: Motion
• Align two images to achieve the best match.
• Determine motion between sequence imagesCopyright Lucas & Kanade
Motion Field
Universidad de Chile49
Optical FlowReal world object motion are transformed to color changes in imagesEfficient computation of motion vectors: use gray-value images
Optical Flow
motion of gray-value patterns in the image plane
first step: calculate motion vector of each gray-value pixel
second step: calculate continuous vector field (interpolation)
Universidad de Chile50
Optical Flow ...
Constraintsboth steps use constraints
both steps introduce motion vector failures
Approachesdifferential techniques (derivatives of gray values)
correlation-based techniques (correlation of regions)
energy-based techniques (velocity filters)
phase-based techniques (phase dependence with regard to band pass filters
Universidad de Chile51
Optical Flow: Examples
originalneedle flicker
Universidad de Chile52
Optical Flow: ProblemsCorrespondence Problem
???
• Other Problems
?
?
?
Aperture Problem Solution of Aperture Problem
DeformableObjects
Periodical Structures
t0
t1
t0
t1
t1t1
t0t0
?
• Optical Flow unreliable feature for content analysis!
Universidad de Chile53
Aperture Problem
Universidad de Chile54
Aperture Problem
Universidad de Chile55
Motion Estimation: Examples
Block-based Region-basedPixel-based
Pixel-based Motion Vector in Video Compression
Universidad de Chile56
Motion Vectors
Modern compression algorithms for video calculate motion vectors for pixel blocks (examples: MPEG-1, MPEG-2, H.261, H.263). Block motion can be used to detect camera operations, but cannot be used to analyze object motion.
Advantage: motion vectors are available without expensive calculation if encoder/decoder information is usedExample
Universidad de Chile57
Motion Vectors
Example: famous MPEG test clip
Displacement Vectors
Velocity vector (flow vector)
ASSUMPTION
For Small time interval velocity is constant
Universidad de Chile58
Local Motion:Motion Trajectory Extraction
Object tracking through motion estimation
In spatial domain 2D or 3DIn compressed domain using motion vectors
Trajectory representation using symbolic or analytical notation
Universidad de Chile59
Trajectory Representation and Retrievala) Trajectory motion pattern b) B-Spline curve
c) Chain code d) Differential chain codeDimitrova94
Universidad de Chile60
MPEG-7 Visual Descriptors
Universidad de Chile61
Motion Activity: Motivation
Need to capture “pace” or Intensity of activityFor example, draw distinction between
“High Action” segments such as chase scenes.“Low Action” segments such as talking heads
Emphasize simple extraction and matchingUse Gross Motion Characteristics thus avoiding object segmentation, tracking etc.Compressed domain extraction is important
Universidad de Chile62
MPEG-7 Motion Activity Descriptor
Attributes UsedIntensity/Magnitude - 3 bitsSpatial Characteristics - 16 bitsTemporal Characteristics - 30 bitsDirectional Characteristics - 3 bits
Universidad de Chile63
MPEG-7 Motion Activity Descriptor
IntensityExpresses “pace” or Intensity of ActionExtracted by suitably quantizing variance of motion vector magnitude
DirectionExpresses dominant direction if definable as one of a set of eight equally spaced directionsExtracted by using averages of angle (direction) of each motion vectorUseful where there is strong directional motion
Universidad de Chile64
Captures the size and number of moving regions in the shot on a frame by frame basisEnables distinction between shots with one large region in the middle such as talking heads and shots with several small moving regions such as aerial soccer shotsThus “sparse” shots have many long runs while “dense” shots do not have many long runs.
MPEG-7 Motion Activity Descriptormedium
long
short
Spatial Distribution : using run-lengths
Universidad de Chile65
Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation
Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion
Multimedia IndexingVideo Segmentation
Shot-Boundary DetectionEffects Detection
Beyond Basic Visual Features: Text, Face
Universidad de Chile66
Video IndexingAnalysis of Still Image
Features: Color, Texture, ShapeDistance Metrics
Analysis of Image SequenceSegmentationCut DetectionMotion VectorsShot TransitionsCamera OperationsScene AnalysisSelection of KeyframesShot Similarity
video
scenes
shots
frames
Universidad de Chile67
Camera Motion Descriptors
Camera track, boom, and dolly motion modes,
Camera pan, tilt and
roll motion modes.
Universidad de Chile68
Video IndexingMultilayered Hierarchical Structure of a Video Clip
Copyright by J. Hunter 2001,
Dublin Core and MPEG-7 Metadata for Video
Universidad de Chile69
Video IndexingSemantic Units (Hierarchy)
Object, Regions, FramesShot: continuous sequence of frames captured from one cameraScene: one or more shots presenting different views of the same event (time or space related)Segment: one or more related scenes
TransitionsCut - an abrupt shot change that occurs in a single frameDissolves – continuous transition, progressive linear combination Fade - a slow change in brightness usually resulting in or starting with a solid black frameWipes – pixels from the second shot replace those of the first shot in a regular patternOthers –special effects, editing tools can offer up to 200 effects
Universidad de Chile70
Video Indexing Example
Controlled VocabularyClose Trans
Controlled VocabularyOpen Trans
Controlled VocabularyLighting
GIF, JPEGKeyFrame
secs, frame #, SMPTEEnd Time
secs, frame #, SMPTEStart Time
secs, framesDuration
Controlled VocabularyCamera Motion
Controlled VocabularyCamera Angle
Controlled VocabularyCamera Distance
TextText
FormatsDescription
TextObject
TextCast
TextLocale
GIF, JPEGKeyFrame
secs, frame #, SMPTEEnd Time
secs, frame #, SMPTEStart Time
secs, framesDuration
TextEdit List
TextTranscript
TextScript
TextText
FormatsDescription
Shots Scenes
Dublin Core Metadata
Universidad de Chile71
Reliable Shot Detection
The three most commonly used transition types are:
Abrupt Cut, Hard CutsFadesDissolves
Universidad de Chile72
Cut Detection
Cut: Sudden Change of Image Content between continuous shotsCut Detection: Separate Video into Shots and calculate Features for Shots separately.
Time
Universidad de Chile73
Shot TransitionsFade In
change of image content from monochrome color to image
example: fade from white/black
Fade Outchange of image content from image to monochrome color
example: fade to white/black
Time
Universidad de Chile74
What is Dissolve?Dissolve: Shot Transition with Image Overlays
Time
Universidad de Chile75
Types of Dissolve
Cross dissolve
Additive dissolve
Universidad de Chile76
Shot Boundary DetectionPixel DifferencesStatistical DifferencesHistogramsCompression DifferencesEdge TrackingMotion Vectors
SMPTE 00:12:45:20
Universidad de Chile77
Pixel Differences: Basic Idea
Compute total number of pixels that change in value more than a threshold If this total is greater than a second
threshold then a shot boundary is detectedDrawbacks
Sensitive to camera motion (pan, zoom)Sensitive to object motion
t
bT
Universidad de Chile78
Pixel Differences: ImprovementsBasic method plus the use of a 3x3 averaging filter before the comparison
[Zhang93]Divide image in 12 regions and find the best match for each region in a neighborhood around the region in the other image. Difference is the sum of the region differences.
[Shahraray95]Chromatic images:
Change in gray level in 2nd imageRelatively constant for dissolves and fadesStill sensitive to camera and object motion
Universidad de Chile79
Histogram DifferencesUse color/gray-scale histograms of pixels as a feature to detect shot boundariesAssumption: for the same background and same objects, there is very little change in the histogramLet be the histogram for the bin of the
frame, then difference is given by
If the difference exceeds a threshold A shot boundary is detected
)( jHithj
thi
|)()(| 1∑ +−= j iii jHjHCHD
bi TCHD >
Universidad de Chile80
Histograms: Example
Cut
Universidad de Chile81
Histograms: Difference GraphCuts
Threshold
Universidad de Chile82
Histogram-Based Cut DetectionDifferent images can have same histograms
Same Histogram
Same Histogram
Obvious example
Not so obvious example
Universidad de Chile83
Histogram-Based Cut Detection: Challenges
Different images can have similar histograms
Color values of subsequent images change significantly without a cut occurring
explosions
change of scene illumination
fast movement of large objects
Performance of histogram-based cut detectionbetween 90 and even 98 (in some cases)
Universidad de Chile84
Histogram Differences:improvements
A coarse quantization is good enough. Typically, 6-bit code: 2 higher order bits or R, G and B channels.
This leads to 64-bin histograms.Good trade-off between accuracy and speed for shot boundary detectionThreshold selection is crucial. Threshold depends very much on the contentGradual transitions: use two thresholds instead of one global threshold, one for abrupt cuts and one for special effects
bT
Universidad de Chile85
Histogram Comparison405 459 810
810 972 1026
0.4264 0.4298
0.1602 0.0383
Frame Number
Similarity Measure
Talk Show Sequence
Copyright Philips (MPEG-7 contribution)
Universidad de Chile86
Histograms Differences:Twin-Comparison MethodCompute for all frames in videoMark camera breaks where Mark potential gradual transitions subsequences
wherever For each gradual transitions ,accumulate frame-to-frame difference:If , then declare as a gradual transition This algorithm works well and is widely used
iCHD
si TCHD >
bi TCHD >
bTAC >
]},{[eF
sFGT =
],[eF
sF
],[eF
sF
Universidad de Chile87
IBM’s CueVideo Shot Boundary Detection
SMPTE 00:12:45:20
Detects cuts, dissolves, fades and other gradual changesCompare multiple pairs of frames: 1, 3 and 7 frames apartProcesses decoded frames
Supports MPEG, QT, AVI, live feed,…No user-tuned parameters - allows batch processingDetection of flashes, bad framesOne pass - allows live video processing
Copyright IBM Almaden
Universidad de Chile88
CueVideo Histogram Example:
Universidad de Chile89
Edge Change Ratio (ECR)
Properties
edge pixel in image i and (i-1): si and si-1
Eout: pixel in image (i-1) is edge pixel, pixel in image i is not an edge pixel
Ein: pixel in image (i-1) is not an edge pixel, pixel in image i is edge pixel
use of broad edges (noise independence)
edge change ratio between images i and (i-1)
=
−−
i
out
i
ini s
EsEECR ,max
11
Universidad de Chile90
Computation of ECR: Example
Image (i-1)
Image i Edge Image i
Edge Image (i-1)
Inverted Images
ECR
AND
ECi
in
EC outi-1
ECR-Images
AND
Universidad de Chile91
ECR Cut Detection
D
Time
D
Time
D
Time
D
Time
D
Time
Inside Shot Cut Fade Out
Fade In Dissolve
Universidad de Chile92
ECR Cut Detection: Cutsif ECRi is edge change ratio between frames i and (i-1) a cut is detected if
where T is a threshold
Fast object and camera motion leads to high ECR-values without cuts
TECRi ≥
Cuts
Universidad de Chile93
ECR Cut DetectionFade In, Fade Out
Fade out: number of edge pixels zero after last frame of sequence
Fade in: number of edge pixels zero before first frame of sequence
Fade In Fade Out
Universidad de Chile94
ECR Cut Detection: Problems
Fast object or camera motion
Explosions
Fades and dissolves
soft transitions are difficult to detect
other effects: wipe detection unreliable
Performancetypically between 90 and 95 percent
Universidad de Chile95
Shot-Boundary Detection: Conclusions
Histogram-based technique are good to recognize cuts
Standard deviation techniques good to recognize fades
Dissolves are the more challengingProblems
Ground truth: experimental data must be analyzed manually
Database ? Benchmarks?
Definition of a fade/dissolve
Universidad de Chile96
Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation
Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion
Multimedia IndexingVideo Segmentation
Shot-Boundary DetectionEffects Detection
Beyond Basic Visual Features: Text, Face
Universidad de Chile97
Text Detection: ApplicationsAnnotation and search of image and video libraries
TV, movie studios, advertising, and surveillance
Automatic identification and logging of the beginning and end of key events based on captionsVideo SummarizationTicker Tape analysisCommercial DetectionSports Programs indexing
Universidad de Chile98
Text Detection: Design DecisionsWhat kind of text occurrences?
Scene text Overlay text
With what style attributes?
Font sizeFont typeText color
In what kind of media data?
Image-basedVideo-based
What should be achieved?
LocalizationSegmentationRecognition
How will the results be used?
IndexingObject-based video encoding
any
both
Universidad de Chile99
Example: MPEG-4 Text Extraction
Locate text of any size at any position in images, web pages and videosSegment and recognize textEncode extracted text as rigid foreground object in MPEG4 (with Yen-Kuang Chen) 27.5
2828.5
2929.5
3030.5
3131.5
160 165 170 175 180 185 190 195
KBits/sec
PSNR
Y
Signle VOP Multiple VOP
Universidad de Chile100
Example:
Dec 25 1998OCR result:
Universidad de Chile101
Text Detection Example - Latin Script
Universidad de Chile102
Text Detection: Korean Script Example
Universidad de Chile103
Text Extracted from Video
Universidad de Chile104
Searching Video Collections:Part IIntroduction to Multimedia Information RetrievalMultimedia Representation
Visual Features (Still Images and Image Sequences)ColorTextureShapeEdgesObjects, Motion
Multimedia IndexingVideo Segmentation
Shot-Boundary DetectionEffects Detection
Beyond Basic Visual Features: Text, Face
Universidad de Chile105
Face Detection
Universidad de Chile106
Pool of Features
=> ~130.000 features for 24x24 window
Universidad de Chile107
Rapid Computationx
y
x
y
Rainer Lienhart,Jochen Maydt. An Extended Set of Haar-like Features for Rapid Object Detection. IEEE ICIP 2002, pp. 900-903, Sep. 2002.
Universidad de Chile108
Cascade of Classifiers
PremiseSize of feature pool (>100000) exceeds what any reasonable classifier can handleCascade of classifiers (special kind of decision tree) can outperform a single stage classifier because it can use more features at the same computational complexityUse Boosting (Discrete/Real/ Gentle Adaboost, LogitBoost)
Input Pattern
Stage N
Stage 2
Stage 1 P(x|¬o)=.5P(x|o) = .002
P(x|¬o)=.52
P(x|o) = .004
P(x|¬o)=.5N
P(x|o) ~ .1
Object
…
P(x|o) = .998
P(x|o) = .9982 = .996
P(x|o) = .998N ~ .90
Universidad de Chile109
Cascade Concept
Target ConceptBackground removal in stage 1
Background removal in stage 2
Background removal in stage 5
Background removal in stage 3
Background removal in stage 4
Background removal in stage 3
Universidad de Chile110
Gracias por su Atencion
Universidad de Chile111
Searching Video Collections: Overview
Part IIntroduction to Multimedia Information RetrievalMultimedia RepresentationMultimedia Indexing
Part II Audio AnalysisSpeech Indexing Query Formulation Multimedia Retrieval
Part IIIBrowsing Distribution/StreamingEvaluation Multimedia IR ApplicationsConclusions
Universidad de Chile112
Edge Detection
Basic Idea:1st and 2nd derivative of an edge position of the edge can be estimated with the maximum of the 1st derivative or with the zero-crossing of the 2nd derivativeGeneralize technique to calculate the derivative of a two-dimensional image
Universidad de Chile113
Canny Edge Detector
designed to be an optimal edge detector (according to particular criteria)It takes as input a gray scale image
as output an image showing the positions of tracked intensity discontinuities.
Universidad de Chile114
Canny Edge Detector
Multi-stage processImage Smoothed by Gaussian ConvolutionSimple 2-D first derivative operator to highlight regions of the image with high first spatial derivativestracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top
non-maximal suppressionThe tracking process exhibits hysteresis