content-based video indexing, classification & retrieval presented by hoi, chu hong nov. 27,...
Post on 19-Dec-2015
219 views
TRANSCRIPT
Content-based Video Indexing,
Classification & Retrieval
Presented by HOI, Chu Hong
Nov. 27, 2002
Outline
Motivation Introduction Two approaches for semantic analysis
A probabilistic framework (Naphade, Huang ’01) Object-based abstraction and modeling [Lee, Kim, Hwang ’01]
A multimodal framework for video content interpretation
Conclusion
Motivation
There is an amazing growth in the amount of digital video data in recent years.
Lack of tools for classify and retrieve video content
There exists a gap between low-level features and high-level semantic content.
To let machine understand video is important and challenging.
Introduction
Content-based Video indexing the process of attaching content based labels to video
shots essential for content-based classification and retrieval Using automatic analysis techniques
- shot detection, video segmentation- key frame selection
- object segmentation and recognition- visual/audio feature extraction
- speech recognition, video text, VOCR
Introduction
Content-based Video Classification Segment & classify videos into meaning categories Classify videos based on predefined topic Useful for browsing and searching by topic Multimodal method
Visual features Audio features Motion features Textual features
Domain-specific knowledge
Introduction
Content-based Video Retrieval Simple visual feature query
Retrieve video with key-frame: Color-R(80%),G(10%),B(10%) Feature combination query
Retrieve video with high motion upward(70%), Blue(30%) Query by example (QBE)
Retrieve video which is similar to example Localized feature query
Retrieve video with a running car toward right Object relationship query
Retrieve video with a girl watching the sun set Concept query (query by keyword)
Retrieve explosion, White Christmas
Introduction
Feature ExtractionColor featuresTexture featuresShape featuresSketch featuresAudio features Camera motion featuresObject motion features
Semantic Indexing & Querying
Limitation of QBE Measuring similarity using only low-level features Lack reflection of user’s perception Difficult annotation of high level features
Syntactic to Semantic Bridge the gap between low-level feature and semantic content Semantic indexing, Query By Keyword (QBK)
Semantic description scheme – MPEG-7 Semantic interaction between concepts no scheme to learn the model for individual concepts
Semantic Modeling & Indexing
Two approachesProbabilistic framework, ‘Multiject’ (Naphade’01)
Object-based abstraction and indexing [Lee, Kim, Hwang ’01]
A probabilistic approach (‘Multiject’ & ‘Multinet’) (Naphade, Huang ’01)
a probabilistic multimedia object 3 categories semantic concepts
Objects Face, car, animal, building
Sites Sky, mountain, outdoor, cityscape
Events Explosion, waterfall, gunshot, dancing
Multiject for semantic concept
Outdoor
Visual features Audio features
Other multijects
P( Outdoor = Present | features, other multijects) = 0.7
Text features
How to create a Multiject
Shot-boundary detection Spatio-temporal segmentation of within-shot frames Feature extraction (color, texture, edge direction, etc ) Modeling
Sites: mixture of Gaussians Events: hidden Markov models (HMMs) with observati
on densities as gaussian mixtures All audio events: modeled using HMMs Each segment is tested for each concept and the infor
mation is then composed at frame level
Multiject : Hierarchical HMM
ss1 - ssm : state sequence for supervisor HMMsa1 - sam : state sequence for audio HMMxa1 - xam : audio observationssv1 - svm : state sequence for video HMMxv1 - xvm : video observations
Multinet: Concept Building based on Multiject
• A network of multijects modeling interaction between them
• + / - : positive/negative interaction between multijects
Bayesian Multinet
• Nodes : binary random variables (presence/absence of multiject)
• Layer 0 : frame-level multiject-based semantic features
• Layer 1 : inference from layer 0 :
• Layer 2 : higher level for performance improvement
Object-based Semantic Video Modeling
VO Extraction
Object-based Video Abstraction
Object-based Low-Level Feature Extraction
Semantic Features
Modeling
Video Sequence
Indexing/Retrieving
Object Extraction based on Object Tracking [Kim, Hwang ‘00]
In-1
Motion Projection
Model Update(Histogram Backprojection)
Object Post-processing
von
von-1
In
delay
Semantic Feature Modeling
- Modeling based on temporal variation of object features- Boundary shape and motion statistics of object area
Pre-processing
Pre-processing
HMMTraining
HMMTraining
Object Features
Object Features
Abstracted frame sequence
HMM Modeling1. Observation Sequence
O1 ……. OT
.
.
2. Left-Right 1-D HMM modeling
.
.
…..S1 S2 ST
object features
Video Modeling: Three Layer Structure
Content Interpretation
Frame-based StructuralModeling
Audio-Visual Feature
Extraction
SemanticVideo
ModelingObject-based
Structural Modeling
Video Understanding
Natural Language Processing
Interpretation
Sentence Structure & grammar
Word Recognition
Three layer structure of video modeling, compared to NLP
A Multimodal Framework for Video Content Interpretation
Long-term goal Application on automatic TV Programs Scout Allow user to request topic-level programs Integrate multiple modalities: visual, audio and Text
information Multi-level concepts
Low: low-level feature Mid: object detection, event modeling High: classification result of semantic content
Probabilistic model, Using Bayesian network for classification (causal relationship, domain-knowledge)
How to work with the framework?
Preprocessing Story segmentation (shot detection) VOCR, Speech Recognition Key frame selection
Feature Extraction Visual features based on key-frame
Color, texture, shape, sketch, etc. Audio features
average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc. Textual features (Transcript)
Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. Word spotting, vote histogram
Motion features Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying Motion trajectories (moving objects) Object abstraction, recognition
Building and training the Bayesian network
Challenging points
Preprocessing is significant in the framework. Accuracy of key-frame selection Accuracy of speech recognition & VOCR
Good feature extraction is important for the performance of classification.
Modeling semantic video objects and events How to integrate multiple modalities still need to
be well considered.
Conclusion
Introduction of several basic concepts Semantic video modeling and indexing Propose a multimodal framework for topic
classification of Video Discussion of Challenging problems
Q & A
Thank you!