content-based video indexing, classification & retrieval presented by hoi, chu hong nov. 27,...

26
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Post on 19-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Content-based Video Indexing,

Classification & Retrieval

Presented by HOI, Chu Hong

Nov. 27, 2002

Page 2: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Outline

Motivation Introduction Two approaches for semantic analysis

A probabilistic framework (Naphade, Huang ’01) Object-based abstraction and modeling [Lee, Kim, Hwang ’01]

A multimodal framework for video content interpretation

Conclusion

Page 3: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Motivation

There is an amazing growth in the amount of digital video data in recent years.

Lack of tools for classify and retrieve video content

There exists a gap between low-level features and high-level semantic content.

To let machine understand video is important and challenging.

Page 4: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Introduction

Content-based Video indexing the process of attaching content based labels to video

shots essential for content-based classification and retrieval Using automatic analysis techniques

- shot detection, video segmentation- key frame selection

- object segmentation and recognition- visual/audio feature extraction

- speech recognition, video text, VOCR

Page 5: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Introduction

Content-based Video Classification Segment & classify videos into meaning categories Classify videos based on predefined topic Useful for browsing and searching by topic Multimodal method

Visual features Audio features Motion features Textual features

Domain-specific knowledge

Page 6: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Introduction

Content-based Video Retrieval Simple visual feature query

Retrieve video with key-frame: Color-R(80%),G(10%),B(10%) Feature combination query

Retrieve video with high motion upward(70%), Blue(30%) Query by example (QBE)

Retrieve video which is similar to example Localized feature query

Retrieve video with a running car toward right Object relationship query

Retrieve video with a girl watching the sun set Concept query (query by keyword)

Retrieve explosion, White Christmas

Page 7: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Introduction

Feature ExtractionColor featuresTexture featuresShape featuresSketch featuresAudio features Camera motion featuresObject motion features

Page 8: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Semantic Indexing & Querying

Limitation of QBE Measuring similarity using only low-level features Lack reflection of user’s perception Difficult annotation of high level features

Syntactic to Semantic Bridge the gap between low-level feature and semantic content Semantic indexing, Query By Keyword (QBK)

Semantic description scheme – MPEG-7 Semantic interaction between concepts no scheme to learn the model for individual concepts

Page 9: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Semantic Modeling & Indexing

Two approachesProbabilistic framework, ‘Multiject’ (Naphade’01)

Object-based abstraction and indexing [Lee, Kim, Hwang ’01]

Page 10: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

A probabilistic approach (‘Multiject’ & ‘Multinet’) (Naphade, Huang ’01)

a probabilistic multimedia object 3 categories semantic concepts

Objects Face, car, animal, building

Sites Sky, mountain, outdoor, cityscape

Events Explosion, waterfall, gunshot, dancing

Page 11: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Multiject for semantic concept

Outdoor

Visual features Audio features

Other multijects

P( Outdoor = Present | features, other multijects) = 0.7

Text features

Page 12: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

How to create a Multiject

Shot-boundary detection Spatio-temporal segmentation of within-shot frames Feature extraction (color, texture, edge direction, etc ) Modeling

Sites: mixture of Gaussians Events: hidden Markov models (HMMs) with observati

on densities as gaussian mixtures All audio events: modeled using HMMs Each segment is tested for each concept and the infor

mation is then composed at frame level

Page 13: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Multiject : Hierarchical HMM

ss1 - ssm : state sequence for supervisor HMMsa1 - sam : state sequence for audio HMMxa1 - xam : audio observationssv1 - svm : state sequence for video HMMxv1 - xvm : video observations

Page 14: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Multinet: Concept Building based on Multiject

• A network of multijects modeling interaction between them

• + / - : positive/negative interaction between multijects

Page 15: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Bayesian Multinet

• Nodes : binary random variables (presence/absence of multiject)

• Layer 0 : frame-level multiject-based semantic features

• Layer 1 : inference from layer 0 :

• Layer 2 : higher level for performance improvement

Page 16: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Object-based Semantic Video Modeling

VO Extraction

Object-based Video Abstraction

Object-based Low-Level Feature Extraction

Semantic Features

Modeling

Video Sequence

Indexing/Retrieving

Page 17: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Object Extraction based on Object Tracking [Kim, Hwang ‘00]

In-1

Motion Projection

Model Update(Histogram Backprojection)

Object Post-processing

von

von-1

In

delay

Page 18: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Semantic Feature Modeling

- Modeling based on temporal variation of object features- Boundary shape and motion statistics of object area

Pre-processing

Pre-processing

HMMTraining

HMMTraining

Object Features

Object Features

Abstracted frame sequence

Page 19: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

HMM Modeling1. Observation Sequence

O1 ……. OT

.

.

2. Left-Right 1-D HMM modeling

.

.

…..S1 S2 ST

object features

Page 20: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Video Modeling: Three Layer Structure

Content Interpretation

Frame-based StructuralModeling

Audio-Visual Feature

Extraction

SemanticVideo

ModelingObject-based

Structural Modeling

Video Understanding

Natural Language Processing

Interpretation

Sentence Structure & grammar

Word Recognition

Three layer structure of video modeling, compared to NLP

Page 21: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

A Multimodal Framework for Video Content Interpretation

Long-term goal Application on automatic TV Programs Scout Allow user to request topic-level programs Integrate multiple modalities: visual, audio and Text

information Multi-level concepts

Low: low-level feature Mid: object detection, event modeling High: classification result of semantic content

Probabilistic model, Using Bayesian network for classification (causal relationship, domain-knowledge)

Page 22: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002
Page 23: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

How to work with the framework?

Preprocessing Story segmentation (shot detection) VOCR, Speech Recognition Key frame selection

Feature Extraction Visual features based on key-frame

Color, texture, shape, sketch, etc. Audio features

average energy, bandwidth, pitch, mel-frequency cepstral coefficients, etc. Textual features (Transcript)

Knowledge tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. Word spotting, vote histogram

Motion features Camera operation: Panning, Tilting, Zooming, Tracking, Booming, Dollying Motion trajectories (moving objects) Object abstraction, recognition

Building and training the Bayesian network

Page 24: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Challenging points

Preprocessing is significant in the framework. Accuracy of key-frame selection Accuracy of speech recognition & VOCR

Good feature extraction is important for the performance of classification.

Modeling semantic video objects and events How to integrate multiple modalities still need to

be well considered.

Page 25: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Conclusion

Introduction of several basic concepts Semantic video modeling and indexing Propose a multimodal framework for topic

classification of Video Discussion of Challenging problems

Page 26: Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002

Q & A

Thank you!