utilizing video ontology for fast and accurate query-by-example retrieval

23
Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval Kimiaki Shirahama Graduate School of Economics, Kobe University Kuniaki Uehara Graduate School of System Informatics, Kobe University

Upload: sakura

Post on 24-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval. Kimiaki Shirahama Graduate School of Economics, Kobe University Kuniaki Uehara Graduate School of System Informatics, Kobe University. Need for Video Ontology. QBE (Query by Example): - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Kimiaki ShirahamaGraduate School of Economics, Kobe University

Kuniaki UeharaGraduate School of System Informatics, Kobe University

Page 2: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Need for Video OntologyQBE (Query by Example):1. Represent a query by providing example shots2. Retrieve shots similar to example shots in terms of features

The similarity on features does not agree with the similarity on semantic contents!

Characterized by few edges corresponding to sky regions (Overfitting)

(Example shots) (Retrieved shots)Query: Tall buildings are shown

Relevant

Irrelevant

Page 3: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Our Use of Video Ontology

(Example shots)

Query: Tall buildings are shownRetained

Filtered

Construct a video ontology as knowledge base in QBE(Formal and explicit specification of concepts, concept properties and their relations)→ Filter shots that are clearly irrelevant to the query

Building: 0.9Urban: 0.7Person: 0.1

Building: 0.8Urban: 0.8Person: 0

Building: 0.9Urban: 0.6Person: 0

Building: 0.8Urban: 0.9Person: 0.2

Building: 0.1Urban: 0.0Person: 0.8

Building: 0.0Urban: 0.0Person: 0.1

Building: 0.0Urban: 0.1Person: 0

Building: 0.9Urban: 0.7Person: 0.1

Concept detection scores arealready assigned to all shots.

Page 4: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Overview of Our Approach

1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering: Filter irrelevant shots based on selected concepts

a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible

Building

Outdoor

Tower Window

Building

Tower

Outdoor

Window

House

House

Concept vocabulary Video ontology

Query: Buildings are shown

Concept selection:Buildings, Outdoor,Tower, etc.

Retained

Filtered

Shot filtering

Page 5: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Large-Scale Concept Ontology for Multimedia (LSCOM)

The most popular ontology for video retrieval 1,000 concepts in broadcast news video domain

Broadcast_News

Location People Objects

Office

Meeting

Crowd

Face

Bus

Vehicle

Program

Weather

Sports

Activity andevents

People related

Walk/Run

Graphics

Maps

Charts

Concept properties and relations are clearly insufficient!→ Organize LSCOM concepts into a meaningful structure!

Many research effort to develop effective detection methods for LSCOM concepts→ Use detection scores of 374 concepts provided by City University of Hong Kong

Page 6: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

appearWith

Concept Organization Approach

Limited kinds of reasoning(e.g. majority voting, linear combination) Various kinds of reasoning!

Inductive approachUse annotated video collection (training data)Only degrees of relatedness are represented

Deductive approach (our approach)Manually organize LSCOM conceptsVarious properties and relations are represented

Tower House

OBJECT

Vehicle

WindowpartOf

Ground_Vehicle

WITH_PERSON Car

Person

Bicycle Motorcycle

OutdoorlocatedAt

CONSTRUCTION

Building

(Wei, 2011)

Page 7: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Our Video OntologyShot with 8 attributes (categories of semantic contents)

Disjoint partition Visual co-occurrence

Define new concepts to construct a meaningful structure

Page 8: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

appearWith

Video Ontology Construction

Ground_Vehicle

1. Disjoint partition

Car

Vehicle

Cannot be an instance of morethan one subconcept!

Ground_Vehicle

Car

Vehicle

Bus

2. Visual co-occurrence

Ground_Vehicle

WITH_PERSON

Bicycle

Car

Some concepts are frequently shownin the same shots

All concepts should not be organizedinto a single hierarchy

Motorcycle

Person

Page 9: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Overview of Our Approach

1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering:

a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible

Building

Outdoor

Tower Window

Building

Tower

Outdoor

Window

House

House

Concept vocabulary Video ontology

Query: Buildings are shown

Concept selection:Buildings, Outdoor,Tower, etc.

Retained

Filtered

Shot filtering

Page 10: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Concept Selection1. Select concepts matching words in the text description of a query2. Select their subconcepts, and concepts specified as properties3. Select concepts with properties matching words in the text description of the query

Query: Buildings are shown

4. Validate selected concepts using example shots→ Concepts detected with high detection scores are selected

Page 11: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Overview of Our Approach

1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering:

a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible

Building

Outdoor

Tower Window

Building

Tower

Outdoor

Window

House

House

Concept vocabulary Video ontology

Query: Buildings are shown

Concept selection:Buildings, Outdoor,Tower, etc.

Retained

Filtered

Shot filtering

Page 12: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Shot Filtering Using Concept Relations

(Wrongly retained shot)

Query: Person appears with computers

Person, Female_Person Computer, Room, Laboratory

Use concept relations to efficiently filter irrelevant shots

Indoor object Indoor locations

Shots where Outdoor isdetected should be filtered!

Simple filtering: Filter shots if none ofselected concepts are detected.

Page 13: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Use of Concept Relations Hierarchical relations

Sibling relations (Disjoint partition)

Building

Office_Building

LOCATION

INDOOROutdoorUnderwaterOuter_Space

Should not be detected

Query: Buildings are shown

IF detected,

THEN should be detected,

Query: Person appears with computers→ INDOOR needs to be detected

Office_Building iswrongly detected

Page 14: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Overview of Our Approach

1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering:

a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible

Building

Outdoor

Tower Window

Building

Tower

Outdoor

Window

House

House

Concept vocabulary Video ontology

Query: Buildings are shown

Concept selection:Buildings, Outdoor,Tower, etc.

Retained

Filtered

Shot filtering

Page 15: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Uncertainty in Concept Detection

Indoor fails to be detected

Query: Person appears with computers → INDOOR needs to be detected

Ontologies present a priori knowledge which is taken as true by human.→ Lack the support for uncertainty

Traditional ontology reasoning assumes … (Russell, 2003)1. Locality: If A⇒B, then B is concluded by A without considering any other rules.2. Detachment: Once B is proven, it is used regardless of how it was derived.→ Do not consider the uncertainty of a hypothesis 3. Truth functionality: A complex rule can be examined from the truth of its components.→ Does not consider the uncertainty of combining multiple hypotheses

How to manage erroneous concept detection results?

Page 16: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Dempster-Shafer Theory (1/2)

1. Locality: If A⇒B, then B is concluded by A without considering any other rules.2. Detachment: Once B is proven, it is used regardless of how it was derived.→ Do not consider the uncertainty of a hypothesis ⇒ Represent the degree of belief of a hypothesis

Demspter-Shafer Theory (DST):Generalization of Bayesian theory where the probability of a hypothesis isdefined based on its degree of belief

Degree of belief that a shot is certainly relevantm({relevance}) = 0.2

Degree of belief that its relevance is uncertain m({relevance, irrelevance}) = 0.6

Degree of belief that it is certainly irrelevantm({irrelevance}) = 0.2

Query: Person appears with computers → INDOOR needs to be detected

Page 17: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Dempster-Shafer Theory (2/2)

3. Truth functionality: A complex rule can be examined from the truth of its components.→ Does not consider the uncertainty of combining multiple hypotheses ⇒ Combination rule for integrating uncertain hypotheses

Filtering by video ontologym({relevance}) = 0.2m({relevance, irrelevance}) = 0.6m({irrelevance}) = 0.2

Filtering by visual –based approachm({relevance}) = 0.7m({relevance, irrelevance}) = 0.2m({irrelevance}) = 0.1

Conflict

Agreement

Query: Persons appear with computers → INDOOR needs to be detected

jm({relevant}) = 0.71

Page 18: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Experimental SettingTRECVID 2009 video data

219 development videos (36,106 shots) → Manually select 10 example shots for each query

619 test videos (97,150 shots)→ Retrieve shots matching the query

Target Queries• Query 1: A view of one or more tall buildings and the top story visible• Query 2: One or more people, each at a table or desk with a computer visible• Query 3: One or more people, each sitting in a char, talking

Evaluation measures1. Precision: The fraction of retained shots that are relevant to a query.2. Recall: The fraction of relevant shots that are successfully retained.3. Filter recall: The fraction of irrelevant shots that are successfully filtered.4. Retrieval performance: The number of relevant shots within 1,000 retrieved shots

Retrieval method: (Shirahama, 2011)

Page 19: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Importance of Concept SelectionOnto WordNet Visual

Shot filtering does not heavily rely on concept selection methods.(Selected concepts are similar to each other)

Filter shots if none of selected concepts are detected

Examine the importance of using concept relations for shot filtering

Query 1 Query 2 Query 3 Query 1 Query 2 Query 3 Query 1 Query 2 Query 3

(a) Precision (b) Recall (c) Filter recall

Page 20: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Effectiveness of Using Concept Relations and DST

Onto: Many irrelevant shots are wrongly retained (Filter recall) No-DST: Precision is significantly improved, but recall is degraded DST: Recall is recovered while improving precision and filter recall

Onto (base) No-DST DST

(a) Precision (b) Recall (c) Filter recall

Query 1 Query 2 Query 3 Query 1 Query 2 Query 3 Query 1 Query 2 Query 3

Best!Concept relations and DST are useful to achieve shot filtering, which not only filters many irrelevant shots, but also retains many relevant shots.

Page 21: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Retrieval Performance and Time

Shot filtering by our video ontology is effective for bothimproving the retrieval performance and reducing the retrieval time!

Without using shot filteringUsing shot filtering

(a) Retrieval performance (b) Retrieval time

Query 1 Query 2 Query 3 Query 1 Query 2 Query 3

Page 22: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Conclusion and Future Works

ConclusionVideo ontology construction and utilization for fast and accurate QBE1. Video ontology construction based on design patterns of general ontologies

Disjoint partition and visual co-occurrence2. Concept selection by tracing the video ontology3. Shot filtering

Improve precision → Concept relationsImprove recall → Dempster-Shafer theory (DST)

Future works Compute detection scores for self-defined concepts Estimate optimal parameter in Basic Belief Assignment (BBA) functions in DST Reduce retrieval time by parallelizing the retrieval process

Page 23: Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval

Thank you!