correlative multi-label video annotation

Post on 25-Jun-2015

1.080 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This work won the best paper award in the most prestigious conference in multimedia research. Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual-concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.

TRANSCRIPT

ACM Multimedia 2007ACM Multimedia 2007

Guo-Jun Qi, Guo-Jun Qi, Xian-Sheng HuaXian-Sheng Hua, Yong Rui, Jinhui Tang, Tao Mei and , Yong Rui, Jinhui Tang, Tao Mei and Hong-Jiang ZhangHong-Jiang Zhang

Microsoft Research AsiaMicrosoft Research Asia

September 25, 2007September 25, 2007

MotivationMotivation Correlative Multi-Label AnnotationCorrelative Multi-Label Annotation Modeling correlationsModeling correlations Learning the classifierLearning the classifier Connections to Gibbs Random FieldConnections to Gibbs Random Field

Experiments Experiments Live DemoLive Demo

2

How many images and videos in the How many images and videos in the world?world?

3

May 2007: 500

millionsAug. 2007 : 1

billion2000 images

/minute

Sep. 2007 : 84

millions

70 - 80’ Manual Labeling

90’ Pure Content Based (QBE)

Now Automated Annotation

Year

Manual

Automatic

Learning-Based

1970 1980 1990 2000

Now Automated Annotation

Learning-Based

Modeling and

Learning

Classifier

Training samples

Features

Learning-based video annotation schemes

Person

Grass

Tree

Building

Road

Face

New sampleLake?

A typical strategy – A typical strategy – Individual Concept Individual Concept DetectionDetection

Annotate multiple concepts separatelyAnnotate multiple concepts separately

6

Low-Level Features

Outdoor Face PersonPeople-

MarchingRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

7

√ Person√ Street√ Building

× Beach× Mountain

√ Crowd√ Outdoor√ Walking/Running

√ Marching? Marching

Low-Level Features

Outdoor Face PersonPeople-Marchin

gRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

8

Low-Level Features

Outdoor Face PersonPeople-Marchin

gRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Concept Model Vector

Score Score Score Score Score Score

Concept Fusion

Another typical strategy – Another typical strategy – Fusion-BasedFusion-Based Context Based Concept fusion (CBCF)Context Based Concept fusion (CBCF)

9

Low-Level Features

Outdoor Face PersonPeople-

MarchingRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Concept Fusion

Concept Model Vector

Score Score Score Score Score Score

10

Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection

Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Low-Level Features

OutdoorPeople-

MarchingRoadFace Person

Walking- Running

11

Multi-Label Annotation

No correlation

Has Correlations, but uses a second step

Model concepts and correlations in one step

Individual Detectors

Fusion Based

Integrated

1st Paradigm

2nd Paradigm

3rd Paradigm

Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection

Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)

13

How to model concepts and the How to model concepts and the correlations among concept in a single correlations among concept in a single stepstep

NotationsNotations

14

Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously

1 1

15

6.0,5.0,4.0,3.0,2.0,1.0x

1:,1:,1:,1:,1: treecarbeachroadperson y

02.002.0

1.0001.0

01.0

12,2

12,2

11,2

11,2

13,1

13,1

12,1

12,1

11,1

11,1

NoYesconceptfeature

/,

-

Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously

1 1

16

6.0,5.0,4.0,3.0,2.0,1.0x

1:,1:,1:,1:,1: treecarbeachroadperson y

0010

00011,1

3,11,1

3,11,1

3,11,13,1

112,1

112,1

1,12,1

1,12,1

,,

NYNYConceptConcept

/,/2,1

-

Modeling concept and correlationsModeling concept and correlations

17

12 KDK

Learning the classifierLearning the classifier

Misclassification Error

Loss function

Empirical risk

Regularization

Introduce slackvariables

Lagrange dual

Find solution by SMO

18

Connection to Gibbs Random FieldConnection to Gibbs Random Field

Define a random field

19

Rewrite the classifier

is a random field

consists of all adjacent sites, that is, this RF is fully connected

Define energy functionDefine GRF

Connection to Gibbs Random FieldConnection to Gibbs Random Field

Rewrite the classifier

20

Define energy function

Intuitive explanation of CML

Define a random field

is a random field

consists of all adjacent sites, that is, this RF is fully connected

Define GRF

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%)

21

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

22

CMLCBCFSVM

SVM CML ↑ 17%CBCF CML ↑14%

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

23

CMLCBCFSVM

SVM CML ↑ 131%CBCF CML ↑128%

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

24

CMLCBCFSVM CMLCBCFSVM CMLCBCFSVM

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

25

26

Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts

simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field

27

Multi-Instance Multi-Label AnnotationMulti-Instance Multi-Label Annotation Exploit correlations among concepts and among Exploit correlations among concepts and among

instances at the same timeinstances at the same time Not only can get image/frame level annotation, Not only can get image/frame level annotation,

but also can get region level annotationbut also can get region level annotation

28

Sky

MountainWater

Sands

Scenery

29

Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts

simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field

30

top related