correlative multi-label video annotation

30
ACM Multimedia 2007 ACM Multimedia 2007 Guo-Jun Qi, Guo-Jun Qi, Xian-Sheng Hua Xian-Sheng Hua , Yong Rui, Jinhui Tang, Tao , Yong Rui, Jinhui Tang, Tao Mei and Hong-Jiang Zhang Mei and Hong-Jiang Zhang Microsoft Research Asia Microsoft Research Asia September 25, 2007 September 25, 2007

Upload: xian-sheng-hua

Post on 25-Jun-2015

1.080 views

Category:

Technology


1 download

DESCRIPTION

This work won the best paper award in the most prestigious conference in multimedia research. Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual-concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.

TRANSCRIPT

Page 1: Correlative Multi-Label Video Annotation

ACM Multimedia 2007ACM Multimedia 2007

Guo-Jun Qi, Guo-Jun Qi, Xian-Sheng HuaXian-Sheng Hua, Yong Rui, Jinhui Tang, Tao Mei and , Yong Rui, Jinhui Tang, Tao Mei and Hong-Jiang ZhangHong-Jiang Zhang

Microsoft Research AsiaMicrosoft Research Asia

September 25, 2007September 25, 2007

Page 2: Correlative Multi-Label Video Annotation

MotivationMotivation Correlative Multi-Label AnnotationCorrelative Multi-Label Annotation Modeling correlationsModeling correlations Learning the classifierLearning the classifier Connections to Gibbs Random FieldConnections to Gibbs Random Field

Experiments Experiments Live DemoLive Demo

2

Page 3: Correlative Multi-Label Video Annotation

How many images and videos in the How many images and videos in the world?world?

3

May 2007: 500

millionsAug. 2007 : 1

billion2000 images

/minute

Sep. 2007 : 84

millions

Page 4: Correlative Multi-Label Video Annotation

70 - 80’ Manual Labeling

90’ Pure Content Based (QBE)

Now Automated Annotation

Year

Manual

Automatic

Learning-Based

1970 1980 1990 2000

Page 5: Correlative Multi-Label Video Annotation

Now Automated Annotation

Learning-Based

Modeling and

Learning

Classifier

Training samples

Features

Learning-based video annotation schemes

Person

Grass

Tree

Building

Road

Face

New sampleLake?

Page 6: Correlative Multi-Label Video Annotation

A typical strategy – A typical strategy – Individual Concept Individual Concept DetectionDetection

Annotate multiple concepts separatelyAnnotate multiple concepts separately

6

Low-Level Features

Outdoor Face PersonPeople-

MarchingRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Page 7: Correlative Multi-Label Video Annotation

7

√ Person√ Street√ Building

× Beach× Mountain

√ Crowd√ Outdoor√ Walking/Running

√ Marching? Marching

Page 8: Correlative Multi-Label Video Annotation

Low-Level Features

Outdoor Face PersonPeople-Marchin

gRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

8

Low-Level Features

Outdoor Face PersonPeople-Marchin

gRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Concept Model Vector

Score Score Score Score Score Score

Concept Fusion

Another typical strategy – Another typical strategy – Fusion-BasedFusion-Based Context Based Concept fusion (CBCF)Context Based Concept fusion (CBCF)

Page 9: Correlative Multi-Label Video Annotation

9

Low-Level Features

Outdoor Face PersonPeople-

MarchingRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Concept Fusion

Concept Model Vector

Score Score Score Score Score Score

Page 10: Correlative Multi-Label Video Annotation

10

Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection

Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Low-Level Features

OutdoorPeople-

MarchingRoadFace Person

Walking- Running

Page 11: Correlative Multi-Label Video Annotation

11

Multi-Label Annotation

No correlation

Has Correlations, but uses a second step

Model concepts and correlations in one step

Individual Detectors

Fusion Based

Integrated

1st Paradigm

2nd Paradigm

3rd Paradigm

Page 12: Correlative Multi-Label Video Annotation

Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection

Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)

Page 13: Correlative Multi-Label Video Annotation

13

How to model concepts and the How to model concepts and the correlations among concept in a single correlations among concept in a single stepstep

Page 14: Correlative Multi-Label Video Annotation

NotationsNotations

14

Page 15: Correlative Multi-Label Video Annotation

Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously

1 1

15

6.0,5.0,4.0,3.0,2.0,1.0x

1:,1:,1:,1:,1: treecarbeachroadperson y

02.002.0

1.0001.0

01.0

12,2

12,2

11,2

11,2

13,1

13,1

12,1

12,1

11,1

11,1

NoYesconceptfeature

/,

-

Page 16: Correlative Multi-Label Video Annotation

Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously

1 1

16

6.0,5.0,4.0,3.0,2.0,1.0x

1:,1:,1:,1:,1: treecarbeachroadperson y

0010

00011,1

3,11,1

3,11,1

3,11,13,1

112,1

112,1

1,12,1

1,12,1

,,

NYNYConceptConcept

/,/2,1

-

Page 17: Correlative Multi-Label Video Annotation

Modeling concept and correlationsModeling concept and correlations

17

12 KDK

Page 18: Correlative Multi-Label Video Annotation

Learning the classifierLearning the classifier

Misclassification Error

Loss function

Empirical risk

Regularization

Introduce slackvariables

Lagrange dual

Find solution by SMO

18

Page 19: Correlative Multi-Label Video Annotation

Connection to Gibbs Random FieldConnection to Gibbs Random Field

Define a random field

19

Rewrite the classifier

is a random field

consists of all adjacent sites, that is, this RF is fully connected

Define energy functionDefine GRF

Page 20: Correlative Multi-Label Video Annotation

Connection to Gibbs Random FieldConnection to Gibbs Random Field

Rewrite the classifier

20

Define energy function

Intuitive explanation of CML

Define a random field

is a random field

consists of all adjacent sites, that is, this RF is fully connected

Define GRF

Page 21: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%)

21

Page 22: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

22

CMLCBCFSVM

SVM CML ↑ 17%CBCF CML ↑14%

Page 23: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

23

CMLCBCFSVM

SVM CML ↑ 131%CBCF CML ↑128%

Page 24: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

24

CMLCBCFSVM CMLCBCFSVM CMLCBCFSVM

Page 25: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

25

Page 26: Correlative Multi-Label Video Annotation

26

Page 27: Correlative Multi-Label Video Annotation

Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts

simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field

27

Page 28: Correlative Multi-Label Video Annotation

Multi-Instance Multi-Label AnnotationMulti-Instance Multi-Label Annotation Exploit correlations among concepts and among Exploit correlations among concepts and among

instances at the same timeinstances at the same time Not only can get image/frame level annotation, Not only can get image/frame level annotation,

but also can get region level annotationbut also can get region level annotation

28

Sky

MountainWater

Sands

Scenery

Page 29: Correlative Multi-Label Video Annotation

29

Page 30: Correlative Multi-Label Video Annotation

Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts

simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field

30