flickr distance

53
Flickr Distance ACM Multimedia 2008 Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008

Upload: azia

Post on 23-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

ACM Multimedia 2008. Flickr Distance. Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008. Multimedia Information Retrieval. Indexing. Ranking. Clustering. ……. Recommendation. Annotation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Flickr Distance

Flickr DistanceACM Multimedia 2008

Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li

Microsoft Research AsiaUniversity of Science and Technology of China

October 28, 2008

Page 2: Flickr Distance

2

IndexingRankingClustering……Recommenda

tionAnnotati

on

Multimedia

Information

Retrieval

Page 3: Flickr Distance

Multimedia

Information

Retrieval

3

Image Similarity

/DistanceConcept

Similarity/

Distance

Annotation

Indexing

Ranking

Clustering

……

Recommendation

Page 4: Flickr Distance

4

Image Similarity

/DistanceConcept

Similarity/

Distance

Image Similarity/Distance

Page 5: Flickr Distance

5

Image Similarity/Distance

Numerous efforts have been made.

Concept Similarity

/Distance

Concept Similarity/Distance

Page 6: Flickr Distance

Image Similarity/Distance

6

Concept Similarity/Distance

Olympic

Numerous efforts have been made.

Sports

CatTige

rPawMore and more used, but not well studied.

Page 7: Flickr Distance

7

WordNet Distance

Google Distance

Tag Concurrence Distance

Page 8: Flickr Distance

WordNet Distance

8

WordNet150,000 words

WordNet DistanceQuite a few methods to get it in WordNetBasic idea is to measure the length of the path between two words

Pros and ConsPros:

Cons:

Built by human experts, so close to human perceptionCoverage is limited and difficult to extend

Page 9: Flickr Distance

Google Distance

9

Normalized Google Distance (NGD)Reflects the concurrency of two words in Web documentsDefined as

Pros and ConsPros:Cons:

Easy to get and huge coverageOnly reflects concurrency in textual documents. Not really concept distance (semantic relationship)

Page 10: Flickr Distance

10

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Page 11: Flickr Distance

Tag Concurrence Distance

11

Reflects the frequency of two tags occur in the same imagesBased on the same idea of NGDMostly is sparse (> 95% are zero in the similarity matrix)

Pros and ConsPros:Cons:

Images are taken into accounta)Tags are sparse so visual

concurrency is not well reflected

b)Training data is difficult to get

similarity matrix: 500 tagssimilarity matrix: 50 tags

Image Tag Concurrence Distance (Qi, Hua,

et al. ACMMM07)

Page 12: Flickr Distance

12

Tag Concurrence Distance

0.8532

0.1739

0.4513

0.1833

0.9617

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Page 13: Flickr Distance

Different Concept Relationships

13

Synonymydifferent words but the same

meaning

table tennis ping-pong—

Visually Similarsimilar things or things of same

type

horse donkey

Meronymypart and the whole

car wheel—

Concurrencyexist at the

same scene/place

airplane

airport

Page 14: Flickr Distance

14

Image tag concurrence distance implicitly uses image information, but tags are too sparse

Google distance’s coverage is very high, but it is for text domain

Conc

ept

Dis

tanc

e

WordNet distance is good, but coverage is too low

Mine from ontology

Mine from text documents

Mine from image tags

Page 15: Flickr Distance

15

Can we mine concept distance

from image content?

Page 16: Flickr Distance

Some Facts

16

Semantic concept distance is based on human’s cognition

80% of human cognition comes from visual information

There are around 2.8 billion photos on Flickr (by Sep 08)

In average each Flickr image has around 8 tags

To mine concept distance from a large tagged

image collection based on image content

bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa

Page 17: Flickr Distance

Overview of Flickr Distance

17

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Page 18: Flickr Distance

Flickr Distance

0.5151

0.0315

0.4231

0.0576

0.0708

18

Flickr Distance is able to cover the four different semantic relationshipsSynonymy, Visually Similar, Meronymy, and Concurrency

Page 19: Flickr Distance

What We Need

19

R1: A Good Image CollectionLargeHigh coverage, especially on daily lifeWith tags

Page 20: Flickr Distance

What We Need

20

R2: A Good Concept Representation or ModelBased on image contentCan cover wider concept relationshipsCan handle large-concept set

SVM, Boosting, …Discriminative

GenerativeGlobal FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words (pLSA, LDA), …2D HMM, MRF, …

Concept Models

Page 21: Flickr Distance

What We Need

21

SVM, Boosting, …Discriminative

GenerativeGlobal FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words, …2D HMM, MRF, …

Concept Models

VLM – Visual Language Model Spatial-relation sensitive Efficient Can handle object variations

Page 22: Flickr Distance

Statistical Language Model

22

I am talking about statisticallanguagemodel.

Unigram Model

Bigram ModelTrigram Model

xnx wPwwwwP 21

121 xxnx wwpwwwwP

2121 xxxnx wwwPwwwwP

Page 24: Flickr Distance

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM0

20406080

100

59 64

88 90 90

Accuracy (%)

Performance of VLM

24pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM

0.000.501.001.502.002.503.00

1.11

2.44

0.44

0.840000000000001

0.14

Training Time (sec/image)

Page 25: Flickr Distance

Latent-Topic VLM (1)

25

Why Latent-Topic

Latent-Topic VLMVisual variations of concept are taken as latent topics

Cconceptoftopiclatentkthez

Cconceptinimagejthed

conceptAC

dzPzwwwPdwwwP

thCk

thCj

K

k

Cj

Ck

Ckyxyxxy

Cjyxyxxy

:

:

:

,,1

1,,11,,1

Page 26: Flickr Distance

Latent-Topic VLM (2)

26

Latent-Topic VLM TrainingSolved by EM algorithm, The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw

Cd yx

Cjyxyxxy

w

Cj

dwwwP

CApmaximize

,1,,1 ,

,

Estimate the posteriors of the hidden topics

Maximize the likelihood of visual arrangement

Page 27: Flickr Distance

Performance of LT-VLM

27

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20406080

100

59 6488 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

010.14 0.24

Training Time (sec/image)

Page 28: Flickr Distance

Flickr Distance

28

Kullback – Leibler (KL) divergenceGood, but not symmetric

Jensen –Shannon (JS) divergenceBetter, as it is symmetricAnd, square root of JS divergence is a metric, so is Flickr Distance

K

i

K

j zzJSCj

CiFlickr C

jCiPPDCzPCzPCCD

1 1 2121 )|()|()|(),( 2121

l Z

ZZZZKL lP

lPlPPPD

Cj

Ci

Ci

CCi

2

1

121 log)(

2)(

21)(

21)(

11

2121

Ci

Ci

Cj

Ci

CCi

ZZ

ZKLZKLZZJS

PPM

MPDMPDPPD

topic distance

topic distance

concept distance

Page 29: Flickr Distance

Procedure of Flickr Distance

29

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Tag search in

Flickr

Jensen-Shannon

Divergence

LT-VLM

Page 30: Flickr Distance

Experiments

30

EvaluationObjective evaluationSubjective evaluation

ApplicationsConcept clusteringImage annotationTag recommendation

Page 31: Flickr Distance

Experiments - Configurations

31

Images6,400,000 from Flickr

Concepts130,000,000 different tags10,000,000 filtered tags1,000 randomly-selected tags

ComparisonNormalized Google Distance (NGD)Tag Concurrence Distance (TCD)Flickr Distance (FD)

Page 32: Flickr Distance

Eva1: Subjective Evaluation

32

Ground-Truth12 persons are asked to score semantic correlation of each concept pairAverage scores are taken as ground-truth

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.470.480.49

0.50.510.520.530.540.550.560.57

Correct Rate

Page 33: Flickr Distance

Eva2: Objective Evaluation

33

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.450.460.470.480.49

0.50.510.520.530.54

Correct Rate

Page 34: Flickr Distance

App1: Concept Clustering

34

Concept Clustering23 concepts; 3 groups – (1) outer space, (2) animal and (3) sports

Normalized Google Distance Tag Concurrence Distance Flickr Distance

Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3

bearshorsesmoonspace

bowlingdolphindonkeySaturnsharkssnake

softballspidersturtle

Venuswhalewolf

baseballbasketball

footballgolf

soccertennis

volleyball

moonspaceVenuswhale

baseballdonkeysoftball

wolf

basketballbears

bowlingdolphinfootball

golfhorsesSaturnsharkssoccer

spiderstennisturtle

volleyball

moonSaturnspaceVenus

bearsdolphindonkey

golfhorsessharksspiderstenniswhalewolf

baseballbasketball

footballsnakesoccerbowlingsoftball

volleyball

Page 35: Flickr Distance

App2: Image Annotation

35

Based on an approach using concept relationDual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) On 79 concepts / 79,000 images

The number of correctly annotated keywords at the first N words1 2 3 4

0200400600800

10001200

55

212 212301

53186 193

310

57

354423

960

NGD-DCMRM TC-DCMRM FD-DCMRM

Tota

l num

ber

of

corr

ect k

eyw

ords

Page 36: Flickr Distance

App3: Tag Recommendation

36

To Improve Tagging QualityEliminating tag incompletion, noises, and ambiguity500 images / 10 recommended tags per image

NGD Tag Concurrent Distance Flickr Distance0.580.6

0.620.640.660.680.7

0.720.740.760.78

0.65200000000000

1

0.66500000000000

1

0.75800000000000

1

Precision @ 10

Page 39: Flickr Distance

Summary

39

A novel approach to discover semantic relationships from image contentbased on real-life images from the Webbased on collective intelligence from grassroots

A distance more consistent with human’s perceptionA measurement more effective in many applications

Flickr Distance

Page 40: Flickr Distance

Future Work

40

Flickr Distance as a Service.

Page 41: Flickr Distance

Thank You

41

Page 42: Flickr Distance

Backup

42

Page 43: Flickr Distance

TagNet

43

TagNet – Visual Concept Net

Can be used in many applicationsKnowledge representationConcept learningMultimedia retrieval...

)(:)(:

)(:,,

weightDistanceFlickrWwedgeiprelationshsemanticEe

nodeconceptVvWEVG

Page 44: Flickr Distance

TagNet

44

VisualizationThe bigger the distance, the longer the edgeUsing a tool called NetDraw provided byInternational Network for

Social Network Analysis

Page 45: Flickr Distance

Outline Motivation Overview Visual Language Model Flickr Distance Calculation Evaluations and Applications

45

Page 46: Flickr Distance

Semantic Relationship Is Important

46

Many efforts on using semantic relationshipsGJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007.R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008.L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007.J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008.

Applications of semantic relationshipsNatural language processingObject detectionConcept detectionMultimedia retrieval

Page 47: Flickr Distance

Discussion

47

Why VLM divergence can estimate concept distance?

Why FD works well even tags are not complete?

Computer

TV

Office

room patternscomputer patterns other patterns

room patterns TV patterns other patterns

room patternsscreen patterns other patterns

VLM: distribution of trigrams

Flickr Distance is able to cover the four different semantic relationships

Synonymy, Visually Similar, Meronymy, and Concurrency

Page 49: Flickr Distance

Visual Word Generation

49

Typical methodsSIFT + Clustering/PCA

Our methodPatch + Texture Direction Histogram + HashingEfficient, low-dimension, and rotation-Invariant Only need 1/20 computation of SIFT feature

1 0 0 1 0 0 1 0

Image Patch

Patch Gradient

Texture HistogramHashing Visual Word

Page 50: Flickr Distance

Performance of VLM

50

Comparison on Image CategorizationCaltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20406080

100

59 6488 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

010.14 0.24

Training Time (sec/image)

Page 51: Flickr Distance

Eva1: Objective Evaluation

51

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all concept triples (A,B,C)Step 2: Get 6 distance pairs for each triple (consider asymmetry)Step 3: Compute the correct ratio of each distance pair in terms of order (not value), compared with ground-truth distance

pair

NGD Ground-TruthC

A

B C

A

B

(AB,AC) x(AB, BC) √(AC, BC) √

Page 52: Flickr Distance

Performance of VLM

52

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20406080

100

59 6488 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

010.14 0.24

Training Time (sec/image)

Page 53: Flickr Distance

Future Work

53

ScalabilityLarge-scale testingTagNet as a service

Other data“PicNet Distance” based on different dataset / Optimizing datasetIntegrating text/tag concurrency distance and Flickr Distance

Concept modelingHandling scale variations (multiple-resolution)New models

More applicationsTag rankingQuery suggestions