multimodal retrieval image

27
MMRetrieval.ne t A Multimodal Search Engine

Upload: konstantinos-zagoris

Post on 13-Jan-2015

1.405 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: MultiModal Retrieval Image

MMRetrieval.netA Multimodal Search Engine

Page 2: MultiModal Retrieval Image

Multimodal Information

Single language text-only retrieval reach a limit. Content-based Image Retrieval is computational

costly and still in infancy stages. Digital Information is increasingly becoming

multimodal Example: Wikipedia

Page 3: MultiModal Retrieval Image

Modality Dictionary: A tendency to conform to a general

pattern or belong to a particular group or category.

Definition of Modality in Information Retrieval It is unclear, fuzzy 1st Definition: Modality = Media 2nd Definition: Modality = Data Stream

Page 4: MultiModal Retrieval Image

MMRetrieval.net A Product of Cooperation Started June, 2010 Avi Arampatzis, Lecturer D.U.T.H. Konstantinos Zagoris, ph.D. D.U.T.H Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.

Page 5: MultiModal Retrieval Image

ImageCLEF 2010Wikipedia Retrieval Task ImageCLEF 2010 Wikipedia Collection Consisting of 237434 items Image Primary Media Noisy and Incomplete User Supplied Textual

Annotations Wikipedia Articles Containing the Images Written in any combination of English, German,

French, or any other unidentified language

Page 6: MultiModal Retrieval Image

Wikipedia Collection<image id="244845" file="images/25/244845.jpg"> <name>Balloons Festival - Chateaux d'Oex.jpg</name> <text xml:lang="en"> <description/> <comment/> <caption article="text/en/4/331622">Balloon festival </caption> </text> <text xml:lang="de"> <description/> <comment/> <caption/> </text> <text xml:lang="fr"> <description/> <comment/> <caption/> </text> <comment>(Balloon festival in Chateaux d'Oex. Category:Chateau d'Oex Category:Hot air balloons) </comment> <license>GFDL</license></image>

Page 7: MultiModal Retrieval Image

ImageCLEF 2010Wikipedia Retrieval Task 70 test topics consisting of a textual and a visual part three title fields (one per language—English,

German, French) one or more example images

Page 8: MultiModal Retrieval Image

Wikipedia Topic

<topic> <number>8</number> <title xml:lang="en">tennis player on court</title> <title xml:lang="de">tennisspieler auf dem platz</title> <title xml:lang="fr">joueur de tennis sur le terrain</title> <image>2197587684_94542c6fbd.jpg</image> <image>777629689_443a25ba08.jpg</image></topic>

Page 9: MultiModal Retrieval Image

Extraction of ModalitiesJoint Composite Descriptor (JCD)

Spartial Color Distribution (SpCD)

descriptioncommentcaptionarticlename

English,French,German

Lemur Toolkit V4.11 and Indri V2.11 with the tf.idf retrieval model

Page 10: MultiModal Retrieval Image

MMRetrieval.net Structure

Page 11: MultiModal Retrieval Image

Fusion in Information Retrieval combining evidence about relevance from

different sources of information from several modalities fusion consists of two components score normalization score combination

Page 12: MultiModal Retrieval Image

Score Normalization the relevance scores are not comparable popular text retrieval models (tf.idf) can be turned to

probabilities of relevance via the score-distributional method

image descriptors does not fit MinMax (maps linearly to the [0,1] ) Zscore (maps to the number of standard deviations it

lies above or below the mean score) non-linear Known-Item Aggregate Cumulative Density

Function (KIACDF)

Page 13: MultiModal Retrieval Image

Score Combination CompSUM CompMULT CompMAX CompMED CompWSUM

Page 14: MultiModal Retrieval Image

Results Participant MAP1 xrce 0.27652 unt 0.22513 telecom 0.22274 i2rcviu 0.21265 dcu 0.20396 cheshire 0.20147 duth 0.19988 uned 0.19279 daedalus 0.182010 sztaki 0.179411 nus 0.158112 rgu 0.061713 uaic 0.0423

Participant P@101 xrce 0.61142 duth 0.52003 i2rcviu 0.49714 cheshire 0.49295 telecom 0.49146 sztaki 0.48577 daedalus 0.44718 unt 0.43149 dcu 0.427110 uned 0.420011 nus 0.352912 rgu 0.227113 uaic 0.1543

Participant P@201 xrce 0.54072 duth 0.48363 telecom 0.44074 cheshire 0.43645 sztaki 0.43296 i2rcviu 0.43217 daedalus 0.40298 unt 0.39869 dcu 0.390710 uned 0.367111 nus 0.326412 uaic 0.152913 rgu 0.1514

Page 15: MultiModal Retrieval Image

Corrected Results Participant MAP1 xrce 0.27652 duth 0.25613 unt 0.22514 telecom 0.22275 i2rcviu 0.21266 dcu 0.20397 cheshire 0.20148 uned 0.19279 daedalus 0.182010 sztaki 0.179411 nus 0.158112 rgu 0.061713 uaic 0.0423

Participant P@101 xrce 0.61142 duth 0.52573 i2rcviu 0.49714 cheshire 0.49295 telecom 0.49146 sztaki 0.48577 daedalus 0.44718 unt 0.43149 dcu 0.427110 uned 0.420011 nus 0.352912 rgu 0.227113 uaic 0.1543

Participant P@201 xrce 0.54072 duth 0.49003 telecom 0.44074 cheshire 0.43645 sztaki 0.43296 i2rcviu 0.43217 daedalus 0.40298 unt 0.39869 dcu 0.390710 uned 0.367111 nus 0.326412 uaic 0.152913 rgu 0.1514

Page 16: MultiModal Retrieval Image

Fusion Problems appropriate weighing of modalities and score

normalization/combination are not trivial problems

if results are assessed by visual similarity only, fusion is not a theoretically sound method

Page 17: MultiModal Retrieval Image

Content-based Image Retrieval Problems Content-based Image Retrieval (CBIR) with global

features is notoriously noisy for image queries of low generality, i.e. the fraction of relevant images in a collection.

does not scale up well to large databases efficiency-wise

Page 18: MultiModal Retrieval Image

Two – Stage Image Retrieval how it works: first use the secondary modality to rank the

collection then perform CBIR only on the top-K items assumption: primary (image) – secondary (text) modalities hypothesis: CBIR can do better than text retrieval in small

sets or sets of high query generality efficient benefit: Using a ‘cheaper’ secondary modality,

this improves also efficiency by cutting down on costly CBIR operations

possible drawback: relevant images with empty or very noise secondary modalities would be completely missed

Page 19: MultiModal Retrieval Image

Previous Work Best results re-ranking by visual content has been

seen before mostly in different setups All these approaches employed a static predefined

K for all queries not clear if it works

Page 20: MultiModal Retrieval Image

Our Two-Stage Method dynamic K calculated dynamically per query optimize a predefined effectiveness measure without using external information or training

data

Page 21: MultiModal Retrieval Image

Retrieval Results

cockpit of an airplane

Image Only

Text Only

Static K=25

Dynamic K

Page 22: MultiModal Retrieval Image

Best Fusion Method – Max of Sums

i the index running over example images (i=1,2,…) j running over the visual descriptors ( {1,2})𝑗∈ DESCji is the score against the ith example image

for the jth descriptor parameter w controls the relative contribution of

the two media

𝑠=(1−𝑤 )max𝑖 (∑𝑗 𝑀𝑖𝑛𝑀𝑎𝑥 (𝐷𝐸𝑆𝐶 𝑗𝑖 ))+𝑤𝑀𝑖𝑛𝑀𝑎𝑥 (𝑡𝑓 .𝑖𝑑𝑓 )

Page 23: MultiModal Retrieval Image

Fusion vs Two-Stage

Page 24: MultiModal Retrieval Image

Implementation• developed in the C#/.NET

Framework 4.0• HTML, CSS and JavaScript (AJAX)

technologies for the interface• requires a fairly modern browser

Page 25: MultiModal Retrieval Image

Directions for Further Research Multi-stage retrieval for multimodal databases

based on modality hierarchy. Fuzzy Fusion (replace w with membership

function m). Create artificial modalities (not only from

relevance scores) pseudo relevance feedback – cross media

feedback

Page 26: MultiModal Retrieval Image

Publications Multimedia Search with Noisy Modalities: Fusion and

Multistage Retrieval. Avi Arampatzis, Savvas A. Chatzichristofis, and Konstantinos Zagoris. In: CLEF (Notebook Papers/LABs/Workshops), 22-23 September, Padua, Italy, 2010.

www.MMRetrieval.net: A Multimodal Search Engine. Konstantinos Zagoris, Avi Arampatzis, and Savvas A. Chatzichristofis. In: Proceedings of the 3rd International Conference on SImilarity Search and APplications, SISAP 2010, Istanbul, Turkey, September 18-19, 2010. © Association for Computing Machinery (ACM).

Page 27: MultiModal Retrieval Image

Ευχαριστ!ώ