1 13/05/07 1/20 list – dtsi – interfaces, cognitics and virtual reality unit the infile project:...

15
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a The INFILE project: a crosslingual filtering crosslingual filtering systems evaluation systems evaluation campaign campaign Romaric Besançon , Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri

Upload: easter-robyn-copeland

Post on 17-Jan-2018

225 views

Category:

Documents


0 download

DESCRIPTION

3 13/05/07 3/20 Goals and features of the INFILE Campaign  Information Filtering Evaluation filter documents according to long-term information needs (user profiles - topics)‏ Adaptive : use simulated user feedback Following TREC adaptive filtering task  Crosslingual three languages: English, French, Arabic  close to real activity of competitive intelligence professionals  in particular, profiles developed by CI professional (STI)‏  pilot track in CLEF 2008 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

TRANSCRIPT

Page 1: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

113/05/07 1/20LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

The INFILE project: a The INFILE project: a crosslingual filtering systems crosslingual filtering systems

evaluation campaignevaluation campaign

Romaric Besançon , Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri

Page 2: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

213/05/07 2/20

Overview

Goals and features of the INFILE campaign Test collections:

DocumentsTopicsAssessments

Evaluation protocolEvaluation procedureEvaluation metrics

Conclusions

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 3: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

313/05/07 3/20

Goals and features of the INFILE Campaign

Information Filtering Evaluation filter documents according to long-term

information needs (user profiles - topics)Adaptive : use simulated user feedbackFollowing TREC adaptive filtering task

Crosslingual three languages: English, French, Arabic

close to real activity of competitive intelligence professionals

in particular, profiles developed by CI professional (STI) pilot track in CLEF 2008

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 4: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

413/05/07 4/20

Test Collection

Built from a corpus of news from the AFP (Agence France Presse)

almost 1.5 million news in French, English and Arabic

For the information filtering task:100 000 documents to filter, in each language

NewsML formatstandard XML format for news (IPTC)

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 5: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

513/05/07 5/20

Document example

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

document identifier

keywords

headline

Page 6: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

613/05/07 6/20

Document example

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

location

IPTC category

AFP category

content

Page 7: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

713/05/07 7/20

Profiles

50 interest profiles

20 profiles in the domain of science and technology

developped by CI professionals from INIST, ARIST, Oto Research, Digiport

30 profiles of general interest

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 8: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

813/05/07 8/20

Profiles

Each profile contains 5 fields: title: a few words descriptiondescription: a one-sentence descriptionnarrative: a longer description of what is

considered a relevant documentkeywords: a set of key words, key phrases or

named entitiessample: a sample of relevant document (one

paragraph) Participants may use any subset of the fields

for their filtering

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 9: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

913/05/07 9/20

Constitution of the corpus

To build the corpus of documents to filter: find relevant documents for the profiles in

the original corpususe a pooling technique with results of IR

tools

the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine)

each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample

28 runs

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 10: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

1013/05/07 10/20

Constitution of the corpus (2)

pooling using a “Mixture of Experts” model first 10 documents of each run is taken

first pool assessed a score is computed for each run and each topic

according to the assessments of the first pool create next pool by merging runs using a weighted

sumweights are proportional to the score

ongoing assessments keep all documents assessed

documents returned by IR systems by judged not relevant form a set of difficult documents

choose random documents (noise)

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 11: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

1113/05/07 11/20

Evaluation procedure

One pass test Interactive protocol using a client-server

architecture (webservice communication)participant registersretrieves one document filters the documentask for feedback (on kept documents)retrieves new document

limited number of feedbacks (50) new document available only if previous one has

been filtered

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 12: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

1213/05/07 12/20

Evaluation metrics

Precision / Recall/F-measure

Utility (from TREC)

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

relevanta bc d

not relevantretrievednot retrieved

P=a/a+b R=a/a+c

F=2PR/P+R

u=w1∗a-w2∗b

min

minminmax

1max

uu,uu/u=un

Page 13: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

1313/05/07 13/20

Evaluation metrics (2)

Detection cost (from TDT) uses probability of missed documents and false alarms

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

relevant not relevantretrieved a bnot retrieved c d

cac=Pmiss /

dbb=Pfalse /

topicfalsefalsetopicmissmissdet PPcPPc=c 1

Page 14: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

1413/05/07 14/20

Evaluation metrics

per profile and averaged on all profiles adaptivity: score evolution curve (values computed

each 10000 documents)

two experimental measuresoriginality

number of relevant documents a system uniquely retrieves

anticipation inverse rank of first relevant document detected

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

Page 15: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric

1513/05/07 15/20

Conclusions

INFILE campaign Information Filtering Evaluation: adaptive, crosslingual, close to real usage

Ongoing pilot track in CLEF 2008current constitution of the corpusdry run mid-Juneevaluation campaign in Julyworkshop in September

Work in progress the modelling of the filtering task assumed

by the CI practitioners

LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit