![Page 1: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/1.jpg)
113/05/07 1/20LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
The INFILE project: a The INFILE project: a crosslingual filtering systems crosslingual filtering systems
evaluation campaignevaluation campaign
Romaric Besançon , Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri
![Page 2: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/2.jpg)
213/05/07 2/20
Overview
Goals and features of the INFILE campaign Test collections:
DocumentsTopicsAssessments
Evaluation protocolEvaluation procedureEvaluation metrics
Conclusions
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 3: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/3.jpg)
313/05/07 3/20
Goals and features of the INFILE Campaign
Information Filtering Evaluation filter documents according to long-term
information needs (user profiles - topics)Adaptive : use simulated user feedbackFollowing TREC adaptive filtering task
Crosslingual three languages: English, French, Arabic
close to real activity of competitive intelligence professionals
in particular, profiles developed by CI professional (STI) pilot track in CLEF 2008
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 4: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/4.jpg)
413/05/07 4/20
Test Collection
Built from a corpus of news from the AFP (Agence France Presse)
almost 1.5 million news in French, English and Arabic
For the information filtering task:100 000 documents to filter, in each language
NewsML formatstandard XML format for news (IPTC)
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 5: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/5.jpg)
513/05/07 5/20
Document example
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
document identifier
keywords
headline
![Page 6: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/6.jpg)
613/05/07 6/20
Document example
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
location
IPTC category
AFP category
content
![Page 7: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/7.jpg)
713/05/07 7/20
Profiles
50 interest profiles
20 profiles in the domain of science and technology
developped by CI professionals from INIST, ARIST, Oto Research, Digiport
30 profiles of general interest
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 8: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/8.jpg)
813/05/07 8/20
Profiles
Each profile contains 5 fields: title: a few words descriptiondescription: a one-sentence descriptionnarrative: a longer description of what is
considered a relevant documentkeywords: a set of key words, key phrases or
named entitiessample: a sample of relevant document (one
paragraph) Participants may use any subset of the fields
for their filtering
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 9: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/9.jpg)
913/05/07 9/20
Constitution of the corpus
To build the corpus of documents to filter: find relevant documents for the profiles in
the original corpususe a pooling technique with results of IR
tools
the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine)
each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample
28 runs
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 10: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/10.jpg)
1013/05/07 10/20
Constitution of the corpus (2)
pooling using a “Mixture of Experts” model first 10 documents of each run is taken
first pool assessed a score is computed for each run and each topic
according to the assessments of the first pool create next pool by merging runs using a weighted
sumweights are proportional to the score
ongoing assessments keep all documents assessed
documents returned by IR systems by judged not relevant form a set of difficult documents
choose random documents (noise)
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 11: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/11.jpg)
1113/05/07 11/20
Evaluation procedure
One pass test Interactive protocol using a client-server
architecture (webservice communication)participant registersretrieves one document filters the documentask for feedback (on kept documents)retrieves new document
limited number of feedbacks (50) new document available only if previous one has
been filtered
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 12: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/12.jpg)
1213/05/07 12/20
Evaluation metrics
Precision / Recall/F-measure
Utility (from TREC)
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
relevanta bc d
not relevantretrievednot retrieved
P=a/a+b R=a/a+c
F=2PR/P+R
u=w1∗a-w2∗b
min
minminmax
1max
uu,uu/u=un
![Page 13: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/13.jpg)
1313/05/07 13/20
Evaluation metrics (2)
Detection cost (from TDT) uses probability of missed documents and false alarms
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
relevant not relevantretrieved a bnot retrieved c d
cac=Pmiss /
dbb=Pfalse /
topicfalsefalsetopicmissmissdet PPcPPc=c 1
![Page 14: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/14.jpg)
1413/05/07 14/20
Evaluation metrics
per profile and averaged on all profiles adaptivity: score evolution curve (values computed
each 10000 documents)
two experimental measuresoriginality
number of relevant documents a system uniquely retrieves
anticipation inverse rank of first relevant document detected
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
![Page 15: 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric](https://reader036.vdocuments.net/reader036/viewer/2022062223/5a4d1b017f8b9ab059986a45/html5/thumbnails/15.jpg)
1513/05/07 15/20
Conclusions
INFILE campaign Information Filtering Evaluation: adaptive, crosslingual, close to real usage
Ongoing pilot track in CLEF 2008current constitution of the corpusdry run mid-Juneevaluation campaign in Julyworkshop in September
Work in progress the modelling of the filtering task assumed
by the CI practitioners
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit