impressions of 10 years of clef donna harman scientist emeritus national institute of standards and...

Impressions of 10 years of CLEF

Donna HarmanScientist Emeritus

National Institute of Standards and Technology

TREC to CLEF

• TREC ran the first cross-language evaluation in TREC-6 (1997), using Swiss newswire that had been obtained by the Swiss Federal Institute of Technology, Zurich

• The languages were English, French and German and there were 13 participating groups

• NIST hired two “tri-lingual” assessors to build the topics and to do the assessments

• The track was well received, good research happened, but the topic building/assessing was a disaster!!

TREC-7 and 8• For this reason, the next two years of the

CLIR task was run in a distributed manner across 4 groups, with each group building topics in their own language and also doing the relevance assessments.

• The groups were• NIST (English)• University of Zurich (French)• Social Science Information Centre, Bonn and the

University of Koblenz (German)• CNR, Pisa (Italian)

The move to Europe

• This distributed method worked fine but it became obvious after three years that the U.S. participants did not have the background (or maybe the interest) to progress much further and it was decided to move to Europe

• Peter Schäuble convinced Carol Peters to take this on and CLEF started in 2000.

Growth in Languages

0

2

4

6

8

10

12

14

2000 2001 2002 2003 2004 2005 2006 2007 2008

Czech

Hungarian

Bulgarian

Portuguese

Russian

Swedish

Finnish

Dutch

Spanish

Italian

German

French

English

English: LA Times 94/95, Glasgow Herald 95

French: Le Monde 94, ATS 94/95

German: Frankfurter Rundschau 94, Der Spiegel 94/95, SDA 94/95

Italian: La Stampa 94, AGZ 94/95

Spanish: EFE 94/95 Dutch: Algemeen Dagblad 94/95, NRC Handelsblad 94/95

Portuguese: Público 94/95, Folha 94/95

Russian: Izvestia 95

Finnish: Aamulehti 94/95 Swedish: TT94/95

Bulgarian: Sega 2002, Standart 2002

Czech: Mladna frontaDnes 2002, Lidove Noviny

Hungarian: Magyar Hirlap 2002

The all important data

Ad hoc track effort

0

5

10

15

20

25

30

35

2000 2002 2004 2006 2008

countries

assesors

pools/10000

Monolingual, Bilingual and Multilingual Research

• Monolingual runs in all of these languages, adding new sources of linguistic tools such as stemmers, decompounders, etc.

• Bilingual runs across many languages, including some “unusual” pairs; new sources of bilingual data often found

• Multilingual runs that require merging of results across all target languages

Savoy’s web page Language

Stoplist Stemmers in C

English 571 (SMART) Porter?? French 463 words Yes, + more complex one German 603 words Yes, + more complex one Italian 430 words Yes Spanish 307 words Yes Portuguese 356 words Yes Finnish 747 words Yes Swedish 386 words Yes Arabic 347 words (UTF-8) Two available Russian 420 words (UTF-8) Yes, in Perl Hungarian 737 words (UTF-8) Yes Bulgarian 258 words (UTF-8) Yes, in Perl Czech 257 words (UTF-8) 2 in Java

Research groups across Europe and around the World

• 2000: 20 groups from Netherlands, Switzerland, Germany, France, Italy, USA, UK, Canada, Spain, Finland

• 2001: 34 groups, adding Hong Kong, Thailand, Japan, Taiwan, Sweden

• 2002: 37 groups including 9 new ones• 2003: 42 groups including 16 new ones, & group from

Ireland• 2004: 26 groups including 2 new groups from Portugal• 2005: 23 groups including 5 new groups and 3 new

countries (Indonesia, Russia and Hungary)• 2006: 25 groups including 10 new groups and 2 new

countries (Brazil and India)• 2007: 22 groups including groups from the Czech Republic

CLEF 2000 – 2009Participation per Track

CLEF 2000 - 2009 Tracks

0

10

20

30

40

50

60

70

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009Years

Par

ticip

atin

g G

roup

s

AdHoc

DomSpec

iCLEF

CL-SR

QA@CLEF

ImageCLEF

WebClef

GeoClef

VideoClef

INFILE

MorphoChallenge

CLEF-IP

LogCLEF

GridCLEF

CLEF 2009 Workshop30 September – 2 October, Corfu, Greece

ImageCLEF (2003)

• Data was 30,000 photographs from well-known Scottish photographers; all photographs have manual English captions

• 50 topics were built in English and then translated by native speakers to Spanish, Dutch, German, French and Italian

• The task was to work from these languages to the target English captions

• 4 groups participated

ImageCLEF (2004)

• Continued work with Scottish photos• Added medical retrieval task using 8,725

medical images, such as scans and X-rays

• Most of these images had case notes in French or English

• Topics included query-by-example images for the medical collection

• 18 groups participated using both CLIR and image retrieval methodologies

ImageCLEF (2005)

• Continued work with Scottish photos; 19 groups participated using 26 different topic languages

• More medical images: 50,000 images total with annotations in “assorted” languages (English, French and German)

• Medical topics more complex, specifically targetted for either image retrieval, text retrieval, or a need for both

• Added automatic annotation task for the medical images

ImageCLEF (2006)• ImageCLEFphoto: new collection of 20,000

“touristic” photos with captions in English, German and Spanish

• 60 topics for this collection based on a real log file• 36 participating groups using 12 topic languages• 20 groups for a general annotation task (21 classes)• Same 50,000 medical images, but medical topics

taken from log files, however specifically selected to cover two of four “categories”– Anatomic region shown in image– Image modality (x-ray, MRI, etc.)– Pathology or disease shown in image– Abnormal visual observation

QA@CLEF

• 2003: 8 groups did factoid question answering for 200 questions (translated into 5 languages) against target text in 3 languages

• 2004: 18 groups did factoid and definition questions (700 questions), with the questions in 9 languages against target text in 7 languages

QA@CLEF

• 2005: 24 groups did factoid, definition, and NIL questions, with the questions in 10 languages against target text in 9 languages

• 2006: 30 groups did “snippet” answers for 200 questions in 11 source languages; also 2 pilot tasks– Answer validation and WiQA

• 2007: “main task” adding Wikipedia data plus an answer validation plus a pilot for QA using spoken questions

iCLEF

• 2002,2003: user studies on CLIR document selection (e.g., new interfaces to help with translation) or new interactive tools of doing CLIR

• 2004: interactive cross-language QA• 2005: continued interactive QA plus

interactive search of Scottish photos• 2006, 2007: interactive Flickr

searching

Structured Data Retrieval

• TRECs 7 and 8 CLIR worked with GIRT-2, including over 50,000 structured documents (bibliographic data including manual index terms); plus an English-German thesaurus

• 9 years of CLEF saw this data grow to 151,319 German documents (also translated to English), plus 20,000 English documents and 145,802 Russian documents; plus vocabulary mappings across this collection

• CLEF 2008 used the TEL collection of library catalogs in English, French and German

WebCLEF

• 2005: EuroGOV collection of over 3.5 million pages in 20 languages; tasks were known-item topics, homepages and named pages; 11 teams built topics, did assessments and did experiments

• 2006: 1,940 known item topics, some automatically generated

• 2007 and 2008: 30 specially-built topics to mimic building a Wikipedia article or survey

GeoCLEF

• 2005: English and German CLEF newspapers with 25 “old” CLEF topics retrofitted with geospatial information

• 2006: 25 topics in 5 languages with target source documents in English, German, Portuguese and Spanish

• 2007: much deeper analysis of what actually makes a topic geospatial!

• 2008: included a Wikipedia pilot task

Assorted@CLEF

• speech retrieval – 2003, 2004 using topics in 6 languages

against the English TREC speech data– 2005, 2006, 2007 searching spontaneous

speech in Czech and English

• Non-European languages– 2007: topics in Bengali, Hindi, Marathi,

Tamil and Telugu against the English data– 2008: topics and documents in Persian

CLEF contributions

• First large test collections for 13 European languages

• Extensive research in IR within and across these languages

• Training for many new IR groups across Europe (and elsewhere)

• Lots of new linguistic tools developed as a result of CLEF

More CLEF contributions

• The first major evaluation of image retrieval, both in the photographic area and in the medical area

• This led not only to these major image collections but to new groups and lots of excellent research in image retrieval

• First step in geospatial retrieval• Major spontaneous speech retrieval

effort,• ETC. ETC. ETC

impressions of 10 years of clef donna harman scientist emeritus national institute of standards and...

Documents

new groups

javaresearch groups

new ones2003

new countries indonesia

new countries brazil

perlhungarian737 words

yesbulgarian258 words

availablerussian420