![Page 1: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/1.jpg)
1
CLEF 2009, Corfu
Question Answering TrackOverview
J. TurmoP.R. ComasS. RossetO. GalibertN. MoreauD. MostefaP. RossoD. Buscaldi
D. SantosL.M. Cabral
A. PeñasP. FornerR. SutcliffeÁ. RodrigoC. ForascuI. AlegriaD. GiampiccoloN. MoreauP. Osenova
![Page 2: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/2.jpg)
2
QA Tasks & Time
2003 20042005
2006 20072008
2009
QA Tasks
Multiple Language QA Main TaskResPubliQ
A
Temporal restrictio
nsand lists
Answer Validation Exercise (AVE)
GikiCLEF
Real Time
QA over Speech Transcriptions (QAST)
WiQAWSD QA
![Page 3: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/3.jpg)
3
2009 campaign
ResPubliQA: QA on European Legislation
GikiCLEF: QA requiring geographical reasoning on Wikipedia
QAST: QA on Speech Transcriptions of European Parliament Plenary sessions
![Page 4: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/4.jpg)
4
QA 2009 campaign
TaskRegistere
dgroups
Participant groups
Submitted Runs
Organizing people
ResPubliQA
20 1128 + 16
(baseline runs)9
Giki CLEF 27 8 17 runs 2
QAST 12 4 86 (5 subtasks) 8
Total59
showed interest
23 Groups
147 runs evaluated
19 + addition
al assessor
s
![Page 5: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/5.jpg)
5
ResPubliQA 2009:QA on European Legislation
Organizers
Anselmo PeñasPamela FornerRichard SutcliffeÁlvaro RodrigoCorina ForascuIñaki AlegriaDanilo GiampiccoloNicolas MoreauPetya Osenova
Additional Assessors
Fernando Luis CostaAnna KampchenJulia KrammeCosmina Croitoru
Advisory Board
Donna HarmanMaarten de RijkeDominique Laurent
![Page 6: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/6.jpg)
6
Evolution of the task
2003 2004 2005 20062007
2008
2009
Target language
s3 7 8 9 10 11 8
Collections
News 1994 + News 1995+ Wikipedia Nov. 2006
European Legislation
Number of
questions200 500
Type of questions
200 Factoid
+ Temporal
restrictions
+ Definitions
- Type of
question
+ Lists
+ Linked questions
+ Closed lists
- Linked+ Reason+ Purpose
+ Procedure
Supporting
information
Document Snippet Paragraph
Size of answer
Snnipet Exact Paragraph
![Page 7: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/7.jpg)
7
Objectives
1. Move towards a domain of potential users
2. Compare systems working in different languages
3. Compare QA Tech. with pure IR4. Introduce more types of questions5. Introduce Answer Validation Tech.
![Page 8: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/8.jpg)
8
Collection
Subset of JRC-Acquis (10,700 docs x lang) Parallel at document level EU treaties, EU legislation, agreements
and resolutions Economy, health, law, food, … Between 1950 and 2006 XML-TEI.2 encoding Unfortunately, non parallel at the
paragraph level -> extra work
![Page 9: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/9.jpg)
9
500 questions
REASON Why did a commission expert conduct an
inspection visit to Uruguay?
PURPOSE/OBJECTIVE What is the overall objective of the eco-
label?
PROCEDURE How are stable conditions in the natural
rubber trade achieved?
In general, any question that can be answered in a paragraph
![Page 10: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/10.jpg)
10
500 questions
Also FACTOID
• In how many languages is the Official Journal of the Community published?
DEFINITION• What is meant by “whole milk”?
No NIL questions
![Page 11: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/11.jpg)
11
![Page 12: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/12.jpg)
12
Translation of questions
![Page 13: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/13.jpg)
13
Selection of the final pool of 500 questions out of the 600 produced
![Page 14: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/14.jpg)
14
![Page 15: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/15.jpg)
15
Systems response
No Answer ≠ Wrong Answer
1. Decide if the answer is given or not• [ YES | NO ]• Classification Problem• Machine Learning, Provers, etc.• Textual Entailment
2. Provide the paragraph (ID+Text) that answers the question
AimTo leave a question unanswered has more value than to give a wrong answer
![Page 16: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/16.jpg)
16
Assessments
R: The question is answered correctlyW: The question is answered incorrectlyNoA: The question is not answered
• NoA R: NoA, but the candidate answer was correct• NoA W: NoA, and the candidate answer was incorrect• Noa Empty: NoA and no candidate answer was given
Evaluation measure: c@1 Extension of the traditional accuracy
(as proportion of questions correctly answered) Considering unanswered questions
![Page 17: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/17.jpg)
17
Evaluation measure
n: Number of questionsnR: Number of correctly answered
questionsnU: Number of unanswered questions
)(1
1@n
nnn
nc R
UR
![Page 18: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/18.jpg)
18
Evaluation measure
If nU = 0 then c@1=nR/n Accuracy
If nR = 0 then c@1=0
If nU = n then c@1=0
Leave a question unanswered gives value only if this avoids to return a wrong answer
Accuracy
)(1
1@n
nnn
nc R
UR Accuracy
The added value is the performance shown with the answered questions: Accuracy
![Page 19: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/19.jpg)
19
List of Participants
System Team
elix ELHUYAR-IXA, SPAIN
icia RACAI, ROMANIA
iiit Search & Info Extraction Lab, INDIA
iles LIMSI-CNRS-2, FRANCE
isik ISI-Kolkata, INDIA
loga U.Koblenz-Landau, GERMAN
mira MIRACLE, SPAIN
nlel U. politecnica Valencia, SPAIN
syna Synapse Developpment, FRANCE
uaic AI.I.Cuza U. of IASI, ROMANIA
uned UNED, SPAIN
![Page 20: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/20.jpg)
20
Value of reducing wrong answers
System c@1 Accuracy
#R #W
#NoA
#NoA R
#NoA W
#NoA empty
combination 0.76 0.76 381 119
0 0 0 0
icia092roro 0.68 0.52 260 84 156 0 0 156
icia091roro 0.58 0.47 237 156
107 0 0107
UAIC092roro 0.47 0.47 236 264
0 0 00
UAIC091roro 0.45 0.45 227 273
0 0 00
base092roro 0.44 0.44 220 280
0 0 00
base091roro 0.37 0.37 185 315
0 0 00
![Page 21: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/21.jpg)
21
Detecting wrong answers
System c@1
Accuracy
#R #W #NoA
#NoA R
#NoA W
#NoA empt
y
combination 0.56
0.56 278
222 0 0 0 0
loga091dede
0.44
0.4 186
221 93 16 689
loga092dede
0.44
0.4 187
230 83 12 629
base092dede
0.38
0.38 189
311 0 0 00
base091dede
0.35
0.35 174
326 0 0 00
Maintaining the number of correct answers,the candidate answer was not correctfor 83% of unanswered questions
Very good step towards improving the system
![Page 22: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/22.jpg)
22
IR important, not enough
System c@1 Accuracy #R #W #NoA #NoA R #NoA W #NoA empty
combination 0.9 0.9 451 49 0 0 0 0
uned092enen 0.61 0.61 288 184 28 15 12 1
uned091enen 0.6 0.59 282 190 28 15 13 0
nlel091enen 0.58 0.57 287 211 2 0 0 2
uaic092enen 0.54 0.52 243 204 53 18 35 0
base092enen 0.53 0.53 263 236 1 1 0 0
base091enen 0.51 0.51 256 243 1 0 1 0
elix092enen 0.48 0.48 240 260 0 0 0 0
uaic091enen 0.44 0.42 200 253 47 11 36 0
elix091enen 0.42 0.42 211 289 0 0 0 0
syna091enen 0.28 0.28 141 359 0 0 0 0
isik091enen 0.25 0.25 126 374 0 0 0 0
iiit091enen 0.2 0.11 54 37 409 0 11 398
elix092euen 0.18 0.18 91 409 0 0 0 0
elix091euen 0.16 0.16 78 422 0 0 0 0
Feasible Task
Perfect combination is 50% better than best system
Many systems under the IR baselines
![Page 23: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/23.jpg)
23
Comparison across languages Same questions Same documents Same baseline systems Strict comparison only affected by
the variable of language But it is feasible to detect the most
promising approaches across languages
![Page 24: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/24.jpg)
24
Comparison across languages
System RO ES EN IT DE
icia092 0.68
nlel092 0.47
uned092 0.41 0.61
uned091 0.41 0.6
icia091 0.58
nlel091 0.58 0.52
uaic092 0.47 0.54
uaic091 0.45
loga091 0.44
loga092 0.44
Baseline 0.44 0.4 0.53 0.42 0.38
Systems above the baselines
Icia, Boolean + intensive NLP +
ML-based validation & very good knowledge of the collection
(Eurovoc terms…)
Baseline, Okapi-BM25 tuned for
paragraph retrieval
![Page 25: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/25.jpg)
25
Comparison across languages
System RO ES EN IT DE
icia092 0.68
nlel092 0.47
uned092 0.41 0.61
uned091 0.41 0.6
icia091 0.58
nlel091 0.58 0.52
uaic092 0.47 0.54
uaic091 0.45
loga091 0.44
loga092 0.44
Baseline 0.44 0.4 0.53 0.42 0.38
Systems above the baselines
nlel092, ngram-based retrieval,
combining evidence from
several languages
Baseline, Okapi-BM25 tuned for
paragraph retrieval
![Page 26: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/26.jpg)
26
Comparison across languages
System RO ES EN IT DE
icia092 0.68
nlel092 0.47
uned092 0.41 0.61
uned091 0.41 0.6
icia091 0.58
nlel091 0.58 0.52
uaic092 0.47 0.54
uaic091 0.45
loga091 0.44
loga092 0.44
Baseline 0.44 0.4 0.53 0.42 0.38
Systems above the baselines
Uned, Okapi-BM25 + NER +
paragraph validation +
ngram based re-ranking
Baseline, Okapi-BM25 tuned for
paragraph retrieval
![Page 27: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/27.jpg)
27
Comparison across languages
System RO ES EN IT DE
icia092 0.68
nlel092 0.47
uned092 0.41 0.61
uned091 0.41 0.6
icia091 0.58
nlel091 0.35 0.58 0.52
uaic092 0.47 0.54
uaic091 0.45
loga091 0.44
loga092 0.44
Baseline 0.44 0.4 0.53 0.42 0.38
Systems above the baselines
nlel091, ngram-based paragraph
retrieval
Baseline, Okapi-BM25 tuned for
paragraph retrieval
![Page 28: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/28.jpg)
28
Comparison across languages
System RO ES EN IT DE
icia092 0.68
nlel092 0.47
uned092 0.41 0.61
uned091 0.41 0.6
icia091 0.58
nlel091 0.58 0.52
uaic092 0.47 0.54
uaic091 0.45
loga091 0.44
loga092 0.44
Baseline 0.44 0.4 0.53 0.42 0.38
Systems above the baselines
Baseline, Okapi-BM25 tuned for
paragraph retrieval
Loga, Lucene + deep NLP + Logic + ML-
based validation
![Page 29: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/29.jpg)
29
Conclusion
Compare systems working in different languages
Compare QA Tech. with pure IR Pay more attention to paragraph retrieval Old issue, late 90’s state of the art (English) Pure IR performance: 0.38 - 0.58 Highest difference respect IR baselines: 0.44 – 0.68
• Intensive NLP• ML-based answer validation
Introduce more types of questions Some types difficult to distinguish Any question that can be answered in a paragraph Analysis of results by question types (in progress)
![Page 30: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/30.jpg)
30
Conclusion
Introduce Answer Validation Tech. Evaluation measure: c@1 Value of reducing wrong answers Detecting wrong answers is feasible
Feasible task 90% of questions have been answered Room for improvement: Best systems around
60% Even with less participants we have
More comparison More analysis More learning
ResPubliQA proposal for 2010 SC and breakout session
![Page 31: CLEF 2009, Corfu Question Answering Track Overview](https://reader035.vdocuments.net/reader035/viewer/2022062723/56813f5b550346895daa2951/html5/thumbnails/31.jpg)
31
Interest on ResPubliQA 2010
GROUP
1 Uni. "Al.I.Cuza" Iasi (Dan Cristea, Diana Trandabat)
2 Linguateca (Nuno Cardoso)
3 RACAI (Dan Tufis, Radu Ion)
4 Jesus Vilares
5 Univ. Koblenz-Landlau (Bjorn Pelzer)
6 Thomson Reuters (Isabelle Moulinier)
7 Gracinda Carvalho
8 UNED (Alvaro Rodrigo)
9 Uni. Politecnica Valencia (Paolo Rosso & Davide Buscaldi)
10
Uni. Hagen (Ingo Glockner)
11
Linguit (Jochen L. Leidner)
12
Uni. Saarland (Dietrich Klakow)
13
ELHUYAR-IXA (Arantxa Otegi)
14
MIRACLE TEAM (Paloma Martínez Fernández)
But we need more
You have already a Gold Standard of 500 questions & answers to play with…