improving answer precision and recall of list questions kian wei kor
TRANSCRIPT
Improving Answer Precision and Recall of List
Questions
Kian Wei Kor
TH
E
U N I V E RS
IT
Y
OF
ED I N B U
RG
H
Master of Science
School of Informatics
University of Edinburgh
2005
Abstract
This thesis presents a novel approach to answering natural language list questions, or
questions that have more than a single correct answer. It is based on the hypothesis that
the set of answers to a list question will often occur in a similar context. By analyzing
candidate answers produced by an existing Question Answering system, it is possible
to identify the common context shared by two or more candidate answers. Once the
common context is identified, it is then possible to extrapolate from this common con-
text to identify more answer candidates previously not found by the original Question
Answering system.
i
Acknowledgements
Many thanks to Bonnie Webber, Johan Bos, Kisuh Ahn and Malvina Nissim for your
invaluable advice and guidance. To Alok Mishra, June Tee and Sasithorn Parinamosot
for being great company this past year. And finally, to my parents who have made this
all possible.
ii
Declaration
I declare that this thesis was composed by myself, that the work contained herein is
my own except where explicitly stated otherwise in the text, and that this work has not
been submitted for any other degree or professional qualification except as specified.
(Kian Wei Kor)
iii
Table of Contents
1 Introduction 1
2 Background 4
2.1 Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Text REtrieval Conference (TREC) . . . . . . . . . . . . . . 5
2.2 Relations and Patterns in Question Answering . . . . . . . . . . . . . 6
2.2.1 Dual Iterative Pattern Relation Extraction . . . . . . . . . . . 6
2.2.2 DIPRE and Question Answering . . . . . . . . . . . . . . . . 7
2.3 String Matching and String Alignment Algorithms . . . . . . . . . . 10
2.3.1 Smith-Waterman-Gotoh Algorithm . . . . . . . . . . . . . . 11
3 The LiQED System 13
3.1 External Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Apache Lucene Search Engine . . . . . . . . . . . . . . . . . 14
3.1.2 OpenNLP Toolkit . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.3 LingPipe Natural Language Toolkit . . . . . . . . . . . . . . 14
3.1.4 JAligner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.5 Google Web API . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.6 Amazon Web Services API . . . . . . . . . . . . . . . . . . . 15
3.2 DIPRE Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Extracting Answers . . . . . . . . . . . . . . . . . . . . . . . 23
iv
4 LiQED in TREC 26
4.1 Improved Answer Extraction . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Hypernyms for Answer Extraction . . . . . . . . . . . . . . . 27
4.1.2 Publication Answers . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Google Reranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Answer Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Evaluation and Analysis 32
5.1 Training Set - TREC 2004 . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Test Set - TREC 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Event Questions . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.3 Semantically Deep Questions . . . . . . . . . . . . . . . . . 36
5.2.4 Answer Extraction . . . . . . . . . . . . . . . . . . . . . . . 38
6 Conclusion 40
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.2.1 Alternative Question Targets . . . . . . . . . . . . . . . . . . 42
6.2.2 Fine-Grained Named Entity Recognition . . . . . . . . . . . 42
6.2.3 Anaphora Resolution . . . . . . . . . . . . . . . . . . . . . . 42
6.2.4 Answer Verification . . . . . . . . . . . . . . . . . . . . . . 43
A TREC 2004 List Questions 44
B TREC 2005 List Questions 49
C Sample LiQED Pattern File 56
Bibliography 59
v
Chapter 1
Introduction
Statistical Natural Language technologies are becoming increasingly important in to-
day’s world. As the world wide web continues to grow, there is an increasing amount
of textual information and knowledge waiting to be mined. While search engine tech-
nologies currently fills this information gap, it is increasingly clear that simply retriev-
ing relevant documents may not be the best solution to this problem. What users are
searching for is information and knowledge. Users do not want to sift through docu-
ments returned by search engines, looking for relevant nuggets of information. What
users want are answers, precise information nuggets and evidence to support the valid-
ity of these information nuggets.
Question Answering (QA) systems are an attempt to fulfill this need for exact an-
swers. Unlike a search engine that returns a ranked list of relevant documents for a
query, question answering systems take a user posed question and returns what it be-
lieves is the correct answer to the question. Open domain Question Answering as a
research area took off in 1999, mainly due to the introduction of a Question Answer-
ing Track in the Text REtrieval Conference (TREC), see Voorhees (1999), held by the
American National Institute of Science and Technology (NIST). The first TREC QA
systems showed that by combining Natural Language Processing and Information Re-
trieval techniques, it is computationally possible to extract answers to questions from
a large corpus of news reports.
Question Answering systems have grown increasingly sophisticated over the past
6 years. A typical current day QA system would contain components for natural lan-
1
Chapter 1. Introduction 2
guage processing, statistical and probabilistic machine learning, logic inference as well
as incorporating various sources of world knowledge. Yet despite all this increased
complexity, Voorhees (2004) show that the best QA system to date has a balanced
F-Score of 0.770. There is still room for much improvement.
Question Answering research has mainly been focused on factoid questions. That
is, questions that have a single, concise fact as an answer. Examples of factoid ques-
tions includes”In what sea did the Russian submarine Kursk sink?”and”Who won
the Miss Universe 2000 crown?”. One area of Question Answering that has been ne-
glected in the past few years is the list question. List questions are similar to factoid
question, except there is more than one correct and distinct answer to the question.
For example,”Which countries expressed regret about the loss of Russian submarine
Kursk?” and”Name the contestants of Miss Universe 2000.”.
Most research thus far treat list questions simply as shorthand for asking the same
factoid question multiple times. The set of all correct, distinct answers in the document
collection that satisfy the factoid question is the correct answer for the list questions.
In some instance, answer to a list question is simply the topN distinct answers found
by the factoid question answering system.
This may not necessarily be the only way or best way to answer list questions.
This dissertation presents a supplementary approach to answering list questions. It is
based on the hypothesis that the set of answers to a list question often appear in similar
contexts. By analyzing candidate answers produced by an existing factoid QA system,
it is possible to identify the common context in which two or more answer candidates
appear in. Once the common context is identified, it is then possible to extrapolate on
this common context to identify more answer candidates previously not found by the
original factoid QA system.
The common context described above can be expressed in several different forms.
For example as common syntactic constituents or semantic structures. In this paper,
words that frequently co-occur with two ore more answer candidates are used as the
common context.
Chapter 2 will provide some background information on some of the techniques
and algorithms used. Chapter 3 details the implementation of the LiQED (short for
Chapter 1. Introduction 3
List QED) system which takes a set of candidate answers from an existing QA system,
Edinburgh University’s QED system by Leidner et al. (2003), and expands the answer
set with new and distinct candidate answers. Chapter 4 wraps up the discussion of the
LiQED system with some of the challenges involved with using LiQED as list question
answering module for the TREC conference. Chapter 5 presents the results of applying
LiQED in the TREC 2005 Question Answering evaluation and concludes with analysis
and potential areas for enhancements in Chapter 6.
Chapter 2
Background
This chapter presents some of the background work that directly relates to the work
done in this thesis. The first part of this chapter gives some basic background on the
field of Question Answering and the defacto arena for evaluation of Question Answer-
ing Systems, the Text REtrieval Conference. This continues on to the second part of
the chapter, which covers prior work that relates to the thesis. This includes the idea of
using patterns and relations by Brin (1999), and the work of Ravichandran and Hovy
(2002) in using automatically generated text patterns for Question Answering. The
chapter ends with a brief look at two string matching algorithms, one of which will be
used in the thesis.
2.1 Question Answering
A Question Answering (QA) System is a computer system that has access to one or
more sources of information and is capable of using these information sources to find
answers to natural language questions posed by human users.
A question answering system is quite similar to a search engine. Both provide
an user interface that allow human users to find information. However there are sev-
eral differences that distinguishes a search engine from a question answering system.
The most salient difference is that a search engine relies on the human subject to pose
queries, not questions. A query is a specially formatted string that contains keywords
4
Chapter 2. Background 5
and search engine commands. The output of a search engine is typically a set of doc-
uments that match the user’s query. A Question Answering system on the other hand,
takes a natural language question as input and typically returns a specific answer as its
output.
The earliest Question Answering systems such as BASEBALL by Green et al.
(1961) was developed in the 1960s. BASEBALL provided a natural language user
interface to a database of baseball facts and figures. Another early Question Answering
system is LUNAR by Woods (1973), which allow NASA geologist ask questions to a
database containing information and analysis of lunar rocks, soil samples gathered
from the Apollo 11 lunar expedition mission.
Today, there are many different types of question answering systems. Generally the
systems fall under two classes, open domain Question Answering systems and closed
domain question answering system. A closed-domain Question Answering system is
designed to answer questions that fall within a specific specialist domain. For example,
there is a medicine question answering system by Yun and Graeme (2004) and an
aircraft maintenance question answering system by Rinaldi et al. (2003). Open-domain
systems deal with generalist questions about nearly everything. An example of an
open-domain question answering system is MIT’s START web-based system Katz.
(1997) that can be found at http://www.ai.mit.edu/projects/infolab/.
2.1.1 Text REtrieval Conference (TREC)
Open domain Question Answering as a research area took off in 1999, due to the in-
troduction of a Question Answering Track in the Text REtrieval Conference (TREC)
Voorhees (1999) held by the American National Institute of Science and Technology
(NIST). The first TREC QA systems showed that by combining Natural Language
Processing and Information Retrieval techniques, it is computationally possible to ex-
tract answers to questions from a large corpus of news reports.
2.1.1.1 Aquaint Corpus
The corpus used in the Question Answering Track in TREC is the AQUAINT corpus.
Aquaint consists of newswire articles stored as text data in English and is drawn from
Chapter 2. Background 6
three sources: the Xinhua News Service (Peoples Republic of China), the New York
Times News Service, and the Associated Press Worldstream News Service.
The corpus contains 1,033,461 news articles starting from the year 1996 to 2000.
The corpus is 3 gigabytes in size, containing roughly 25 million sentences and 375
million words.
2.2 Relations and Patterns in Question Answering
This section explores some prior work that has been done to exploit the relationships
inherent between two concepts. Concepts are represented in natural language as words.
The recent work on semantic web by Berners-Lee et al. (2001) aim to capture the
relationship between these concepts via the way concept words relate to each other. In
question answering, there is a clear and direct relationship between a question and its
answers. Both the question (or more specifically the question topic) and the answers
are in some sense related. Most QA systems try to leverage these relationships in some
way.
This thesis takes a look at list questions because these questions offer an opportu-
nity to directly examine the relationship between a question and its answers. Given a
question and some correct answers, is it possible to identify the relationship between
question topic and an answer? Can we then use this relationship to expand the set of
correct answers? Given a set of answers from a QA system which may contain false
positives, can relationships be reliably identified in the presence of such noise?
There exists a wide range of natural language processing techniques that is ap-
plicable for this task. For this thesis, we will implement and examine one possible
technique.
2.2.1 Dual Iterative Pattern Relation Extraction
Brin (1999) suggested that it is possible to extract pairs of related concepts, for example
authors and their book titles, from the web using just a small initial seed samples.
The general algorithm he proposed, called Dual Iterative Pattern Relation Extraction
(DIPRE), work as follows.
Chapter 2. Background 7
1. Start with a small seed set of (author, title) pairs.
2. Find all occurrences of those pairs on the web.
3. Identify patterns for the citations of the books from these occurrences.
4. Search the web for these patterns to recognize more new (author, title) pairs.
5. Repeat the steps with the new (author, title) pairs to find even more (author, title)
pairs.
Brin shows that using this method, he successfully found over 15,000 author, title
pairs with just an initial seed of 5 authors and book title pairs. DIPRE has also been
used by Yi and Sundaresan (1999) to identify acronyms of organizations on the web.
2.2.2 DIPRE and Question Answering
While Brin’s DIPRE algorithm has been successfully applied to the web, it is not cer-
tain if such an algorithm, or a similar algorithm will be successful within a question
answering context. Thus far the algorithm has only been applied to simple, clearly
defined and commonly occurring relationship types such as (author, title) pairs or (or-
ganization, acronym) pairs.
In a separate work, Hovy et al. (2002) has applied a similar algorithm in question
answering systems. By examining past TREC factoid questions, six commonly oc-
curring relationship types were identified. They showed that it is possible to extract
patterns for such relationships from a corpus. Table 2.1 shows the 6 specific relation-
ship types and an example of the type of pattern picked up by Hovy and Ravichandran’s
system.
Unlike Hovy and Ravichandran, this work focuses on relationships specific to
each individual question rather than specialized relationships applicable only to cer-
tain classes of questions. This is achieved by adapting the DIPRE algorithm to operate
in a question answering context.
The adapted DIPRE algorithm for question answering treats the Aquaint corpus
as a bag of sentences. Given a question and a set of potential answers, the adapted
algorithm work as follows:
Chapter 2. Background 8
Relationship Type Sample Pattern
<PERSON>-<BIRTHYEAR> <PERSON> was born on<BIRTHYEAR>
<PERSON>-<INVENTION> the<INVENTION> was invented by<PERSON>
<DISCOVERY>-<PERSON> discovery of<DISCOVERY> by <PERSON>
<ENTITY>-<TYPE> , a form of<TYPE>, <ENTITY>
<PERSON>-<FAME> the famous<FAME>, <PERSON>,
<NAME>-<LOCATION> at the<NAME> in <LOCATION>
Table 2.1: 6 common relationship types found in factoid questions
1. Find sentences that contain the relationship.
2. Identify surface text patterns common to two or more sentences.
3. Find other sentences that match these surface text patterns, extract answers.
The above algorithm can be iterative like the original DIPRE algorithm. However
there was the danger of false positive answers causing a feedback loop and amplifying
the number of false positives with each iteration. Thus for this thesis, it was decided
not to apply this algorithm in an iterative fashion.
2.2.2.1 The Question - Answer Relationship
Table 2.2 show a TREC 2005 List question and the set of all correct answers. It is
possible to say that the question target, OPEC, is in some way related to the answer
set. The relationship is the organization OPEC and its member countries. Likewise,
each answer is also in some way related to other answers in the answer set.
We thus have two possible question specific relationship types that can be exploited
in list questions, the Question to Answer relationship and the Answer to Answer rela-
tionship. Both relationships were initially explored. However initial experimentation
showed that there were disadvantages to using the Answer to Answer relationship.
1. Simple Patterns. Pattern generation experiments were conducted using known
Answer to Answer pairs from the TREC 2004 list question set. The majority of
Chapter 2. Background 9
Target OPEC
Question What are OPEC countries?
Answers Brazil
Algeria
Indonesia
Iran
Iraq
Kuwait
Libya
Nigeria
Qatar
Saudi Arabia
United Arab Emirates
Table 2.2: Question on OPEC countries
the generated text patterns were very simply answer pairs delimited by a comma
”,” or the word”and” . In other words, the patterns generated correspond pre-
cisely to what one would expect to match a sequence of answers. These are
common and intuitive patterns that does not require a sophisticated method for
extraction.
2. Computational Complexity. Given a set ofN answers, it takesO(N(N−1)2 ) time
to perform pairwise comparisons between all distinct answer pairs. In compar-
ison, question to answer relationship pairs only requireO(N) comparisons. In
light of the simple patterns produced by Answer to Answer pairs, it was deter-
mined that using Answer to Answer relations would not be helpful.
3. Identifying Conjunctions . It was found that patterns generated using Question
and Answer relationships can also identify and extract a list of sequential an-
swers in a single sentence. Thus depreciating the need for using Answer and
Answer relationships to identify a list of answers.
Chapter 2. Background 10
For these reasons, Answer to Answer relationship was not used in the final build
of the system. Only the question target and answer set will be used. For the remainder
of this paper, angular brackets will be used to represent words related to a certain
concept. Examples can already be found in Table 2.1. Specifically,<QUESTION> will
refer words in the question target while<ANSWER> refer to words that make up an
answer.
2.3 String Matching and String Alignment Algorithms
In order to identify common text patterns in<QUESTION>-<ANSWER> relationship
pairs, it is important to have an algorithm that is able to pick up surface text patterns
common to two or more of these pairs of words. Hovy et al. (2002) borrowed the idea
of suffix trees from computational biology for this purpose
A suffix tree is a tree-like data structure for storing a string. Suffix trees are used to
solve the exact string matching problem in linear time, achieving about the same worst-
case bound as the Knuth et al. (1977) and the Boyer and Moore (1997) algorithms. See
Gusfield (1997) and Nelson (1996) for more information on suffix trees.
A suffix treeT for a stringS (with n = |S|) is a rooted, labeled tree with a leaf for
each non-empty suffix ofS. Furthermore, a suffix tree satisfies the following properties:
• Each internal node, other than the root, has at least two children;
• Each edge leaving a particular node is labeled with a non-empty substring of S
of which the first symbol is unique among all first symbols of the edge labels of
the edges leaving this particular node;
• For any leaf in the tree, the concatenation of the edge labels on the path from the
root to this leaf exactly spells out a non-empty suffix of s.
By concatenating a pair of sentences into a single string and constructing its suffix
tree, the longest common substring can be identified inO(n+m) time, wheren is the
length of the first string andm is the length of the second string.
• Mozart (1756-1791) was a genius.
Chapter 2. Background 11
• The great Mozart (1756-1791) achieved fame at a young age.
In the above two sentences, a suffix tree would be able to identify that the longest
common substring isMozart (1756-1791), which nicely matches Hovy’s<NAME>-
<BIRTHYEAR> text pattern. However if either sentences has some form of long dis-
tance dependency that often occur in natural language, a suffix tree would fail to find
the text pattern.
For example, a suffix tree will only pick up the fragmentauthor of Silent
Spring from the two sentences below. This fragment would not result in a positive
match with any<BOOK>-<AUTHOR> patterns as the fragment does not contain an au-
thor name.
• Rachel Carson is the author of Silent Spring.
• Rachel Carson, founder of contemporary environmental movement, author of
Silent Spring, died on April 14 1964.
During the course of experimentation while working on this thesis, it was discov-
ered that a significant majority of the sentences that contain a list question relationship
pair also separates the question topic and answer via some form of long distance de-
pendency. Thus, an algorithm that is able to perform substring matching on sentences
with long distance dependencies is required.
2.3.1 Smith-Waterman-Gotoh Algorithm
Besides the suffix tree algorithn, the field of computational biology also uses a number
of other algorithms for matching of DNA or protein sequences. One such algorithm
is the Smith-Waterman algorithm by Smith and Waterman (1981). It is a dynamic
programming algorithm that works by computing the optimal local alignment between
two sequences or strings.
Suppose we have two string sequencesA=(a1,a2,a3, ...,an), B=(b1,b2,b3, ...,bm)
and a scoring or similarity matrixs wheres(ai ,b j) is a measure of similarity between
elementai in sequenceA and elementb j in sequenceB. The algorithm computes op-
timal alignment using a score function,H(i, j), that measures the degree of optimal
alignment between the two sequences ending at elementsai andb j .
Chapter 2. Background 12
H(i, j) = max
0
H(i−1, j −1)+s(ai ,b j)
H(i−1, j)+d
H(i, j −1)+d
whered is the open gap penalty value. In biology,d is usually a function which
allows biologists to vary the penalty costs for different gap sizes. For the purposes
of this thesis, the gap penalty is constant thusd = 1. The original Smith-Waterman
algorithm stores the intermediate values ofH(i, j) as a two dimensionaln by mmatrix,
which takesO(m2n) time to compute. There is an improved version of the algorithm by
Gotoh (1982) that reduces computation time toO(mn). Once this matrix is computed,
the optimal alignment can be found by retracing steps through the matrix.
To illustrate the results of the algorithm, here is an example from biology where
the algorithm is used to compare human and mouse DNA.
Human DNA QWEFTEDPGGDEAFT...
|-| ||||-|||...
Mouse DNA EEEET PGGDFAFT...
The algorithm identifies similar parts of the two DNA sequences and finds the
optimal alignment which are indicated by the vertical bar|. Gaps are indicated by
spaces and the horizontal bar- indicate a single DNA (character) substitution.
By replacing DNA sequences with natural language word sequences, it is possible
to use this algorithm to identify matching substrings even if there are long-distance
dependencies in the sentences. The Smith-Waterman-Gotoh algorithm is thus the al-
gorithm of choice for this thesis.
Chapter 3
The LiQED System
The DIPRE algorithm itself is really just a generalized pseudo code that prescribes
an approach for finding new instances of a relationship type from some relationship
samples, this chapter and the next will go into details of how this algorithm is fleshed
out into a system capable of answering TREC list questions.
Written in Java, the complete system is called LiQED, short for List QED, because
it only answer list questions. The system draw its initial set of candidate answers from
Edinburgh’s QED Question Answering system.
As mentioned in chapter 2, the system only uses<QUESTION>-<ANSWER> rela-
tion pairs to extract answers for a list question.<QUESTION> is the question target
associated with every question and<ANSWER> are candidate answers generated by
QED.
Using purely shallow methods, LiQED first does a search for sentences that contain
both the question target and a candidate answer. From these sentences, LiQED gen-
erate text patterns which are then used to search for more sentences. Next, an answer
extraction step identifies new candidate answers and finally the answers are reranked
to give the final set of answers.
13
Chapter 3. The LiQED System 14
3.1 External Libraries
Before going into the details of LiQED, here is a brief overview of the external libraries
used in the system.
3.1.1 Apache Lucene Search Engine
Apache Lucene is a high-performance, full-featured text search engine library written
entirely in Java. The library is released by the Apache Software Foundation under the
Apache Software License. Lucene, being a search engine API, allows the creation of
an vector-space model index which can then searched with a large variety of query
tools, including boolean, phrase, wildcard and fuzzy match queries. Of particular use
is Lucene’s capability for searching hierarchical ”span phrase” queries, which are used
to search for sentences with long range dependencies.
3.1.2 OpenNLP Toolkit
The OpenNLP Toolkit and OpenNLP MaxEnt projects by Tom Morton provides a
set of open source, Java-based Natural Language Processing tools including sentence
detection, tokenization, part-of-speech tagging, chunking, parsing, named-entity de-
tection. These tools have been trained using the Maximum Entropy model provided by
OpenNLP MaxEnt.
While the toolkit itself is still very much a work in progress, most of the tools
such as sentence detection, tokenization, POS tagging and chunking are stable and
sufficiently accurate for the purpose of this thesis. However, the toolkit’s named entity
tagger requires an inordinate amount of memory and is very slow. Thus for LiQED,
the toolkit is only used for tokenization, POS tagging and chunking.
3.1.3 LingPipe Natural Language Toolkit
LingPipe is a commercial Natural Language Processing library. LingPipe can be li-
censed for no cost under a non-commercial use license. Like the OpenNLP toolkit,
Chapter 3. The LiQED System 15
it provides sentence detection, tokenization, part-of-speech tagging, chunking, parsing
and named-entity detection.
LingPipe is used mainly for its named entity tagger as OpenNLP’s tagger is highly
inefficient.
3.1.4 JAligner
JAligner is an open source Java implementation of the Smith and Waterman (1981)
algorithm, with improvements Gotoh (1982) made to the algorithm. The algorithm was
originally designed as a fast tool for protein and nucleic acid sequence comparison.
However the algorithm works equally well for linguistic purposes as the algorithm
works on the basis of dynamic programming string comparison. The algorithm is used
to generate the text patterns that will be used to identify answers.
3.1.5 Google Web API
The Google Web API is a SOAP based web service API provided by Google. Using
this API enables a java program to issue queries and get results from the Google search
engine. The Google API is used in two different areas of the system, reranking the final
set of answers as well as for identifying hypernyms.
3.1.6 Amazon Web Services API
The Amazon Web Services API is a SOAP based web service API provided by Ama-
zon.com. This API allows a program to query the Amazon.com database of books and
music. The Amazon API is used mainly as an external information source to verify
answers for publication (books, songs and movies) questions.
3.2 DIPRE Implementation
There are two distinct phases in the DIPRE algorithm. The first phase involves au-
tomatically generating patterns from a set of initial relation pairs. In the context of
Chapter 3. The LiQED System 16
question answering, a relation pair would be the question topic and a candidate an-
swer.
The next phase takes the generated patterns and matches them to sentences in the
corpus to find new potential answers. A final answer extraction step then picks up what
the system believes are answers to the question.
3.2.1 Preprocessing
Both the Pattern Generation and Answer Extraction phase require searching over sen-
tences. To speed up the process of sentence searching, the sentence level index of the
Aquaint corpus is created using the Lucene search engine library. In addition, every
word in every sentence is also tagged with part-of-speech and chunk information. This
tag information is stored together with the original sentence in the Lucene index and is
also retrieved together with its associating sentence.
3.2.1.1 The Aquaint Index
The following steps were applied to all documents in the Aquaint corpus to construct
the Lucene search index.
1. Break each news article into constituent sentences using LingPipe’s sentence
detector.
2. For each sentence,
(a) Tokenize the sentence with OpenNLP Tokenizer.
(b) Stem each word in the sentence with Lucene’s Porter Stemmer.
(c) Label each stemmed word with a part-of-speech tag with OpenNLP’s POS
Tagger.
(d) Label each stemmed word with a chunk tag with OpenNLP’s Chunk Tag-
ger.
(e) Store in Lucene the DOCID of the source article, line number, the original
sentence, the tokenized, stemmed and lowercased version of the sentence,
POS tags and Chunk tags.
Chapter 3. The LiQED System 17
All punctuation and stop words have been preserved in the sentences as they prove
to be very important during the pattern generation phase. This will be discussed in
detail in the next section.
The end result is an sentence-level index of all news articles in the Aquaint corpus.
The index takes up 13 GB of disk space. Table 3.1 shows the index size for each news
source and year.
Agency Year Date Sub-Index Size
APW 1998 Jun 01 to Dec 31 1,000 MB
APW 1999 Jan 01 to Nov 01 1,007 MB
APW 2000 Jan 01 to Sep 30 711 MB
NYT 1998 Jun 01 to Dec 31 1,800 MB
NYT 1999 Jan 01 to Dec 31 3,000 MB
NYT 2000 Jan 01 to Sep 29 2,000 MB
XIE 1996 Jun 01 to Dec 31 469 MB
XIE 1997 Jan 01 to Dec 31 496 MB
XIE 1998 Jan 01 to Dec 31 537 MB
XIE 1999 Jan 01 to Dec 31 545 MB
XIE 2000 Jan 01 to Sep 30 425 MB
Total Disk space 13.246 GB
Table 3.1: Size of the indexed Aquaint corpus
3.2.2 Pattern Generation
Given a question target and a set of candidate answers, the pattern generation phase
seeks to identify a chain of words, a text pattern, that will identify potential answers.
The end result of this pattern generation phase will be a pattern file that contains a list
of text patterns that can be used to identify more answers. Appendix C exhibits one
such sample pattern file for the question”What movies was he (Bing Crosby) in?”and
will be used for illustration purposes in this chapter.
Chapter 3. The LiQED System 18
3.2.2.1 Identifying Relevant Words
The first step in creating text patterns is to identify sentences that contain the question
target and one of the candidate answers. In our example question,”What movies was
Bing Crosby in?”, the question target isBing Crosbyand QED identified six candidate
answers which are listed in Table 3.2.
Of these candidate answers, onlyHigh Societyis a movie staring Bing Crosby.
Roadrefer to a series of movies by Bing Crosby andWhite Christmasis a song also
by Bing Crosby.Southern Man, Mr. Tambourine ManandLooking Forwardare songs
or albums by David Crosby who is not in any way related to Bing Crosby.
A series of queries on the Aquaint corpus picks up around 30 sentences that con-
tain both the question target and one candidate answer. Table 3.3 show some of the
sentences with the question target and QED’s candidate answers highlighted in bold.
Given sentences like the above, the first step in constructing text patterns is to
identify terms that are in some sense, relevant to the question and its answers. What
are considered relevant terms here include not just words, but also stop words and even
punctuation marks and symbols. The reason for including punctuation in list questions
is due to Yang et al. (2003) and others who note that multiple answers appearing in the
same sentences are often delimited by punctuation marks. As can be seen later, it is
important to capture these punctuation marks in the generated text patterns.
Every term found in the matching sentences will be tested for relevancy. Relevance
is defined in terms of distance from the question target and candidate answer. For each
term in a sentence, its relevance is defined by:
Relevance(term) = wq
(1−
Distance(term,Tq)Maxspan(Tq)
)+wa
(1− Distance(term,Ta)
Maxspan(Ta)
)where,Tq is the set of words in the question target.Ta is the set of answer words that
occur in the sentence.wq andwa are weights such thatwq+wa = 1. Distance(term,T)
is the shortest distance be the term and any words in setT. MaxSpan(T), which serves
as a normalization function, is the longest distance between any words in setT, or the
sentence boundary.
As an example, the formula is applied to every term in the sentence:60 years ago
: Bob Hope andBing Crosbystarred in ” Roadto Singapore ” .
Chapter 3. The LiQED System 19
White Christmas Mr. Tambourine Man Looking Forward
High Society Road Southern Man
Table 3.2: QED’s candidate answers for ”What movies was Bing Crosby in?”
• The record was previously held byBing Crosby’s “ White Christmas. ”
• It has sold an estimated 15 million copies and is the second best-selling single
in history , runner-up toBing Crosby’s “ White Christmas. ”
• Bing Crosbyand Fred Astaire star in “ Holiday Inn , ” the 1942 musical featur-
ing the classic song “White Christmas, ” for $ 9.98 .
• “ High Society” with Bing Crosbyand Grace Kelly , “ Anchors Aweigh ” with
Gene Kelly and Kathryn Grayson and “ On the Town ” with Gene Kelly and
Betty Garrett .
• “ High Society, ” TCM Saturday at 8 :Bing Crosby, Grace Kelly and Frank
Sinatra star in this 1956 remake of “ The Philadelphia Story . ”
• 60 years ago : Bob Hope andBing Crosbystarred in ” Roadto Singapore . ”
• are having more fun on the road than Bob Hope andBing Crosbyin “ The Road
to Morocco . ”
• “The Enchanted Cottage , ” 1944 starring Robert Young ; “Roadto Utopia ”
1945 starring Bob Hope andBing Crosby; and Alfred Hitchcock ’s “ The Man
Who Knew Too Much ” 1956 .
Table 3.3: Sentences containing both question target and candidate answers for the
question ”What movies was Bing Crosby in?”
Chapter 3. The LiQED System 20
Let wq = 0.5, wa = 0.5, Tq = {Bing,Crosby}, Ta = {Road}, Maxspan(Tq) = 8,
Maxspan(Ta) = 12.
Relevance(60) = 0.5(1− 7
8
)+0.5
(1− 12
12
)= 0.0625
Relevance(years) = 0.5(1− 6
8
)+0.5
(1− 11
12
)= 0.1667
Relevance(ago) = 0.5(1− 5
8
)+0.5
(1− 10
12
)= 0.2708
Relevance(:) = 0.5(1− 4
8
)+0.5
(1− 9
12
)= 0.3750
Relevance(Bob) = 0.5(1− 3
8
)+0.5
(1− 8
12
)= 0.4792
Relevance(Hope) = 0.5(1− 2
8
)+0.5
(1− 7
12
)= 0.5833
Relevance(and) = 0.5(1− 1
8
)+0.5
(1− 6
12
)= 0.6875
Relevance(Bing)
Relevance(Crosby)
Relevance(starred) = 0.5(1− 1
8
)+0.5
(1− 3
12
)= 0.8125
Relevance(in) = 0.5(1− 2
8
)+0.5
(1− 2
12
)= 0.7917
Relevance(′′) = 0.5(1− 3
8
)+0.5
(1− 1
12
)= 0.7708
Relevance(Road)
Relevance(to) = 0.5(1− 5
8
)+0.5
(1− 1
12
)= 0.6458
Relevance(Singapore) = 0.5(1− 6
8
)+0.5
(1− 2
12
)= 0.5417
Relevance(′′) = 0.5(1− 7
8
)+0.5
(1− 3
12
)= 0.4375
In the example sentence, the termsstarred, in and ” have the highest relevance
because they appear between both the question target and candidate answer. By av-
eraging the relevance score of all occurrences of the same term, we can then rank all
terms according to their relevance to bothTq andTa. Using a threshold as filter, we can
select the set of most relevant terms,R. Table 3.4 lists setR for the example question.
It was found that the most relevant terms generally fall into one of three classes:
Contextual Terms add context information that improves precision of locating rele-
vant sentences. For example,BobandHopefrom the example question fall into
Chapter 3. The LiQED System 21
to the and ’s
with bob hope in
. ‘‘ ’’ ,
Table 3.4: Set of terms relevant to the question ”What movies was Bing Crosby in?”
this category because the actor Bob Hope regularly co-star with Bing Crosby.
Thus Bob Hope can help in disambiguating random movies from movies star-
ring Bing Crosby.
Chunk Markers identify a position in a sentence where an answer can be found.
Chunk markers are typically punctuation marks, determinants and prepositions.
These chunk markers are especially helpful when the answer and question target
are separated by a span of non-relevant words due to long distance dependency.
Sequence Markersare really a subset of chunk markers because they also identify
locations within a sentence that contain answers. These are conjunction terms
such as”and” and commas. The presence of these terms is a good indicator that
the multiple answers can often be found within a single sentence.
3.2.2.2 Identifying Patterns
Having identified a set of relevant terms, the next step is to use these relevant terms
to construct surface text patterns. The sentences need to be compared using a diff-
like algorithm that looks for similar terms within a pair of sentences. This similarity
search is performed using the Smith-Waterman-Gotoh algorithm described in Chapter
2. Before sentences can be compared, they are first cleaned and simplified to ensure
that the algorithm can reliably identify good text patterns.
To ensure that the algorithm only focus on terms near the question target and can-
didate answer, all terms falling outside a three-chunk window are dropped. The two
BIO chunk-tagged sentences in Table 3.5 illustrate this process. In the two examples,
our question target isBing Crosbyand the candidate answer isWhite Christmas. Terms
falling within the three-chunk window, highlighted in boldface, are retained while the
Chapter 3. The LiQED System 22
remaining terms are dropped.
The BNP record INP was BVP previously IVP held IVP by BPP
Bing BNP Crosby INP ’s BNP ‘‘ INP White INP Christmas INP
. O ’’ O
[ Bing BNP Crosby INP and INP Fred INP Astaire INP star INP
in BPP ‘‘ O ] Holiday BNP Inn INP , O ’’ O the BNP 1942 INP
musical INP featuring BVP [ the BNP classic INP song INP ‘‘ O
White BNP Christmas INP , O ’’ O ] for BPP $ BNP 9.98 INP . O
Table 3.5: Chunk tagged sentences. retained terms are highlighted in boldface.
If there are long distance dependencies between the question target and candidate
answer, there will be two sentence fragments. Square brackets,[ and] will be used
to delineate the two sentence fragments. If there is no long distance dependency, there
will only be a single, longer sentence fragment.
The removal of noisy terms is performed by a term replacement function. The
function determine if each term in the remaining sequence of terms is either a member
of the set of question target terms,Tq, the set of candidate answer terms,Ta, the set of
relevant terms,R, or does not fall within any set. The function then replaces the term
with a representative label depending on membership of the term. Table 3.3 shows the
original sentences and Table 3.6 shows the cleaned and simplified sentence fragments.
Replace(t) =
t ∈ Tq <QUESTION>
t ∈ Ta <ANSWER>
t ∈ R t
otherwise *
A pairwise comparison between these sentence fragments are then performed us-
ing the Smith-Waterman-Gotoh Algorithm to create our surface text patterns. Not
all generated text patterns are usable as some patterns will not contain<QUESTION>
or <ANSWER> tags. Other patterns do contain both tags but only consist of wild-
card,*, terms and stop words which would pick up too many false positives. Both
types of unusable patterns are removed, leaving only text patterns that contain both
Chapter 3. The LiQED System 23
* * <QUESTION> ’s ‘‘ <ANSWER> . ’’
* to <QUESTION> ’s ‘‘ <ANSWER> . ’’
[ <QUESTION> and * * * in ‘‘ ] [ ‘‘ <ANSWER> , ’’ ]
‘‘ <ANSWER> ’’ with <QUESTION> and * *
[ ‘‘ <ANSWER> , ’’ * * ] [ * * <QUESTION> , * * ]
bob hope and <QUESTION> * in ’’ <ANSWER> to *
bob hope and <QUESTION> in ‘‘ The <ANSWER> to *
[ * ‘‘ <ANSWER> to * ] [ * * bob hope and <QUESTION> * and ]
Table 3.6: Cleaned and simplified sentence fragments.
a <QUESTION> and a<ANSWER> tag and does not contain solely wildcard and stop
words. Table 3.7 shows some of the final surface text patterns that can be used to iden-
tify new and distinct answers. For a full list of surface text patterns generated for this
example question, please refer to Appendix C.
<QUESTION> ’s ‘‘ <ANSWER> ’’
[ ‘‘ <ANSWER> to ] [ bob hope and <QUESTION> ]
bob hope and <QUESTION> * ‘‘ <ANSWER>
bob hope and <QUESTION> * <ANSWER>
the ‘‘ <ANSWER> * ’’ <QUESTION>
Table 3.7: Answer-finding text patterns for ”What movies was Bing Crosby in?”
3.2.3 Extracting Answers
Now that text patterns have been generated for each question, it is a simple matter of
searching the Aquaint corpus for sentences that match these text patterns. For every
matching sentence found, the text pattern also identifies a window of between one to
three chunks that potentially contains an answer. This level of granularity is not suffi-
cient for the purpose of question answering as the exact answer has not been provided.
And additional step is required to extract the exact answer from these answer chunks.
Chapter 3. The LiQED System 24
3.2.3.1 Named Entity Based Answer Extraction
The QED system provides a fine-grained expected answer type which is the system’s
prediction of type of answer required for a question. A series of simple regular ex-
pressions was used to map QED’s expected answer type to a named entity type. The
LingPipe named entity tagger is then used to sift through the answer chunks, looking
for labeled named entities of the correct type. These named entities are the final answer
candidates generated by LiQED. Table 3.8 shows the series of regular expressions used
to map QED’s expected answer type to LingPipe’s named entity type.
There are limitations to using a course grained named entity tagger such as Ling-
Pipe to identify answer strings. Most significant is that LiQED is restricted to only
answering questions that expect the answer to be either a person, location or organiza-
tion.
This limitation caused by the choice to use a simple answer extraction mechanism,
and not a limitation of the thesis. Ideally a fine-grained named entity tagger that is able
to identify a larger variety of named entity types would better suit the task. However,
a suitable fine-grained named entity tagger could not be found and integrated in time
for the TREC evaluation.
Chapter 4 will detail further enhancements that expand on LiQED’s ability to only
answer person, location and organization type questions.
Chapter 3. The LiQED System 25
QED Expected Answer Type Name Entity Type
person PERSON
people PERSON
citizen PERSON
man PERSON
men PERSON
woman PERSON
women PERSON
mortal PERSON
adult PERSON
child PERSON
male PERSON
human PERSON
name PERSON
location LOCATION
city LOCATION
metropolis LOCATION
town LOCATION
village LOCATION
county LOCATION
province LOCATION
state LOCATION
country LOCATION
nation LOCATION
organization ORGANIZATION
organisation ORGANIZATION
company ORGANIZATION
business ORGANIZATION
institution ORGANIZATION
Table 3.8: Mapping QED’s expected answer type to LingPipe’s named entity type
Chapter 4
LiQED in TREC
This chapter describes some further modifications and optimizations made to LiQED
just two weeks prior to the release of the 2005 TREC Question Answering Track main
task test set. The focus of this phase of work is to enhance and optimize the basic
system.
4.1 Improved Answer Extraction
As alluded in Chapter 3, the simple named entity answer extraction mechanism will
fail if the answer type to a question is not a person, location or organization. Most
of the time had been spent developing and fine-tuning LiQED’s implementation of
the DIPRE algorithm, the core of this thesis, leaving less than two weeks to address
this limitation in answer extraction. Thus quick and easy to implement solutions were
required.
The key issue in extracting answers is that LiQED by itself is only able to identify a
sentence fragment that may contain an answer. The system does have any information
to assist it in identifying an answer within those sentence fragments. LiQED has to
rely on external sources to extract correct answers.
For persons, locations and organization answer types, a named entity tagger is used.
The assumption here is that if LiQED identifies that a sentence fragment contains an
answer and the named entity tagger identifies an answer of the correct type within
26
Chapter 4. LiQED in TREC 27
the sentence fragment, then there is a high chance that the answer identified by both
LiQED and the tagger is a correct answer. This is essentially a simple form of ensemble
learning or boosting (See R.Meir and G.Ratsch (2003)).
This simple ensemble learning technique can be applied to other information sources.
4.1.1 Hypernyms for Answer Extraction
Table 4.1 lists the expected answer types of all list questions in the TREC 2005 test set.
These answer types are essentially class labels or hypernyms, words whose meaning
denotes a superordinate or superclass. Conversely, hyponyms are words that denote
membership to a class. For example, animal is a hypernym of dog and dog is a hy-
ponym of animal.
date:date general:award general:character general:child
general:company general:competitor general:contestant general:course
general:eyewitness general:festival general:graduate general:group
general:holding general:horse general:individual general:leader
general:legionnaire general:manufacturer general:medal general:member
general:nationality general:occupation general:off general:official
general:opponent general:organization general:people general:person
general:personnel general:player general:position general:product
general:program general:puppet general:ship general:show
general:species general:student general:submarine general:team
general:theme general:thing general:variety general:victim
general:work location:country location:location name:name
publication:book publication:movie publication:song publication:title
Table 4.1: QED Expected Answer Types
Hearst (1992) defines a simple surface text pattern that is able to reliably identify
such hypernym-hyponym class relationships. The surface text pattern is”X such as
Y” , whereX is the class label andY is a member of classX. This text pattern can be
applied to a web search engine to identify members of a specific answer type.
Chapter 4. LiQED in TREC 28
Specifically for LiQED, this surface text pattern is expressed as a query to the
Google search engine via Google Web API. The first part of the query is a phrase search
for ”X such as”, whereX is the hyponym or expected answer type. The Google query
also includes the question target as context to ensure that potential answers are relevant
to the list question. Hypernyms are then extracted from snippets returned by Google
using a series of simple rules. Table 4.2 shows the query and the first ten hyponyms
extracted for question 136.7,What Shiite leaders were killed in Pakistan?While not
all extracted hyponyms are correct (the 9th hyponym in Table 4.2 is incorrect), it is
sufficient for the task.
Question Target Shiite
Answer Type general:leader
Google Query ”Shiite” AND ”leaders such as”
Extracted Hyponyms Abu Mazen
Ahmad Chalabi
Ahmad Shah Masud
Asi
Ayatollah Khomeini
Ayatollah Sistani
Ayman al-Zawahiri
Baburam
Blair who
Burhanuddin Rabbani
Table 4.2: Shiite leaders found by Google
The extracted hyponyms are cached and compared against sentence fragments
identified by LiQED as containing an answer. And if a hyponym is found within a
LiQED sentence fragment, it is flagged as an answer candidate. This technique allows
LiQED to expand beyond just answering questions that expect a person, location or
organization answer type.
Chapter 4. LiQED in TREC 29
4.1.2 Publication Answers
Amazon.com provides a web service API (Application Program Interface) that allow
programs to query Amazon’s database of books and music albums. For list questions
that expects a publication answer type, a query on the question target is made to Ama-
zon via the web service API. The results of the query are concatenated to the list of
hyponyms already found in the previous section. Again, any hyponyms identified in
the sentence fragments are flagged as LiQED’s answer candidates.
Out of 93 list questions in the TREC 2005 test set, there were 6 publication ques-
tions so the Amazon web service API proved to be useful. The publication list ques-
tions include:
Q76.7:What movies was Bing Crosby in?
Q97.5:List the Counting Crows’ record titles.
Q108.6:Name movies released by Sony Pictures Entertainment (SPE).
Q113.5:Name some of Paul Newman’s movies.
Q114.4:Name movies/TV shows Jesse Ventura appeared in.
Q121.3:What books did Rachel Carson write?
Table 4.3: Publication questions in the TREC 2005 Test Set
4.2 Google Reranking
A set of answers generated by a list question answering system such as LiQED can
be treated as a ranked list of answers. For example, a threshold can be applied to
answers ranked by confidence, which ensures that precision can be improved while
still retaining good recall scores.
Since LiQED uses surface text pattern matching, a sentence fragment is deemed to
contain an answer if it matches any of the automatically generated text patterns. There
is no notion of confidence with pattern matching. However, by counting the number
of times the same answer is picked up by our automatically generated text patterns, a
rough level of confidence can be given.
Chapter 4. LiQED in TREC 30
To improve on this confidence, Google is used to rerank the answers by the degree
of correlation the answer has to the question target. The idea is to improve the ranking
of answers that are more related to the question target and decrease the ranking of
answers that do not seem to be related. A simple correlation formula is used.
Correlation(x,y) = 0.5
(Count(x∧ y)
Count(x)+
Count(x∧ y)Count(y)
)wherex is the question target andy is an answer.Count(x) is the number of relevant
documents Google returns when the query isx. The computed correlation score is then
equally weighted with a simple linear rank score of the answer.
Con f idence(x) = 0.5
(Correlation(target,x)+
NumAnswers−Rank(x)NumAnswers
)whereNumAnswersis the number of distinct answers LiQED found andRank(x)
is LiQED’s original ranking for answerx. The answers are reranked in descending
order based on this confidence score. Via experimentation, it is found that answers
with a confidence score of 0.5 or less was usually wrong and are thus removed. The
remaining answers constitutes the final reranked list of answers.
4.3 Answer Merging
The answers generated by QED tends to be different from answers generated by LiQED.
This is due to both systems applying very different techniques in generating their an-
swer set. To get the best recall performance, the best answers from both systems will
need to be merged.
Simply combing all answers from both systems will result in a large number of
answers and potentially adversely affecting the precision score. Thus there is a need
to strike a balance between precision and recall. After several rounds of trial and error
testing, the following procedure was used to create the final answer set for LiQED.
1. Select the top 10 answers from QED.
2. Add the top 10 answer from LiQED that are not identical to any of the top 10
QED answers.
Chapter 4. LiQED in TREC 31
This procedure creates the final set of LiQED answers that was submitted for eval-
uation as run 2 in TREC 2005. The set of answers generated by QED itself was sub-
mitted as run 1. This setup allows easily comparison between the two systems. The
next chapter analyzes the performance of LiQED and compare its performance with
QED.
Chapter 5
Evaluation and Analysis
Chapters 3 and 4 detailed the construction of the LiQED system. This chapter exam-
ines and analyzes the performance of LiQED on the TREC 2005 question answering
test set.
The official TREC evaluations allow a maximum of three distinct runs, such that
answers from each run is separately evaluated. For list questions, we submitted the
top 12 answers from QED for the first run. The second run gave the top 10 answers
from QED and LiQED as described in 4. In addition to the top 7 answers from QED
and LiQED, the final run also included the top 7 answers from TOQA, a topic based
question answering system by Kisuh Ahn.
This chapter focus mainly on comparing QED answers in run 1 against run 2 which
is a combination of the top answers from QED with additional answers from LiQED.
The third run with TOQA will not be discussed here as the addition of a third system
introduces too many new variables and makes it hard to determine the behavior of the
three combined systems.
5.1 Training Set - TREC 2004
The 56 list questions from TREC 2004 were used as a training set to tune LiQED. Table
5.1 shows the precision and recall scores of LiQED on the training set. The precision
and recall formulas used here are the formula for instance precision and instance recall
32
Chapter 5. Evaluation and Analysis 33
as defined in the TREC QA Task, see Voorhees (2004):
A systems response to a list question was scored using instance pre-cision (IP) and instance recall (IR) based on the list of known instances.Let Sbe the the number of known instances,D be the number of correct,distinct responses returned by the system, andN be the total number ofresponses returned by the system. ThenIP = D/N andIR = D/S. Preci-sion and recall were then combined using the F measure with equal weightgiven to recall and precision,F = 2×IP×IR
IP+IR .
The list of known instances for a question is the set of all correct answers found by
systems that have participated in the year’s QA task. Table 5.1 shows the performance
of both QED and LiQED on the 2004 question set. Since LiQED operates by taking
QED’s answers as examples to look for more answers of a similar type, the recall score
is indicative of the degree by which LiQED improves QED’s performance. There is
a 0.0390 improvement in recall and a 0.0081 decrease in precision. Overall, run 2
improves on run 1’s F-score by 0.0083. This is sufficient to place the system as the 4th
position for list questions in last year’s evaluation.
Run Precision Recall F-Score
QED 0.1927 0.2419 0.2145
QED + LiQED 0.1846 0.2809 0.2228
QED + LiQED + TOQA 0.1719 0.2761 0.2119
Table 5.1: QED and LiQED performance on TREC 2004 Training Set
5.2 Test Set - TREC 2005
The TREC 2005 question set was released on July 21, 2005 and participating systems
were required to submit their answers a week later on July 28, 2005 (An extension of
1 day was given due to a temporary hardware failure of the NIST webserver that was
hosting the questions and answer submission form). The question set has a total of 600
questions covering 75 different topics (question targets). Each topic asks between 0 to
2 list questions. In total, there were 93 list questions in the test set.
Chapter 5. Evaluation and Analysis 34
The number of surface text patterns generated by LiQED can vary from none for
questions where QED provided no answers, to over a thousand patterns. An average
of 182.2 surface text patterns were generated for each question and in total, LiQED
generated 16,944 text patterns.
From these patterns, LiQED found 824 distinct answers or an average of 8.9 an-
swers per question. After reranking and filtering the answers using Google, the system
was left with 379 distinct answers or an average of 4.08 answers per question. These
remaining answers were then added to QED’s top 10 answers, resulting in 956 submit-
ted answers or 10.3 answers per question.
5.2.1 Evaluation
The official evaluation results for TREC 2005 will only be released in November 2005,
too late to be included in this report. Instead of the official evaluation score, these are
preliminary scores from a personal evaluation performed on the answers generated by
both QED and LiQED. At this point, I need to add a cautionary note to the reader. I
have been as unbiased as possible in my evaluation of my own system. However, one is
only human and thus the results shown below should be seen only as interim results and
will be superseded by the official TREC evaluation results. Also, the precision score
for list questions is dependent on the set of all correct answers given by all systems
that submitted answers to TREC for evaluation. Obviously, the set of correct answers
given by all systems will not be known until the official evaluation results are revealed.
Thus only recall scores will be evaluated.
Out of the 93 list questions in the test set, there was some ambiguity between ques-
tion Q128.3,”What countries constitute the OPEC committee?”and question Q128.5,
”What are OPEC countries?”. The two questions were asking for very similar an-
swers and the few people we polled were unable to distinguish any difference between
an OPEC country and an OPEC committee country. We therefore presume that both
questions were essentially the same question, phrased in different forms. For that rea-
son one question, Q128.3, was dropped from evaluation leaving 92 questions in the
question set.
Table 5.2 only shows the recall scores of QED and LiQED for the 92 list questions,
Chapter 5. Evaluation and Analysis 35
as well as for a subset of 64 questions that will be discussed in a later section. Like
the training set, there is an improvement of 0.0140 to recall. While the amount of
improvement is smaller compared to the improvement achieved with the training set,
it is not unexpected for various reasons discussed in the next sections.
Run Recall, 64 Questions Recall, 92 Questions
QED 0.1271 0.1445
QED + LiQED 0.1529 0.1585
QED + LiQED + TOQA 0.1507 0.1409
Table 5.2: QED and LiQED performance on TREC 2005 Test Set
5.2.2 Event Questions
Unlike the TREC 2004 questions, this year’s questions included questions on tempo-
ral events. The events mainly focused on current day events like the 1998 Nagano
Olympic Games and the Port Arthur Massacre. However, there were also questions
on past events like the Hindenburg disaster. From the perspective of a Question An-
swering system, events can generally be classified under one of three different types of
events:
Named Event A significant, one-off event that are important enough to be given a
name. Examples include the Hindenburg Disaster, Port Arthur Massacre.
Unnamed Event A minor, one-off event not important to be named. Examples in-
clude the 1998 indictment and trial of Susan McDougal, the first 2000 Bush-
Gore presidential debate, a plane clipping cable wires in an Italian resort. The
difficulty with an unnamed event is in identifying articles and passages that are
relevant to the event. Often, a named event may actually start as an unnamed
event. For example, an event involving a plane crashing into the World Trade
Center in 2001 was not called the ”9-11 Attack” until it was established to be the
work of terrorists.
Chapter 5. Evaluation and Analysis 36
Periodic Event An event that occurs periodically, perhaps quarterly or annually. These
include holidays or major festivals such as Christmas, Halloween and the Edin-
burgh Fringe Festival. Sport events like the Olympics, Superbowl and Wimble-
don are also periodic events. Similar to unnamed events, it can be difficult to
disambiguate. After all, Olympics 2000 in Sydney is different from Olympics
1996 in Atlanta.
Event-based questions were particularly difficult for LiQED and there are two rea-
sons why this is so. Firstly, event questions are new and up till a week before the actual
questions were revealed, there were no example questions to train or tune LiQED on.
The more critical issue to LiQED was that some of the events does not have a proper
name. LiQED relies heavily on a well defined question target to identify relevant sen-
tences. When the question target is an unnamed event like”France wins World Cup
in soccer” or a periodic event like the Olympics game, LiQED has difficulty identify-
ing relevant sentences and thus fails to identify new answers. Table 5.3 lists the event
targets in this year’s question set. Unnamed events are marked with an asterisk (∗) and
periodic events are marked with a cross (+).
Out of the 93 list questions in the test set, 20 questions were based on events. Of
the 20 event questions, LiQED only found additional answers for one question. In
comparison, out of the 92 evaluated questions LiQED found additional answers for 14
questions.
5.2.3 Semantically Deep Questions
Unlike the TREC 2004 question set, the TREC 2005 question set contained more chal-
lenging questions that require a Question Answering system with some form of se-
mantic reasoning or inferencing module. Table 5.4 list some of these questions.
As LiQED uses purely shallow methods to identify answers, it is unable to answer
any of these questions correctly. For these questions, LiQED either found no new an-
swers or found many incorrect answers. This is perfectly illustrated in Table 5.5 which
list the answers for two questions Q77.6”Name opponents who Foreman defeated”
and Q77.7”Name opponents who defeated Foreman”. Ideally, the union of these two
Chapter 5. Evaluation and Analysis 37
1980 Mount St. Helens eruption
1998 Nagano Olympic Games+
1998 Baseball World Series+
1998 indictment and trial of Susan McDougal∗
1999 North American International Auto Show+
Boston Big Dig
Crash of EgyptAir Flight 990∗
first 2000 Bush-Gore presidential debate∗
France wins World Cup in soccer∗
Hindenburg disaster
Kip Kinkel school shooting∗
Miss Universe 2000 crowned+
Plane clips cable wires in Italian resort∗
Port Arthur Massacre
Preakness 1998+
return of Hong Kong to Chinese sovereignty∗
Russian submarine Kursk sinks∗
Super Bowl XXXIV +
Table 5.3: Event-based question targets in TREC 2005
sets of answers should form a disjoint set. In the case of LiQED, the two answer sets
are identical.
Combined, the event questions and semantically deep questions make up 28 out of
92 questions in the test set. In other words LiQED is unable to answer or has difficultly
answering nearly a third of the list questions. Discounting these questions leave 64
questions in the question set. Table 5.2 shows the recall scores for this subset of 64
questions. As anticipated LiQED’s recall performance is better, improving QED’s
score by a respectable 0.0258.
Comparing the recall scores between the 64 question set and 92 question set, it is
clear that QED did not have problems with the 28 event or inference-type question
Chapter 5. Evaluation and Analysis 38
QID Question Target Question
Q67.6 Miss Universe
2000 crowned
Name other contestants (besides Miss Universe).
Q77.6 George Foreman Name opponents who Foreman defeated.
Q77.7 George Foreman Name opponents who defeated Foreman.
Q81.2 Preakness 1998 List other horses who won the Kentucky Derby and
Preakness but not the Belmont.
Q100.7 Sammy Sosa Name the pitchers off of which Sosa homered.
Q119.4 Harley-Davidson What other products (beside motorcycles) do they
produce?
Q123.5 Vicente Fox What countries did Vicente Fox visit after election?
Q126.3 Pope Pius XII What official positions did he hold prior to becoming
Pius XII?
Q133.3 Hurricane Mitch As of the time of Hurricane Mitch, what previous hur-
ricanes had higher death totals?
Q137.3 Kinmen Island What other island groups are controlled by this gov-
ernment (Taiwan)?
Table 5.4: Questions that require semantic inferencing
since its recall score increased from 0.1271 to 0.1445. Conversely, LiQED’s recall
score only increase by a mere 0.0056, indicating that it failed to identify answers for
most of the event or inference-type questions.
5.2.4 Answer Extraction
The majority of work on this thesis was focused on implementing the adapted DIPRE
question answering algorithm. As a consequence, not as much effort was placed on
answer extraction and this has had some negative effect on the overall performance of
LiQED. Typically the automatically generated text patterns are able to identify several
hundred relevant sentences and due to the amount of time required to manually exam-
ine every sentence, a strict evaluation of all questions is not possible. However, just
Chapter 5. Evaluation and Analysis 39
Opponents who Foreman defeated. Opponents who defeated Foreman.
George Foreman George Foreman
Joe Frazier Joe Frazier
Ken Norton Ken Norton
Sonny Sonny
Archie Moore Archie Moore
Table 5.5: LiQED answers for questions the two questions on boxer George Foreman.
examining the example question used in Chapter 3,”What movies was Bing Crosby
in?” clearly shows that answer extraction can be improved. Table 5.6 list the correct
answers identified by the text patterns compared with the actual set of extracted an-
swers that constitute the final answer set. In total, 14 correct answers were identified
by patterns but only 5 answers were extracted.
Answers Found by Patterns Answers Extracted
Birth of the Blues Rhythm on the RangeGoing My Way
East Side of Heaven Road to Morocco High Society
Going My Way Road to Singapore Holiday Inn
High Society Road to Utopia Road to Zanzibar
Holiday Inn Road to Zanzibar White Christmas
Legend of Sleepy Hollow Waikiki Wedding
Pennies From Heaven White Christmas
Table 5.6: Identified and extracted answers for the question, ”What movies was Bing
Crosby in?”
Chapter 6
Conclusion
6.1 Conclusion
This thesis set out to examine if it is possible to extrapolate on an existing set of an-
swers to identify more answers. Through the use of surface text patterns automatically
generated from commonality found within the initial answer set, new answers were
indeed found. The basic LiQED system was able to automatically capture question
specific relationships instead of pre-determined, broad-coverage relationships as done
in prior work by Hovy and Ravichandran. Table 6.1 show some of the question-specific
relationships where LiQED is able to positively extrapolate from the initial set of an-
swers to identify new answers. These relations tend to be deeper and more specific
than the 6 relationship types used by Hovy et al. (2002).
In general, the system requires at least two correct answers to be provided before it
is able to identify new correct answers. This is expected as a minimum of two correct
answers are required to find commonality. So long as there are two or more correct
answers, the generated patterns can and do pick up new answers. However the current
answer extraction implementation needs to be improved as many correct answers were
not extracted.
Besides using surface text patterns, other information can also be used as context
information. For example, part-of-speech can be included in the text pattern. Al-
ternatively, sentence structure can be used as context. In this case, new answers are
40
Chapter 6. Conclusion 41
<ACTOR>-<MOVIE>
<BOXER>-<OPPONENT>
<MUSEUM>-<ARTWORK>
<PARENT>-<CHILD>
<GOLFER>-<OPPONENT>
<GOLFER>-<GOLF COURSE>
<BAND>-<ALBUM>
<SINGER>-<SONG>
<LOCATION>-<PERSON>
<ORGANIZATION>-<MEMBER>
<PROJECT>-<ORGANIZATION>
Table 6.1: Some relationships identified by LiQED
extracted from sentences that conform with commonly occurring branches of parse
trees of answers in the initial answer set.
Several secondary goals were also achieved. These include:
1. A novel, shallow approach that enables the generation of surface patterns that is
able to match sentences containing long distance dependencies. This is achieved
through the use of word alignment instead of word matching via the Smith-
Waterman-Gotoh algorithm for pattern generation. One caveat is that this tech-
nique does not ensure coordination between the two constituents in a sentence.
2. Observation on the importance of prepositions, determinants and punctuation
symbols, especially conjunction symbols, in identifying answers to list ques-
tions. Typically these are considered as stop words and ignored. However, they
have proven to be useful as they tend to prefix answers.
In conclusion, the basic LiQED system has proven that the hypothesis is sound
and can be applied to identify new answers. The remainder of this chapter will cover
possible areas of enhancements to this basic system.
Chapter 6. Conclusion 42
6.2 Future Enhancements
While the basic concept has been shown to be feasible, there is quite a lot of room
for improvement. Here are some areas where further work will improve LiQED’s
performance.
6.2.1 Alternative Question Targets
One of the main issue in LiQED is caused by the over reliance on the question target. If
LiQED be unable to find sentences containing the question target, it would be unable to
generate patterns and thus unable to extract new answers. The introduction of unnamed
and periodic events in TREC 2005 further exacerbates this issue. One possible method
to alleviate this problem is to identify alternative question targets that are synonymous
to the original question target. For example, the1998 Nagano Olympic Gamesis often
referred to as theNagano Olympicsor the1998 Olympics. Such alternative question
targets will enable LiQED to better find matching sentences.
6.2.2 Fine-Grained Named Entity Recognition
As detailed in Chapters 3 and 4, one of the issue with the LiQED system is that it
is only able to identify a sentence fragment that contains an answer, and requires an
external information source to confirm the extract answers. This is where the use
of a fine-grained named entity tagger that is able to identify more than just persons,
locations and organizations would help the system better identify answers to more
question types.
6.2.3 Anaphora Resolution
Currently, LiQED only construct text patterns searching for sentences that contain
words from both the question target and a candidate answer. However not all sen-
tences that contain an answer will also contain words from the question target. Very
likely, the question target is mentioned in a sentences and answers will be found in the
suceeding sentences. In cases like these, it may be possible to apply anaphora reso-
Chapter 6. Conclusion 43
lution to extend pattern matching beyond a single sentence by matching the question
target in the first sentence with answer antecedents in subsequent sentences, especially
questions on unnamed events.
6.2.4 Answer Verification
LiQED uses purely shallow methods to identify potential answers. Due to the use of
context, answers generated by LiQED tends to be more answers belonging to the same
class than correct answers. This is evident from the two questions that ask for boxers
that George Foreman defeated and boxers that defeated George Foreman. LiQED gave
the exact same answers to both questions as it could distinguish boxers from non-
boxers but was unable to distinguish who the winner is. In other words, LiQED itself
is unable to determine if a potential answer is actually a correct answer. Ideally, some
form of post-hoc inference module should be used to verify if an answer produced by
LiQED is indeed a correct answer. Such a module would also address the needs of
answering inference-based questions.
If implemented, all the improvements discussed in this section should reliably im-
prove both question coverage as well as the recall of LiQED.
Appendix A
TREC 2004 List Questions
1. Crips
1.3 Which cities have Crip gangs?
2. Fred Durst
2.3 What are titles of the group’s releases?
3. Hale Bopp comet
3.3 In what countries was the comet visible on its last return?
4. James Dean
4.4 What movies did he appear in?
5. AARP
5.5 What companies has AARP endorsed?
6. Rhodes scholars
6.3 Name famous people who have been Rhodes scholars.
6.4 What countries have Rhodes scholars come from?
7. agouti
7.3 In what countries are they found?
8. Black Panthers
8.4 Who have been members of the organization?
44
Appendix A. TREC 2004 List Questions 45
9. Insane Clown Posse
9.1 Who are the members of this group?
9.2 What albums have they made?
10. prions
10.3 What diseases are prions associated with?
10.4 What researchers have worked with prions?
11. the band Nirvana
11.2 Who are the band members?
11.5 What are their albums?
15. Rat Pack
15.1 Who are the members of the Rat Pack?
16. cataract
16.3 Who are doctors that have performed cataract surgery?
18. boxer Floyd Patterson
18.6 List the names of boxers he fought.
20. Concorde
20.2 What airlines have Concordes in their fleets?
21. Club Med
21.2 List the spots in the United States.
22. Franz Kafka
22.4 What books did he author?
24. architect Frank Gehry
24.4 What prizes or awards has he won?
24.5 What buildings has he designed?
25. Harlem Globe Trotters
25.4 What countries have they played in?
Appendix A. TREC 2004 List Questions 46
26. Ice-T
26.5 What are names of his albums?
30. minstrel Al Jolson
30.5 What songs did he sing?
31. Jean Harlow
31.7 What movies did she appear in?
31.8 What leading men did she star opposite of?
32. Wicca
32.4 What festivals does it have?
34. Amtrak
34.5 Name cities that have an Amtrak terminal.
36. Khmer Rouge
36.4 Who were leaders of the Khmer Rouge?
37. Wiggles
37.2 Who are the members’ names?
37.4 List the Wiggles’ songs.
38. quarks
38.4 What are the different types of quarks?
39. The Clash
39.3 Name their songs.
41. Teapot Dome scandal
41.4 Who were the major players involved in the scandal?
43. Nobel prize
43.2 What are the different categories of Nobel prizes?
45. International Finance Corporation (IFC)
45.3 What countries has the IFC financed projects in?
Appendix A. TREC 2004 List Questions 47
47. Bashar Assad
47.5 What schools did he attend?
48. Abu Nidal
48.4 In what countries has he operated from?
50. Cassini space probe
50.4 What planets will it pass?
51. Kurds
51.3 What other countries do Kurds live in?
52. Burger King
52.5 What countries is Burger King located in?
53. Conde Nast
53.4 What magazines does Conde Nast publish?
54. Eileen Marie Collins
54.6 What schools did she attend?
55. Walter Mosley
55.4 What books has he written?
56. Good Friday Agreement
56.3 What groups are affected by it?
56.4 Who were the key players in negotiating the agreement?
58. philanthropist Alberto Vilar
58.2 What organizations has he donated money to?
58.4 What companies has he invested in?
61. Muslim Brotherhood
61.4 What countries does it operate in?
61.5 Name members of the group.
62. Berkman Center for Internet and Society
Appendix A. TREC 2004 List Questions 48
62.4 Name members of the center.
63. boll weevil
63.3 What states have had problems with boll weevils?
64. Johnny Appleseed
64.5 In what states did he plant trees?
65. space shuttles
65.1 What are the names of the space shuttles?
Appendix B
TREC 2005 List Questions
66. Russian submarine Kursk sinks
66.5 Which countries expressed regret about the loss?
66.7 Which U.S. submarines were reportedly in the area?
67. Miss Universe 2000 crowned
67.6 Name other contestants.
68. Port Arthur Massacre
68.6 What were the names of the victims?
68.7 What were the nationalities of the victims?
69. France wins World Cup in soccer
69.7 Name players on the French team.
70. Plane clips cable wires in Italian resort
70.7 Who were on-ground witnesses to the accident?
71. F16
71.6 What countries besides U.S. fly F16s?
72. Bollywood
72.6 Who are some of the Bollywood stars?
49
Appendix B. TREC 2005 List Questions 50
73. Viagra
73.6 In what countries could Viagra be obtained on the black market?
74. DePauw University
74.6 Name graduates of the university.
75. Merck & Co.
75.5 Name companies that are business competitors.
75.7 Name products manufactured by Merck.
76. Bing Crosby
76.7 What movies was he in?
77. George Foreman
77.6 Name opponents who Foreman defeated.
77.7 Name opponents who defeated Foreman.
78. Akira Kurosawa
78.7 What were some of his Japanese film titles?
79. Kip Kinkel school shooting
79.3 List students who were shot by Kip Kinkel.
80. Crash of EgyptAir Flight 990
80.6 Identify the nationalities of passengers on Flight 990.
81. Preakness 1998
81.2 List other horses who won the Kentucky Derby and Preakness but not theBelmont.
82. Howdy Doody Show
82.3 Name the various puppets used in the ”Howdy Doody Show”.
82.4 Name the characters in the show.
83. Louvre Museum
83.4 Name the works of art that have been stolen from the Louvre.
Appendix B. TREC 2005 List Questions 51
84. meteorites
84.7 Provide a list of names or identifications given to meteorites.
85. Norwegian Cruise Lines (NCL)
85.1 Name the ships of the NCL.
85.6 Name so-called theme cruises promoted by NCL.
86. Sani Abacha
86.5 Name the children of Sani Abacha.
87. Enrico Fermi
87.4 List things named in honor of Enrico Fermi.
88. United Parcel Service (UPS)
88.4 In what foreign countries does the UPS operate?
89. Little League Baseball
89.3 What Little League teams have won the World Series?
90. Virginia wine
90.1 What grape varieties are Virginia wines made from?
90.5 Name the Virginia wine festivals.
91. Cliffs Notes
91.3 Give the titles of Cliffs Notes Condensed Classics.
92. Arnold Palmer
92.3 What players has Arnold competed against in the Skins Games?
92.4 Which golf courses were designed by Arnold?
93. first 2000 Bush-Gore presidential debate
93.7 Who helped the candidates prepare?
94. 1998 indictment and trial of Susan McDougal
94.4 Who testified for Mrs. McDougal’s defense?
95. return of Hong Kong to Chinese sovereignty
Appendix B. TREC 2005 List Questions 52
95.5 What other countries formally congratulated China on the return?
96. 1998 Nagano Olympic Games
96.3 Who won gold medals in Nagano?
97. Counting Crows
97.5 List the Crows’ record titles.
97.6 List the Crows’ band members.
98. American Legion
98.5 List Legionnaires.
99. Woody Guthrie
99.1 List Woody Guthrie’s songs.
100. Sammy Sosa
100.7 Name the pitchers off of which Sosa homered.
101. Michael Weiss
101.7 List Michael Weiss’s competitors.
102. Boston Big Dig
102.6 List individuals associated with the Big Dig.
103. Super Bowl XXXIV
103.6 List players who scored touchdowns in the game.
104. 1999 North American International Auto Show
104.4 List auto manufacturers in the show.
105. 1980 Mount St. Helens eruption
105.6 List names of eyewitnesses of the eruption.
106. 1998 Baseball World Series
106.6 Name the players in the series.
107. Chunnel
Appendix B. TREC 2005 List Questions 53
107.6 List dates of Chunnel closures.
108. Sony Pictures Entertainment (SPE)
108.4 Name movies released by SPE.
108.5 Name TV shows by the SPE.
109. Telefonica of Spain
109.5 Name companies involved in mergers with Telefonica of Spain.
110. Lions Club International
110.5 Name officials of the club.
110.6 Name programs sponsored by the Lions Club.
111. AMWAY
111.4 Name the officials of the company.
112. McDonald’s Corporation
112.5 Name the corporation’s top officials.
112.6 Name the non-hamburger restaurant holdings of the corporation.
113. Paul Newman
113.4 Name the camps started under his Hole in the Wall Foundation.
113.5 Name some of his movies.
114. 114. Jesse Ventura
114.3 List his various occupations.
114.4 Name movies/TV shows he appeared in.
115. Longwood Gardens
115.7 List personnel of the gardens.
116. Camp David
116.6 Who are some world leaders that have met there?
117. kudzu
117.4 What are other names it is known by?
Appendix B. TREC 2005 List Questions 54
118. U.S. Medal of Honor
118.4 What Medal of Honor recipients are in Congress?
119. Harley-Davidson
119.4 What other products do they produce?
120. Rose Crumb
120.5 What awards has she received?
121. Rachel Carson
121.3 What books did she write?
122. Paul Revere
122.7 What were some of his occupations?
123. Vicente Fox
123.5 What countries did Vicente Fox visit after election?
124. Rocky Marciano
124.6 Who were some of his opponents?
125. Enrico Caruso
125.1 What operas has Caruso sung?
126. Pope Pius XII
126.3 What official positions did he hold prior to becoming Pius XII?
127. U.S. Naval Academy
127.6 List people who have attended the Academy.
128. OPEC
128.3 What countries constitute the OPEC committee?
128.5 List OPEC countries.
129. NATO
129.4 Which countries were the original signers?
Appendix B. TREC 2005 List Questions 55
130. tsunami
130.5 What countries has it struck?
131. Hindenburg disaster
131.7 Name individuals who witnessed the disaster.
132. Kim Jong Il
132.4 What posts has Kim Jong Il held in the government of this country?
133. Hurricane Mitch
133.3 As of the time of Hurricane Mitch, what previous hurricanes had higherdeath totals?
133.4 What countries offered aid for this hurricane?
134. genome
134.2 List species whose genomes have been sequenced.
134.3 List the organizations that sequenced the Human genome.
135. Food-for-Oil Agreement
135.5 What countries participated in this agreement by providing food or medi-cine?
136. Shiite
136.7 What Shiite leaders were killed in Pakistan?
137. Kinmen Island
137.3 What other island groups are controlled by this government?
138. International Bureau of Universal Postal Union (UPU)
138.3 Where were UPU congresses held?
139. Organization of Islamic Conference (OIC)
139.2 Which countries are members of the OIC?
139.3 Who has served as Secretary General of the OIC?
140. PBGC
140.4 Employees of what companies are receiving benefits from this organiza-tion?
Appendix C
Sample LiQED Pattern File
The following file,liqed.list.pat, was generated by the Pattern Generation phase
for question 76.7,What movies was he in?(He, in this case, is Bing Crosby). The text
patterns use a series of symbolic representation and markers.<QUESTION> refers to
the question target, Bing Crosby, and<ANSWER> refers to the location of a possible
answer. An asterisk,*, is a wildcard that can match any single word. The square
brackets,[ ] signify a contiguous sentence fragment or chunk.
% Q76.7: What movies was he in ?%% Question topic: Bing Crosby (233 sentences)% Potential answer: White Christmas (190 sentences)% Potential answer: Mr. Tambourine Man (23 sentences)% Potential answer: Looking Forward (10092 sentences)% Potential answer: High Society (129 sentences)% Potential answer: Road (77370 sentences)% Potential answer: Southern Man (32 sentences)%% "Bing Crosby" and "White Christmas" (21 sentences)% "Bing Crosby" and "Mr. Tambourine Man" (0 sentences)% "Bing Crosby" and "Looking Forward" (0 sentences)% "Bing Crosby" and "High Society" (2 sentences)% "Bing Crosby" and "Road" (12 sentences)% "Bing Crosby" and "Southern Man" (0 sentences)
56
Appendix C. Sample LiQED Pattern File 57
%% Most Relevant Terms :% "to", "the", "and", "’s", "’’", "with",% ".", "bob", "hope", "in", "‘‘", ","%% Patterns:, ‘‘ <ANSWER> * * * ’’ with <QUESTION>, ‘‘ <ANSWER> * ’’ * <QUESTION> *, <QUESTION> ‘‘ <ANSWER> * ’’<ANSWER> ’’ * <QUESTION><ANSWER> ’’ * <QUESTION> .<ANSWER> ’’ with <QUESTION><ANSWER> * ’’ * * <QUESTION><ANSWER> * ’’ * <QUESTION><ANSWER> * ’’ with <QUESTION> .<QUESTION> ’s ‘‘ <ANSWER><QUESTION> ’s ‘‘ <ANSWER> ’’<QUESTION> ’s ‘‘ <ANSWER> ’’ and<QUESTION> ’s ‘‘ <ANSWER> * ’’<QUESTION> ’s ‘‘ <ANSWER> . ’’<QUESTION> * ‘‘ * <ANSWER><QUESTION> * ‘‘ * <ANSWER> * to<QUESTION> * ‘‘ <ANSWER><QUESTION> * ‘‘ <ANSWER> ’’<QUESTION> * ‘‘ <ANSWER> ’’ * the<QUESTION> * ‘‘ <ANSWER> * ’’<QUESTION> * ‘‘ <ANSWER> * to<QUESTION> * ‘‘ <ANSWER> . ’’<QUESTION> ‘‘ <ANSWER><QUESTION> ‘‘ <ANSWER> * ’’<QUESTION> ‘‘ <ANSWER> * ,<QUESTION> ‘‘ <ANSWER> . ’’[ * ‘‘ * <ANSWER> to ] [ bob hope and <QUESTION> ][ * ‘‘ <ANSWER> to ] [ bob hope and <QUESTION> , and ][ , * ‘‘ * <ANSWER> to ] [ bob hope and <QUESTION> ][ <ANSWER> to ] [ * * * * * bob hope and <QUESTION> * ][ ‘‘ <ANSWER> to ] [ bob hope and <QUESTION> ][‘‘ <ANSWER> to ] [ bob hope and <QUESTION> * and ]‘‘ <ANSWER> ’’ * <QUESTION>‘‘ <ANSWER> ’’ <QUESTION>‘‘ <ANSWER> * ’’ * * <QUESTION>‘‘ <ANSWER> * ’’ * <QUESTION>
Appendix C. Sample LiQED Pattern File 58
‘‘ <ANSWER> * ’’ * <QUESTION> .‘‘ <ANSWER> * ’’ <QUESTION>‘‘ <ANSWER> * ’’ with <QUESTION>‘‘ <ANSWER> , ’’ * * <QUESTION>‘‘ <ANSWER> , ’’ * * <QUESTION> ,‘‘ <ANSWER> , ’’ * <QUESTION> * ]‘‘ <ANSWER> , ’’ <QUESTION> ,bob hope and <QUESTION> * * <ANSWER>bob hope and <QUESTION> * * <ANSWER> tobob hope and <QUESTION> * <ANSWER>bob hope and <QUESTION> * ‘‘ * <ANSWER>bob hope and <QUESTION> * ‘‘ <ANSWER>bob hope and <QUESTION> in * * <ANSWER> tohope and <QUESTION> * * * ‘‘ * <ANSWER> tohope and <QUESTION> * * * ‘‘ the <ANSWER> tothe * bob hope and <QUESTION> * ‘‘ * <ANSWER> tothe ‘‘ <ANSWER> * ’’ <QUESTION>to <QUESTION> ’s ‘‘ <ANSWER> * ’’
Bibliography
Berners-Lee, T., Hendler, J., and Lasilla, O. (2001). The semantic web.Scientific
American.
Boyer, R. S. and Moore, J. S. (1997). A fast string search algorithm. InCommunica-
tions of the Association for Computing Machinery, volume 20, pages 762–772.
Brin, S. (1999). Extracting patterns and relations from the world wide web. InWebDB
’98: Selected papers from the International Workshop on The World Wide Web and
Databases, pages 172–183, London, UK. Springer-Verlag.
Gotoh (1982). An improved algorithm for matching biological sequences. InJournal
of Molecular Biology, number 162, pages 705–708.
Green, B., Wolf, A., Chomsky, C., and Laughery, K. (1961). Baseball: an automatic
question answerer. InProceedings of the Western Joint Computer Conference, pages
219–224.
Gusfield, D. (1997).Algorithms on Strings, Trees, and Sequences: Computer Science
and Computational Biology. Cambridge University Press.
Hearst, M. A. (1992). Automatic aquisition of hyponyms from large text corpora. In
59
Bibliography 60
Proceedings of the Fourteenth International Conference on Computational Linguis-
tics, Nantes, France.
Hovy, E., Hermjakob, U., and Ravichandran, D. (2002). A question/answer typol-
ogy with surface text patterns. InProceedings of the Human Language Technology
Conference, Seattle, CA.
Katz., B. (1997). From sentence processing to information access on the world wide
web. In In Proceedings of the AAAI Spring Symposium on Natural Language
Processing for the World Wide Web.
Knuth, D. E., Morris, J. H., and Pratt, V. R. (1977). Fast pattern matching in strings. In
Society for Industrial and Applied Mathematics Journal on Computing, volume 6,
pages 323–350.
Leidner, J., Bos, J., Dalmas, T., Curran, J. R., Clark, S., Bannard, C. J., Webber, B., and
Steedman, M. (2003). Qed: The edinburgh trec-2003 question answering system.
In Text REtrieval Conference.
Nelson, M. (1996). Fast string searching with suffix trees.Dr. Dobb’s Journal.
Ravichandran, D. and Hovy, E. H. (2002). Learning surface text patterns for a question
answering system. InACL, pages 41–47.
Rinaldi, F., Dowdall, J., Kaljurad, K., Hess, M., and Molla, D. (2003). Exploiting para-
phrases in a question answering system. InProceedings of the Second International
Workshop on Paraphrasing, pages 25–32.
R.Meir and G.Ratsch (2003). An introduction to boosting and leveraging. In
S.Mendelson and A.Smola, editors,Advanced Lectures on Machine Learning,
LNCS, pages 119–184. Springer. In press. Copyright by Springer Verlag.
Bibliography 61
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subse-
quences.Journal of Molecular Biology, 147:195–197.
Voorhees, E. M. (1999). The trec-8 question answering track report. InText REtrieval
Conference (TREC).
Voorhees, E. M. (2004). Overview of the trec 2004 question answering track. InText
REtrieval Conference.
Woods, W. (1973). Progress in natural language understanding: An application to
lunar geology. InAFIPS Conference Proceedings, 42, pages 441–450.
Yang, H., Cui, H., Kan, M.-Y., Maslennikov, M., Qiu, L., and Chua, T.-S. (2003).
Qualifier in trec-12 qa main task. InText REtrieval Conference (TREC), page 480.
Yi, J. and Sundaresan, N. (1999). Mining the web for acronyms using the duality of
patterns and relations. InWIDM ’99: Proceedings of the 2nd international workshop
on Web information and data management, pages 48–52, New York, NY, USA.
ACM Press.
Yun, N. and Graeme, H. (2004). Analysis of semantic classes in medical text for
question answering. In Aliod, D. M. and Vicedo, J. L., editors,ACL 2004: Question
Answering in Restricted Domains, pages 54–61, Barcelona, Spain. Association for
Computational Linguistics.