day 14 information retrieval question answering 1

40
Day 14 Information Retrieval Question Answering 1

Upload: gertrude-turner

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Day 14 Information Retrieval Question Answering 1

1

Day 14

Information RetrievalQuestion Answering

Page 2: Day 14 Information Retrieval Question Answering 1

2

TREC

• TREC – Text REtrieval Conference• Administered by the National Institute of

Standards (NIST)• Annual competition held annually since 1992.• First conference included leading text retrieval

groups at UMass, City University London, Cornell, and a smattering of industry groups.

Page 3: Day 14 Information Retrieval Question Answering 1

3

The Aquaint Corpus

• The corpus used by TREC from which answers are drawn.

• LDC2002T31 (on patas)– Newswire from three sources:

• Xinhua News Service (People's Republic of China) • New York Times News Service• Associated Press Worldstream News Service

– Not current: years 1996-2000 for Xinhua, 1998-2000 for NYT and AP

– For TREC competition: assumed current

Page 4: Day 14 Information Retrieval Question Answering 1

4

TREC QA track

• Three types of questions in TREC QA track:– Factoid– List– Other

• All clustered into topics

Page 5: Day 14 Information Retrieval Question Answering 1

5

TREC Question File

Page 6: Day 14 Information Retrieval Question Answering 1

6

Question Answering (QA)

• Uses IR and IE techniques (and more…)• Questions posed in Natural Language

– Who was Genghis Khan?– What songs did Barry Manilow compose?– What countries fly the F-16?– When was James Dean born?– What does Park Jae-sang sing?

• Answers retrieved from a collection of documents (or a database, or the Web)

Page 7: Day 14 Information Retrieval Question Answering 1

7

Park Jae-sang

Page 8: Day 14 Information Retrieval Question Answering 1

8

Park Jae-sang

Page 9: Day 14 Information Retrieval Question Answering 1

9

Designing a QA System

• Start with a question:– Who won the Nobel Peace Prize in 1991?

• Assume you have a search engine API at your disposal

• Need to return the answer:– Aung San Suu Kyi– Aung San Suu Kyi won the Nobel Peace Prize in 1991

• What do you do?

Page 10: Day 14 Information Retrieval Question Answering 1

10

Designing a QA System

• Assume:– The search engine returns

• Snippets, and,• Documents

Page 11: Day 14 Information Retrieval Question Answering 1

11

Designing a QA System

Page 12: Day 14 Information Retrieval Question Answering 1

12

Designing a QA System

• Assume:– The search engine returns

• Snippets, and,• Documents

– Documents • Are in English• Contain passages of interest• Not all documents will have the answer

Page 13: Day 14 Information Retrieval Question Answering 1

13

Designing a QA System

But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991.

The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.

The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

Who won the Nobel Peace Prize in 1991?

Page 14: Day 14 Information Retrieval Question Answering 1

14

Designing a QA System

• Assume:– The search engine returns

• Snippets, and,• Documents

– Documents • Are in English• Contain passages of interest• Not all documents will have the answer

– The sky’s the limit wrt tools, resources, time, etc.

Page 15: Day 14 Information Retrieval Question Answering 1

15

Designing a QA SystemWho won the Nobel Peace Prize in 1991?

Page 16: Day 14 Information Retrieval Question Answering 1

16

Designing a QA System

But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991.

The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.

The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

Who won the Nobel Peace Prize in 1991?

Page 17: Day 14 Information Retrieval Question Answering 1

17

An Example

But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991.

The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.

The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

Who won the Nobel Peace Prize in 1991?

Page 18: Day 14 Information Retrieval Question Answering 1

18

Question Answering (QA)

• For a QA system to work, we need to– Find documents that may contain the answer

• Form search engine query from original question

– Find passages within the documents that may contain the answer

• What is “type” of answer?• Determine what kind of answer is expected (query

classification)

– Extract the answer from the relevant passage(s)• Repeated occurrences may reinforce

– Return the answer

Page 19: Day 14 Information Retrieval Question Answering 1

A Generic QA Framework

DocumentCollection

Search EngineTop n

documents

DocumentProcessing

Questions Questions

Answers

• Passage extractor needed too

Page 20: Day 14 Information Retrieval Question Answering 1

20

The UWCLMAQA System

Page 21: Day 14 Information Retrieval Question Answering 1

21

Steps for the UWCLMAQA System

• Query Analysis• Query Processing (some additional steps)

• Document Selection• Passage Extraction & Ranking• Answer Extraction

• “Unit” evaluation done at each step

Page 22: Day 14 Information Retrieval Question Answering 1

22

Query Analysis

• Grouped questions into types • Purpose: Determine what the answer will look like• Categorized by enhanced UIUC scheme:

– Abbreviation– Description– Entity– Human– Location – Country, State, City– Numeric – Date, Measure

UIUC: http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/definition.html

Page 23: Day 14 Information Retrieval Question Answering 1

23

Alternative Strategy: Query Analysis and Rewrite

• Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

– Where is the Louvre Museum located? • The Louvre Museum is located in Paris

– Who created the character of Scrooge?• Charles Dickens created the character of Scrooge.

Page 24: Day 14 Information Retrieval Question Answering 1

24

Alternative Strategy:Query Analysis and Rewrite

• Hand-craft category-specific transformation rules

e.g.:

“Where is the Louvre Museum located?” “is the Louvre Museum located” “the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”

• Search for all permutations

Page 25: Day 14 Information Retrieval Question Answering 1

25

Query Processing

• Basic process:– Extracted question– Appended topic– “Web boosted” query– Threw against Lucene

Page 26: Day 14 Information Retrieval Question Answering 1

26

Query Processing

• Web boosting strategy– Supplied question and topic to Google API– Results were

• Stop-worded, query terms removed• Ranked by frequency

– 5 most frequent terms added to Lucene query

Page 27: Day 14 Information Retrieval Question Answering 1

27

Document Selection

• Lucene returned top 1,000 documents• Took top 3 for Factoid, Top 25 for List• (Hook for reranking provided, but not implemented.)• Our doc retrieval performance for 2005 Qs:

– F-measure - .3517 n=3, .3620 n=1– Mean 2005: .2958– Max (LCC) 2005: .7920

Page 28: Day 14 Information Retrieval Question Answering 1

28

Passage Extraction & Ranking

• From top documents, extracted relevant paragraphs

• Paragraphs ranked by tf/idf:– tf = 1+log(word frequency in paragraph)– idf = log(total doc count/# docs containing word)– total doc count = # docs by day by news source– tf/idf score normalized by paragraph length

Page 29: Day 14 Information Retrieval Question Answering 1

29

Passage Ranking

• tf/idf multiplied by count of query terms in paragraph (giving them more weight)

• 10 paragraphs returned for factoids• 45 paragraphs returned for lists

Page 30: Day 14 Information Retrieval Question Answering 1

30

Passage Extraction & Ranking

But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991.

The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.

The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

Who won the Nobel Peace Prize in 1991?

Page 31: Day 14 Information Retrieval Question Answering 1

31

Answer Extraction

• Most factoids need NP answer (e.g., most are NEs, such as countries, cities, dates, people’s names, company names, …)– All NPs considered as possible answers

• For passages– Used Lingua::Stem to find sentences (sentence breaking)– POS tagged (Stanford POS Tagger)– Chunked using the fnTBL Chunker (ID NP-chunks)

• Prior query classification used to identify kind of NP answer expected

• Some other heuristics (e.g., most likely place NP would occur)

Page 32: Day 14 Information Retrieval Question Answering 1

32

An Example

But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991.

The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.

The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

Who won the Nobel Peace Prize in 1991?

Page 33: Day 14 Information Retrieval Question Answering 1

33

Answer Extraction

• For lists:– Question topic appeared the most important– Heavily weighted topic terms for Lucene– Similar process to Factoids (tagging, chunking) for

finding answers– Cut-off determined by 2005 data

Page 34: Day 14 Information Retrieval Question Answering 1

34

Answer Extraction

• For others:– Anything left over that might be answer bearing– Top 15 returned

Page 35: Day 14 Information Retrieval Question Answering 1

35

How’d we do?

• Before answering the question:– Mean Reciprocal Rank (MRR)

Page 36: Day 14 Information Retrieval Question Answering 1

36

Mean Reciprocal Rank (MRR)

• Assumes: test set of questions with human-labeled answers

• Assumes: system returns short ranked list of answers or passages with answers

• Answers scored with the sum of the reciprocal rank of the correct answers over total returned answers

(for N questions)

Page 37: Day 14 Information Retrieval Question Answering 1

37

How’d we do?

• Factoid– UWCLMAQA: .112 and .109– Median: .186, Best: .578, Worst: .040

• List– UWCLMAQA: .051 and .046– Median: .087, Best: .433, Worst: .000

• Other– UWCLMAQA: .164 and .153– Median: .125, Best: .250, Worst: .000

Page 38: Day 14 Information Retrieval Question Answering 1

38

Page 39: Day 14 Information Retrieval Question Answering 1

39

Page 40: Day 14 Information Retrieval Question Answering 1

40

Full List of Tools Used• SGML::Parser::OpenSP

http://search.cpan.org/~bjoern/SGML-Parser-OpenSP-0.98/• OpenSP

http://openjade.sourceforge.net/• UIUC Question Classification

http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/• Lucene

http://lucene.apache.org/java/docs/index.html• SAX (Simple API for XML)

http://www.saxproject.org/• Maxent Toolkit

http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html• PyGoogle

http://pygoogle.sourceforge.net/• SOAPy

http://soapy.sourceforge.net/• Google API

http://www.google.com/apis/• Lingua::Stem

http://search.cpan.org/~snowhare/Lingua-Stem-0.82/lib/Lingua/Stem/En.pm• Lingua::Sentence

http://search.cpan.org/~shlomoy/Lingua-EN-Sentence-0.25/lib/Lingua/EN/Sentence.pm• Stanford POS Tagger

http://nlp.stanford.edu/software/tagger.shtml• fnTBL Chunker

http://nlp.cs.jhu.edu/~rflorian/fntbl/• Lingpipe

http://www.alias-i.com/lingpipe/• LevenshteinXS.pm

http://search.cpan.org/~jgoldberg/Text-LevenshteinXS-0.03/