searching question and answer archives dr. jiwoon jeon presented by charanya venkatesh kumar

47
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Upload: autumn-tickner

Post on 15-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

SEARCHING QUESTION AND ANSWER ARCHIVES

Dr. Jiwoon Jeon

Presented by CHARANYA VENKATESH

KUMAR

Page 2: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Discussion

Current Information Retrieval systems?

Page 3: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

OVERVIEW

Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval

framework Learning word-to-word translations

Page 4: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

INTRODUCTION Q&A Retrieval problem Challenges

Semantically similar questions Problem : Word mismatch problem Solution : Machine translation-based

information retrieval model Quality of the Answers

Problem : Many answers to a given question Solution : Answer Quality Prediction

Technique

Page 5: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

What is New? New Type of Information System New Translation-based Retrieval Model New Document Quality Estimation

Method Integration of Advances in Multiple

research Areas New Paraphrase Generation Method Utilizing Web as a Resource for Retrieval

Page 6: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

OVERVIEW

Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval

framework Learning word-to-word translations

Page 7: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Q & A RETRIEVAL

Question & Answer Archives Websites with FAQ Community based question answering

services Task Definition

Page 8: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Q & A Retrieval (Contd..)

Page 9: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Q & A Retrieval (Contd..)

Advantages Handle natural language questions Return answers instead of relevant

documents Disadvantages

Can answer only previously answered questions

Page 10: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Q & A RETRIEVAL SYSTEM ARCHITECTURE

Page 11: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

CHALLENGES

Finding relevant Question & Answer Pairs Importance of question parts Word mismatch problem

Estimating Answer Quality Importance

Page 12: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

OVERVIEW

Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval

framework Learning word-to-word translations

Page 13: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

TEST COLLECTIONS

Components : Set of documents Set of information needs (queries) Set of relevance judgment

Pooling Method

Page 14: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

WONDIR COLLECTION

Earliest community based QA service in the US.

1 million question and answer pairs used from this service

Average question length = 27 words

Average answer length = 28 words

Page 15: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Examples

Page 16: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Queries Closed-class questions that ask

fact based short answers. E.g.: Where is Charlotte located?

Relevance Judgment 220 relevant Q&A pairs for 50 queries

using pooling method. Relevance Judgment Criteria

Page 17: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

WebFAQ COLLECTIONby Jijkoun and Rijke

Collection of FAQs using web crawlers-made public for research purposes.

Found web pages that contain the word “FAQ”.

Used heuristic methods to automatically extract question and answer pairs from the web pages.

Page 18: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

NAVER COLLECTION

Leading portal site in South Korea Community-based answering service Collection A :

Category information – To test category specific translations

Collection B : Non-Textual Information – To build

answer quality prediction technique

Page 19: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Naver Collection (Contd..)

Question – Title & Body Naver Test Collection A Naver Test Collection B Relevance :

Question semantically related to query and

Question contains all query terms Q&A pair was clicked multiple times for the

query.

Page 20: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Comparison of test Collections

Page 21: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

OVERVIEW

Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval

framework Learning word-to-word translations

Page 22: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Translation Based Q&A Retrieval framework

Use of Machine Translation technique for information retrieval

Word mismatch problem Translation based approach

Page 23: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

IBM Statistical Machine translation Models

Do not require any linguistic knowledge of the source or target language.

Exploits only co-occurrence statistics of terms in training data.

Page 24: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

IBM Models Model 1

Treats every possible word alignment equally

Model 2 Assumes only positions of terms are

related to the word alignment Model 3

The first term and the second term generated from the same term are independent

Page 25: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

IBM Models (Contd..)

Model 4 First order alignment model Every word is dependent only on the

previous aligned word. Model 5

Reformulation of Model 4

Page 26: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Advantages of Model 1

Efficient implementation is possible using a form of query expansion.

Performance gain of using low level translation models is high.

Can be easily integrated into the query likelihood

Page 27: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

IBM Model 1 Equation The probability that a query Q of length m

is the translation of a document D (of length n) is given as

Page 28: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

IBM Model 1 Equation

Page 29: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Translation based Language Models

Language model is a mechanism for generating text.

Unigram language model Assumes each word is generated

independently Concerns only probabilities of

sampling a single word.

Page 30: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Language modeling approach to IR

In maximum likelihood estimator, unseen words in a document have zero probability.

Smoothing : Transfers some probability mass from the

seen words to the unseen words. Dirichlet smoothing – good

performance and cheap computational cost.

Page 31: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Language modeling approach to IR (Contd..) The ranking function for the query

likelihood language model with Dirichlet smoothing can be written as

Page 32: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

IBM Model 1 vs. Query Likelihood Comparable components in the two

models

Page 33: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Self Translation Model

Every word has some probability to translate to itself.

Cannot be 1 If too low – deteriorate retrieval

performance

Page 34: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

TransLM Final ranking Function looks like

Page 35: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Efficiency Issues and Implementation of TransLM Flipped Translation Tables

Page 36: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Term-at-a-time Algorithm

Page 37: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

OVERVIEW

Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval

framework Learning word-to-word translations

Page 38: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Properties of Word Relationships

Not Symmetric Not fixed Change depending on retrieval or

translation tasks. must be given as probability

values.

Page 39: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Training Sample Generation

Key Idea If two answers are very similar, then

the corresponding questions are semantically similar.

Similarity Measures Cosine Similarity Query Likelihood scores between two

answers (LM SCORE) LM-HRANK

Page 40: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Word Relationship Types

P(Q|A) Source – Answer ; Target – Question

P(A|Q) Source – Question ; Target – Answer

P(Q|Q) P(Q<->Q)

Page 41: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

EM Algorithm Find word relationships that maximize

the likelihood of sampling the target text from the source text in training samples.

Page 42: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

EM Algorithm (Contd..) The translation probability from a source

word t to a target word w is given as

Page 43: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

EM Algorithm (Contd..) The translation probability from a source

word t to a target word w is given as

Page 44: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Examples

Page 45: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Examples (Contd..)

Page 46: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

SUMMARY

Introduction Q&A Retrieval Test Collections Translation Based Q&A retrieval

framework Learning word-to-word translations

Page 47: SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR

Coming Up Next…

Estimating Answer Quality Experiments