integrating word relationships into language models

27
Integrating Word Relationships into Language Models Guihong Cao , Jian-Yun Nie , Jing Bai Départment ďInformatique et de Recherc he Opérationnelle,Université de Montré al Presenter : Chia-Ha o Lee

Upload: chick

Post on 12-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Integrating Word Relationships into Language Models. Guihong Cao , Jian-Yun Nie , Jing Bai D épartment ďInformatique et de Recherche Opérationnelle,Université de Montréal. Presenter : Chia-Hao Lee . Outline. Introduction Previous Work A Dependency Model to Combine WordNet and Co-occurrence - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Integrating Word Relationships into Language Models

Integrating Word Relationships into Language Models

Guihong Cao , Jian-Yun Nie , Jing BaiDépartment ďInformatique et de Recherche Opérati

onnelle,Université de Montréal

Presenter : Chia-Hao Lee 

Page 2: Integrating Word Relationships into Language Models

Outline• Introduction

• Previous Work

• A Dependency Model to Combine WordNet and Co-occurrence

• Parameter estimation– Estimating conditional probabilities – Estimating mixture weights

• Experiments

• Conclusion and feature work

Page 3: Integrating Word Relationships into Language Models

Introduction• In recent years, language models for information retrieval

(IR) have increased in popularity.

• The basic idea behind is to compute the conditional probability .

• In most approaches, the computation is conceptually decomposed into two distinct steps:– (1) Estimating the document model– (2) Computing the query likelihood using the estimated document

model

DQP

Page 4: Integrating Word Relationships into Language Models

• When estimating the document model, the words in the document are assumed to be independent with respect to one another, leading to the so called “bag-of-word” model.

• However, from our own knowledge of natural language, we know that the assumption of term independence is a matter of mathematical convenience rather than a reality.

• For example, the words “computer” and “program” are not independent. A query requesting for “computer” might be well satisfied by a document about “program”.

Introduction (cont.)

Page 5: Integrating Word Relationships into Language Models

• Some studies have been carried out to relax the independence assumption.

• The first one is data-driven, which tries to capture dependency among terms by statistical information derived from the corpus directly.

• Another direction is to exploit hand-crafted thesauri, such as WordNet.

Introduction (cont.)

Page 6: Integrating Word Relationships into Language Models

Previous Work• In classical language modeling approach to IR, a multinomial

model over terms is estimated for each document in the collection to be indexed and searched.

• In most cases, each query term is assumed to be independent of the others, the query likelihood is estimated by .

• After the specification of a document prior ,the posteriori probability of a document is given by:

dwP dC

n

ii dqPdqP

1

dP

1dPdqPqdP

Page 7: Integrating Word Relationships into Language Models

• However the classical language model approach for IR does not address the problem of dependence between words.

• The term “dependence” may mean two different things:– Dependence between words within a query or within a document– Dependence between query words and document words

• The first meaning, one may try to recognize the relationships between words in sentence.

• Under the second meaning, dependence means any relationship that can be exploited during query evaluation.

Previous Work (cont.)

Page 8: Integrating Word Relationships into Language Models

• The incorporate term relationships into the document language model, we propose a translation model .

• With the translation model, the document-to-query model becomes:

• Even though their model is general than other language models, it is different to determine the translation probability

in practice.

• To solve this problem, we generate an artificial collection of “synthetic” data for training by assuming that a sentence is parallel to the paragraph that contains the sentence.

Previous Work (cont.)

wqt i

21

n

i wi dwPwqtdqP

wqt i

Page 9: Integrating Word Relationships into Language Models

A Dependency Model to Combine WordNet and Co-occurrence

• Given a query q and a document d, the query can be related directly, or they can be related indirectly through some word relationships.

• An example of the first case is that the document and the query contain the same words.

• In the second case, a document can contain a different word, but synonymous or related to the one in the query.

Page 10: Integrating Word Relationships into Language Models

• In order to take both cases into our modeling, we assume that there are two sources to generate a term from a document: one from a dependency model and another from a non-dependency model.

:the parameter of dependency model

:the parameter of non-dependency model

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

n

ii dqPdqP

1

n

iDiDi dqPdqP

1

,,

3,,1

n

iDDiDDi dPdqPdPdqP

DD

Page 11: Integrating Word Relationships into Language Models

• The non-dependency model tries to capture the direct generation of the query by the document, we can model it by unigram document model:

• Then, we select a term in the document randomly first.• Second, a query term is generated based on the

observed term. Therefore we have:

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

dUPdqPdPdqP iUDDi ,

4,,

dw

DiDi dwPwqPdqP

dqP iU :the probability of unigram model

Page 12: Integrating Word Relationships into Language Models

• As for the translation model, we also have the problem of estimating the dependency between two term, i.e.

• To address the problem, we assume that some word relationships have been manually identified and stored in a linguistic resource, and some other relationships have to be found automatically according to co-occurrences.

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

wqP i

Page 13: Integrating Word Relationships into Language Models

• So, this combination can be achieved by a linear interpolation smoothing. Thus:

• In our study, we only consider co-occurrence information beside WordNet.

• So, is just the co-occurrence model.

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

5,1, wLqPwLqPwqP iii

:the conditional probability of given according to WordNet.iq w wLqP i ,

:the probability that the link between and is achieved by other means.

iq w wLqP i ,

:the interpolation factor, which can be considered as a two-component model.

wLqP i ,

Page 14: Integrating Word Relationships into Language Models

• For the simplicity of expression, we denote probability of link model as , i.e. , and the co-occurrence model as .

• Substitute Equations 4 and 5 into 3, we obtain Equation 6:

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

wqP iL wLqPwqP iiL , wLqPwqP iiCO ,

n

iiUDDi dUPdqPdPdqPdqP

1

,

n

iiUD

dwDi dUPdqPdPdwPwqP

1

,

6,1,1

n

iiU

dwDiCOD

dwDiLD dUPdqPdwPwqPdPdwPwqPdP

dw

Dii dwPwLqPwLqP ,,1,

dw

Di dwPwqP ,

dw

Didw

Di dwPwqPdwPwqP ,1,

Page 15: Integrating Word Relationships into Language Models

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

+

+

+ + ++

Page 16: Integrating Word Relationships into Language Models

• The idea can become more obvious if we make some simplification in the formula.

• So, we can get:

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

7,

dw

DiLiL dwPwqPdqP

8,

dw

DiCOiCO dwPwqPdqP

911

n

iiUiCODiLD dUPdqPdqPdPdqPdPdqP

consisting of link model, co-occurrence model and unigram model

Page 17: Integrating Word Relationships into Language Models

• Let , , denote the respect weights of link model, co-occurrence model, and unigram model.

• Then equation 9 can be rewritten as:

• For information retrieval, the most important terms are nouns. So, we concentrate on three relations related to nouns: synonym, hypernym, and hyponym.

A Dependency Model to Combine WordNet and Co-occurrence (cont.)

L CO U

101

n

iiUUiCOCOiLL dqPdqPdqPdqP

dP DL dP DCO 1 dUPU

111

54321

n

iiUiCOiHYPOiHYPEiSYN dqPdqPdqPdqPdqPdqP

NSLM

SLM

Page 18: Integrating Word Relationships into Language Models

Parameter estimation• 1.Estimating conditional probabilities

– The unigram model ,we use the MLE estimation, smoothed by interpolated absolute discount, that is:

dwP iU

120,;maxCwP

d

d

d

dwcdwP iMLE

uiiabs

:the discount factor :the length of the document d:the count of unigram term in the document u

d:the maximum likelihood probability of the word in the collection

CwP iMLE

(related to D)

Page 19: Integrating Word Relationships into Language Models

• For , it can be approximated by the maximum likelihood probability .

• This approximation is motivated by the fact that the word is primarily generated from in a way quite independent from the model .

• The estimation of - the probability of link between two words according to WordNet.

Parameter estimation (cont.) DdwP ,

Dd

w

dwPMLE

wwP iL

Page 20: Integrating Word Relationships into Language Models

• Equation 13 defines our estimation of by interpolated Absolute discount:

Parameter estimation (cont.) wwP iL

LWwP

LWwwc

LWwc

LWwwc

LWwwcwwP ioneadd

wj

wj

iiL

jj

,,,

,*,

,,

0,,,max

131,,

1,,,

1 1

1

v

i

v

j ji

v

j ji

ioneaddLWwwc

LWwwcLWwP

and are assumed to have a relationship in WordNetwiw

LWwC ,*, :the number of unique terms which have a relationship with in WordNet and co-occur with it in .W

iw:the count of co-occurrences of with within the predefined window iw w LWwwC i ,,

?

Page 21: Integrating Word Relationships into Language Models

• The estimation of the components of the co-occurrence model is similar to those of the link model expect that that when counting the co-occurrence frequency, the requirement of having a link in WordNet is removed.

Parameter estimation (cont.)

dwP ico dwP iL

WwP

Wwwc

Wwc

Wwwc

WwwcwwP ioneadd

wj

wj

iiCO

jj

,

*,

,

0,,max

141,

1,

1 1

1

v

i

v

j ji

v

j ji

ioneaddWwwc

WwwcWwP

Page 22: Integrating Word Relationships into Language Models

• 2.Estimating mixture weights

We introduce an EM algorithm to estimate the mixture weights in NSLM.

Because NSLM is a three-component mixture model, the optimal weights should maximize the likelihood of the queries.

Let be the mixture weights, we then have:

Parameter estimation (cont.)

15logmaxarg11

*

m

jijCOCOijLLijUU

N

iiq dqPdqPdqP

q

UCOLq ,,

:the number of documents in the datasetNm :the length of query q

:the prior probability with which to choose the document to generate the query Nii 1

Page 23: Integrating Word Relationships into Language Models

• However, some documents having high weights are not truly relevant to the query. They contain noise.

• To account for the noise, we further assume that there are two distinctive sources to generate the query.

• One is the relevant documents, another is a noisy source, which is approximated by the collection C

Parameter estimation (cont.)

:respectively unigram model, link model and co-occurrence model built from the collection

CqP jU

CqP jL

CqP jCO

:the weight of the noise

16

1

logmaxarg

1

11*

m

jjCOCOjLLjUU

m

jijCOCOijLLijUU

N

ii

q

CqPCqPCqP

dqPdqPdqP

q

smoothing

Page 24: Integrating Word Relationships into Language Models

• With this setting, the hidden and can be estimated using the EM algorithm.

• The update formulas are as follows:

Parameter estimation (cont.) Nii 1 q

17

11

11

m

jijCO

rCOijL

rLijU

rU

N

i

ri

m

jijCO

rCOijL

rLijU

rU

ri

ri

dqPdqPdqP

dqPdqPdqP

CqPCqPCqPdqPdqPdqP

CqPdqP

mjCO

rCOjL

rLjU

rU

N

i ijCOrCOijL

rLijU

rU

ri

N

i jLrLijL

rL

rir

U

1

11

1

11

CqPCqPCqPdqPdqPdqP

CqPdqP

mjCO

rCOjL

rLjU

rU

N

i ijCOrCOijL

rLijU

rU

ri

N

i jCOrCOijCO

rCP

rir

U

1

11

1

11

CqPCqPCqPdqPdqPdqP

CqPdqP

mjCO

rCOjL

rLjU

rU

N

i ijCOrCOijL

rLijU

rU

ri

N

i jUrUijU

rU

rir

U

1

11

1

11

Page 25: Integrating Word Relationships into Language Models

Experiments We evaluated our model described in the previous sections using three different TREC collections – WSJ,AP and SJM

Page 26: Integrating Word Relationships into Language Models

Experiments (cont.)

Page 27: Integrating Word Relationships into Language Models

Conclusion and feature work• In this paper, we integrate word relationships into the

language modeling framework.

• We used EM algorithm to train the parameters. This method worked well for our experiment.