similarity based methods for word sense disambiguation

26
Copyright © Wondershare Softw -Ido Dagan - Lillian Lee -Fernando Pereira

Upload: vini89

Post on 14-Jan-2015

546 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

-Ido Dagan- Lillian Lee

-Fernando Pereira

Page 2: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• The problem is ” How to get sense from unseen word pairs that are not present in training set”.

Eg: I want to bee a scientist. Robbed the bank

Page 3: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• They compared four similarity based estimation methods:

1. KL divergence 2. Total divergence to average3. L1 Norm4. Confusion probability Against two well established methods 1. Katz’s back-off scheme2. Maximum likelihood estimation

Page 4: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Katz’s back-off scheme(1987) widely used in bigram language modeling, estimates the probability of an unseen bigram by utilizing unigram estimates using baye’s conditional probability theorem.

Eg: {make,take} plans.

Page 5: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• As the estimation of probability of unseen bigram depends on unigram frequencies ,so this has undesirable result of assigning unseen bigrams the same probability if they are made up of unigrams of same frequency.

Eg:{a b} and {c b}

Page 6: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• In this method, words of similar meaning are grouped together statically to form a class.

• So for a group of words there is only one representative, which is its class.

• A word is therefore modeled by average behavior of many words.

• When in doubt between two words search the testing data related to words of those classes.

Eg: {a,b,c,d,e} & {f,g,h,I} W

Page 7: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• As the word is modeled by average behavior of many words so the uniqueness of meaning of word is lost.

Eg: Thanda• Initially probability for unseen word pairs remains

zero which leads to extremely inaccurate estimates for word pair probabilities.

Eg: Periodic table

Page 8: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Estimates for most compatible(similar) words with a word w are combined and based on evidence provided by word w’ ,is weighted by a function of its compatibility with w .

• No word pair is dropped even it is very rare one,as there were in katz’s back off scheme.

Page 9: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Similarity based word sense can be achieved in 3 steps…

1. A scheme for deciding which word pairs require similarity based estimation.

2. A method for combining information from similar words.

3. A function measuring similarity between words.

Page 10: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Good points of katz’s back off scheme and MLE are combined…

• In the MLE probability is PML(w2/w1) =c(w1,w2)/c(w1)

But for similarity based sense P(w2/w1)={ Pd (w2/w1) c(w1,w2)>0 for seen pair

α(w1)Pr (w2/w1) for unseen pair

Page 11: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Similarity based models assume that if word w1’ is similar to word w1,then w1’ can yield the information about probability of unseen word pairs involving w1.

• It is proved that w2 is more likely to occur with w1 if it tends to occur with the words that are most similar to w1.

• They used a weighted average of evidence provided by similar words, where the weight given to a particular word depends on its similarity to w1.

Page 12: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Number of words similar to a word w1 are set up to a threshold value because in a large training set it will use very large amount of resources.

• Number of similar words(k) and threshold of dissimilarity between words(t) is tuned experimentally.

Page 13: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• These word similarity functions can be derived automatically from statistics of training data, as opposed to functions derived from manually constructed word classes.

1. KL divergence2. Total divergence to average3. L1 Norm4. Confusion Probability

Page 14: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• KL divergence is standard measure of dissimilarity between two probability mass functions

• For D to be defined P(w2|w1’)>0 whenever P(w2|w1)>0.

• Above condition might not hold good in some cases,So smoothing is required which is very expensive for large vocabularies.

Page 15: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• It is a relative measure based on the total KL divergence to the average of two distributions:

This is reduced to

Page 16: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• A(w1,w1’) is bounded ,ranging between 0 and 2log2.• Smoothed estimates are not required because

probability ratios are not involved.• Calculation of A(w1,w1’) requires summing only over

those w2 for which P(w2|w1) and P(w2|w1’) are both non zero, this makes computation quite fast.

Page 17: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• L1 norm is defined as

by reducing it to form depending upon w2

It is also bounded between 0 to 2.

Page 18: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• It estimates that a word w1’ can be substituted with word w1 or not.

• Unlike the D,A,L w1 may not be “closest” to itself ie. there may exist a word w1’ such that

Page 19: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• As the sense of actual word may be very fine or very coarse, provided by the dictionary and it will take large amount of resources for training data to have correct sense,Experiment done on Pseudo Word.

Eg: {make,take} plans {make,take} action where {make,take} is a pseudo word tested with

plans and action.

Page 20: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Each method in experiment is tested with a noun and two verbs and method decides which verb is more likely to have a noun as direct object.

• Experiment used 587833 bigrams to make bigram language model.

• Experiment tested with 17152 unseen bigrams by dividing it into five equal parts T1 to T5.

• Used error rate as performance metric.

Page 21: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• As Back off consistently performed worse than MLE so not including Back off in experiments.

• As only experiment is only on unsmoothed data so KL divergence is not included in experiments.

Page 22: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

Page 23: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

Page 24: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

Page 25: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software

• Similarity based methods performed 40% better over Back off and MLE methods.

• Singletons should not be omitted from training data for similarity based methods.

• Total divergence to average method (A) performs best in all cases.

Page 26: Similarity based methods for word sense disambiguation

Copyright © Wondershare Software