unsupervised word sense disambiguation rivaling supervised methods

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

1998. 12. 10.

Oh-Woog KwonKLE Lab. CSE POSTECH

Introduction

An Unsupervised Algorithm for WSD Avoids the need for costly hand-tagged training data Using two powerful properties of human language

1. One sense per collocation (dictionary definition of collocation):

2. One sense per discourse:

동물의 눈은 물체를 보는 기관이다 . 동물의 눈은 물체를 보는 기관이다 .

bank

bank

bank …. ….

…. ….

…. ….

Text:101Same sense

Only one sense (eye), not two sense (eye or snow)

One Sense Per Discourse

A Test for One Sense Per Discourse Table of pp. 189 (using 37,232 hand-tagged examples) Accuracy: discourse 에서 같은 단어는 같은 의미로 사용되나 ? (9

9.8%) Applicability: 한 discourse 에서 두 번 이상 나타나는가 ? (50.

1%)

Advantage of One Sense Per Discourse Conjunction with separate models of local context for each word

… bank

… bank …

bank …. ….

…. ….

…. ….

Text:101Local context of bank =

+

+

One Sense Per Collocation

The Type of Collocation (predictive degree) Immediately adjacent collocations > collocations with distance At equivalent distance, predicate-argument relationship >

arbitrary associations Collocations with content words > collocations with function

words

adjacent content words can disambiguate word sense.

A Supervised Algorithm based on Above Property Decision List Algorithm [Yarowsky, ACL94] Accent Restoration in Spanish and French Be used as a component of the proposed unsupervised algorithm

Decision List Algorithm

Step 1: Identify the Ambiguities in the Target Wordex) 눈 : eye, snow

Step 2: Collect Training Context, for Each Senseex) eye : … 사람의 눈은 좋은 … , 곤충의 눈은 머리에 … , … …

snow: … 하늘에서 눈이 내리고 … , … 어제 눈이 내려 … , … …

Step 3: Measure Collocational Distributionex) -1 w [ 사람 눈 ] : eye (1,000), snow (0)

k w [ 하늘 within k words] : eye (2), snow (10,000)

Step 4: Sort by Log-Likelihood into Decision Lists

Step 5: Optional Pruning and Interpolation

Step 6: Train Decision Lists for General Classes of Ambiguity

Step 7: Classification using Decision ListsUsing only the single most reliable collocation matched in the target context

)))|Pr(

)|Pr(((

2

1

i

i

nCollocatoisense

nCollocatoisenseLogAbs

Unsupervised Learning Algorithm - 1

Illustrated by the disambiguation of 7,538 instances of plant STEP 1:

Collect contexts in untagged training set (right column of pp. 190)

STEP 2: a) Choose a small number of seed collocations of each sense

b) Tagging all training examples containing the seed collocates with seed’s sense label => two seed sets (left column of pp. 191, Figure 1)

Options for Training Seeds• Use words in dictionary definitions

• Use a single defining collocate for each class (using thesaurus(WordNet))

• Label salient corpus collocates (not fully automatic): – use of words that co-occur with the target word

– a human judge decide which one


STEP 3: (pp. 192, Figure 2)a) Train the supervised classification algorithm on two seed sets

b) Classify the entire sample set using the resulting classifier of (a)

Add examples with probability above a threshold to the seed sets

c) Using one-sense-per-discourse constraint (option)• Detect the dominate sense for each discourse (using threshold).

• Augmentation: If the dominate sense exists, add previously untagged contexts to the seed set of the dominate sense

• Filtering: Otherwise, return all instances in the discourse (where there is substantial disagreement for the dominate sense) to the residual set.

d) Repeat Step 3.• Can escape from initial misclassification

• Two techniques to avoid a local minimum– incrementally increasing the width of the context window periodically

– randomly perturbing the class-inclusion threshold, similar to simulated annealing.


STEP 4: Stop, when converging on a stable residual set.

STEP 5: Classify new data using final decision lists For error correction, optionally use one-sense-per-discourse

constraint.

Evaluation

The test data extracted from a 460 million word corpus the type of data: news article, scientific abstracts, spoken

transcripts, and novels used in the previous researches.

Comparison System (see Table in pp. 194) (5) : using supervised algorithm (6) : using only two words as seeds (7) : using the salient words of a dictionary definition as seeds (8) : using quick hand tagging of a list of algorithmically-identified

salient collocates (9) : (7) + using one-sense-per-discourse only in classification

procedure (10) : (9) + using one-sense-per-discourse in the learning

Conclusion

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

unsupervised word sense disambiguation rivaling supervised methods

Documents