unsupervised word sense disambiguation rivaling supervised methods
DESCRIPTION
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. 1998. 12. 10. Oh-Woog Kwon KLE Lab. CSE POSTECH. Introduction. An Unsupervised Algorithm for WSD Avoids the need for costly hand-tagged training data Using two powerful properties of human language - PowerPoint PPT PresentationTRANSCRIPT
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods
1998. 12. 10.
Oh-Woog KwonKLE Lab. CSE POSTECH
Introduction
An Unsupervised Algorithm for WSD Avoids the need for costly hand-tagged training data Using two powerful properties of human language
1. One sense per collocation (dictionary definition of collocation):
2. One sense per discourse:
동물의 눈은 물체를 보는 기관이다 . 동물의 눈은 물체를 보는 기관이다 .
bank
bank
bank …. ….
…. ….
…. ….
Text:101Same sense
Only one sense (eye), not two sense (eye or snow)
One Sense Per Discourse
A Test for One Sense Per Discourse Table of pp. 189 (using 37,232 hand-tagged examples) Accuracy: discourse 에서 같은 단어는 같은 의미로 사용되나 ? (9
9.8%) Applicability: 한 discourse 에서 두 번 이상 나타나는가 ? (50.
1%)
Advantage of One Sense Per Discourse Conjunction with separate models of local context for each word
… bank
… bank …
bank …. ….
…. ….
…. ….
Text:101Local context of bank =
+
+
One Sense Per Collocation
The Type of Collocation (predictive degree) Immediately adjacent collocations > collocations with distance At equivalent distance, predicate-argument relationship >
arbitrary associations Collocations with content words > collocations with function
words
adjacent content words can disambiguate word sense.
A Supervised Algorithm based on Above Property Decision List Algorithm [Yarowsky, ACL94] Accent Restoration in Spanish and French Be used as a component of the proposed unsupervised algorithm
Decision List Algorithm
Step 1: Identify the Ambiguities in the Target Wordex) 눈 : eye, snow
Step 2: Collect Training Context, for Each Senseex) eye : … 사람의 눈은 좋은 … , 곤충의 눈은 머리에 … , … …
snow: … 하늘에서 눈이 내리고 … , … 어제 눈이 내려 … , … …
Step 3: Measure Collocational Distributionex) -1 w [ 사람 눈 ] : eye (1,000), snow (0)
k w [ 하늘 within k words] : eye (2), snow (10,000)
Step 4: Sort by Log-Likelihood into Decision Lists
Step 5: Optional Pruning and Interpolation
Step 6: Train Decision Lists for General Classes of Ambiguity
Step 7: Classification using Decision ListsUsing only the single most reliable collocation matched in the target context
)))|Pr(
)|Pr(((
2
1
i
i
nCollocatoisense
nCollocatoisenseLogAbs
Unsupervised Learning Algorithm - 1
Illustrated by the disambiguation of 7,538 instances of plant STEP 1:
Collect contexts in untagged training set (right column of pp. 190)
STEP 2: a) Choose a small number of seed collocations of each sense
b) Tagging all training examples containing the seed collocates with seed’s sense label => two seed sets (left column of pp. 191, Figure 1)
Options for Training Seeds• Use words in dictionary definitions
• Use a single defining collocate for each class (using thesaurus(WordNet))
• Label salient corpus collocates (not fully automatic): – use of words that co-occur with the target word
– a human judge decide which one
Unsupervised Learning Algorithm - 2
STEP 3: (pp. 192, Figure 2)a) Train the supervised classification algorithm on two seed sets
b) Classify the entire sample set using the resulting classifier of (a)
Add examples with probability above a threshold to the seed sets
c) Using one-sense-per-discourse constraint (option)• Detect the dominate sense for each discourse (using threshold).
• Augmentation: If the dominate sense exists, add previously untagged contexts to the seed set of the dominate sense
• Filtering: Otherwise, return all instances in the discourse (where there is substantial disagreement for the dominate sense) to the residual set.
d) Repeat Step 3.• Can escape from initial misclassification
• Two techniques to avoid a local minimum– incrementally increasing the width of the context window periodically
– randomly perturbing the class-inclusion threshold, similar to simulated annealing.
Unsupervised Learning Algorithm - 3
STEP 4: Stop, when converging on a stable residual set.
STEP 5: Classify new data using final decision lists For error correction, optionally use one-sense-per-discourse
constraint.
Evaluation
The test data extracted from a 460 million word corpus the type of data: news article, scientific abstracts, spoken
transcripts, and novels used in the previous researches.
Comparison System (see Table in pp. 194) (5) : using supervised algorithm (6) : using only two words as seeds (7) : using the salient words of a dictionary definition as seeds (8) : using quick hand tagging of a list of algorithmically-identified
salient collocates (9) : (7) + using one-sense-per-discourse only in classification
procedure (10) : (9) + using one-sense-per-discourse in the learning
Conclusion
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods