supervised, semi-supervised and unsu pervised approaches for word sense disambiguation
DESCRIPTION
Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation. Slides by Arindam Chatterjee & Salil Joshi. Under the guidance of. Prof. Pushpak Bhattacharyya. May 01, 2010. roadmap. Bird’s Eye View. Supervised Approaches. Semi-supervised Approaches. - PowerPoint PPT PresentationTRANSCRIPT
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED
APPROACHES FOR
WORD SENSE DISAMBIGUATION
Under the guidance of
Slides byArindam
Chatterjee&
Salil Joshi
Prof. Pushpak BhattacharyyaMay 01, 2010
ROADMAP1. Bird’s Eye View.2. Supervised Approaches.3. Semi-supervised Approaches.4. Unsupervised Approaches.5. Summary
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
BIRD’S EYE VIEW
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
WSD Approaches
Machine Learning
Supervised
Unsupervise
d
Semi-
supervised
Knowledge Based
Hybrid
The unifying thread of operation.Distinguishing features of the algorithms.
Supervised, Semi-supervised and Unsupervised Approaches in WSD
4
SUPERVISED APPROACHES
5
SUPERVISED APPROACHES
TRAINING PHASESUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES
IN WSD
TESTING PHASE
CLASS 1
CLASS 2
CLASS 3(SENSE
1)(SENSE
2)(SENSE
3)
5 TRAINING INSTANCES(WORDS)MODEL TRAINED FROM TRAINING DATA
CLASSIFIED BASED ON ITS FEATURE VECTOR
WSDCLASSES = SENSES
Water, river Money, finance blood, plasma
Money, finance
FEATURE VECTOR FOR WSD1. In supervised WSD, the feature vector
consists of four features
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
The feature vector consists of the following features:
1. Part Of Speech (POS) of w2. Semantic & Syntactic features of w3. Collocation vector (set of words around it)
typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's.
4. Co-occurrence vector (number of times w occurs in bag of words around it)
Feature 1
Feature 2
Feature 3
Feature 4
SUPERVISED APPROACHESUnifying thread of operation
1. Use of annotated corpora.2. They are all target-word WSD approaches.3. Representation of words as feature vectors.
Algorithms4. Decision List.5. Decision Tree.6. Naïve Bayes.7. Exemplar Based Approach.8. Support Vector Machines.9. Neural Networks.10.Ensemble Methods.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
1. DECISION LISTS1. Based on ‘One sense per collocation’
property.– Nearby words provide strong and consistent clues to
the sense of a target word.2. Decision List is an ordered set of if-then-
else rules.– If (feature X) then sense (Si)
3. Each rule is weighted by a score.4. In the Training phase the decision list is
built from evidence in the corpus.5. In the Testing phase, the sense with the
highest score wins.SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES
IN WSD
Supervised, Semi-supervised and Unsupervised Approaches in WSD
9
For a particular word:
1. DECISION LISTS(CONTD.)
SUPERVISED, SEMI-SUPERVISED AND HYBRID APPROACHES IN WSD
1. Features are extracted from the corpus.
2. An ordered decision list of the form {feature-value, sense, score} is created.
3. The score of a feature f is the log-likelihood ratio of the sense given the feature as:
.
( | )( ) max log( | )i
i fj
j i
P S fscore SP S f
TRAINING PHASE
Supervised, Semi-supervised and Unsupervised Approaches in WSD
10
1. DECISION LISTS(CONTD.)
SUPERVISED, SEMI-SUPERVISED AND HYBRID APPROACHES IN WSD
Feature Prediction Scoreaccount with bank bank/FINANCE 4.83standing in bank bank/FINANCE 3.35bank of blood bank/SUPPLY 2.48work in bank bank/FINANCE 2.33the left river bank bank/RIVER 1.12of the bank - 0.01
The decision list for the word bank.(Courtesy Navigli, 2009)
Test Sentence: I went for a walk along the river bank
11
3.SUPPORT VECTOR MACHINES
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
This distance gives the confidence score for each
SVMA
B
SVM A B
1 S1 S2, S2, S3
2 S2 S1, S3, S4
3 S3 S1, S2, S4
4 S4 S1, S2, S3
E.g., If a word has 4 senses
The SVM with the highest confidence score becomes the
winner sense
A collection of classifiers (C1, C2, …, Cn) are combined to improve the overall accuracy of WSD system.
3. ENSEMBLE METHODS.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
C1
C2
C3
S1
S2
Total_Score(S1)
Total_Score(S2)
SensesEnsemble Components(Classifiers)
Score Function
For each approach, the score function varies.
Here the score function is a vote function.
The sense with largest number of ‘votes’ is selected as winner sense.
A. MAJORITY VOTING.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
C1
C2
C3
S1
S2
Winner sense
Each ensemble component votes for one sense of targeted word.
( )ˆ argmax |{ : ( ) } |
i DS Senses w j iS j vote C S
B. PROBABILITY MIXTURE.
Classifier
Sense Confidence score
Normalized score
C1S1 0.6 0.6/0.6 = 1.0S2 0.4 0.4/0.6 = 0.7
C2S1 0.7 0.7/0.7 = 1.0S2 0.3 0.3/0.7 = 0.4
C3S1 0.8 0.8/0.8 = 1.0S2 0.2 0.2/0.8 = 0.3Total_Score(S1) = 1.0 +1.0
+ 1.0 = 3.0Total_Score(S2) = 0.7 +0.4 + 0.3 = 1.4
The scoring function is a confidence score The confidence score is normalized as The normalized scores are summed up and
the sense with maximum sum is selected as the winner sense.
( ) ( , ) max { ( , )}jC i j i k j kP S score C S score C S
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
B. PROBABILITY MIXTURE.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
C1
C2
C3
S1
S2
Winner sense
0.6/10.4/0.7
0.7/1
0.3/0.4
0.2/0.3
0.8/1
Score = 3.0
Score = 1.4
Confidence Score/Normalized Score
C. RANK BASED COMBINATION.
Classifier
Sense Ranks Negated Ranks
C1S1 1 -1S2 2 -2
C2S1 2 -2S2 1 -1
C3S1 1 -1S2 2 -2Total_Score: S1 = (-1) + (-2) + (-1) = -4, S2 =
(-2) + (-1) + (-2) = -5
The score function is the rank of each sense.
The ranks are negated and summed up. The sense with the highest sum wins.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
( )1
ˆ argmax ( )i D j
m
S Senses w C ii
S Rank S
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
C1
C2
C3
S1
S2
Winner sense
1/-12/-2
2/-2
1/-11/-1
Score = -4
Score = -5
Rank/Negated Rank
2/-2
C. RANK BASED COMBINATION.
Supervised, Semi-supervised and Unsupervised Approaches in WSD
18
SEMI-SUPERVISED APPROACHES
19
SEMI-SUPERVISED APPROACHES
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Semi-Supervised approaches use
minimal annotated data
Supervised approaches use large annotated
data
Data required reduced
SEMI-SUPERVISED APPROACHES
Unifying thread of operation1. Use of minimal annotated corpora.2. Use of unannotated data for
tuning.Algorithms
3. Bootstrapping .4. Monosemous Relatives .
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
1. bootstrapping
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
1. bootstrapping
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
An example of Yarowsky’s algorithm. At each iteration, new examples are labeled with class a or b and added to the set A of sense tagged examples.
Courtesy Navigli, 2009
Supervised, Semi-supervised and Unsupervised Approaches in WSD
23
UNSUPERVISEDAPPROACHES
24
Unsupervised approaches
Supervised, Semi-supervised and Unsupervised Approaches in WSD
Input data•Circles of different size and colors•No associated background knowledge•Implicit features are size and color of balls
Unsupervised Approach I (Clustering based on size of balls)
clusters
Unsupervised Approach II(Clustering based on color of balls)
clusters
Hyperlex: Example showing graph for context of word वीज (electricity/lightning)
• For each high density component, highest degree node is selected as hub.• The procedure is iterated by removing the hub with its neighbors.• For this example, the hubs will be ज्वलन (combustion) and चमक (shine).
Hyperlex (1/2)
Supervised, Semi-supervised and Unsupervised Approaches in WSD 25
धन(positive)
मुक्तता(discharge)
प्रभार(charge)
चमक(shine)
वादळ(thunder)
ऋण(negative)
उजा�(energy)
उष्णता(heat)
इंधन(fuel)
वाफ(steam)
ज्वलन(combustion)जनिनत्र
(turbine)
निनमा�ण(produce)
26
Hyperlex (2/2)• Example
– जनिनते्र वाफ वापरून वीज प्रभार निनमा�ण करतात. Turbines steam use to electricity produce
(Turbines use steam to produce electricity)
ज्वलन चमकजनिनत्र 0.70 0.00वाफ 1.00 0.00निनमा�ण 0.55 0.00प्रभार 0.00 0.75Total 2.25 0.75
Scores of context words for वीज found using earlier graph.
ज्वलन becomes the winner sense in this case.
Supervised, Semi-supervised and Unsupervised Approaches in WSD
SUMMARY
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Supervised Algorithms:1. Based on human supervision hence the
name.2. Use corpus evidence instead of relying
on knowledge bases.3. Build classifiers to classify words, where
senses are classes.Semi-supervised Algorithms4. Use less information than supervised
approaches.5. Create required information as a part of
the algorithm.Unsupervised Algorithms6. Cluster instances based on inherent
features
SUMMARY
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Supervised Algorithms:1. Perform better than all other approaches,
especially knowledge based.E.g. Can pick up clues from several components
like proper nouns, unlike knowledge based approaches.
2. Depend heavily on large amount of tagged data.
3. Suffer from data sparsity.Semi-supervised Algorithms4. Tend to partially eradicate the knowledge
acquisition bottleneck .5. Works at par with supervised approach.Unsupervised Algorithms6. Performance is good for a limited set of target
words.
REFERENCES1. AGIRRE, E., AND MARTINEZ, D. Exploring automatic word sense disambiguation with decision
lists and the web. In Proc. of the COLING-2000 (2000). 2. BOSER, B. E., GUYON, I. M., AND VAPNIK, V. N. A training algorithm for optimal margin
classifiers. In Proceedings of the fifth annual workshop on Computational learning theor y (1992), p. 144152.
3. COST, S., AND SALZBERG, S. A weighted nearest neighbor algorithm for learning with symbolic features. Machine learning 10, 1 (1993), 5778.
4. ESCUDERO, G., MARQUEZ, L., AND RIGAU, G. Naive bayes and exemplar-based approaches to word sense disambiguation revisited. Arxiv preprint cs/0007011 (2000).
5. FELLBAUM, C., ET AL. WordNet: An electronic lexical database. MIT press Cambridge, MA, 1998.
6. FREUND, Y., SCHAPIRE, R., AND ABE, N. A short introduction to boosting. JOURNAL-JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE 14 (1999), 771780.
7. KHAPRA, M. M., BHATTACHARYYA, P., CHAUHAN, S., NAIR, S., AND SHARMA, A. Domain specific iterative word sense disambiguation in a multilingual setting.
8. KILGARRIFF, A., AND GREFENSTETTE, G. Introduction to the special issue on the web as corpus. Computational linguistics 29, 3 (2003), 333347.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
REFERENCES9. KILGARRIFF, A., AND YALLOP, C. Whats in a thesaurus. In Proceedings of the Second
Interna-tional Conference on Language Resources and Evaluation (2000), p. 13711379. 10. LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-
threshold algorithm. Machine learning 2, 4 (1988), 285318. 11. MALLERY, J. C. Thinking about foreign policy: Finding an appropriate role for artificially
intel-ligent computers. Cambridge: Masters Thesis, MIT Political Science Department (1988).
12. MCCULLOCH, W. S., AND PITTS, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology 5, 4 (1943), 115133.
13. MILLER, G., BECKWITH, R., FELLBAUM, C., GROSS, D., AND MILLER, K. J. WordNet: an on-line lexical database. International journal of lexicography 3, 4 (1990), 235312.
14. NAVIGLI, R. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2 (2009). 15. NAVIGLI, R., AND VELARDI, P. Learning domain ontologies from document warehouses
and dedicated web sites. Computational Linguistics 30, 2 (2004), 151179.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
REFERENCES16. NG, H. T., ET AL. Exemplar-based word sense disambiguation: Some recent improvements. In
Proceedings of the Second Conference on Empirical methods in natural Language Processing (1997), p. 208213.
17. PEDERSEN, T. A simple approach to building ensembles of naive bayesian classifiers f or word sense disambiguation. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (2000), p. 6369.
18. QUINLAN, J. R. Induction of decision trees. Machine learning 1, 1 (1986), 81106. 19. QUINLAN, J. R. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. 20. ROGET, P. M. Roget's International Thesaurus, 1st ed. Cromwell, New York, 1911. 21. ROTH, D., YANG, M., AND AHUJA, N. A snowbased face detector. In Neural Information Processing
(2000), vol. 12. 22. SCHAPIRE, R. E., AND SINGER, Y. Improved boosting algorithms using confidence-rated predic-tions.
Machine learning 37, 3 (1999), 297336. 23. YAROWSKY, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in
spanish and french. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics (1994), p. 8895.
24. YAROWSKY, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (1995), p. 189196.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Supervised, Semi-supervised and Unsupervised Approaches in WSD
32
THANK YOU?
Supervised, Semi-supervised and Unsupervised Approaches in WSD
33
APPENDIX
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Lexical Sample [Targeted WSD]: System is required to disambiguate a restricted set of target words usually occurring one per sentence. Employs Supervised techniques using Hand-
labeled instances as training set and then an unlabeled test set.
All-words WSD: Systems are expected to disambiguate all open-class words in a text (i.e., nouns, verbs, adjectives, and adverbs). Wide coverage systems to disambiguate all
open-class words. Suffers from Data sparseness problem, as large knowledge sources are not available.
1. WSD : VARIANTS
2. COLLOCATION VECTOR
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
• Set of words around the target word.
• Typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's:[wi−2, POSi−2, wi−1, POSi−1, wi+1, POSi+1, wi+2,POSi+2]
• For example, the sentence :“I usually have grilled bass on
Sunday”and the target word bass, would
yield the following vector:[have, VB, grilled, ADJ, on, PREP, Sunday, NN]
3. DECISION TREES1. Feature vectors are represented in the form of
a tree.2. The tree is built using ID3(C4.5) algorithm.3. Corresponding to the input sentence, the tree
is traversed.4. The sense at the leaf node reached is the
winner sense.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
4. NAÏVE BAYES1. Applying Bayes’ rule and naive
independence assumption on the features
sˆ= argmax s ε senses Pr(s).Πi=1nPr(Vw
i|s)
Also known as Memory Based or Instance Based Learning approach.
Unlike other Supervised approaches, builds a Classification model by keeping all the training instances in the memory.
Typically implemented using kNN algorithm.
Represented in form of points in feature space.
The new examples are classified by computing distance with all training set examples.
The k-nearest neighbors are found. Class from which largest number of
neighbors are found is selected as the Winner sense.
5. EXEMPLAR BASED APPROACH
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
The Hamming Distance between the points is calculated using:
Where, • x is the instance to be classified.• xi is the ith training example.• Wj is weight of jth feature, calculated
using gain ration measure [Quinlan, 1993] or using modified value difference metric [Cost & Salzberg, 1993].
• ∂ (xj, xij) is zero if xi = xj and 1 otherwise.
• EXEMPLAR BASED APPROACH(CNTD.)
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
1
( , ) ( , )m
i j j ijj
x x w x x
39
• WSD is treated as a sequence labeling task.• The class space is reduced by using WordNet's
super senses instead of actual senses.• A discriminative HMM is trained using the
following features:– POS of w as well as POS of neighboring words.– Local collocations– Shape of the word and neighboring words
E.g. for s = “Merrill Lynch & Co shape(s) =Xx*Xx*&Xx
• Lends itself well to NER as labels like “person”, location”, "time” etc are included in the super sense tag set.
6. NEURAL NETWORKS
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
7. Monosemous relatives
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
• Uses the web as corpus.• Selects a seed of data from the web.• The seed data is minimal.• Then bootstraps and builds large
annotated data.
8. An iterative approach to wsd
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
• Uses semantic relations (synonymy and hypernymy) form WordNet.
• Extracts collocational and contextual information form WordNet (gloss) and a small amount of tagged data.
• Monosemic words in the context serve as a seed set of disambiguated words.
• In each iteration new words are disambiguated based on their semantic distance from already disambiguated words.
• It would be interesting to exploit other semantic relations available in WordNet.
42
9. Results: Supervised
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
43SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
10. Results: Semi-Supervised
44SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
11. Results: Hybrid
INTRODUCTION
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Q : What is Word Sense Disambiguation(WSD) ?
John has a bank account
Domain1 : FINANCE
Domain2 : GEOGRAPHY
Domain3 : SUPPLY
Senses of the word “bank”
Target word : bank Context word : account
WSD : Definitions1. Generally: WSD is the ability to identify the sense(meaning) of
words in context in a computational manner. 2. Formally: WSD a mapping A from words to senses, such that A(i)
⊆ SensesD (wi ).Where:SensesD(wi) : Set of senses encoded in a dictionary D for word wi .A(i) : That subset of the senses of wi which are appropriate in the context T.3. As a classification problem: Where senses are classes.
WINNER SENSE
MOTIVATION
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
WSD
MTNER
SASP
SRL
CLIR
TE
1. WSD: As the Heart NLP
2. WSD IS AN AI-COMPLETE PROBLEM: It is as hard as the hardest problems in AI, like
representation of common sense
SRL : Semantic Role Labeling TE : Text Entailment CLIR : Cross Lingual Information Retrieval NER : Named Entity Recognition MT : Machine Translation SP : Shallow Parsing SA : Sentiment Analysis WSD : Word Sense Disambiguation
1. Each instance is assigned equal weight initially.
2. In each pass of the iteration, the weights of misclassified instances are increased.
3. A value αj is calculated for each classifier, which is a function of the classification error for classifier Cj
D. ADABOOST.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
STEPS
i. Constructs strong classifier as a linear combination of two or more weak classifiers.
ii. The method is adaptive because it adjusts the weak classifiers so that it correctly classifies previously misclassified instances.
iii. The algorithm iterates m times, if there are m classifiers.
4. A classifiers are then combined by the function ‘H’ for instance x.
• H is the strong classifier, which is a linear combination of the other weak classifiers.
• It is a sign function of the linear combination of the weak classifiers.
D. ADABOOST.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
STEPS(CTD.)
1
( ) ( )m
j ji
H x sign C x
FUTURE DIRECTIONS
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
1. Development of better sense recognition systems.
2. Eradication of knowledge acquisition bottleneck.
3. More attention needs to be paid towards Domain Specific approach in WSD.
4. If larger annotated corpora can be built then the accuracy of supervised approaches will shoot higher.
50
2.SUPPORT VECTOR MACHINES• SVM is a binary classifier which finds a hyper plane
with the largest margin that separates training examples into 2 classes.
• As SVMs are binary classifiers, a separate classifier is built for each sense of the word.
• Training Phase: Using a tagged corpus, for every sense of the word a SVM is trained using features.
• Testing Phase: Given a test sentence, a test example is constructed using the features and fed as input to each binary classifier.
• The correct sense is selected based on the label returned by each classifier.
• In case of a clash, the SVM with higher confidence score is returned.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Supervised, Semi-supervised and Unsupervised Approaches in WSD
51
HYBRIDAPPROACHES
52
HYBRID APPROACHES
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Knowledge baseHuman
Supervision(annotated
data)Hybrid Approach
HYBRID APPROACHESUnifying thread of operation
1. Combine information obtained from multiple knowledge sources.
2. Use a very small amount of tagged data.
Algorithms3. Sense Learner.4. Iterative WSD.
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
1. SENSE LEARNER
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
• Uses some tagged data to build a semantic language model for words seen in the training corpus.
• Uses WordNet to derive semantic generalizations for words which are not observed in the corpus.
Semantic Language Model• Each training example is represented as
a feature vector and a class label which is word & sense
• In the testing phase, for each test sentence, a similar feature vector is constructed.
• The trained classifier is used to predict the word and the sense.
• If the predicted word is same as the observed word then the predicted sense is selected as the correct sense.
1. SENSE LEARNER
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
Semantic Generalizations• Uses semantic dependencies form
the WordNet.• Labels a more general concept,
higher in the WordNet.• More training data can be found.For e.g. • if “drink water” is observed in the
corpus then using the hypernymy tree we can derive the syntactic dependency “take-in liquid”
• “take-in liquid” can then be used to disambiguate an instance of the word tea as in “take tea”, by using the hypernymy-hyponymy relations.
1. bootstrapping
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD
I. Based on Yarowsky’s supervised algorithm that uses Decision Lists.
II. Uses two heuristics:1. “One sense per discourse” property• A word is referred to by the same
sense in a discourse (document).2. ‘One sense per collocation’ property.• Nearby words provide strong and
consistent clues to the sense of a target word.
III. Co-training: If the classifiers are alternated between iterations.
Self-training: If only 1 classifier used (Yarowsky).