supervised, semi-supervised and unsu pervised approaches for word sense disambiguation

56
SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES FOR WORD SENSE DISAMBIGUATION Under the guidance of Slides by Arindam Chatterjee & Salil Joshi Prof. Pushpak Bhattacharyya May 01, 2010

Upload: tahlia

Post on 24-Feb-2016

64 views

Category:

Documents


0 download

DESCRIPTION

Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation. Slides by Arindam Chatterjee & Salil Joshi. Under the guidance of. Prof. Pushpak Bhattacharyya. May 01, 2010. roadmap. Bird’s Eye View. Supervised Approaches. Semi-supervised Approaches. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED

APPROACHES FOR

WORD SENSE DISAMBIGUATION

Under the guidance of

Slides byArindam

Chatterjee&

Salil Joshi

Prof. Pushpak BhattacharyyaMay 01, 2010

Page 2: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

ROADMAP1. Bird’s Eye View.2. Supervised Approaches.3. Semi-supervised Approaches.4. Unsupervised Approaches.5. Summary

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 3: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

BIRD’S EYE VIEW

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

WSD Approaches

Machine Learning

Supervised

Unsupervise

d

Semi-

supervised

Knowledge Based

Hybrid

The unifying thread of operation.Distinguishing features of the algorithms.

Page 4: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

4

SUPERVISED APPROACHES

Page 5: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

5

SUPERVISED APPROACHES

TRAINING PHASESUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES

IN WSD

TESTING PHASE

CLASS 1

CLASS 2

CLASS 3(SENSE

1)(SENSE

2)(SENSE

3)

5 TRAINING INSTANCES(WORDS)MODEL TRAINED FROM TRAINING DATA

CLASSIFIED BASED ON ITS FEATURE VECTOR

WSDCLASSES = SENSES

Water, river Money, finance blood, plasma

Money, finance

Page 6: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

FEATURE VECTOR FOR WSD1. In supervised WSD, the feature vector

consists of four features

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

The feature vector consists of the following features:

1. Part Of Speech (POS) of w2. Semantic & Syntactic features of w3. Collocation vector (set of words around it)

typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's.

4. Co-occurrence vector (number of times w occurs in bag of words around it)

Feature 1

Feature 2

Feature 3

Feature 4

Page 7: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SUPERVISED APPROACHESUnifying thread of operation

1. Use of annotated corpora.2. They are all target-word WSD approaches.3. Representation of words as feature vectors.

Algorithms4. Decision List.5. Decision Tree.6. Naïve Bayes.7. Exemplar Based Approach.8. Support Vector Machines.9. Neural Networks.10.Ensemble Methods.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 8: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. DECISION LISTS1. Based on ‘One sense per collocation’

property.– Nearby words provide strong and consistent clues to

the sense of a target word.2. Decision List is an ordered set of if-then-

else rules.– If (feature X) then sense (Si)

3. Each rule is weighted by a score.4. In the Training phase the decision list is

built from evidence in the corpus.5. In the Testing phase, the sense with the

highest score wins.SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES

IN WSD

Page 9: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

9

For a particular word:

1. DECISION LISTS(CONTD.)

SUPERVISED, SEMI-SUPERVISED AND HYBRID APPROACHES IN WSD

1. Features are extracted from the corpus.

2. An ordered decision list of the form {feature-value, sense, score} is created.

3. The score of a feature f is the log-likelihood ratio of the sense given the feature as:

.

( | )( ) max log( | )i

i fj

j i

P S fscore SP S f

TRAINING PHASE

Page 10: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

10

1. DECISION LISTS(CONTD.)

SUPERVISED, SEMI-SUPERVISED AND HYBRID APPROACHES IN WSD

Feature Prediction Scoreaccount with bank bank/FINANCE 4.83standing in bank bank/FINANCE 3.35bank of blood bank/SUPPLY 2.48work in bank bank/FINANCE 2.33the left river bank bank/RIVER 1.12of the bank - 0.01

The decision list for the word bank.(Courtesy Navigli, 2009)

Test Sentence: I went for a walk along the river bank

Page 11: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

11

3.SUPPORT VECTOR MACHINES

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

This distance gives the confidence score for each

SVMA

B

SVM A B

1 S1 S2, S2, S3

2 S2 S1, S3, S4

3 S3 S1, S2, S4

4 S4 S1, S2, S3

E.g., If a word has 4 senses

The SVM with the highest confidence score becomes the

winner sense

Page 12: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

A collection of classifiers (C1, C2, …, Cn) are combined to improve the overall accuracy of WSD system.

3. ENSEMBLE METHODS.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

C1

C2

C3

S1

S2

Total_Score(S1)

Total_Score(S2)

SensesEnsemble Components(Classifiers)

Score Function

For each approach, the score function varies.

Page 13: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Here the score function is a vote function.

The sense with largest number of ‘votes’ is selected as winner sense.

A. MAJORITY VOTING.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

C1

C2

C3

S1

S2

Winner sense

Each ensemble component votes for one sense of targeted word.

( )ˆ argmax |{ : ( ) } |

i DS Senses w j iS j vote C S

Page 14: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

B. PROBABILITY MIXTURE.

Classifier

Sense Confidence score

Normalized score

C1S1 0.6 0.6/0.6 = 1.0S2 0.4 0.4/0.6 = 0.7

C2S1 0.7 0.7/0.7 = 1.0S2 0.3 0.3/0.7 = 0.4

C3S1 0.8 0.8/0.8 = 1.0S2 0.2 0.2/0.8 = 0.3Total_Score(S1) = 1.0 +1.0

+ 1.0 = 3.0Total_Score(S2) = 0.7 +0.4 + 0.3 = 1.4

The scoring function is a confidence score The confidence score is normalized as The normalized scores are summed up and

the sense with maximum sum is selected as the winner sense.

( ) ( , ) max { ( , )}jC i j i k j kP S score C S score C S

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 15: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

B. PROBABILITY MIXTURE.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

C1

C2

C3

S1

S2

Winner sense

0.6/10.4/0.7

0.7/1

0.3/0.4

0.2/0.3

0.8/1

Score = 3.0

Score = 1.4

Confidence Score/Normalized Score

Page 16: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

C. RANK BASED COMBINATION.

Classifier

Sense Ranks Negated Ranks

C1S1 1 -1S2 2 -2

C2S1 2 -2S2 1 -1

C3S1 1 -1S2 2 -2Total_Score: S1 = (-1) + (-2) + (-1) = -4, S2 =

(-2) + (-1) + (-2) = -5

The score function is the rank of each sense.

The ranks are negated and summed up. The sense with the highest sum wins.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

( )1

ˆ argmax ( )i D j

m

S Senses w C ii

S Rank S

Page 17: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

C1

C2

C3

S1

S2

Winner sense

1/-12/-2

2/-2

1/-11/-1

Score = -4

Score = -5

Rank/Negated Rank

2/-2

C. RANK BASED COMBINATION.

Page 18: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

18

SEMI-SUPERVISED APPROACHES

Page 19: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

19

SEMI-SUPERVISED APPROACHES

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Semi-Supervised approaches use

minimal annotated data

Supervised approaches use large annotated

data

Data required reduced

Page 20: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SEMI-SUPERVISED APPROACHES

Unifying thread of operation1. Use of minimal annotated corpora.2. Use of unannotated data for

tuning.Algorithms

3. Bootstrapping .4. Monosemous Relatives .

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 21: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. bootstrapping

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 22: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. bootstrapping

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

An example of Yarowsky’s algorithm. At each iteration, new examples are labeled with class a or b and added to the set A of sense tagged examples.

Courtesy Navigli, 2009

Page 23: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

23

UNSUPERVISEDAPPROACHES

Page 24: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

24

Unsupervised approaches

Supervised, Semi-supervised and Unsupervised Approaches in WSD

Input data•Circles of different size and colors•No associated background knowledge•Implicit features are size and color of balls

Unsupervised Approach I (Clustering based on size of balls)

clusters

Unsupervised Approach II(Clustering based on color of balls)

clusters

Page 25: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Hyperlex: Example showing graph for context of word वीज (electricity/lightning)

• For each high density component, highest degree node is selected as hub.• The procedure is iterated by removing the hub with its neighbors.• For this example, the hubs will be ज्वलन (combustion) and चमक (shine).

Hyperlex (1/2)

Supervised, Semi-supervised and Unsupervised Approaches in WSD 25

धन(positive)

मुक्तता(discharge)

प्रभार(charge)

चमक(shine)

वादळ(thunder)

ऋण(negative)

उजा�(energy)

उष्णता(heat)

इंधन(fuel)

वाफ(steam)

ज्वलन(combustion)जनिनत्र

(turbine)

निनमा�ण(produce)

Page 26: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

26

Hyperlex (2/2)• Example

– जनिनते्र वाफ वापरून वीज प्रभार निनमा�ण करतात. Turbines steam use to electricity produce

(Turbines use steam to produce electricity)

ज्वलन चमकजनिनत्र 0.70 0.00वाफ 1.00 0.00निनमा�ण 0.55 0.00प्रभार 0.00 0.75Total 2.25 0.75

Scores of context words for वीज found using earlier graph.

ज्वलन becomes the winner sense in this case.

Supervised, Semi-supervised and Unsupervised Approaches in WSD

Page 27: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SUMMARY

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Supervised Algorithms:1. Based on human supervision hence the

name.2. Use corpus evidence instead of relying

on knowledge bases.3. Build classifiers to classify words, where

senses are classes.Semi-supervised Algorithms4. Use less information than supervised

approaches.5. Create required information as a part of

the algorithm.Unsupervised Algorithms6. Cluster instances based on inherent

features

Page 28: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SUMMARY

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Supervised Algorithms:1. Perform better than all other approaches,

especially knowledge based.E.g. Can pick up clues from several components

like proper nouns, unlike knowledge based approaches.

2. Depend heavily on large amount of tagged data.

3. Suffer from data sparsity.Semi-supervised Algorithms4. Tend to partially eradicate the knowledge

acquisition bottleneck .5. Works at par with supervised approach.Unsupervised Algorithms6. Performance is good for a limited set of target

words.

Page 29: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

REFERENCES1. AGIRRE, E., AND MARTINEZ, D. Exploring automatic word sense disambiguation with decision

lists and the web. In Proc. of the COLING-2000 (2000). 2. BOSER, B. E., GUYON, I. M., AND VAPNIK, V. N. A training algorithm for optimal margin

classifiers. In Proceedings of the fifth annual workshop on Computational learning theor y (1992), p. 144152.

3. COST, S., AND SALZBERG, S. A weighted nearest neighbor algorithm for learning with symbolic features. Machine learning 10, 1 (1993), 5778.

4. ESCUDERO, G., MARQUEZ, L., AND RIGAU, G. Naive bayes and exemplar-based approaches to word sense disambiguation revisited. Arxiv preprint cs/0007011 (2000).

5. FELLBAUM, C., ET AL. WordNet: An electronic lexical database. MIT press Cambridge, MA, 1998.

6. FREUND, Y., SCHAPIRE, R., AND ABE, N. A short introduction to boosting. JOURNAL-JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE 14 (1999), 771780.

7. KHAPRA, M. M., BHATTACHARYYA, P., CHAUHAN, S., NAIR, S., AND SHARMA, A. Domain specific iterative word sense disambiguation in a multilingual setting.

8. KILGARRIFF, A., AND GREFENSTETTE, G. Introduction to the special issue on the web as corpus. Computational linguistics 29, 3 (2003), 333347.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 30: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

REFERENCES9. KILGARRIFF, A., AND YALLOP, C. Whats in a thesaurus. In Proceedings of the Second

Interna-tional Conference on Language Resources and Evaluation (2000), p. 13711379. 10. LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-

threshold algorithm. Machine learning 2, 4 (1988), 285318. 11. MALLERY, J. C. Thinking about foreign policy: Finding an appropriate role for artificially

intel-ligent computers. Cambridge: Masters Thesis, MIT Political Science Department (1988).

12. MCCULLOCH, W. S., AND PITTS, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology 5, 4 (1943), 115133.

13. MILLER, G., BECKWITH, R., FELLBAUM, C., GROSS, D., AND MILLER, K. J. WordNet: an on-line lexical database. International journal of lexicography 3, 4 (1990), 235312.

14. NAVIGLI, R. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2 (2009). 15. NAVIGLI, R., AND VELARDI, P. Learning domain ontologies from document warehouses

and dedicated web sites. Computational Linguistics 30, 2 (2004), 151179.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 31: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

REFERENCES16. NG, H. T., ET AL. Exemplar-based word sense disambiguation: Some recent improvements. In

Proceedings of the Second Conference on Empirical methods in natural Language Processing (1997), p. 208213.

17. PEDERSEN, T. A simple approach to building ensembles of naive bayesian classifiers f or word sense disambiguation. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (2000), p. 6369.

18. QUINLAN, J. R. Induction of decision trees. Machine learning 1, 1 (1986), 81106. 19. QUINLAN, J. R. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. 20. ROGET, P. M. Roget's International Thesaurus, 1st ed. Cromwell, New York, 1911. 21. ROTH, D., YANG, M., AND AHUJA, N. A snowbased face detector. In Neural Information Processing

(2000), vol. 12. 22. SCHAPIRE, R. E., AND SINGER, Y. Improved boosting algorithms using confidence-rated predic-tions.

Machine learning 37, 3 (1999), 297336. 23. YAROWSKY, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in

spanish and french. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics (1994), p. 8895.

24. YAROWSKY, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics (1995), p. 189196.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 32: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

32

THANK YOU?

Page 33: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

33

APPENDIX

Page 34: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Lexical Sample [Targeted WSD]: System is required to disambiguate a restricted set of target words usually occurring one per sentence. Employs Supervised techniques using Hand-

labeled instances as training set and then an unlabeled test set.

All-words WSD: Systems are expected to disambiguate all open-class words in a text (i.e., nouns, verbs, adjectives, and adverbs). Wide coverage systems to disambiguate all

open-class words. Suffers from Data sparseness problem, as large knowledge sources are not available.

1. WSD : VARIANTS

Page 35: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

2. COLLOCATION VECTOR

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

• Set of words around the target word.

• Typically consists of next word(+1), next-to-next word(+2), -2, -1 & their POS's:[wi−2, POSi−2, wi−1, POSi−1, wi+1, POSi+1, wi+2,POSi+2]

• For example, the sentence :“I usually have grilled bass on

Sunday”and the target word bass, would

yield the following vector:[have, VB, grilled, ADJ, on, PREP, Sunday, NN]

Page 36: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

3. DECISION TREES1. Feature vectors are represented in the form of

a tree.2. The tree is built using ID3(C4.5) algorithm.3. Corresponding to the input sentence, the tree

is traversed.4. The sense at the leaf node reached is the

winner sense.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

4. NAÏVE BAYES1. Applying Bayes’ rule and naive

independence assumption on the features

sˆ= argmax s ε senses Pr(s).Πi=1nPr(Vw

i|s)

Page 37: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Also known as Memory Based or Instance Based Learning approach.

Unlike other Supervised approaches, builds a Classification model by keeping all the training instances in the memory.

Typically implemented using kNN algorithm.

Represented in form of points in feature space.

The new examples are classified by computing distance with all training set examples.

The k-nearest neighbors are found. Class from which largest number of

neighbors are found is selected as the Winner sense.

5. EXEMPLAR BASED APPROACH

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 38: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

The Hamming Distance between the points is calculated using:

Where, • x is the instance to be classified.• xi is the ith training example.• Wj is weight of jth feature, calculated

using gain ration measure [Quinlan, 1993] or using modified value difference metric [Cost & Salzberg, 1993].

• ∂ (xj, xij) is zero if xi = xj and 1 otherwise.

• EXEMPLAR BASED APPROACH(CNTD.)

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

1

( , ) ( , )m

i j j ijj

x x w x x

Page 39: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

39

• WSD is treated as a sequence labeling task.• The class space is reduced by using WordNet's

super senses instead of actual senses.• A discriminative HMM is trained using the

following features:– POS of w as well as POS of neighboring words.– Local collocations– Shape of the word and neighboring words

E.g. for s = “Merrill Lynch & Co shape(s) =Xx*Xx*&Xx

• Lends itself well to NER as labels like “person”, location”, "time” etc are included in the super sense tag set.

6. NEURAL NETWORKS

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 40: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

7. Monosemous relatives

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

• Uses the web as corpus.• Selects a seed of data from the web.• The seed data is minimal.• Then bootstraps and builds large

annotated data.

Page 41: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

8. An iterative approach to wsd

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

• Uses semantic relations (synonymy and hypernymy) form WordNet.

• Extracts collocational and contextual information form WordNet (gloss) and a small amount of tagged data.

• Monosemic words in the context serve as a seed set of disambiguated words.

• In each iteration new words are disambiguated based on their semantic distance from already disambiguated words.

• It would be interesting to exploit other semantic relations available in WordNet.

Page 42: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

42

9. Results: Supervised

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 43: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

43SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

10. Results: Semi-Supervised

Page 44: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

44SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

11. Results: Hybrid

Page 45: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

INTRODUCTION

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Q : What is Word Sense Disambiguation(WSD) ?

John has a bank account

Domain1 : FINANCE

Domain2 : GEOGRAPHY

Domain3 : SUPPLY

Senses of the word “bank”

Target word : bank Context word : account

WSD : Definitions1. Generally: WSD is the ability to identify the sense(meaning) of

words in context in a computational manner. 2. Formally: WSD a mapping A from words to senses, such that A(i)

⊆ SensesD (wi ).Where:SensesD(wi) : Set of senses encoded in a dictionary D for word wi .A(i) : That subset of the senses of wi which are appropriate in the context T.3. As a classification problem: Where senses are classes.

WINNER SENSE

Page 46: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

MOTIVATION

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

WSD

MTNER

SASP

SRL

CLIR

TE

1. WSD: As the Heart NLP

2. WSD IS AN AI-COMPLETE PROBLEM: It is as hard as the hardest problems in AI, like

representation of common sense

SRL : Semantic Role Labeling  TE : Text Entailment  CLIR : Cross Lingual Information Retrieval NER : Named Entity Recognition MT : Machine Translation  SP : Shallow Parsing  SA : Sentiment Analysis  WSD : Word Sense Disambiguation

Page 47: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. Each instance is assigned equal weight initially.

2. In each pass of the iteration, the weights of misclassified instances are increased.

3. A value αj is calculated for each classifier, which is a function of the classification error for classifier Cj

D. ADABOOST.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

STEPS

i. Constructs strong classifier as a linear combination of two or more weak classifiers.

ii. The method is adaptive because it adjusts the weak classifiers so that it correctly classifies previously misclassified instances.

iii. The algorithm iterates m times, if there are m classifiers.

Page 48: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

4. A classifiers are then combined by the function ‘H’ for instance x.

• H is the strong classifier, which is a linear combination of the other weak classifiers.

• It is a sign function of the linear combination of the weak classifiers.

D. ADABOOST.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

STEPS(CTD.)

1

( ) ( )m

j ji

H x sign C x

Page 49: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

FUTURE DIRECTIONS

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

1. Development of better sense recognition systems.

2. Eradication of knowledge acquisition bottleneck.

3. More attention needs to be paid towards Domain Specific approach in WSD.

4. If larger annotated corpora can be built then the accuracy of supervised approaches will shoot higher.

Page 50: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

50

2.SUPPORT VECTOR MACHINES• SVM is a binary classifier which finds a hyper plane

with the largest margin that separates training examples into 2 classes.

• As SVMs are binary classifiers, a separate classifier is built for each sense of the word.

• Training Phase: Using a tagged corpus, for every sense of the word a SVM is trained using features.

• Testing Phase: Given a test sentence, a test example is constructed using the features and fed as input to each binary classifier.

• The correct sense is selected based on the label returned by each classifier.

• In case of a clash, the SVM with higher confidence score is returned.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 51: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

Supervised, Semi-supervised and Unsupervised Approaches in WSD

51

HYBRIDAPPROACHES

Page 52: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

52

HYBRID APPROACHES

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Knowledge baseHuman

Supervision(annotated

data)Hybrid Approach

Page 53: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

HYBRID APPROACHESUnifying thread of operation

1. Combine information obtained from multiple knowledge sources.

2. Use a very small amount of tagged data.

Algorithms3. Sense Learner.4. Iterative WSD.

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Page 54: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. SENSE LEARNER

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

• Uses some tagged data to build a semantic language model for words seen in the training corpus.

• Uses WordNet to derive semantic generalizations for words which are not observed in the corpus.

Semantic Language Model• Each training example is represented as

a feature vector and a class label which is word & sense

• In the testing phase, for each test sentence, a similar feature vector is constructed.

• The trained classifier is used to predict the word and the sense.

• If the predicted word is same as the observed word then the predicted sense is selected as the correct sense.

Page 55: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. SENSE LEARNER

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

Semantic Generalizations• Uses semantic dependencies form

the WordNet.• Labels a more general concept,

higher in the WordNet.• More training data can be found.For e.g. • if “drink water” is observed in the

corpus then using the hypernymy tree we can derive the syntactic dependency “take-in liquid”

• “take-in liquid” can then be used to disambiguate an instance of the word tea as in “take tea”, by using the hypernymy-hyponymy relations.

Page 56: Supervised, semi-supervised and  Unsu pervised  approaches  for  word sense disambiguation

1. bootstrapping

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED APPROACHES IN WSD

I. Based on Yarowsky’s supervised algorithm that uses Decision Lists.

II. Uses two heuristics:1. “One sense per discourse” property• A word is referred to by the same

sense in a discourse (document).2. ‘One sense per collocation’ property.• Nearby words provide strong and

consistent clues to the sense of a target word.

III. Co-training: If the classifiers are alternated between iterations.

Self-training: If only 1 classifier used (Yarowsky).