11 chapter 20 part 3 computational lexical semantics acknowledgements: these slides include material...

69
1 1 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova

Upload: blake-lynch

Post on 12-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

11

Chapter 20Part 3

Computational Lexical Semantics

Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani

Nenkova

Page 2: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

2

Similarity Metrics

• Similarity metrics are useful not just for word sense disambiguation, but also for:– Finding topics of documents– Representing word meanings, not with respect

to a fixed sense inventory

• We will start with dictionary based methods and then look at vector space models

Page 3: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

3

Thesaurus-based word similarity

• We could use anything in the thesaurus– Meronymy– Glosses– Example sentences

• In practice– By “thesaurus-based” we just mean

• Using the is-a/subsumption/hypernym hierarchy

• Can define similarity between words or between senses

Page 4: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

4

Path based similarity

• Two senses are similar if nearby in thesaurus hierarchy (i.e. short path between them)

Page 5: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

5

path-based similarity

• pathlen(c1,c2) = number of edges in the shortest path between the sense nodes c1 and c2

• wordsim(w1,w2) =– maxc1senses(w1),c2senses(w2) pathlen(c1,c2)

Page 6: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

6

Problem with basic path-based similarity

• Assumes each link represents a uniform distance

• But, some areas of WordNet are more developed than others

• Depended on the people who created it

• Also, links deep in the hierarchy are intuitively more narrow than links higher up [on slide 4, e.g., nickel to money vs nickel to standard]

Page 7: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

7

Information content similarity metrics

• Let’s define P(C) as:– The probability that a randomly selected word

in a corpus is an instance of concept c– A word is an instance of a concept if it appears

below the concept in the WordNet hierarchy– We saw this idea when we covered selectional

preferences

Page 8: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

8

In particular

– If there is a single node that is the ancestor of all nodes, then its probability is 1

– The lower a node in the hierarchy, the lower its probability

– An occurrence of the word dime would count towards the frequency of coin, currency, standard, etc.

Page 9: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

9

Information content similarity

• Train by counting in a corpus– 1 instance of “dime” could count toward

frequency of coin, currency, standard, etc

• More formally:

Here N is the total number of words (tokens) in the corpus that are also in the thesaurus

Page 10: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

10

Information content similarity

WordNet hierararchy augmented with probabilities P(C)

Page 11: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

11

Information content: definitions

• Information content:– IC(c)=-logP(c)

• Lowest common subsumer LCS(c1,c2) – I.e. the lowest node in the hierarchy– That subsumes (is a hypernym of)

both c1 and c2

Page 12: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

12

Resnik method

• The similarity between two senses is related to their common information

• The more two senses have in common, the more similar they are

• Resnik: measure the common information as:

– The info content of the lowest common subsumer of the two senses

– simresnik(c1,c2) = -log P(LCS(c1,c2))

Page 13: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Example Use:

• Yaw Gyamfi, Janyce Wiebe, Rada Mihalcea, and Cem Akkaya (2009). Integrating Knowledge for Subjectivity Sense Labeling. HLT-NAACL 2009.

13

Page 14: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

What is Subjectivity?

• The linguistic expression of somebody’s opinions, sentiments, emotions, evaluations, beliefs, speculations (private states)

This particular use of subjectivity was adapted from literary theory Banfield 1982; Wiebe 1990

Page 15: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Examples of Subjective Expressions

• References to private states– She was enthusiastic about the plan

• Descriptions– That would lead to disastrous consequences– What a freak show

Page 16: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Subjectivity Analysis

• Automatic extraction of subjectivity (opinions) from text or dialog

Page 17: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Subjectivity Analysis: Applications

• Opinion-oriented question answering: How do the Chinese regard the human rights record of the United States?

• Product review mining: What features of the ThinkPad T43 do customers like and which do they dislike?

• Review classification: Is a review positive or negative toward the movie?

• Tracking sentiments toward topics over time: Is anger ratcheting up or cooling down?

• Etc.

Page 18: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Subjectivity Lexicons

• Most approaches to subjectivity and sentiment analysis exploit subjectivity lexicons. – Lists of keywords that have been gathered

together because they have subjective uses Brilliant

DifferenceHate

InterestLove

Page 19: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Automatically Identifying Subjective Words

• Much work in this areaHatzivassiloglou & McKeown ACL97 Wiebe AAAI00Turney ACL02Kamps & Marx 2002Wiebe, Riloff, & Wilson CoNLL03Yu & Hatzivassiloglou EMNLP03Kim & Hovy IJCNLP05Esuli & Sebastiani CIKM05Andreevskaia & Bergler EACL06Etc.

Subjectivity Lexicon available at : http://www.cs.pitt.edu/mpqaEntries from several sources

Page 20: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

However…

• Consider the keyword “interest”

• It is in the subjectivity lexicon

• But, what about “interest rate,” for example?

Page 21: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

WordNet Senses

Interest, involvement -- (a sense of concern with

and curiosity about someone or something; "an interest in music")

Interest -- (a fixed charge for borrowing money;

usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?")

Page 22: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

WordNet Senses

Interest, involvement -- (a sense of concern with

and curiosity about someone or something; "an interest in music")

Interest -- (a fixed charge for borrowing money;

usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?")

S

O

Page 23: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Senses

• Even in subjectivity lexicons, many senses of the keywords are objective

• Thus, many appearances of keywords in texts are false hits

Page 24: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

WordNet Miller 1995; Fellbaum 1998

Page 25: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Examples

• “There are many differences between African and Asian elephants.”

• “… dividing by the absolute value of the difference from the mean…”

• “Their differences only grew as they spent more time together …”

• “Her support really made a difference in my life”• “The difference after subtracting X from Y…”

Page 26: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Our Task: Subjectivity Sense Labeling

• Automatically classifying senses as subjective or objective

• Purpose: exploit labels to improve– Word sense diambiguation Wiebe and Mihalcea ACL06

– Automatic subjectivity and sentiment analysis systems Akkaya, Wiebe, Mihalcea (2009,2010,2011,2012,2014)

Page 27: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

SubjectivityOr Sentiment

Classifier

Subjectivity Tagging using Subjectivity WSD

SWSDSystem

S O?

Sense O {1, 2, 5}

Sense S {3,4}

S O?

Difference sense#1 O sense#2 O sense#3 S sense#4 S sense#5 O

“There are many differences between African and Asian elephants.”

“Their differences only grew as they spent more time together …”

Page 28: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

SubjectivityOr Sentiment

Classifier

Subjectivity Tagging using Subjectivity WSD

SWSDSystem

S O

Sense O {1, 2, 5}

Sense S {3,4}

S O

Difference sense#1 O sense#2 O sense#3 S sense#4 S sense#5 O

“There are many differences between African and Asian elephants.”

“Their differences only grew as they spent more time together …”

Page 29: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Seed sense

LCS

Target sense

Using Hierarchical Structure

Page 30: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Using Hierarchical Structure

voice#1 (objective)

LCS

Page 31: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

• If you are interested in the entire approach and experiments, please see the paper (it is on my website)

31

Page 32: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Dekang Lin method

• Intuition: Similarity between A and B is not just what they have in common

• The more differences between A and B, the less similar they are:– Commonality: the more A and B have in common, the more similar they are

– Difference: the more differences between A and B, the less similar

• Commonality: IC(common(A,B))

• Difference: IC(description(A,B))-IC(common(A,B))

Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. ICML

Page 33: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Dekang Lin similarity theorem

• The similarity between A and B is measured by the ratio between the amount of information needed to state the commonality of A and B and the information needed to fully describe what A and B are

• Lin (altering Resnik) defines:

Page 34: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Lin similarity function

Page 35: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

35

Summary: thesaurus-based similarity between senses

• There are many metrics (you don’t have to memorize these)

Page 36: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Using Thesaurus-Based Similarity for WSD

• One specific method (Banerjee & Pedersen 2003):

• For sense k of target word t:– SenseScore[k] = 0– For each word w appearing within –N and +N

of t:• For each sense s of w:

– SenseScore[k] += similarity(k,s)

• The sense with the highest SenseScore is assigned to the target word

36

Page 37: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Problems with thesaurus-based meaning

• We don’t have a thesaurus for every language• Even if we do, they have problems with recall

– Many words are missing

– Most (if not all) phrases are missing

– Some connections between senses are missing

– Thesauri work less well for verbs, adjectives

• Adjectives and verbs have less structured hyponymy relations

Page 38: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Distributional models of meaning

• Also called vector-space models of meaning• Offer much higher recall than hand-built thesauri

– Although they tend to have lower precision

• Zellig Harris (1954): “oculist and eye-doctor … occur in almost the same environments…. If A and B have almost identical environments we say that they are synonyms.

• Firth (1957): “You shall know a word by the company it keeps!”

•38

Page 39: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Intuition of distributional word similarity

• Nida example:A bottle of tesgüino is on the tableEverybody likes tesgüinoTesgüino makes you drunkWe make tesgüino out of corn.

• From context words humans can guess tesgüino means

– an alcoholic beverage like beer• Intuition for algorithm:

– Two words are similar if they have similar word contexts.

Page 40: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Reminder: Term-document matrix

• Each cell: count of term t in a document d: tft,d: – Each document is a count vector: a column below

•40

Page 41: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Reminder: Term-document matrix

• Two documents are similar if their vectors are similar

•41

Page 42: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

The words in a term-document matrix

• Each word is a count vector: a row below

•42

Page 43: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

The words in a term-document matrix

• Two words are similar if their vectors are similar

•43

Page 44: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

The Term-Context matrix

• Instead of using entire documents, use smaller contexts– Paragraph

– Window of 10 words

• A word is now defined by a vector over counts of context words

•44

Page 45: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Sample contexts: 20 words (Brown corpus)

• equal amount of sugar, a sliced lemon, a tablespoonful of apricot preserve or jam, a pinch each of clove and nutmeg,

• on board for their enjoyment. Cautiously she sampled her first pineapple and another fruit whose taste she likened to that of

•45

• of a recursive type well suited to programming on the digital computer. In finding the optimal R-stage policy from that of

• substantially affect commerce, for the purpose of gathering data and information necessary for the study authorized in the first section of this

Page 46: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Term-context matrix for word similarity

• Two words are similar in meaning if their context vectors are similar

•46

Page 47: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Should we use raw counts?

• For the term-document matrix– We used tf-idf instead of raw term counts

• For the term-context matrix– Positive Pointwise Mutual Information (PPMI) is

common

•47

Page 48: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Pointwise Mutual Information

• Pointwise mutual information: – Do events x and y co-occur more than if they were independent?

– PMI between two words: (Church & Hanks 1989)

– Do words x and y co-occur more than if they were independent?

– Positive PMI between two words (Niwa & Nitta 1994)

– Replace all PMI values less than 0 with zero

Page 49: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Computing PPMI on a term-context matrix

• Matrix F with W rows (words) and C columns (contexts)

• fij is # of times wi occurs in context cj

•49

Page 50: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

p(w=information,c=data) =

p(w=information) =

p(c=data) =

•50

= .326/19

11/19 = .58

7/19 = .37

Page 51: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

•51

• pmi(information,data)= log2 (.32/(.37*.58)) =.58

Page 52: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Weighing PMI

• PMI is biased toward infrequent events• Various weighting schemes help alleviate this

– See Turney and Pantel (2010)– Add-one smoothing can also help

•52

Page 53: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Summary: vector space models

• Representing meaning through counts– Represent document/sentence/context through

content words• Proximity in semantic space ~

similarity between words

53

Page 54: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

Summary: vector space models

• Uses: – Search

– Inducing ontologies

– Modeling human judgments of word similarity

– Improve supervised word sense disambiguation

– Word-sense discrimination: cluster words based on vectors; the clusters may not correspond to any particular sense inventory

54

Page 55: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

55

SenseEval

• Standardized international “competition” on WSD.

• Organized by the Association for Computational Linguistics (ACL) Special Interest Group on the Lexicon (SIGLEX).– Senseval 1: 1998

– Senseval 2: 2001

– Senseval 3: 2004

– Senseval 4: 2007

Page 56: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

56

Senseval 1: 1998

• Datasets for– English– French – Italian

• Lexical sample in English– Noun: accident, behavior, bet, disability, excess, float, giant, knee,

onion, promise, rabbit, sack, scrap, shirt, steering– Verb: amaze, bet, bother, bury, calculate, consumer, derive, float,

invade, promise, sack, scrap, sieze– Adjective: brilliant, deaf, floating, generous, giant, modest, slight,

wooden– Indeterminate: band, bitter, hurdle, sanction, shake

• Total number of ambiguous English words tagged: 8,448

Page 57: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

57

Senseval 1 English Sense Inventory

• Senses from the HECTOR lexicography project.

• Multiple levels of granularity– Coarse grained (avg. 7.2 senses per word)– Fine grained (avg. 10.4 senses per word)

Page 58: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

58

Senseval Metrics

• Fixed training and test sets, same for each system.• System can decline to provide a sense tag for a

word if it is sufficiently uncertain.• Measured quantities:

– A: number of words assigned senses

– C: number of words assigned correct senses

– T: total number of test words

• Metrics:– Precision = C/A

– Recall = C/T

Page 59: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

59

Senseval 1 Overall English Results

Fine grained

precision (recall)

Course grained

precision (recall)

Human Lexicographer

Agreement

97% (96%) 97% (97%)

Most common

sense baseline

57% (50%) 63% (56%)

Best system 77% (77%) 81% (81%)

Page 60: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

60

Senseval 2: 2001

• More languages: Chinese, Danish, Dutch, Czech, Basque, Estonian, Italian, Korean, Spanish, Swedish, Japanese, English

• Includes an “all-words” task as well as lexical sample.

• Includes a “translation” task for Japanese, where senses correspond to distinct translations of a word into another language.

• 35 teams competed with over 90 systems entered.

Page 61: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

61

Senseval 2 Results

Page 62: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

62

Senseval 2 Results

Page 63: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

63

Senseval 2 Results

Page 64: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

64

Ensemble Models

• Systems that combine results from multiple approaches seem to work very well.

Training Data

System 1 System 2 System 3 . . . System n

Result 1 Result 2 Result 3 Result n

Combine Results(weighted voting)

Final Result

Page 65: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

65

Senseval 3: 2004

• Some new languages: English, Italian, Basque, Catalan, Chinese, Romanian

• Some new tasks– Subcategorization acquisition– Semantic role labelling– Logical form

Page 66: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

66

Senseval 3 English Lexical Sample

• Volunteers over the web used to annotate senses of 60 ambiguous nouns, adjectives, and verbs.

• Non expert lexicographers achieved only 62.8% inter-annotator agreement for fine senses.

• Best results again in the low 70% accuracy range.

Page 67: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

67

Senseval 3: English All Words Task

• 5,000 words from Wall Street Journal newspaper and Brown corpus (editorial, news, and fiction)

• 2,212 words tagged with WordNet senses.• Interannotator agreement of 72.5% for people with

advanced linguistics degrees.– Most disagreements on a smaller group of difficult

words. Only 38% of word types had any disagreement at all.

• Most-common sense baseline: 60.9% accuracy• Best results from competition: 65% accuracy

Page 68: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

68

Other Approaches to WSD

• Active learning

• Unsupervised sense clustering• Semi-supervised learning (Yarowsky 1995)

– Bootstrap from a small number of labeled examples to exploit unlabeled data

– Exploit “one sense per collocation” and “one sense per discourse” to create the labeled training data

Page 69: 11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin

69

Issues in WSD

• What is the right granularity of a sense inventory?• Integrating WSD with other NLP tasks

– Syntactic parsing– Semantic role labeling– Semantic parsing

• Does WSD actually improve performance on some real end-user task?– Information retrieval– Information extraction– Machine translation– Question answering– Sentiment Analysis