measuring similarity between contexts and concepts

11

Measuring Similarity Measuring Similarity Between Between

Concepts and ContextsConcepts and ContextsTed Pedersen Ted Pedersen

Department of Computer ScienceDepartment of Computer ScienceUniversity of Minnesota, DuluthUniversity of Minnesota, Duluthhttp://www.d.umn.edu/~tpedersehttp://www.d.umn.edu/~tpederse

22

The problems…The problems…

Recognize similar (or related) conceptsRecognize similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Recognize similar contextsRecognize similar contexts I bought some food at the store : I bought some food at the store :

I purchased something to eat at the marketI purchased something to eat at the market

33

Similarity and RelatednessSimilarity and Relatedness

Two concepts are similar if they are Two concepts are similar if they are connected by connected by is-a is-a relationships.relationships. A frog A frog is-a-kind-of is-a-kind-of amphibianamphibian An illness An illness is-a is-a heath_conditionheath_condition

Two concepts can be related many ways…Two concepts can be related many ways… A human A human has-a-part has-a-part liver liver Duluth Duluth receives-a-lot-of receives-a-lot-of snowsnow

……similarity is one way to be related similarity is one way to be related

44

The approaches…The approaches…

Measure conceptual similarity using a Measure conceptual similarity using a structured repository of knowledge structured repository of knowledge Lexical database WordNetLexical database WordNet

Measure contextual similarity using Measure contextual similarity using knowledge lean methods that are based knowledge lean methods that are based on co-occurrence information from large on co-occurrence information from large corporacorpora

55

Why measure conceptual similarity? Why measure conceptual similarity?

A word will take the sense that is most A word will take the sense that is most related to the surrounding contextrelated to the surrounding context I love I love JavaJava, especially the beaches and the , especially the beaches and the

weather. weather. I love I love JavaJava, especially the support for , especially the support for

concurrent programming.concurrent programming. I love I love javajava, especially first thing in the morning , especially first thing in the morning

with a bagel. with a bagel.

66

Word Sense DisambiguationWord Sense Disambiguation

……can be performed by finding the sense of a can be performed by finding the sense of a word most related to its neighborsword most related to its neighbors

Here, we define similarity and relatedness with Here, we define similarity and relatedness with respect to WordNetrespect to WordNet WordNet::Similarity WordNet::Similarity http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

WordNet::SenseRelateWordNet::SenseRelate AllWords – assign a sense to every content wordAllWords – assign a sense to every content word TargetWord – assign a sense to a given wordTargetWord – assign a sense to a given word http://senserelate.sourceforge.net http://senserelate.sourceforge.net

77

SenseRelateSenseRelate

For each sense of a target word in contextFor each sense of a target word in context For each content word in the contextFor each content word in the context

• For each sense of that content wordFor each sense of that content word Measure similarity/relatedness between sense of target Measure similarity/relatedness between sense of target

word and sense of content word with WordNet::Similarityword and sense of content word with WordNet::Similarity Keep running sum for score of each sense of targetKeep running sum for score of each sense of target

Pick sense of target word with highest Pick sense of target word with highest score with words in contextscore with words in context

88

WordNet::SimilarityWordNet::Similarity Path based measuresPath based measures

Shortest path (path)Shortest path (path) Wu & Palmer (wup)Wu & Palmer (wup) Leacock & Chodorow (lch)Leacock & Chodorow (lch) Hirst & St-Onge (hso)Hirst & St-Onge (hso)

Information content measuresInformation content measures Resnik (res)Resnik (res) Jiang & Conrath (jcn)Jiang & Conrath (jcn) Lin (lin)Lin (lin)

Gloss based measuresGloss based measures Banerjee and Pedersen (lesk)Banerjee and Pedersen (lesk) Patwardhan and Pedersen (vector, vector_pairs)Patwardhan and Pedersen (vector, vector_pairs)

99

watercraft

instrumentality

object

artifact

conveyance

vehicle

motor-vehicle

car boat

ark

article

ware

table-ware

cutlery

fork

from Jiang and Conrath [1997]

1010

Path FindingPath Finding

Find shortest is-a path between two concepts?Find shortest is-a path between two concepts? Rada, et. al. (1989)Rada, et. al. (1989) Scaled by depth of hierarchyScaled by depth of hierarchy

• Leacock & Chodorow (1998)Leacock & Chodorow (1998) Depth of subsuming concept scaled by sum of the Depth of subsuming concept scaled by sum of the

depths of individual concepts depths of individual concepts • Wu and Palmer (1994)Wu and Palmer (1994)

1111

watercraft

instrumentality

object

artifact

conveyance

vehicle

motor-vehicle

car boat

ark

article

ware

table-ware

cutlery

fork

1212

Information ContentInformation Content

Measure of specificity in is-a hierarchy (Resnik, 1995)Measure of specificity in is-a hierarchy (Resnik, 1995) -log (probability of concept)-log (probability of concept) High information content values mean very specific concepts High information content values mean very specific concepts

(like pitch-fork and basketball shoe)(like pitch-fork and basketball shoe)

Count how often a concept occurs in a corpusCount how often a concept occurs in a corpus Increment the count associated with that concept, and Increment the count associated with that concept, and

propagate the count up!propagate the count up! If based on word forms, increment all concepts associated If based on word forms, increment all concepts associated

with that formwith that form

1313

Observed “car”...Observed “car”...

motor vehicle (327 +1)

*root* (32783 + 1)

minicab (6)

cab (23)

car (73 +1) bus (17)

stock car (12)

1414

Observed “stock car”...Observed “stock car”...

motor vehicle (328+1)

*root* (32784+1)

minicab (6)

cab (23)

car (74+1) bus (17)

stock car (12+1)

1515

After Counting Concepts... After Counting Concepts...

motor vehicle (329) IC = 1.998

*root* (32785)

minicab (6)

cab (23)

car (75) bus (17)

stock car (13) IC = 3.042

1616

Similarity and Information ContentSimilarity and Information Content

Resnik (1995) use information content of least Resnik (1995) use information content of least common subsumer to express similarity between common subsumer to express similarity between two conceptstwo concepts

Lin (1998) scale information content of least Lin (1998) scale information content of least common subsumer with sum of information common subsumer with sum of information content of two conceptscontent of two concepts

Jiang & Conrath (1997) find difference between Jiang & Conrath (1997) find difference between least common subsumer’s information content least common subsumer’s information content and the sum of the two individual conceptsand the sum of the two individual concepts

1717

Why doesn’t this Why doesn’t this solve problem?solve problem?

Concepts must be organized in a Concepts must be organized in a hierarchy, and connected in that hierarchyhierarchy, and connected in that hierarchy Limited to comparing nouns with nouns, or Limited to comparing nouns with nouns, or

maybe verbs with verbsmaybe verbs with verbs Limited to similarity measures (is-a)Limited to similarity measures (is-a)

What about mixed parts of speech?What about mixed parts of speech? Murder (noun) and horrible (adjective)Murder (noun) and horrible (adjective) Tobacco (noun) and drinking (verb)Tobacco (noun) and drinking (verb)

1818

Using Dictionary Glosses Using Dictionary Glosses to Measure Relatednessto Measure Relatedness

Lesk (1985) Algorithm – measure relatedness of two Lesk (1985) Algorithm – measure relatedness of two concepts by counting the number of shared words in their concepts by counting the number of shared words in their definitionsdefinitions

Cold - a mild Cold - a mild viral viral infection involving the nose and respiratory passages (but infection involving the nose and respiratory passages (but not the lungs)not the lungs)

Flu - an acute febrile highly contagious Flu - an acute febrile highly contagious viral viral diseasedisease Adapted Lesk (Banerjee & Pedersen, 2003) – exapand Adapted Lesk (Banerjee & Pedersen, 2003) – exapand

g;losses to include those concepts directly relatedg;losses to include those concepts directly related Cold - a common cold affecting the nasal passages and resulting in Cold - a common cold affecting the nasal passages and resulting in

congestion and sneezing and headache; mild congestion and sneezing and headache; mild viralviral infection involving the nose infection involving the nose and and respiratoryrespiratory passages (but not the lungs); a passages (but not the lungs); a disease disease affecting the affecting the respiratoryrespiratory system system

Flu - an acute and highly contagious Flu - an acute and highly contagious respiratoryrespiratory diseasedisease of swine caused by of swine caused by the orthomyxovirus thought to be the same virus that caused the 1918 the orthomyxovirus thought to be the same virus that caused the 1918 influenza pandemic; an acute febrile highly contagious influenza pandemic; an acute febrile highly contagious viral viral disease; a disease; a disease disease that can be communicated from one person to anotherthat can be communicated from one person to another

1919

Context/Gloss VectorsContext/Gloss Vectors

Leskian approaches require exact matches in glossesLeskian approaches require exact matches in glosses Glosses are short, may use related but not identical Glosses are short, may use related but not identical

wordswords Solution? Expand glosses by replacing each content word Solution? Expand glosses by replacing each content word

with a co-occurrence vector derived from corporawith a co-occurrence vector derived from corpora Rows are words found in glosses, columns represent Rows are words found in glosses, columns represent

their co-occurring words in a corpus, cell values are their their co-occurring words in a corpus, cell values are their log-likelihoodlog-likelihood

Average the word vectors to create a single vector that Average the word vectors to create a single vector that represents the gloss/sense represents the gloss/sense Patwardhan & Pedersen, 2003Patwardhan & Pedersen, 2003

Measure relatedness using cosine rather than exact match!Measure relatedness using cosine rather than exact match!

2020

Gloss/Context VectorsGloss/Context Vectors

2121

ExperimentExperiment

Senseval-2 data consists of 73 nouns, verbs, Senseval-2 data consists of 73 nouns, verbs, and adjectives, approximately 8,600 “training” and adjectives, approximately 8,600 “training” examples and 4,300 “test” examples. examples and 4,300 “test” examples. Best supervised system 64%Best supervised system 64% SenseRelate 53% (lesk, vector)SenseRelate 53% (lesk, vector) Most frequent sense 48%Most frequent sense 48%

2222

ResultsResults

SenseRelate achieves disambiguation SenseRelate achieves disambiguation accuracy better than most frequent sense!!accuracy better than most frequent sense!! This is more unusual than you would think.This is more unusual than you would think.

Window of context is defined by position, Window of context is defined by position, includes 2 content words to both the left includes 2 content words to both the left and right which are measured against the and right which are measured against the word being disambiguated. word being disambiguated. Positional proximity is not always associated Positional proximity is not always associated

with semantic similarity.with semantic similarity.

2323

Why this doesn’tWhy this doesn’t solve the problem.. solve the problem..

WordNetWordNet Nouns – 80,000 conceptsNouns – 80,000 concepts Verbs – 13,000 conceptsVerbs – 13,000 concepts Adjectives – 18,000 conceptsAdjectives – 18,000 concepts Adverbs – 4,000 conceptsAdverbs – 4,000 concepts

Words not found in WordNet can’t be Words not found in WordNet can’t be disambiguated by SenseRelatedisambiguated by SenseRelate

2424

Knowledge Lean MethodsKnowledge Lean Methods

Can measure similarity between two Can measure similarity between two words by comparing co-occurrence words by comparing co-occurrence vectors created for each.vectors created for each.

Can measure similarity of two contexts by Can measure similarity of two contexts by representing them as 2representing them as 2ndnd order co- order co-occurrence vectors and comparing. occurrence vectors and comparing.

2525

Word Sense DiscriminationWord Sense Discrimination

Cluster different senses of words like Cluster different senses of words like line line or or interestinterest based on contextual similarity. based on contextual similarity. Pedersen & Bruce, 1997Pedersen & Bruce, 1997 Schutze, 1998Schutze, 1998 Purandare & Pedersen, 2004Purandare & Pedersen, 2004

Hard to evaluate, senses of words are Hard to evaluate, senses of words are somewhat ill defined, distinctions made by somewhat ill defined, distinctions made by clustering methods may or may not correspond clustering methods may or may not correspond with human intuitionswith human intuitions

http://http://senseclusters.sourceforge.netsenseclusters.sourceforge.net

2626

Name DiscriminationName Discrimination

Names that occur in similar contexts may Names that occur in similar contexts may refer to the same person.refer to the same person. George MillerGeorge Miller is an eminent psychologist. is an eminent psychologist. George MillerGeorge Miller is one of the founders of is one of the founders of

modern cognitive science. modern cognitive science. George MillerGeorge Miller is a member of the US House of is a member of the US House of

Representatives. Representatives.

3131

ObjectiveObjective

Given some number of contexts containing Given some number of contexts containing “John Smith”, identify those that are similar “John Smith”, identify those that are similar to each otherto each other

Group similar contexts together, assume Group similar contexts together, assume they are associated with single individualthey are associated with single individual

Generate an identifying label from the Generate an identifying label from the content of the different clusterscontent of the different clusters

3232

Similarity of Context? Similarity of Context? Second order Co-occurrencesSecond order Co-occurrences

He drives his car fast / Jim speeds in his autoHe drives his car fast / Jim speeds in his auto

Car -> motor, garage, gasoline, insuranceCar -> motor, garage, gasoline, insurance Auto -> motor, insurance, gasoline, accidentAuto -> motor, insurance, gasoline, accident

Car and Auto occur with many of the same words. They Car and Auto occur with many of the same words. They are therefore similar! are therefore similar!

Less direct relationship, more resistant to sparsity!Less direct relationship, more resistant to sparsity!

3333

Feature SelectionFeature Selection

Bigrams – two word sequences that Bigrams – two word sequences that may have one intervening word may have one intervening word between thembetween them Frequency > 1Frequency > 1 Log-likelihood ratio > 3.841Log-likelihood ratio > 3.841 OR stop listOR stop list

Must occur within Ft positions of target, Must occur within Ft positions of target, Ft typically set to 5 or 20 Ft typically set to 5 or 20

3434

Second Order Context Second Order Context RepresentationRepresentation

Bigrams used to create matrixBigrams used to create matrix Cell values = log-likelihood of word pairCell values = log-likelihood of word pair

Rows are co-occurrence vector for a wordRows are co-occurrence vector for a word Represent context by averaging vectors of Represent context by averaging vectors of

words in that contextwords in that context Context includes the Cxt positions around the Context includes the Cxt positions around the

target, where Cxt is typically 5 or 20.target, where Cxt is typically 5 or 20.

3535

22ndnd Order Context Vectors Order Context Vectors

He won an Oscar, but He won an Oscar, but Tom HanksTom Hanks is still a nice guy. is still a nice guy.

06272.852.913362.608420.0321176.8451.021O2contex

t

018818.55

000205.5469

134.5102

guy

000136.0441

29.57600Oscar

008.739951.781230.5203324.9818.5533won

needlefamilywarmovieactorfootballbaseball

3636

Limitations of 2Limitations of 2ndnd order order

052.2700.9204.210

28.7203.2401.2802.53

Weapon

Missile

ShootFireDestroy

Murder

Kill

17.77014.646.222.1034.2

19.232.36072.701.28

2.56

ExecuteCommandBomb

PipeFireCDBurn

3737

Singular Value DecompositionSingular Value Decomposition

What it does (for sure):What it does (for sure): Smoothes out zeroesSmoothes out zeroes Finds Principal ComponentsFinds Principal Components

What it might do: What it might do: Capture PolysemyCapture Polysemy Word Space to Semantic SpaceWord Space to Semantic Space

3838

After context representation…After context representation…

Second order vector is an average of word Second order vector is an average of word vectors that make up context, captures vectors that make up context, captures indirect relationshipsindirect relationships Reduced by SVD to principal componentsReduced by SVD to principal components

Now, cluster the vectors!Now, cluster the vectors! We use the method of repeated bisectionsWe use the method of repeated bisections CLUTOCLUTO

3939

Evaluation Evaluation (before mapping)(before mapping)

21152C46112C31711C223010C1

4040

Evaluation Evaluation (after mapping)(after mapping)

2015212C4

17

1

1

0

55111215

10612C3

10171C2

152310C1

4141

Majority Sense ClassifierMajority Sense Classifier

4242

Experimental DataExperimental Data

Created from AFE GigaWord corpusCreated from AFE GigaWord corpus 170,969,00 words170,969,00 words May 1994-May 1997May 1994-May 1997 December 2001-June 2002December 2001-June 2002 Created name conflated pseudo wordsCreated name conflated pseudo words

25 words to left and right of target25 words to left and right of target

4343

Name Conflated DataName Conflated Data

51.4%231,069

JapAnce

112,357France118,712Japan

53.9%46,431JorGypt21,762Egyptian25,539Jordan

56.0%13,734MonSlo6,176SlobodanMilosovic

7,846Shimon Peres

58.6%5,807MSIIBM2,406IBM3,401Microsoft

73.7%4,073JikRol1,071Rolf Ekeus

3,002Tajik

69.3%2,452RoBeck740David Beckham

1,652Ronaldo

Maj.TotalNewCountNameCountName

4444

Cxt 5Cxt 5 Cxt 20Cxt 20

# # Maj.Maj. Ft 5Ft 5 Ft 20Ft 20 Ft 5Ft 5 Ft 20Ft 20

RobeckRobeck 2,4522,452 69.369.3 57.357.3 72.772.7 85.985.9 54.754.7

JikRolJikRol 4,0734,073 73.773.7 94.794.7 96.296.2 91.091.0 90.490.4

MSIIBMMSIIBM 5,8075,807 58.658.6 47.747.7 51.351.3 68.068.0 60.060.0

MonSLoMonSLo 13,73413,734 56.056.0 62.862.8 96.696.6 54.654.6 91.491.4

JorGyptJorGypt 46,43146,431 53.953.9 56.656.6 59.159.1 57.057.0 53.053.0

JapAnceJapAnce 231,069231,069 51.451.4 51.151.1 51.151.1 50.350.3 50.350.3

4545

ConclusionsConclusions Tradeoff between size of context and feature Tradeoff between size of context and feature

selection space selection space Context small – Feature large : narrow window Context small – Feature large : narrow window

around target word where many possible around target word where many possible features representedfeatures represented

Context large – Feature small : large window Context large – Feature small : large window around target word where a selective set of around target word where a selective set of features representedfeatures represented

SVD didn’t help/hurtSVD didn’t help/hurt Results shown are without SVDResults shown are without SVD

4646

Ongoing workOngoing work

Creating Path Finding Measures of Creating Path Finding Measures of RelatednessRelatedness

Stopping Clustering AutomaticallyStopping Clustering Automatically Cluster labelingCluster labeling

……Bring together finding conceptual Bring together finding conceptual similarity and contextual similarity similarity and contextual similarity

4747

Thanks to…Thanks to… WordNet::Similarity and SenseRelateWordNet::Similarity and SenseRelate

http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net http://http://senserelate.sourceforge.netsenserelate.sourceforge.net

Siddharth Patwardhan Siddharth Patwardhan Satanjeev BanerjeeSatanjeev Banerjee Jason MichelizziJason Michelizzi

SenseClusters SenseClusters http://http://senseclusters.sourceforge.netsenseclusters.sourceforge.net

Anagha KulkarniAnagha Kulkarni Amruta PurandareAmruta Purandare

measuring similarity between contexts and concepts

Education

sense of content word

similarity path

similarity measures

concepts lin

sum of information content

content word targetword

information content

similarity http