the semantic quilt

The Semantic Quilt: The Semantic Quilt: Contexts, Co-occurrences, Contexts, Co-occurrences, Kernels, and Ontologies Kernels, and Ontologies

Ted PedersenTed PedersenUniversity of Minnesota, DuluthUniversity of Minnesota, Duluth

http://http://www.d.umn.edu/~tpedersewww.d.umn.edu/~tpederse

Create by stitching togetherCreate by stitching together

Sew together different materialsSew together different materials

Ont

olog

ies

Co-Occurrences

Kernels

Contexts

Semantics in NLPSemantics in NLP Potentially useful for many applicationsPotentially useful for many applications

Machine TranslationMachine Translation Document or Story UnderstandingDocument or Story Understanding Text GenerationText Generation Web SearchWeb Search ……

Can come from many sourcesCan come from many sources Not well integratedNot well integrated Not well defined?Not well defined?

What do we mean by What do we mean by semanticssemantics??…it depends on our resources……it depends on our resources…

Ontologies – relationships among conceptsOntologies – relationships among concepts Similar / related concepts connectedSimilar / related concepts connected

Dictionary – definitions of senses / conceptsDictionary – definitions of senses / concepts similar / related senses have similar / related similar / related senses have similar / related

definitionsdefinitions Contexts – short passages of words Contexts – short passages of words

similar / related words occur in similar / related similar / related words occur in similar / related contextscontexts

Co-occurrences – Co-occurrences – a a word word is defined by the company it keeps is defined by the company it keeps words that occur with the same kinds words are words that occur with the same kinds words are

similar / relatedsimilar / related

What level of granularity?What level of granularity?

wordswords terms / collocationsterms / collocations phrasesphrases sentencessentences paragraphsparagraphs documentsdocuments booksbooks

The Terrible Tension :The Terrible Tension :Ambiguity versus GranularityAmbiguity versus Granularity

Words are potentially very ambiguousWords are potentially very ambiguous But we can list them (sort of)But we can list them (sort of) ……we can define their meanings (sort of)we can define their meanings (sort of) ……not ambiguous to human reader, but hard for a not ambiguous to human reader, but hard for a

computer to know which meaning is intendedcomputer to know which meaning is intended Terms / collocations are less ambiguousTerms / collocations are less ambiguous

Difficult to enumerate because there are so many, but Difficult to enumerate because there are so many, but can be done for a domain (e.g., medicine)can be done for a domain (e.g., medicine)

Phrases (short contexts) can still be ambiguous, Phrases (short contexts) can still be ambiguous, but not to the same degree as words or but not to the same degree as words or terms/collocationsterms/collocations

The Current State of AffairsThe Current State of Affairs Most resources and methods focus on word or term Most resources and methods focus on word or term

semantics semantics makes it possible to build resources (manually or makes it possible to build resources (manually or

automatically) with reasonable coverage, but …automatically) with reasonable coverage, but … … … techniques become very resource dependenttechniques become very resource dependent … … resources become language dependentresources become language dependent … … introduces a lot of ambiguityintroduces a lot of ambiguity … … not clear how to bring together resourcesnot clear how to bring together resources

Similarity is a useful organizing principle, but …Similarity is a useful organizing principle, but … ……there are lots of ways to be similarthere are lots of ways to be similar

Similarity as Organizing PrincipleSimilarity as Organizing Principle

Measure word association using knowledge lean Measure word association using knowledge lean methods that are based on co-occurrence methods that are based on co-occurrence information from large corporainformation from large corpora

Measure contextual similarity using knowledge Measure contextual similarity using knowledge lean methods that are based on co-occurrence lean methods that are based on co-occurrence information from large corporainformation from large corpora

Measure conceptual similarity / relatedness Measure conceptual similarity / relatedness using a structured repository of knowledge using a structured repository of knowledge Lexical database WordNetLexical database WordNet Unified Medical Language System (UMLS)Unified Medical Language System (UMLS)

Things we can do now…Things we can do now… Identify associated wordsIdentify associated words

fine winefine wine baseball batbaseball bat

Identify similar contextsIdentify similar contexts I bought some food at the storeI bought some food at the store I purchased something to eat at the marketI purchased something to eat at the market

Assign meanings to wordsAssign meanings to words I went to the bankI went to the bank/[financial-inst.]/[financial-inst.] to deposit my check to deposit my check

Identify similar (or related) conceptsIdentify similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Things we want to do…Things we want to do…

Integrate different resources and methodsIntegrate different resources and methods Solve bigger problemsSolve bigger problems

some of what we do now is a means to an unclear some of what we do now is a means to an unclear endend

Be Language Independent Be Language Independent Offer Broad CoverageOffer Broad Coverage Reduce dependence on manually built Reduce dependence on manually built

resources resources ontologies, dictionaries, labeled training data…ontologies, dictionaries, labeled training data…

Semantic Patches to Sew TogetherSemantic Patches to Sew Together ContextsContexts

SenseClusters : measures similarity between written texts (i.e., SenseClusters : measures similarity between written texts (i.e., contexts)contexts)

Co-OccurrencesCo-Occurrences Ngram Statistics Package : measures association between Ngram Statistics Package : measures association between

words, identify collocations or termswords, identify collocations or terms KernelsKernels

WSD-Shell : supervised learning for word sense disambiguation, WSD-Shell : supervised learning for word sense disambiguation, in process of including SVMs with user defined kernelsin process of including SVMs with user defined kernels

““Ontologies”Ontologies” WordNet-Similarity : measures similarity between concepts WordNet-Similarity : measures similarity between concepts

found in WordNetfound in WordNet UMLS-Similarity UMLS-Similarity

All of these are projects at the University of Minnesota, DuluthAll of these are projects at the University of Minnesota, Duluth

Ont

olog

ies

Co-Occurrences

Kernels

Contexts

Ngram Statistics PackageNgram Statistics Package

http://http://ngram.sourceforge.netngram.sourceforge.net

Co-Occurrences

Things we can do now…Things we can do now…

Identify associated wordsIdentify associated words fine winefine wine baseball batbaseball bat


Assign meanings to wordsAssign meanings to words I went to the bank/[financial-inst.] to deposit my checkI went to the bank/[financial-inst.] to deposit my check


Co-occurrences and semantics?Co-occurrences and semantics?

individual words (esp. common ones) are individual words (esp. common ones) are very ambiguousvery ambiguous bat bat lineline

pairs of words disambiguate each otherpairs of words disambiguate each other baseball batbaseball bat vampire … Transylvaniavampire … Transylvania product lineproduct line speech …. line speech …. line

Why pairs of words?Why pairs of words?

Zipf's LawZipf's Law most words are rare, most bigrams are even most words are rare, most bigrams are even

more rare, most ngrams are even rarer stillmore rare, most ngrams are even rarer still the more common a word, the more senses it the more common a word, the more senses it

will havewill have ““Co-occurrences” are less frequent than Co-occurrences” are less frequent than

individual words, tend to be less individual words, tend to be less ambiguous as a resultambiguous as a result Mutually disambiguating Mutually disambiguating

BigramsBigrams

Window Size of 2Window Size of 2 baseball bat, fine wine, apple orchard, bill clintonbaseball bat, fine wine, apple orchard, bill clinton

Window Size of 3Window Size of 3 house house ofof representatives, bottle representatives, bottle ofof wine, wine,

Window Size of 4Window Size of 4 president president of theof the republic, whispering republic, whispering in thein the wind wind

Selected using a small window size (2-4 words)Selected using a small window size (2-4 words) Objective is to capture a regular or localized Objective is to capture a regular or localized

pattern between two words (collocation?)pattern between two words (collocation?) If order doesn’t matter, then these are co-If order doesn’t matter, then these are co-

occurrences … occurrences …

““occur together more often than occur together more often than expected by chance…”expected by chance…”

Observed frequencies for two words occurring Observed frequencies for two words occurring together and alone are stored in a 2x2 matrixtogether and alone are stored in a 2x2 matrix

Expected values are calculated, based on the Expected values are calculated, based on the model of independence and observed valuesmodel of independence and observed values How often would you expect these words to occur How often would you expect these words to occur

together, if they only occurred together by chance?together, if they only occurred together by chance? If two words occur “significantly” more often than the If two words occur “significantly” more often than the

expected value, then the words do not occur together expected value, then the words do not occur together by chance.by chance.

Measures and Tests of Association Measures and Tests of Association http://http://ngram.sourceforge.netngram.sourceforge.net

Log-likelihood RatioLog-likelihood Ratio Mutual Information Mutual Information Pointwise Mutual Pointwise Mutual

Information Information Pearson’s Chi-squared Pearson’s Chi-squared

TestTest

Phi coefficient Phi coefficient Fisher’s Exact Test Fisher’s Exact Test T-test T-test Dice CoefficientDice Coefficient Odds RatioOdds Ratio

What do we get at the end?What do we get at the end?

A list of bigrams or co-occurrences that are A list of bigrams or co-occurrences that are significant or interesting (meaningful?)significant or interesting (meaningful?) automaticautomatic language independentlanguage independent

These can be used as building blocks for These can be used as building blocks for systems that do semantic processingsystems that do semantic processing relatively unambiguousrelatively unambiguous often very informative about topic or domainoften very informative about topic or domain can serve as a fingerprint for a document or bookcan serve as a fingerprint for a document or book

Ont

olog

ies

Co-Occurrences

Kernels

Contexts

SenseClustersSenseClusters

http://http://senseclusters.sourceforge.netsenseclusters.sourceforge.net

Contexts

Identify Similar ContextsIdentify Similar Contexts Find phrases that say the same thing using Find phrases that say the same thing using

different wordsdifferent words I went to the storeI went to the store Ted drove to Wal-MartTed drove to Wal-Mart

Find words that have the same meaning in Find words that have the same meaning in different contextsdifferent contexts The The lineline is moving pretty fast is moving pretty fast I stood in I stood in lineline for 12 hours for 12 hours

Find different words that have the same Find different words that have the same meaning in different contextsmeaning in different contexts The The lineline is moving pretty fast is moving pretty fast I stood in the I stood in the queuequeue for 12 hours for 12 hours

SenseClusters MethodologySenseClusters Methodology

Represent contexts using first or second Represent contexts using first or second order co-occurrences order co-occurrences

Reduce dimensionality of vectorsReduce dimensionality of vectors Singular value decompositionSingular value decomposition

Cluster the context vectorsCluster the context vectors Find the number of clustersFind the number of clusters Label the clustersLabel the clusters

Evaluate and/or use the contexts!Evaluate and/or use the contexts!

Second Order FeaturesSecond Order Features

Second order features encode something Second order features encode something ‘extra’ about a feature that occurs in a ‘extra’ about a feature that occurs in a context, something not available in the context, something not available in the context itselfcontext itself Native SenseClusters : each feature is Native SenseClusters : each feature is

represented by a vector of the words with which represented by a vector of the words with which it occurs it occurs

Latent Semantic Analysis : each feature is Latent Semantic Analysis : each feature is represented by a vector of the contexts in which represented by a vector of the contexts in which it occurs it occurs

2929

Similar ContextsSimilar Contextsmay have the same meaning…may have the same meaning…

Context 1: He drives his car fast Context 1: He drives his car fast Context 2: Jim speeds in his autoContext 2: Jim speeds in his auto

Car -> motor, garage, gasoline, insuranceCar -> motor, garage, gasoline, insurance Auto -> motor, insurance, gasoline, accidentAuto -> motor, insurance, gasoline, accident

Car and Auto share many co-occurrences… Car and Auto share many co-occurrences…

3030

Second Order Context Second Order Context RepresentationRepresentation

Bigrams used to create a word matrixBigrams used to create a word matrix Cell values = log-likelihood of word pairCell values = log-likelihood of word pair

Rows are first order co-occurrence vector Rows are first order co-occurrence vector for a wordfor a word

Represent context by averaging vectors of Represent context by averaging vectors of words in that contextwords in that context Context includes the Cxt positions around the Context includes the Cxt positions around the

target, where Cxt is typically 5 or 20.target, where Cxt is typically 5 or 20.

3131

22ndnd Order Context Vectors Order Context Vectors

He won an Oscar, but He won an Oscar, but Tom HanksTom Hanks is still a nice guy. is still a nice guy.

06272.852.913362.608420.0321176.8451.021O2contex

t

018818.55

000205.5469

134.5102

guy

000136.0441

29.57600Oscar

008.739951.781230.5203324.9818.5533won

needlefamilywarmovieactorfootballbaseball

3232

After context representation…After context representation…

Second order vector is an average of word Second order vector is an average of word vectors that make up context, captures vectors that make up context, captures indirect relationshipsindirect relationships Reduced by SVD to principal componentsReduced by SVD to principal components

Now, cluster the vectors!Now, cluster the vectors! Many methods, we often use k-means or Many methods, we often use k-means or

repeated bisectionsrepeated bisections CLUTOCLUTO


contexts organized into some number of contexts organized into some number of clusters based on the clusters based on the similarity similarity of their co-of their co-occurrencesoccurrences

contexts which share words that tend to contexts which share words that tend to co-occur with the same other words are co-occur with the same other words are clustered togetherclustered together 22ndnd order co-occurrences order co-occurrences

O

ntol

ogie

s W

ordN

et-S

imila

rity

Co-Occurrences

Ngram Statistics Package

Kernels

WSD-Shell

Contexts

SenseClusters

Oh…we also get plenty of these…Oh…we also get plenty of these…

Similarity Matrices…Similarity Matrices… Word by Word Word by Word Ngram by Ngram Ngram by Ngram Word by ContextWord by Context Ngram by ContextNgram by Context Context by WordContext by Word Context by NgramContext by Ngram Context by Context Context by Context

The WSD-ShellThe WSD-Shell

http://http://www.d.umn.edu/~tpederse/supervised.htmlwww.d.umn.edu/~tpederse/supervised.html

Kernels




Assign meanings to wordsAssign meanings to words I went to the bank/[financial-inst.] to I went to the bank/[financial-inst.] to

deposit my checkdeposit my check Identify similar (or related) conceptsIdentify similar (or related) concepts

frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Machine Learning ApproachMachine Learning Approach

Annotate text with sense tagsAnnotate text with sense tags must select sense inventorymust select sense inventory

Find interesting featuresFind interesting features bigrams and co-occurrences quite effectivebigrams and co-occurrences quite effective

Learn a modelLearn a model Apply model to untagged dataApply model to untagged data Works very well…given sufficient Works very well…given sufficient

quantities of training data and sufficient quantities of training data and sufficient coverage of your sense inventorycoverage of your sense inventory

Kernel MethodsKernel Methods

The challenge for any learning algorithm is The challenge for any learning algorithm is to separate the training data into groups to separate the training data into groups by finding a boundary (hyperplane)by finding a boundary (hyperplane)

Sometimes in the original space this Sometimes in the original space this boundary is hard to findboundary is hard to find

Transform data via kernel function to a Transform data via kernel function to a different higher dimensional different higher dimensional representation, where boundaries are representation, where boundaries are easier to spoteasier to spot

Kernels are similarity matricesKernels are similarity matrices

NSP NSP produces word by word similarity produces word by word similarity matrices, for use by SenseClustersmatrices, for use by SenseClusters

SenseClustersSenseClusters produces various sorts of produces various sorts of similarity matrices based on co-similarity matrices based on co-occurrencesoccurrences

……which can be used as kernelswhich can be used as kernels Latent Semantic kernelLatent Semantic kernel Bigram Association kernelBigram Association kernel Co-occurrence Association kernelCo-occurrence Association kernel


More accurate supervised classifiers that More accurate supervised classifiers that potentially require less training datapotentially require less training data

Kernel improves ability to find boundaries Kernel improves ability to find boundaries between training examples by between training examples by transforming feature space to a higher transforming feature space to a higher dimensional “cleaner” space…dimensional “cleaner” space…

O

ntol

ogie

s W

ordN

et-S

imila

rity

Co-Occurrences


Kernels

WSD-Shell

Contexts

SenseClusters

WordNet-SimilarityWordNet-Similarity

http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

Ont

olog

ies

Similarity and RelatednessSimilarity and Relatedness

Two concepts are similar if they are Two concepts are similar if they are connected by connected by is-a is-a relationships.relationships. A frog A frog is-a-kind-of is-a-kind-of amphibianamphibian An illness An illness is-a is-a heath_conditionheath_condition

Two concepts can be related many ways…Two concepts can be related many ways… A human A human has-a-part has-a-part liver liver Duluth Duluth receives-a-lot-of receives-a-lot-of snowsnow

……similarity is one way to be related similarity is one way to be related

WordNet-SimilarityWordNet-Similarityhttp://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

Path based measuresPath based measures Shortest path (path)Shortest path (path) Wu & Palmer (wup)Wu & Palmer (wup) Leacock & Chodorow (lch)Leacock & Chodorow (lch) Hirst & St-Onge (hso)Hirst & St-Onge (hso)

Information content measuresInformation content measures Resnik (res)Resnik (res) Jiang & Conrath (jcn)Jiang & Conrath (jcn) Lin (lin)Lin (lin)

Gloss based measuresGloss based measures Banerjee and Pedersen (lesk)Banerjee and Pedersen (lesk) Patwardhan and Pedersen (vector, vector_pairs)Patwardhan and Pedersen (vector, vector_pairs)

Path FindingPath Finding

Find shortest is-a path between two concepts?Find shortest is-a path between two concepts? Rada, et. al. (1989)Rada, et. al. (1989) Scaled by depth of hierarchyScaled by depth of hierarchy

• Leacock & Chodorow (1998)Leacock & Chodorow (1998) Depth of subsuming concept scaled by sum of the Depth of subsuming concept scaled by sum of the

depths of individual concepts depths of individual concepts • Wu and Palmer (1994)Wu and Palmer (1994)

watercraft

instrumentality

object

artifact

conveyance

vehicle

motor-vehicle

car boat

ark

article

ware

table-ware

cutlery

fork

from Jiang and Conrath [1997]

Information ContentInformation Content

Measure of specificity in is-a hierarchy (Resnik, 1995)Measure of specificity in is-a hierarchy (Resnik, 1995) -log (probability of concept)-log (probability of concept) High information content values mean very specific concepts High information content values mean very specific concepts

(like pitch-fork and basketball shoe)(like pitch-fork and basketball shoe)

Count how often a concept occurs in a corpusCount how often a concept occurs in a corpus Increment the count associated with that concept, and Increment the count associated with that concept, and

propagate the count up!propagate the count up! If based on word forms, increment all concepts associated If based on word forms, increment all concepts associated

with that formwith that form

Observed “car”...Observed “car”...

motor vehicle (327 +1)

*root* (32783 + 1)

minicab (6)

cab (23)

car (73 +1) bus (17)

stock car (12)

Observed “stock car”...Observed “stock car”...

motor vehicle (328+1)

*root* (32784+1)

minicab (6)

cab (23)

car (74+1) bus (17)

stock car (12+1)

After Counting Concepts... After Counting Concepts...

motor vehicle (329) IC = 1.9

*root* (32785)

minicab (6)

cab (23)

car (75) bus (17) IC = 3.5

stock car (13) IC = 3.1

Similarity and Information ContentSimilarity and Information Content

Resnik (1995) use information content of least Resnik (1995) use information content of least common subsumer to express similarity between common subsumer to express similarity between two conceptstwo concepts

Lin (1998) scale information content of least Lin (1998) scale information content of least common subsumer with sum of information common subsumer with sum of information content of two conceptscontent of two concepts

Jiang & Conrath (1997) find difference between Jiang & Conrath (1997) find difference between least common subsumer’s information content least common subsumer’s information content and the sum of the two individual conceptsand the sum of the two individual concepts


Similarity (or relatedness) scores between Similarity (or relatedness) scores between pairs of words / concepts that are based pairs of words / concepts that are based on path lengths, but augmented with on path lengths, but augmented with distributional information from corporadistributional information from corpora

Can create a similarity matrix between Can create a similarity matrix between concepts based on these scoresconcepts based on these scores

Ont

olog

ies

Wor

dNet

-Sim

ilarit

y

Co-Occurrences


Kernels

WSD-Shell

Contexts

SenseClusters

Using Dictionary Glosses Using Dictionary Glosses to Measure Relatednessto Measure Relatedness

Lesk (1985) Algorithm – measure relatedness of two Lesk (1985) Algorithm – measure relatedness of two concepts by counting the number of shared words in their concepts by counting the number of shared words in their definitionsdefinitions

Cold - a mild Cold - a mild viral viral infection involving the nose and respiratory passages (but infection involving the nose and respiratory passages (but not the lungs)not the lungs)

Flu - an acute febrile highly contagious Flu - an acute febrile highly contagious viral viral diseasedisease Adapted Lesk (Banerjee & Pedersen, 2003) – expand Adapted Lesk (Banerjee & Pedersen, 2003) – expand

glosses to include those concepts directly relatedglosses to include those concepts directly related Cold - a common cold affecting the nasal passages and resulting in Cold - a common cold affecting the nasal passages and resulting in

congestion and sneezing and headache; mild congestion and sneezing and headache; mild viralviral infection involving the nose infection involving the nose and and respiratoryrespiratory passages (but not the lungs); a passages (but not the lungs); a disease disease affecting the affecting the respiratoryrespiratory system system

Flu - an acute and highly contagious Flu - an acute and highly contagious respiratoryrespiratory diseasedisease of swine caused by of swine caused by the orthomyxovirus thought to be the same virus that caused the 1918 the orthomyxovirus thought to be the same virus that caused the 1918 influenza pandemic; an acute febrile highly contagious influenza pandemic; an acute febrile highly contagious viral viral disease; a disease; a disease disease that can be communicated from one person to anotherthat can be communicated from one person to another

Gloss VectorsGloss Vectors

Leskian approaches require exact matches in glossesLeskian approaches require exact matches in glosses Glosses are short, use related but not identical wordsGlosses are short, use related but not identical words

Solution? Expand glosses by replacing each content word Solution? Expand glosses by replacing each content word with a co-occurrence vector derived from corporawith a co-occurrence vector derived from corpora Rows are words in glosses, columns are the co-Rows are words in glosses, columns are the co-

occurring words in a corpus, cell values are their log-occurring words in a corpus, cell values are their log-likelihood ratioslikelihood ratios

Average the word vectors to create a single vector that Average the word vectors to create a single vector that represents the gloss/sense (Patwardhan & Pedersen, 2003)represents the gloss/sense (Patwardhan & Pedersen, 2003) 22ndnd order co-occurrences order co-occurrences

Measure relatedness using cosine rather than exact match!Measure relatedness using cosine rather than exact match! Methodology the same as that used in SenseClustersMethodology the same as that used in SenseClusters


Relatedness scores between pairs of Relatedness scores between pairs of words / concepts that are based on words / concepts that are based on content of WordNet (viewing it more like content of WordNet (viewing it more like MRD than ontology)MRD than ontology)

Can create a “relatedness” matrix between Can create a “relatedness” matrix between concepts based on these scoresconcepts based on these scores

Why measure conceptual similarity? Why measure conceptual similarity?

A word will take the sense that is most A word will take the sense that is most related to the surrounding contextrelated to the surrounding context I love I love JavaJava, especially the beaches and the , especially the beaches and the

weather. weather. I love I love JavaJava, especially the support for , especially the support for

concurrent programming.concurrent programming. I love I love javajava, especially first thing in the morning , especially first thing in the morning

with a bagel. with a bagel.

Word Sense DisambiguationWord Sense Disambiguation

……can be performed by finding the sense of a can be performed by finding the sense of a word most related to its neighborsword most related to its neighbors

Here, we define similarity and relatedness Here, we define similarity and relatedness with respect to WordNet-Similaritywith respect to WordNet-Similarity

WordNet-SenseRelateWordNet-SenseRelate AllWords – assign a sense to every content wordAllWords – assign a sense to every content word TargetWord – assign a sense to a given wordTargetWord – assign a sense to a given word

• http://http://senserelate.sourceforge.netsenserelate.sourceforge.net

WordNet-SenseRelateWordNet-SenseRelatehttp://http://senserelate.sourceforge.netsenserelate.sourceforge.net

SenseRelate

SenseRelate AlgorithmSenseRelate Algorithm

For each sense of a target word in contextFor each sense of a target word in context For each content word in the contextFor each content word in the context

• For each sense of that content wordFor each sense of that content word Measure similarity/relatedness between sense of target Measure similarity/relatedness between sense of target

word and sense of content word with WordNet::Similarityword and sense of content word with WordNet::Similarity Keep running sum for score of each sense of targetKeep running sum for score of each sense of target

Pick sense of target word with highest Pick sense of target word with highest score with words in contextscore with words in context

Go to the next word, repeatGo to the next word, repeat

Coverage…Coverage…

WordNetWordNet Nouns – 82,000 conceptsNouns – 82,000 concepts Verbs – 14,000 conceptsVerbs – 14,000 concepts Adjectives – 18,000 conceptsAdjectives – 18,000 concepts Adverbs – 4,000 conceptsAdverbs – 4,000 concepts

Words not found in WordNet can’t be Words not found in WordNet can’t be disambiguated by SenseRelatedisambiguated by SenseRelate

language and resource dependent…language and resource dependent…


Can assign a sense to every word (known Can assign a sense to every word (known to WordNet) in running textto WordNet) in running text

Can assign similarity scores to pairs of Can assign similarity scores to pairs of contexts, or a word and a given set of contexts, or a word and a given set of words…words…

Can turn these into a matrix …Can turn these into a matrix …

Ont

olog

ies

Wor

dNet

-Sim

ilarit

y

Co-Occurrences


Kernels

WSD-Shell

Contexts

SenseClusters

SenseRelate

Kernels are similarity matricesKernels are similarity matrices NSP NSP produces word by word similarity matrices, produces word by word similarity matrices,

for use by SenseClustersfor use by SenseClusters SenseClustersSenseClusters produces various similarity produces various similarity

matrices based on co-occurrencesmatrices based on co-occurrences WordNet-SimilarityWordNet-Similarity produces concept by concept produces concept by concept

similarity matricessimilarity matrices SenseRelateSenseRelate produces context by context produces context by context

similarity matrices based on concept similaritysimilarity matrices based on concept similarity All of these could be used as kernels for All of these could be used as kernels for

Supervised WSDSupervised WSD

Ont

olog

ies

Wor

dNet

-Sim

ilarit

y

Co-Occurrences


Kernels

WSD-Shell

Contexts

SenseClusters

SenseRelate

SenseClusters Input … matricesSenseClusters Input … matrices

Word by Word co-occurrences to create Word by Word co-occurrences to create second order representation (Native)second order representation (Native)

Context by Word co-occurrences to create Context by Word co-occurrences to create LSA representation…LSA representation…

Concept by Concept similarity scores from Concept by Concept similarity scores from WordNet::Similarity WordNet::Similarity

Context by Context similarity scores from Context by Context similarity scores from SenseRelateSenseRelate

Ont

olog

ies

Wor

dNet

-Sim

ilarit

y

Co-Occurrences


Kernels

WSD-Shell

Contexts

SenseClusters

SenseRelate

Identifying CollocationsIdentifying Collocations

……could benefit from word clusters found in could benefit from word clusters found in SenseClustersSenseClusters

……could benefit from similarity measures could benefit from similarity measures from WordNet::Similarity…from WordNet::Similarity…

Ont

olog

ies

Wor

dNet

-Sim

ilarit

y

Co-Occurrences


Kernels

WSD-Shell

Contexts

SenseClusters

SenseRelate

ConclusionConclusion

Time to integrate what we have at the word and Time to integrate what we have at the word and term levelterm level look for ways to stitch semantic patches togetherlook for ways to stitch semantic patches together

This will increase our coverage and decrease This will increase our coverage and decrease language dependencelanguage dependence make the quilt bigger and sturdiermake the quilt bigger and sturdier

We will then be able to look at a broader range We will then be able to look at a broader range of languages and semantic problemsof languages and semantic problems calm problems with the warmth of your lovely quilt… calm problems with the warmth of your lovely quilt…

Many Thanks… Many Thanks… SenseClustersSenseClusters

Amruta Purandare (MS '04)Amruta Purandare (MS '04) Anagha Kulkarni (MS '06)Anagha Kulkarni (MS '06) Mahesh Joshi (MS '06)Mahesh Joshi (MS '06)

WordNet SimilarityWordNet Similarity Sid Patwardhan (MS '03)Sid Patwardhan (MS '03) Jason Michelizzi (MS '05)Jason Michelizzi (MS '05)

SenseRelateSenseRelate Satanjeev Banerjee (MS '02)Satanjeev Banerjee (MS '02) Sid Patwardhan (MS '03)Sid Patwardhan (MS '03) Jason Michelizzi (MS '05)Jason Michelizzi (MS '05) Varada Kolhatkar (MS '09)Varada Kolhatkar (MS '09)

Ngram Statistics PackageNgram Statistics Package Satanjeev Banerjee (MS '02)Satanjeev Banerjee (MS '02) Bridget McInnes (MS '04, PhD '??) Bridget McInnes (MS '04, PhD '??) Saiyam Kohli (MS '06)Saiyam Kohli (MS '06)

Supervised WSDSupervised WSD Saif Mohammad (MS '03)Saif Mohammad (MS '03) Amruta Purandare (MS '04)Amruta Purandare (MS '04) Mahesh Joshi (MS '06)Mahesh Joshi (MS '06) Bridget McInnes (MS '04, PhD '??)Bridget McInnes (MS '04, PhD '??)

URLsURLs

Ngram Statistics PackageNgram Statistics Package http://http://ngram.sourceforge.netngram.sourceforge.net

SenseClustersSenseClusters http://http://senseclusters.sourceforge.netsenseclusters.sourceforge.net

WordNet-SimilarityWordNet-Similarity http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

SenseRelate WSDSenseRelate WSD http://http://senserelate.sourceforge.netsenserelate.sourceforge.net

Supervised WSDSupervised WSD http://http://www.d.umn.edu/~tpederse/supervised.htmlwww.d.umn.edu/~tpederse/supervised.html

the semantic quilt

Education

granularity words

individual words

words objective

kinds words

words collocation

similar contexts

resources similarity

measures similarity