The Semantic Quilt

Download The Semantic Quilt

Post on 11-May-2015

751 views

Category:

Education

1 download

Embed Size (px)

DESCRIPTION

A talk on the Semantic Quilt, which combines various methods of "doing semantics" into a more unified framework.

TRANSCRIPT

<ul><li>1.The Semantic Quilt:Contexts, Co-occurrences, Kernels, and OntologiesTed Pedersen University of Minnesota, Duluth http:// www.d.umn.edu/~tpederse</li></ul> <p>2. Create by stitching together 3. Sew together different materials 4. Ontologies Co-Occurrences Kernels Contexts 5. Semantics in NLP </p> <ul><li>Potentially useful for many applications </li></ul> <ul><li><ul><li>Machine Translation </li></ul></li></ul> <ul><li><ul><li>Document or Story Understanding </li></ul></li></ul> <ul><li><ul><li>Text Generation </li></ul></li></ul> <ul><li><ul><li>Web Search </li></ul></li></ul> <ul><li>Can come from many sources </li></ul> <ul><li>Not well integrated </li></ul> <ul><li>Not well defined? </li></ul> <p>6. What do we mean bysemantics ? it depends on our resources </p> <ul><li>Ontologies relationships among concepts </li></ul> <ul><li><ul><li>Similar / related concepts connected </li></ul></li></ul> <ul><li>Dictionary definitions of senses / concepts </li></ul> <ul><li><ul><li>similar / related senses have similar / related definitions </li></ul></li></ul> <ul><li>Contexts short passages of words</li></ul> <ul><li><ul><li>similar / related words occur in similar / related contexts </li></ul></li></ul> <ul><li>Co-occurrences </li></ul> <ul><li><ul><li>awordis defined by the company it keeps</li></ul></li></ul> <ul><li><ul><li>words that occur with the same kinds words are similar / related </li></ul></li></ul> <p>7. What level of granularity? </p> <ul><li>words </li></ul> <ul><li>terms / collocations </li></ul> <ul><li>phrases </li></ul> <ul><li>sentences </li></ul> <ul><li>paragraphs </li></ul> <ul><li>documents </li></ul> <ul><li>books </li></ul> <p>8. The Terrible Tension : Ambiguity versus Granularity </p> <ul><li>Words are potentially very ambiguous </li></ul> <ul><li><ul><li>But we can list them (sort of) </li></ul></li></ul> <ul><li><ul><li> we can define their meanings (sort of) </li></ul></li></ul> <ul><li><ul><li> not ambiguous to human reader, but hard for a computer to know which meaning is intended </li></ul></li></ul> <ul><li>Terms / collocations are less ambiguous </li></ul> <ul><li><ul><li>Difficult to enumerate because there are so many, but can be done for a domain (e.g., medicine) </li></ul></li></ul> <ul><li>Phrases (short contexts) can still be ambiguous, but not to the same degree as words or terms/collocations </li></ul> <p>9. The Current State of Affairs </p> <ul><li>Most resources and methods focus on word or term semantics</li></ul> <ul><li><ul><li>makes it possible to build resources (manually or automatically) with reasonable coverage, but </li></ul></li></ul> <ul><li><ul><li>techniques become very resource dependent </li></ul></li></ul> <ul><li><ul><li>resources become language dependent </li></ul></li></ul> <ul><li><ul><li>introduces a lot of ambiguity </li></ul></li></ul> <ul><li><ul><li>not clear how to bring together resources </li></ul></li></ul> <ul><li>Similarity is a useful organizing principle, but </li></ul> <ul><li><ul><li> there are lots of ways to be similar </li></ul></li></ul> <p>10. Similarity as Organizing Principle </p> <ul><li>Measure word association using knowledge lean methods that are based on co-occurrence information from large corpora </li></ul> <ul><li>Measure contextual similarity using knowledge lean methods that are based on co-occurrence information from large corpora </li></ul> <ul><li>Measure conceptual similarity / relatedness using a structured repository of knowledge</li></ul> <ul><li><ul><li>Lexical database WordNet </li></ul></li></ul> <ul><li><ul><li>Unified Medical Language System (UMLS) </li></ul></li></ul> <p>11. Things we can do now </p> <ul><li>Identify associated words </li></ul> <ul><li><ul><li>fine wine </li></ul></li></ul> <ul><li><ul><li>baseball bat </li></ul></li></ul> <ul><li>Identify similar contexts </li></ul> <ul><li><ul><li>I bought some food at the store </li></ul></li></ul> <ul><li><ul><li>I purchased something to eat at the market </li></ul></li></ul> <ul><li>Assign meanings to words </li></ul> <ul><li><ul><li>I went to the bank /[financial-inst.]to deposit my check </li></ul></li></ul> <ul><li>Identify similar (or related) concepts </li></ul> <ul><li><ul><li>frog : amphibian </li></ul></li></ul> <ul><li><ul><li>Duluth : snow </li></ul></li></ul> <p>12. Things we want to do </p> <ul><li>Integrate different resources and methods </li></ul> <ul><li>Solve bigger problems </li></ul> <ul><li><ul><li>some of what we do now is a means to an unclear end </li></ul></li></ul> <ul><li>Be Language Independent</li></ul> <ul><li>Offer Broad Coverage </li></ul> <ul><li>Reduce dependence on manually built resources</li></ul> <ul><li><ul><li>ontologies, dictionaries, labeled training data </li></ul></li></ul> <p>13. Semantic Patches to Sew Together </p> <ul><li>Contexts </li></ul> <ul><li><ul><li>SenseClusters : measures similarity between written texts (i.e., contexts) </li></ul></li></ul> <ul><li>Co-Occurrences </li></ul> <ul><li><ul><li>Ngram Statistics Package : measures association between words, identify collocations or terms </li></ul></li></ul> <ul><li>Kernels </li></ul> <ul><li><ul><li>WSD-Shell : supervised learning for word sense disambiguation, in process of including SVMs with user defined kernels </li></ul></li></ul> <ul><li> Ontologies </li></ul> <ul><li><ul><li>WordNet-Similarity : measures similarity between concepts found in WordNet </li></ul></li></ul> <ul><li><ul><li>UMLS-Similarity</li></ul></li></ul> <ul><li><ul><li>All of these are projects at the University of Minnesota, Duluth </li></ul></li></ul> <p>14. Ontologies Co-Occurrences Kernels Contexts 15. Ngram Statistics Package http:// ngram.sourceforge.net Co-Occurrences 16. Things we can do now </p> <ul><li>Identify associated words </li></ul> <ul><li><ul><li>fine wine </li></ul></li></ul> <ul><li><ul><li>baseball bat </li></ul></li></ul> <ul><li>Identify similar contexts </li></ul> <ul><li><ul><li>I bought some food at the store </li></ul></li></ul> <ul><li><ul><li>I purchased something to eat at the market </li></ul></li></ul> <ul><li>Assign meanings to words </li></ul> <ul><li><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></li></ul> <ul><li>Identify similar (or related) concepts </li></ul> <ul><li><ul><li>frog : amphibian </li></ul></li></ul> <ul><li><ul><li>Duluth : snow </li></ul></li></ul> <p>17. Co-occurrences and semantics? </p> <ul><li>individual words (esp. common ones) are very ambiguous </li></ul> <ul><li><ul><li>bat</li></ul></li></ul> <ul><li><ul><li>line </li></ul></li></ul> <ul><li>pairs of words disambiguate each other </li></ul> <ul><li><ul><li>baseball bat </li></ul></li></ul> <ul><li><ul><li>vampire Transylvania </li></ul></li></ul> <ul><li><ul><li>product line </li></ul></li></ul> <ul><li><ul><li>speech . line</li></ul></li></ul> <p>18. Why pairs of words? </p> <ul><li>Zipf's Law </li></ul> <ul><li><ul><li>most words are rare, most bigrams are even more rare, most ngrams are even rarer still </li></ul></li></ul> <ul><li><ul><li>the more common a word, the more senses it will have </li></ul></li></ul> <ul><li> Co-occurrences are less frequent than individual words, tend to be less ambiguous as a result </li></ul> <ul><li><ul><li>Mutually disambiguating</li></ul></li></ul> <p>19. Bigrams </p> <ul><li>Window Size of 2 </li></ul> <ul><li><ul><li>baseball bat, fine wine, apple orchard, bill clinton </li></ul></li></ul> <ul><li>Window Size of 3 </li></ul> <ul><li><ul><li>houseof representatives, bottleofwine,</li></ul></li></ul> <ul><li>Window Size of 4 </li></ul> <ul><li><ul><li>presidentof therepublic, whisperingin thewind </li></ul></li></ul> <ul><li>Selected using a small window size (2-4 words) </li></ul> <ul><li>Objective is to capture a regular or localized pattern between two words (collocation?) </li></ul> <ul><li>If order doesnt matter, then these are co-occurrences </li></ul> <p>20. occur together more often than expected by chance </p> <ul><li>Observed frequencies for two words occurring together and alone are stored in a 2x2 matrix </li></ul> <ul><li>Expected values are calculated, based on the model of independence and observed values </li></ul> <ul><li><ul><li>How often would you expect these words to occur together, if they only occurred together by chance? </li></ul></li></ul> <ul><li><ul><li>If two words occur significantly more often than the expected value, then the words do not occur together by chance. </li></ul></li></ul> <p>21. Measures and Tests of Associationhttp:// ngram.sourceforge.net </p> <ul><li>Log-likelihood Ratio </li></ul> <ul><li>Mutual Information</li></ul> <ul><li>Pointwise Mutual Information</li></ul> <ul><li>Pearsons Chi-squared Test </li></ul> <ul><li>Phi coefficient</li></ul> <ul><li>Fishers Exact Test</li></ul> <ul><li>T-test</li></ul> <ul><li>Dice Coefficient </li></ul> <ul><li>Odds Ratio </li></ul> <p>22. What do we get at the end? </p> <ul><li>A list of bigrams or co-occurrences that are significantor interesting (meaningful?) </li></ul> <ul><li><ul><li>automatic </li></ul></li></ul> <ul><li><ul><li>language independent </li></ul></li></ul> <ul><li>These can be used as building blocks for systems that do semantic processing </li></ul> <ul><li><ul><li>relatively unambiguous </li></ul></li></ul> <ul><li><ul><li>often very informative about topic or domain </li></ul></li></ul> <ul><li><ul><li>can serve as a fingerprint for a document or book </li></ul></li></ul> <p>23. Ontologies Co-Occurrences Kernels Contexts 24. SenseClusters http:// senseclusters.sourceforge.net Contexts 25. Things we can do now </p> <ul><li>Identify associated words </li></ul> <ul><li><ul><li>fine wine </li></ul></li></ul> <ul><li><ul><li>baseball bat </li></ul></li></ul> <ul><li>Identify similar contexts </li></ul> <ul><li><ul><li>I bought some food at the store </li></ul></li></ul> <ul><li><ul><li>I purchased something to eat at the market </li></ul></li></ul> <ul><li>Assign meanings to words </li></ul> <ul><li><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></li></ul> <ul><li>Identify similar (or related) concepts </li></ul> <ul><li><ul><li>frog : amphibian </li></ul></li></ul> <ul><li><ul><li>Duluth : snow </li></ul></li></ul> <p>26. Identify Similar Contexts </p> <ul><li>Find phrases that say the same thing using different words </li></ul> <ul><li><ul><li>I went to the store </li></ul></li></ul> <ul><li><ul><li>Ted drove to Wal-Mart </li></ul></li></ul> <ul><li>Find words that have the same meaning in different contexts </li></ul> <ul><li><ul><li>Thelineis moving pretty fast </li></ul></li></ul> <ul><li><ul><li>I stood inlinefor 12 hours </li></ul></li></ul> <ul><li>Find different words that have the same meaning in different contexts </li></ul> <ul><li><ul><li>Thelineis moving pretty fast </li></ul></li></ul> <ul><li><ul><li>I stood in thequeuefor 12 hours </li></ul></li></ul> <p>27. SenseClusters Methodology </p> <ul><li>Represent contexts using first or second order co-occurrences</li></ul> <ul><li>Reduce dimensionality of vectors </li></ul> <ul><li><ul><li>Singular value decomposition </li></ul></li></ul> <ul><li>Cluster the context vectors </li></ul> <ul><li><ul><li>Find the number of clusters </li></ul></li></ul> <ul><li><ul><li>Label the clusters </li></ul></li></ul> <ul><li>Evaluate and/or use the contexts! </li></ul> <p>28. Second Order Features </p> <ul><li>Second order features encode something extra about a feature that occurs in a context, something not available in the context itself </li></ul> <ul><li><ul><li>Native SenseClusters : each feature is represented by a vector of the words with which it occurs</li></ul></li></ul> <ul><li><ul><li>Latent Semantic Analysis : each feature is represented by a vector of the contexts in which it occurs</li></ul></li></ul> <p>29. Similar Contexts may have the same meaning </p> <ul><li><ul><li>Context 1: He drives his car fast</li></ul></li></ul> <ul><li><ul><li>Context 2: Jim speeds in his auto </li></ul></li></ul> <ul><li><ul><li>Car -&gt; motor, garage, gasoline, insurance </li></ul></li></ul> <ul><li><ul><li>Auto -&gt; motor, insurance, gasoline, accident </li></ul></li></ul> <ul><li><ul><li>Car and Auto share many co-occurrences</li></ul></li></ul> <p>30. Second Order Context Representation </p> <ul><li>Bigrams used to create a word matrix </li></ul> <ul><li><ul><li>Cell values = log-likelihood of word pair </li></ul></li></ul> <ul><li>Rows are first order co-occurrence vector for a word </li></ul> <ul><li>Represent context by averaging vectors of words in that context </li></ul> <ul><li><ul><li>Context includes the Cxt positions around the target, where Cxt is typically 5 or 20. </li></ul></li></ul> <p>31. 2 ndOrder Context Vectors </p> <ul><li>He won an Oscar, butTom Hanksis still a nice guy. </li></ul> <p>0 6272.85 2.9133 62.6084 20.032 1176.84 51.021 O2 context 0 18818.55 0 0 0 205.5469 134.5102 guy 0 0 0 136.0441 29.576 0 0 Oscar 0 0 8.7399 51.7812 30.520 3324.98 18.5533 won needle family war movie actor football baseball 32. After context representation </p> <ul><li>Second order vector is an average of word vectors that make up context, captures indirect relationships </li></ul> <ul><li><ul><li>Reduced by SVD to principal components </li></ul></li></ul> <ul><li>Now, cluster the vectors! </li></ul> <ul><li><ul><li>Many methods, we often use k-means or repeated bisections </li></ul></li></ul> <ul><li><ul><li>CLUTO </li></ul></li></ul> <p>33. What do we get at the end? </p> <ul><li>contexts organized into some number of clusters based on thesimilarityof their co-occurrences </li></ul> <ul><li>contexts which share words that tend to co-occur with the same other words are clustered together </li></ul> <ul><li><ul><li>2 ndorder co-occurrences </li></ul></li></ul> <p>34. OntologiesWordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters 35. Ohwe also get plenty of these </p> <ul><li>Similarity Matrices </li></ul> <ul><li><ul><li>Word by Word</li></ul></li></ul> <ul><li><ul><li>Ngram by Ngram</li></ul></li></ul> <ul><li><ul><li>Word by Context </li></ul></li></ul> <ul><li><ul><li>Ngram by Context </li></ul></li></ul> <ul><li><ul><li>Context by Word </li></ul></li></ul> <ul><li><ul><li>Context by Ngram </li></ul></li></ul> <ul><li><ul><li>Context by Context</li></ul></li></ul> <p>36. The WSD-Shell http:// www.d.umn.edu/~tpederse/supervised.html Kernels 37. Things we can do now </p> <ul><li>Identify associated words </li></ul> <ul><li><ul><li>fine wine </li></ul></li></ul> <ul><li><ul><li>baseball bat </li></ul></li></ul> <ul><li>Identify similar contexts </li></ul> <ul><li><ul><li>I bought some food at the store </li></ul></li></ul> <ul><li><ul><li>I purchased something to eat at the market </li></ul></li></ul> <ul><li>Assign meanings to words </li></ul> <ul><li><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></li></ul> <ul><li>Identify similar (or related) concepts </li></ul> <ul><li><ul><li>frog : amphibian </li></ul></li></ul> <ul><li><ul><li>Duluth : snow </li></ul></li></ul> <p>38. Machine Learning Approach </p> <ul><li>Annotate text with sense tags </li></ul> <ul><li><ul><li>must select sense inventory </li></ul></li></ul> <ul><li>Find interesting features </li></ul> <ul><li><ul><li>bigrams and co-occurrences quite effective </li></ul></li></ul> <ul><li>Learn a model </li></ul> <ul><li>Apply model to untagged data </li></ul> <ul><li>Works very wellgiven sufficient quantities of training data and sufficient coverage of your sense inventory </li></ul> <p>39. Kernel Methods </p> <ul><li>The challenge for any learning algorithm is to separate the training data into groups by finding a boundary (hyperplane) </li></ul> <ul><li>Sometimes in the original space this boundary is hard to find </li></ul> <ul><li>Transform data via kernel function to a different higher dimensional representation, where boundaries are easier to spot </li></ul> <p>40. Kernels are similarity matrices </p> <ul><li>NSPproduces word by word similarity matrices, for use by SenseClusters </li></ul> <ul><li>SenseClustersproduces various sorts of similarity matrices based on co-occurrences </li></ul> <ul><li>which can be used as kernels </li></ul> <ul><li><ul><li>Latent Semantic kernel </li></ul></li></ul> <ul><li><ul><li>Bigram Association kernel </li></ul></li></ul> <ul><li><ul><li>Co-occurrence Association kernel </li></ul></li></ul> <p>41. What do we get at the end? </p> <ul><li>More accurate supervised classifiers that potentially require less training data </li></ul> <ul><li>Kernel improves ability to find boundaries between training examples by transforming feature space to a higher dimensional cleaner space </li></ul> <p>42. OntologiesWordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters 43. WordNet-Similarity http://wn-similarity.sourceforge.net Ontologies 44. Things we can do now </p> <ul><li>Identify associated words </li></ul> <ul><li><ul><li>fine wine </li></ul></li></ul> <ul><li><ul><li>baseball bat </li></ul></li></ul> <ul><li>Identify similar contexts </li></ul> <ul><li><ul><li>I bought some food at the store </li></ul></li></ul> <ul><li><ul><li>I purchased something to eat at the market </li></ul></li></ul> <ul><li>Assign meanings to words </li></ul> <ul><li><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></li></ul> <ul><li>Identify similar (or related) concepts </li></ul> <ul><li><ul><li>frog : amphibian </li></ul></li></ul> <ul><li><ul><li>Duluth : snow </li></ul></li></ul> <p>45. Similarity and Relatedness </p> <ul><li>Two concepts are similar if they are connected byis-arelationships. </li></ul> <ul><li><ul><li>A frogis-a-kind-ofamphibian </li></ul></li></ul> <ul><li><ul><li>An illnessis-aheath_condition </li></ul></li></ul> <ul><li>Two concepts can be related many ways </li></ul> <ul><li><ul><li>A humanhas-a-partliver</li></ul></li></ul> <ul><li><ul><li>Duluthreceives-a-lot-ofsnow </li></ul></li></ul> <ul><li> similarity is one way to be related</li></ul> <p>46. WordNet-Similarity http://wn-similarity.sourceforge.net </p> <ul><li>Path based measures </li></ul> <ul><li><ul><li>Shortest path (path) </li></ul></li></ul> <ul><li><ul><li>Wu &amp; Palmer (wup) </li></ul></li></ul> <ul><li><ul><li>Leacock &amp; Chodorow (lch) </li></ul></li></ul> <ul><li><ul><li>Hirst &amp; St-Onge (hso) </li></ul></li></ul> <ul><li>Information content measures </li></ul> <ul><li><ul><li>Resnik (res) </li></ul></li></ul> <ul><li><ul><li>Jiang &amp; Conrath (jcn) </li></ul></li></ul> <ul><li><ul><li>Lin (lin) </li></ul></li></ul> <ul><li>Gloss based measures </li></ul> <ul><li><ul><li>Banerjee and Pedersen (lesk) </li></ul></li></ul> <ul><li><ul><li>Patwardhan and Pedersen (vector, vector_pairs) </li></ul></li></ul> <p>47. Path Finding </p> <ul><li>Find shortest is-a path between two concepts? </li></ul> <ul><li><ul><li>Rada, et. al. (1989) </li></ul></li></ul> <ul><li><ul><li>Scaled by depth of hierarchy </li></ul></li></ul> <ul><li><ul><li><ul><li>Leacock &amp; Chodorow (1998) </li></ul></li></ul></li></ul> <ul><li><ul><li>Depth of subsuming concept scaled by sum of the depths of individual concepts</li></ul></li></ul> <ul><li><ul><li><ul><li>Wu and Palmer (1994) </li></ul></li></ul></li></ul> <p>48. watercraftinstrumentality object artifact conveyance vehicle motor-vehicle car boat ark article ware table-ware cutlery fork from Jiang and Conrath [1997] 49. Information Content </p> <ul><li>Measure of specificity in is-a hierarchy (Resnik, 1995) </li></ul> <ul><li><ul><li>-log (probability of concept) </li></ul></li></ul> <ul><li><ul