23- november-091 wordnet and extended wordnet sriram rajaraman
TRANSCRIPT
23- November-09 1
WordNet and Extended WordNet
Sriram Rajaraman
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
2
WordNet and eXtended WordNet
Objective
Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
3
WordNet and eXtended WordNet
Focus
Introduction WordNet eXtended WordNet Summary
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
4
WordNet and eXtended WordNet
Reference
1. WordNet: http://wordnet.princeton.edu/
2. eXtended WordNet: http://xwn.hlt.utdallas.edu/
3. Christiane Fellbaum,MIT ,”WordNet : an electronic lexical database”, MIT Press, 1999, c1998.
4. George A. Miller, Richard Beckwith, Christiane Fellbaum,Derek Gross, and Katherine Miller, “Introduction to WordNet: An On-line Lexical Database”, core working paper
5. Rada Mihalcea, Dan I. Moldovan,” eXtended WordNet: progress report ” Proceedings of NAACL Workshop on WordNet and Other Lexical Resources , 2001
6. Sanda M. Harabagiu, George A. Miller, Dan I. Moldovan, “WordNet 2 - A Morphologically and Semantically Enhanced Resource”, SIGLEX 1999
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
5
WordNet and eXtended WordNet
Focus
Introduction WordNet eXtended WordNet Summary
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
6
WordNet and eXtended WordNet
Introduction
Traditional Dictionary What is available:
spelling pronunciation inflected and derivative forms etymology part of speech definitions illustrative uses of alternative senses synonyms and antonyms special usage notes
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
7
WordNet and eXtended WordNet
TreeRef: http://www.merriam-webster.com/dictionary/Tree
Main Entry: tree Pronunciation: \ˈtrē\ Function: noun Etymology: Middle English, from Old English trēow; akin to Old Norse trē tree, Greek drys, Sanskrit dāru wood
Date: before 12th century - a woody perennial plant having a single
usually elongate main stem generally with few or no branches on its lower part
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
8
WordNet and eXtended WordNet
Drawback of traditional dictionary
What is missing: It does not say, for example, that trees have roots, or that they
consist of cells having cellulose walls, or even that they are living organisms
“Sense” of the super ordinate term aka hypernym (living plant or industrial plant)
Coordinate terms (bushes, shrubs, …) Hyponyms - types of trees (pine, tropical,deciduous..) Information assumed to be known to everyone ( trees have
barks and leaves, they grow from seeds, they make their own food by photosynthesis- probably information for encyclopedia!)
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
9
WordNet and eXtended WordNet
How can we improve ?
The missing information is structural – every word points upwards to its super-ordinate (hypernym), but not sideward to its co-ordinates or downward to the hyponym.
Restriction due to alphabetical ordering, budget and size constraints- which can be overcome in an electronic lexical database
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
10
WordNet and eXtended WordNet
Focus
Introduction
WordNet eXtended WordNet Summary
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
11
WordNet and eXtended WordNet
What is WordNet?
WordNet is a lexical database for the English language.
WordNet 3.0 has [1]: – 117,097 nouns (average noun has 1.23 senses) – 11,488 verbs (average verb has 2.16 sense) – 22,141 adjectives – 4,601 adverbs
Created and maintained at the Cognitive Science Laboratory of Princeton University
Accessible online @http://wordnetweb.princeton.edu/perl/webwn(Also Downloadable)
Interfaces available in , c, dot Net , java, perl, php, python, sql etc..(JWNL, WordNet.Net, RTiA wordNet, pywordne ..)
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
12
WordNet and eXtended WordNet
WordNet Structure
Words are organized as synsets in WordNet
There are four disjoint kinds of synsets, containing either
Nouns verbs Adjectives Adverbs
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
13
WordNet and eXtended WordNet
What is a synset?
Basic unit of WordNet A group of synonymous words which refer to
a common semantic concept Words may belong to more than one synset –
first sense is the most frequent sense Words also include collocations (“eye
contact’, “mix up”) Example
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
14
WordNet and eXtended WordNet
Synset example
“car” as in {car, auto, automobile, machine, motorcar} {car, railcar, railway car, railroad car}.
“Chocolate” as in-
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
15
WordNet and eXtended WordNet
How are synsets related?
A list of pointers associated with each sysnet to express the relationship between synsets
WordNet defines 17 relations 10 between synsets 5 between wordsense "gloss" (between a synset and a sentence, i.e a textual
definition for each synset) "frame" (between a synset and a verb construction
pattern)
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
16
WordNet and eXtended WordNet
WordNet relations
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
17
WordNet and eXtended WordNet
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
18
WordNet and eXtended WordNet
Applications of WordNet
Information Extraction Information Retreival Question Answering Word Sense Disambiguation Text Inference Coreference, coherence and metonymy Knowledge acquisition Internet Search engine
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
19
WordNet and eXtended WordNet
Limitations of WordNet
Designed as a semantic lexicon, not a knowledge base
Limited connections between topically related words
Lack of morphological relationship(special algorithm does that)
Lack of selectional restriction And more…. [6]
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
20
WordNet and eXtended WordNet
Focus
Introduction WordNet
eXtended WordNet Summary
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
21
WordNet and eXtended WordNet
eXtended WordNet[2]
A project at the Human Language Technology Research Institute , at The University of Texas at Dallas(http://xwn.hlt.utdallas.edu)
Provides several important enhancements (over WordNet2.0) intended to remedy the present limitations of WordNet
Current Version: eXtended WordNet 2.0 (xwn 2.0-1.1)
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
22
WordNet and eXtended WordNet
Objective of eXtended WordNet
Exploit the rich information, available in synset glosses (gloss is a sentence, i.e a textual definition for each synset)
Semantic and logical enhancements to WordNet Increase the connectivity among the synsets by
at least one order of magnitude Enable access to a broader context for each
concept
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
23
WordNet and eXtended WordNet
What eXtended WordNet does?[5]
Preprocessing and Parsing Separation of glosses into definition and examples,
tokenization and identification of compound words Word Sense Disambiguation
All words in a gloss is tagged with appropriate senses and linked to corresponding synsets
Logical Form Transformation Gloss Logical Forms
Topical Relations Connections are established between the words,
based on the context/topic
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
24
WordNet and eXtended WordNet
Extended WordNet
tennis court
“Tennis court: A court on which tennis is played.”
playcourt
tennisobject
location-ofdef
{“tennis”, “lawn tennis”}
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
25
WordNet and eXtended WordNet
eXtended WordNet format
Consists of four XML files--one for each part of speech: Noun Verb Adjective Adverb
The xml tags contains attributes that specify the relationships
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
26
WordNet and eXtended WordNet
eXtended WordNet- Applications
Core Knowledge Base for applications - Question Answering Information Retrieval Information Extraction Summarization Natural Language Generation Inferences Other knowledge intensive applications
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
27
WordNet and eXtended WordNet
Focus
Introduction WordNet eXtended WordNet
Summary
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
28
WordNet and eXtended WordNet
Further Reading
W3C- RDF/OWL Representation of WordNet http://www.w3.org/TR/wordnet-rdf/
eXtended WordNet Format/algorithm http://xwn.hlt.utdallas.edu/wsd.html
Current research at Princeton http://wordnet.cs.princeton.edu/projects.html
Related Projects (APIs, Web Interface, Extension) http://wordnet.princeton.edu/wordnet/related-projects/
23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer
Science
29
WordNet and eXtended WordNet
Back up