wordnet: a database of lexical relations

27
Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall, Chapter 16. and http://en.wikipedia.org/wiki/WordNet and explore WordNet: http://wordnet.princeton.edu/

Upload: ahmed-abd-elwasaa

Post on 14-Apr-2017

454 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: WORDNET: A Database of Lexical Relations

Structured lexiconsand Lexical semantics

Especially WordNet®

See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall, Chapter 16.

and http://en.wikipedia.org/wiki/WordNetand explore WordNet: http://wordnet.princeton.edu/

Page 2: WORDNET: A Database of Lexical Relations

2/27

Structured lexicons• Alternative to alphabetical dictionary• List of words grouped according to meaning• Classic example Roget’s Thesaurus• Hierarchical organization is important• Hierarchies familiar as taxonomies, eg in natural

sciences– Daughters are “types of” and share certain properties,

inherited from the mother• Similar idea for ordinary words: hyponymy and

synonymy

Page 3: WORDNET: A Database of Lexical Relations

3/27

animal

bird fish ...

canary eagle trout shark

bald e. golden e. hawk e. bateleur

space

in general dimensions form motion

size expansion distance interval contiguity

reduction, deflation, shrinkage, curtailment, condensation ....

hyponymy

synonymy

Page 4: WORDNET: A Database of Lexical Relations

4/27

Thesaurus• A way to show the structure of (lexical)

knowledge• Much used for technical terminology• Can be enriched by having other lexical

relations:– Antonyms (as well as synonyms)– Different hyponymy relations, not just is-a-type-of, but

has-as-part/member• Thesaurus can be explored in any direction

– across, up, down– Some obvious distance metrics can be used to

measure similarity between words

Page 5: WORDNET: A Database of Lexical Relations

5/27

WordNet: History

• 1985: a group of psychologists and linguists start to develop a “lexical database”– Princeton University– theoretical basis: results from

• WordNet organizes lexical information in terms of word meanings, rather than word forms.

Page 6: WORDNET: A Database of Lexical Relations

6/27

Global organisation

• division of the lexicon into five categories:– Nouns– Verbs– Adjectives– Adverbs– function words (“probably stored separately

as part of the syntactic component of language” [Miller et al.]

Page 7: WORDNET: A Database of Lexical Relations

7/27

Global organization

• nouns: organized as topical hierarchies• verbs: entailment relations• adjectives: multi-dimensional hyperspaces• adverbs: multi-dimensional hyperspaces

Page 8: WORDNET: A Database of Lexical Relations

8/27

Lexical semantics• How are word meanings represented in WordNet?

– synsets (synonym sets) as basic units– a word ‘meaning’ is represented by simply listing the word forms

that can be used to express it• example: senses of board

– a piece of lumber vs. a group of people assembled for some purpose

– synsets as unambiguous designators:– {board, plank, ...} vs. {board, committee, ...}

• Members of synsets are rarely true synonyms– WordNet does not attempt to capture subtle distinctions among

members of the synset– may be due to specific details, or simply connotation, collocation

Page 9: WORDNET: A Database of Lexical Relations

9/27

Synsets

• synsets often sufficient for differential purposes

• Synsets are linked by semantic relations, word forms are linked by lexical relations..

• Preferable for cardinality of synset to be >1– WordNet also gives a gloss for each word

meaning, and (often) an example

Page 10: WORDNET: A Database of Lexical Relations

10/27

Page 11: WORDNET: A Database of Lexical Relations

11/27

16.2 WORDNET: A Database of Lexical Relations

• WordNet:– The most well-developed and widely used

lexical DB for English– Handcrafting from scratch, rather than

mining information from existing dictionaries and thesauri

– Consisting three separate DBs:• One each for nouns and verbs, and• A third for adjectives and adverbs

Page 12: WORDNET: A Database of Lexical Relations

12/27

16.2 WORDNET: A Database of Lexical Relations

Scope of current WordNet 1.6 release in terms of unique entries and total numbers of senses for the four databases.

Page 13: WORDNET: A Database of Lexical Relations

16.2 WORDNET: A Database of Lexical Relations

A portion of the WordNet 1.6 entry for the noun bass

Page 14: WORDNET: A Database of Lexical Relations

14/27

Lexical relations in WordNet

• WordNet is organized by semantic relations.– It is characteristic of semantic relations that they are

reciprocated– if there is a semantic relation R between meaning {x1,

x2, ...} and meaning {y1, y2, ...}, then there is a relation R between {y1,y2, ...} and {x1, x2, ...}

– Individual relations may or may not be• Symmetric R(A,B) R(B,A) (eg synonymy, not hyponymy)• Transitive R(A,B) & R(B,C) R(A,C) (eg synonymy may be)• Reflexive R(A,A) is true (synonymy is, antonymy isn’t)

Page 15: WORDNET: A Database of Lexical Relations

15/27

Lexical relations• Nouns

– Synonym ~ antonym (opposite of)– Hypernyms (is a kind of) ~ hyponym (for example)– Holonym (is part of) ~ meronym (has as part)

• Verbs– Synonym ~ antonym– Hypernym ~ troponym (eg lisp – talk) – Entailment (eg snore – sleep)

• Adjectives/Adverbs in addition to above– Related nouns– Verb participles– Derivational information

Page 16: WORDNET: A Database of Lexical Relations

16/27

Lexical relations

Noun relations in WordNet

Page 17: WORDNET: A Database of Lexical Relations

17/27

Lexical relations

Verb relations in WordNet

Page 18: WORDNET: A Database of Lexical Relations

18/27

Lexical relations

Adjective and adverb relations in WordNet

Page 19: WORDNET: A Database of Lexical Relations

19/27

WordNet’s noun hierarchy

• noun hierarchy partitioned into separate hierarchies with unique top hypernyms

• vague abstractions would be semantically empty, e.g. {entity} with immediate hyponyms {object, thing} and {idea}

Page 20: WORDNET: A Database of Lexical Relations

20/27

• {act,action,activity}• {animal,fauna}• {artifact}• {attribute,property}• {body,corpus}• {cognition,knowledge}• {communication}• {event,happening}• {feeling,emotion}• {food}• {group,collection}• {location,place}• {motive}

• {natural object}• {natural phenomenon}• {person,human being}• {plant,flora}• {possession}• {process}• {quantity,amount}• {relation}• {shape}• {state,condition}• {substance}• {time}

Page 21: WORDNET: A Database of Lexical Relations

21/27

Nouns in WordNet

• noun hierarchy as lexical inheritance system– seldom goes more than ten levels deep, – the deepest examples usually contain

technical levels that are not part of everyday vocabulary

– shallowest levels are too vague– “Inherited hypernym” option shows full

hierarchy

Page 22: WORDNET: A Database of Lexical Relations

22/27

deep

shallow

Page 23: WORDNET: A Database of Lexical Relations

23/27

Nouns in WordNet

• man-made artefacts: sometimes six or seven levels deep– roadster → car → motor vehicle → wheeled vehicle

→ vehicle → conveyance → artefact• hierarchy of persons: about three or four levels

– televangelist → evangelist → preacher → clergyman → spiritual leader → person

• Like all thesaurus structures, words can have multiple hypernyms

Page 24: WORDNET: A Database of Lexical Relations

24/27

WordNets for other languages

• Idea has been widely copied• Sometimes by “translating” Princeton WordNet

– Lexical relations in general are universal ...– But are they in practice?– Are synsets universal?

• EuroWordNet: combining multilingual WordNets to include cross-language equivalence– Inherent difficulties, as above

Page 25: WORDNET: A Database of Lexical Relations

25/27

What can WordNet be used for?

• As a lexical resource, an online dictionary, for human use

• Word-sense disambiguation (including homophone correction)– neighbouring words will be more closely

related to correct sense (desert/dessert ~ camel)

• Document classification– What is this text about? Look for recurring

hypernyms

Page 26: WORDNET: A Database of Lexical Relations

26/27

What can WordNet be used for?

• Document retrieval– eg looking for texts about sports cars, search

for synonyms and hyponyms of sports car• Open-domain Q/A

– Searching texts (eg WWW) to answer questions expressed in natural language

– eg http://uk.ask.com/ [example]• Textual entailment

– Answering questions implied by text

Page 27: WORDNET: A Database of Lexical Relations

27/27