1 from wordnet, to eurowordnet, to the global wordnet grid: anchoring languages to universal meaning...

96
1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

Upload: blake-bartlett

Post on 27-Mar-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

1

From WordNet, to EuroWordNet,

to the Global Wordnet Grid: anchoring languages to universal meaning

Piek Vossen

VU University Amsterdam

Page 2: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

2

What kind of resource is wordnet?

• Mostly used database in language technology

• Enormous impact in language technology development

• Large

• Free and downloadable

• English

Page 3: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

WordNet http://wordnet.princeton.edu/http://wordnet.princeton.edu/• Developed by George Miller and his team at

Princeton University, as the implementation of a mental model of the lexicon

• Organized around the notion of a synset: a set of synonyms in a language that represent a single concept

• Semantic relations between concepts• Covers over 117,000 concepts and over

150,000 English words

Page 4: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

4

Relational model of meaning

man woman

boy girl

cat

kitten

dog

puppy

animal

man

woman

boy

meisje

cat

kitten

dogpuppy

animal

Page 5: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

Wordnet: a network of semantically related words

{conveyance;transport}

{vehicle}

{motor vehicle; automotive vehicle}

{car; auto; automobile; machine; motorcar}{bumper}

{car door}

{car window}

{car mirror} {armrest}

{doorlock}

{hinge; flexible joint}

{cruiser; squad car; patrol car; police car; prowl car}

{cab; taxi; hack; taxicab}

Page 6: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

6

Wordnet Semantic RelationsWordnet Semantic Relations

WN 1.5 starting point

The ‘synset’ as a weak notion of synonymy:“two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value.” (Miller et al. 1993)

Relations between synsets:Relation POS-combination ExampleANTONYMY adjective-to-adjective good/bad

verb-to-verb open/ closeHYPONYMY noun-to-noun car/ vehicle

verb-to-verb walk/ moveMERONYMY noun-to-noun head/ noseENTAILMENT verb-to-verb buy/ payCAUSE verb-to-verb kill/ die

Page 7: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

7

Wordnet Data Model

bank

fiddleviolin

violistfiddler

string

rec: 12345- financial instituterec: 54321

- side of a riverrec: 9876

- small string instrumentrec: 65438

- musician playing violinrec:42654

- musician

rec:25876

- string instrument

rec:35576

- string of instrumentrec:29551

- underwear

type-of

type-of

part-of

Vocabulary of a languageConceptsRelations

1

2

2

1

1

2

Page 8: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

8

Some observations on Wordnet

• synsets are more compact representations for concepts than word meanings in traditional lexicons

• synonyms and hypernyms are substitutional variants:– begin – commence– I once had a canary. The bird got sick. The poor animal died.

• hyponymy and meronymy chains are important transitive relations for predicting properties and explaining textual properties:object -> artifact -> vehicle -> 4-wheeled vehicle -> car

• strict separation of part of speech although concepts are closely related (bed – sleep) and are similar (dead – death)

• lexicalization patterns reveal important mental structures

Page 9: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

9

Lexicalization patterns

25 unique beginnersgarbage

tree

organism

animal

bird

canarychurch

building

artifact

object

plant

flower

rose

wastethreat

entity

common canary

abbey

crocodiledogbasic level concepts

• balance of two principles: • predict most features• apply to most subclasses

• where most concepts are created • amalgamate most parts• most abstract level to draw a pictures

Page 10: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

10

Wordnet top level

Page 11: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

11

Meronymy & picturesbeak

tail

leg

Page 12: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

12

Meronymy & pictures

Page 13: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

13

Co-reference constraint in wordnet:Cats cannot be a kind of cats

• S: (n) cat, true cat (feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats)

• S: (n) guy, cat, hombre, bozo (an informal term for a youth or man) "a nice guy"; "the guy's only doing it for some doll"

• S: (n) cat (a spiteful woman gossip) "what a cat she is!" • S: (n) kat, khat, qat, quat, cat, Arabian tea, African tea (the leaves of the shrub Catha

edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant) "in Yemen kat is used daily by 85% of adults"

• S: (n) cat-o'-nine-tails, cat (a whip with nine knotted cords) "British sailors feared the cat"

• S: (n) Caterpillar, cat (a large tracked vehicle that is propelled by two endless metal belts; frequently used for moving earth in construction and farm work)

• S: (n) big cat, cat (any of several large cats typically able to roar and living in the wild) • S: (n) computerized tomography, computed tomography, CT, computerized axial

tomography, computed axial tomography, CAT (a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross-sectional scans along a single axis)

• S: (n) domestic cat, house cat, Felis domesticus, Felis catus (any domesticated member of the genus Felis)

Page 14: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

14

Page 15: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

15

Wordnet 3.0 statistics

POS Unique Synsets Total

  Strings  Word-Sense

Pairs

Noun 117,798 82,115 146,312

Verb 11,529 13,767 25,047

Adjective 21,479 18,156 30,002

Adverb 4,481 3,621 5,580

Totals 155,287 117,659 206,941

Page 16: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

16

Wordnet 3.0 statistics

POS Monosemous Polysemous Polysemous

 Words and

Senses Words Senses

Noun 101,863 15,935 44,449

Verb 6,277 5,252 18,770

Adjective 16,503 4,976 14,399

Adverb 3,748 733 1,832

Totals 128,391 26,896 79,450

Page 17: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

17

Wordnet 3.0 statistics

POS Average Polysemy Average Polysemy

 Including Monosemous

Words Excluding Monosemous

Words

Noun 1.24 2.79

Verb 2.17 3.57

Adjective 1.4 2.71

Adverb 1.25 2.5

Page 18: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

18

http://www.visuwords.com

Page 19: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

19

Page 20: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

20

Usage of Wordnet

• Improve recall of textual based analysis:– Query -> Index

• Synonyms: commence – begin• Hypernyms: taxi -> car• Hyponyms: car -> taxi• Meronyms: trunk -> elephant• Lexical entailments: gun -> shoot

• Inferencing:– what things can burn?

• Expression in language generation and translation:– alternative words and paraphrases

Page 21: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

21

Improve recall

• Information retrieval: – small databases without redundancy, e.g. image

captions, video text

• Text classification:– small training sets

• Question & Answer systems– query analysis: who, whom, where, what, when

Page 22: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

22

Improve recall

• Anaphora resolution:– The girl fell off the table. She....– The glass fell of the table. It...

• Coreference resolution:– When he moved the furniture, the antique table got

damaged.

• Information extraction (unstructed text to structured databases):– generic forms or patterns "vehicle" - > text with

specific cases "car"

Page 23: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

23

Improve recall

• Summarizers:– Sentence selection based on word counts ->

concept counts– Avoid repetition in summary -> language

generation

• Limited inferencing: detect locations, organisations, etc.

Page 24: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

24

Many others

• Data sparseness for machine learning: hapaxes can be replaced by semantic classes

• Use redundancy for more robustness: spelling correction and speech recognition can built semantic expectations using Wordnet and make better choices

• Sentiment and opinion mining• Natural language learning

Page 25: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

Recall & Precision

query:

“cell”

“cell

phone”

“mobile

phones”

“nerve cell”

“police cell”

recall = doorsnede / relevant

precision = doorsnede / gevonden

found intersection relevant

Recall < 20% for basic search engines!

(Blair & Maron 1985)

“jail”

“neuron”

Page 26: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

26

EuroWordNet

• The development of a multilingual database with wordnets for several European languages

• Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328

• March 1996 - September 1999

• 2.5 Million EURO.

• http://www.hum.uva.nl/~ewn

• http://www.illc.uva.nl/EuroWordNet/finalresults-ewn.html

Page 27: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

27

EuroWordNetEuroWordNet

• Languages covered: – EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian– EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.

• Size of vocabulary:– EuroWordNet-1: 30,000 concepts - 50,000 word meanings.– EuroWordNet-2: 15,000 concepts- 25,000 word meaning.

• Type of vocabulary: – the most frequent words of the languages– all concepts needed to relate more specific concepts

Page 28: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

28

EuroWordNet Model

I = Language Independent linkII = Link from Language Specific to Inter lingual IndexIII = Language Dependent Link

III

Lexical Items Table

cavalcare

andaremuoversi

III

guidare

ILI-record{drive}

Inter-Lingual-Index

Ontology

2OrderEntity

Location Dynamic

Domains

Traffic

Air Road` III

Lexical Items Table

bewegengaan

rijden berijden

III

Lexical Items Table

driveride

movego

III

III

Lexical Items Table

cabalgar jinetear

III

conducir

movertransitar

IIIII

IIII

II

I I

Page 29: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

30

Differences in relations between Differences in relations between EuroWordNet and WordNetEuroWordNet and WordNet

• Added Features to relations

• Cross-Part-Of-Speech relations

• New relations to differentiate shallow hierarchies

• New interpretations of relations

Page 30: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

32

EWN Relationship LabelsEWN Relationship Labels{airplane} HAS_MERO_PART: conj1 {door}

HAS_MERO_PART: conj2 disj1 {jet engine}HAS_MERO_PART: conj2 disj2 {propeller}

{door} HAS_HOLO_PART: disj1 {car}HAS_HOLO_PART: disj2 {room}

HAS_HOLO_PART: disj3 {entrance}

{dog} HAS_HYPERONYM: conj1 {mammal} HAS_HYPERONYM: conj2 {pet}

{albino} HAS_HYPERONYM: disj1 {plant} HAS_HYPERONYM: disj2 {animal}

Default Interpretation: non-exclusive disjunction

Page 31: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

33

Factive/Non-factive CAUSES (Lyons 1977)

factive (default interpretation):

“to kill causes to die”: {kill} CAUSES{die}

non-factive: E1 probably or likely causes event E2 or E1 is intended to cause some event E2:

“to search may cause to find”.{search} CAUSES {find} non-factive

EWN Relationship LabelsEWN Relationship Labels

Page 32: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

34

Cross-Part-Of-Speech relationsCross-Part-Of-Speech relations

WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy:

adornment 2 change of state-- (the act of changing something)adorn 1 change, alter-- (cause to change; make different)

EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations:

{adorn V} XPOS_NEAR_SYNONYM {adornment N}

{size N} XPOS_NEAR_HYPONYM {tall A}{short A}

Page 33: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

35

Role relationsRole relations

In the case of many verbs and nouns the most salient relation is not the hyperonym but the relation between the event and the involved participants. These relations are expressed as follows:

{knife} ROLE_INSTRUMENT {to cut}{to cut} INVOLVED_INSTRUMENT {knife} reversed{school} ROLE_LOCATION {to teach}{to teach} INVOLVED_LOCATION {school} reversed

These relations are typically used when other relations, mainly hyponymy, do not clarify the position of the concept network, but the word is still closely related to another word.

Page 34: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

36

Co_Role relationsCo_Role relations

guitar player HAS_HYPERONYM playerCO_AGENT_INSTRUMENT guitar

player HAS_HYPERONYM personROLE_AGENT to play musicCO_AGENT_INSTRUMENT musical instrument

to play music HAS_HYPERONYM to makeROLE_INSTRUMENT musical instrument

guitar HAS_HYPERONYM musical instrumentCO_INSTRUMENT_AGENT guitar player

ice saw HAS_HYPERONYM sawCO_INSTRUMENT_PATIENT ice

saw HAS_HYPERONYM sawROLE_INSTRUMENT to saw

ice CO_PATIENT_INSTRUMENT ice saw REVERSED

Page 35: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

37

Co_Role relationsCo_Role relations

Examples of the other relations are:

criminal CO_AGENT_PATIENT victimnovel writer/ poet CO_AGENT_RESULT novel/ poemdough CO_PATIENT_RESULT pastry/ breadphotograpic camera CO_INSTRUMENT_RESULT photo

Page 36: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

38

Overview of the Language Internal Overview of the Language Internal relations in EuroWordnetrelations in EuroWordnet

Same Part of Speech relations:NEAR_SYNONYMY apparatus - machineHYPERONYMY/HYPONYMY car - vehicleANTONYMY open - closeHOLONYMY/MERONYMY head - nose

Cross-Part-of-Speech relations:XPOS_NEAR_SYNONYMY dead - death; to adorn - adornmentXPOS_HYPERONYMY/HYPONYMY to love - emotionXPOS_ANTONYMY to live - deadCAUSE die - deathSUBEVENT buy - pay; sleep - snoreROLE/INVOLVED write - pencil; hammer - hammerSTATE the poor - poorMANNER to slurp - noisily BELONG_TO_CLASS Rome - city

Page 37: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

chronical patient ; mental patient

patient

HYPONYM

ρ-PROCEDURE ρ-LOCATION

STATE

ρ-CAUSE

cureρ-PATIENT

treat

docter

disease; disorder

physiotherapymedicineetc.

hospital, etc.

stomach disease, kidney disorder,

ρ-PATIENT ρ-AGENT

child docter

child

co-ρ-AGENT-PATIENT

Horizontal & vertical semantic relations

HYPONYM

HYPONYM

Page 38: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

40

• Inter-Lingual-Index: unstructured fund of concepts to

provide an efficient mapping across the languages;

• Index-records are mainly based on WordNet synsets and

consist of synonyms, glosses and source references;

• Various types of complex equivalence relations are

distinguished;

• Equivalence relations from synsets to index records: not on a

word-to-word basis;

• Indirect matching of synsets linked to the same index items;

The Multilingual DesignThe Multilingual Design

Page 39: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

41

Equivalent Near SynonymEquivalent Near Synonym1. Multiple Targets (1:many)

Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5:• make clean by removing dirt, filth, or unwanted substances from• remove unwanted substances from, such as feathers or pits, as of chickens or fruit• remove in making clean; "Clean the spots off the rug"• remove unwanted substances from - (as in chemistry)

2. Multiple Sources (many:1)Dutch wordnet: versiersel near_synonym versiering ILI-Record: decoration.

3. Multiple Targets and Sources (many:many)Dutch wordnet: toestel near_synonym apparaat

ILI-records: machine; device; apparatus; tool

Page 40: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

42

Equivalent HyperonymyTypically used for gaps in English WordNet:

• genuine, cultural gaps for things not known in English culture:

– Dutch: klunen, to walk on skates over land from one frozen water to the other

• pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English:

– Dutch: kunststof = artifact substance <=> artifact object

Page 41: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

43

Equivalent Hyponymy

has_eq_hyponym

Used when wordnet1.5 only provides more narrow terms. In this case there can only be a pragmatic difference, not a genuine cultural gap, e.g.: Spanish dedo = either finger or toe.

Page 42: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

44

{ toe : part of foot }

{ finger : part of hand }

{ dedo , dito : finger or toe } { head : part of body } { hoofd : human head } { kop : animal head }

toe finger head

dito

dedo

hoofd kop

EN-Net

NL-Net

IT-Net

ES-Net

= normal equivalence

= eq _has_hyponym

= eq _has_hyperonym

Complex mappings across languages

Page 43: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

45

Typical gaps in the (English) ILI• Dutch:doodschoppen (to kick to death):

eq_hyperonym {kill}V and to {kick}V aardig (Adjective, to like):

eq_near_synonym {like}Vcassière (female cashier)

eq_hyperonym {cashier}, {woman}kunstproduct (artifact substance)

eq_hyperonym {artifact} and to {product}

• Spanish:alevín (young fish):

eq_hyperonym {fish} and eq_be_in_state {young}cajera (female cashier)

eq_hyperonym {cashier}, {woman}

Page 44: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

46

Wordnets as semantic structures

• Wordnets are unique language-specific structures:– different lexicalizations– differences in synonymy and homonymy– different relations between synsets– same organizational principles: synset structure and

same set of semantic relations.

• Language independent knowledge is assigned to the ILI and can thus be shared for all language linked to the ILI: both an ontology and domain hierarchy

Page 45: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

47

Autonomous & Language-Specific

voorwerp{object}

lepel{spoon}

werktuig{tool}

tas{bag}

bak{box}

blok{block}

lichaam{body}

Wordnet1.5 Dutch Wordnet

bagspoonbox

object

natural object (an object occurring naturally)

artifact, artefact (a man-made object)

instrumentality block body

containerdeviceimplement

tool instrument

Page 46: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

48

Artificial ontology: • better control or performance, or a more compact and coherent structure. • introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), • neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise).

What properties can we infer for spoons?spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking

Linguistic versus Artificial Ontologies

Page 47: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

49

Linguistic ontology: • Exactly reflects the relations between all the lexicalized words and

expressions in a language. • Captures valuable information about the lexical capacity of

languages: what is the available fund of words and expressions in a language.

What words can be used to name spoons?spoon -> object, tableware, silverware, merchandise, cutlery,

Linguistic versus Artificial Ontologies

Page 48: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

50

Wordnets versus ontologies

• Wordnets:• autonomous language-specific lexicalization

patterns in a relational network. • Usage: to predict substitution in text for

information retrieval,• text generation, machine translation, word-

sense-disambiguation.• Ontologies:

• data structure with formally defined concepts.• Usage: making semantic inferences.

Page 49: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

51

Sharing world knowledge

• All wordnets in the world can be linked to the same ontology

• All wordnets in the world can be linked to the same thesaurus

Page 50: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

52

Wordnet: Domain information

type-of

type-ofpart-of

Relations

rec: 12345- financial institute

rec: 54321

- river side

rec: 9876

- small string instrument

rec: 65438

- musician playing a violin

rec:42654

- musician

rec:25876

- string instrument

rec:35576

- string of an instrument

rec:29551

- underwear

ConceptsVocabularies of languages

bank

violin

violist

string

1

2

1

2

1

2

Domains

Music

Culture FinanceClothing Sport

Ball

sports

Winter

sports

Page 51: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

53

How to harmonize wordnets?

• Wordnets are unique language-specific lexicalizations patterns

• Define universal sets of concepts that play a major role in many different wordnets: so-called Base Concepts

• Define base concepts in each language wordnet– High level in the hierarchy– Many hyponyms

• Provide the closest equivalent in English wordnet• Determine the intersection of English

equivalences

Page 52: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

54

Lexicalization patterns

25 unique beginnersgarbage

tree

organism

animal

bird

canarychurch

building

artifact

object

plant

flower

rose

threat

entity

common canary

abbey

crocodiledogbasic level concepts

1024 base concepts

Page 53: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

55

Base Concept Intersection

Nouns Verbs

Intersection EN, NL, IT, ES 24 6

Intersection FR, DE, EE, CZ 70 30

Intersection All 13 2

{cause 6; get#9; have#7; induce#2; make#12; stimulate#3}{create 2; make#13}{go 14; locomote#1; move#15; travel#4}{be 4; have the quality of being#1}

{human 1; individual#1; mortal#1; person#1; someone#1; soul#1}{animal 1; animate being#1; beast#1; brute#1; creature#1; fauna#1}{flora 1; plant#1; plant life#1}{matter 1; substance#1}{food 1; nutrient#1}{feeling 1}{act 1; human action#1; human activity#1}

Page 54: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

56

Explanations for low intersection of Base Concepts

• The individual selections are not representative enough.

• There are major differences in the way meanings are classified, which have an effect on the frequency of the relations.

• The translations of the selection to WordNet1.5 synsets are not reliable

• The resources cover very different vocabularies

Page 55: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

57

Concepts selected by at least two Concepts selected by at least two languages: intersections of pairslanguages: intersections of pairs

 

NOUNS 

 VERBS

  

  NL ES IT EN NL ES IT EN

NL 1027 103 182 333 323 36 42 86

ES 103 523 45 284 36 128 18 43

IT 182 45 334 167 42 18 104 39

EN 333 284 167 1296 86 43 39 236

Page 56: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

58

  Nouns Verbs Total

Physical objects & substances 491   491

Processes and states 272 228 500

Mental objects 33   33

Total 796 228 1024

Common Base Concepts

Page 57: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

59

Table 4: Number of Common BCs represented in the local wordnetsTable 4: Number of Common BCs represented in the local wordnets

Related to CBCs Eq_synonym Eq_near CBCs Without

Direct Equivalent

NL 992 725 269 97

ES 1012 1009 0 15IT 878 759 191 9

Table 5: BC4 Gaps in at least two wordnets (10 synsets)Table 5: BC4 Gaps in at least two wordnets (10 synsets)

body covering#1 mental object#1; cognitive content#1; content#2body substance#1 natural object#1social control#1 place of business#1; business establishment#1change of magnitude#1 plant organ#1contractile organ#1 plant part#1psychological feature#1 spatial property#1; spatiality#1

Page 58: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

60

Table 6: Local senses with complex equivalence Table 6: Local senses with complex equivalence relations to CBCsrelations to CBCs

NL ES ITEq_has_hyperonym 61 40 4eq_has_hyponym 34 14 20Eq_has_holonym 2 0Eq_has_meronym 3 2Eq_involved 3Eq_is_caused_by 3Eq_is_state_of 1

Example of complex relation

CBC: cause to feel unwell#1, Verb

Closest Dutch concept: {onwel#1}, Adjective (sick)

Equivalence relation: eq_is_caused_by

Page 59: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

61

EuroWordNet data Synsets No. of senses Sens./

syns. Entries Sens./

entry LIRels. LIRels/

syns EQRels-

ILI EQRels/s

yn Synsets without

ILI Dutch 44015 70201 1,59 56283 1,25 111639 2,54 53448 1,21 7203 Spanish 23370 50526 2,16 27933 1,81 55163 2,36 21236 0,91 0 Italian 40428 48499 1,20 32978 1,47 117068 2,90 71789 1,78 1561 French 22745 32809 1.44 18777 1.75 49494 2.18 22730 1.00 20 German 15132 20453 1.35 17098 1.20 34818 2.30 16347 1.08 0 Czech 12824 19949 1.56 12283 1.62 26259 2.05 12824 1.00 0 Estonian 7678 13839 1.80 10961 1.26 16318 2.13 9004 1.17 0 English 16361 40588 2,48 17320 2,34 42140 2,58 n.a. n.a. n.a. WN15 94515 187602 1,98 126617 1,48 211375 2,24 n.a. n.a. n.a.

Page 60: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

62

From EuroWordNet to Global WordNet

• Currently, wordnets exist for more than 50 languages, including:

• Arabic, Bantu, Basque, Chinese, Bulgarian, Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean, Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, Zulu...

• Many languages are genetically and typologically unrelated

• http://www.globalwordnet.org

Page 61: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

63

Global Wordnet Association

• Danish

• Norway

• Swedish

• Portuguese

• Korean

• Russian

• Basque

• Catalan

• Thai

Arabic Polish Welsh Chinese 20 Indian

Languages Brazilian

Portuguese Hebrew Latvian Persian Kurdish Avestan Baluchi Hungarian

• English

• German

• Spanish

• French

• Italian

• Dutch

• Czech

• Estonian

Romanian Bulgarian Turkish Slovenian Greek Serbian

EuroWordNet BalkaNet

http://www.globalwordnet.org

Page 62: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

64

Some downsides of the EuroWordnet model

• Construction is not done uniformly• Coverage differs• Not all wordnets can communicate with one

another• Proprietary rights restrict free access and usage• A lot of semantics is duplicated• Complex and obscure equivalence relations due to

linguistic differences between English and other languages

Page 63: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

65

Inter-LingualOntology

Device

Object

TransportDeviceEnglish Words

vehicle

car train

1

2

3 3

Czech Words

dopravní prostředník

auto vlak

2

1French Words

véhicule

voiture train

2

1

Estonian Words

liiklusvahend

auto killavoor

2

1

German Words

Fahrzeug

Auto Zug

2

1

Spanish Words

vehículo

auto tren

2

1

Italian Words

veicolo

auto treno

2

1

Dutch Words

voertuig

auto trein

2

1

Next step: Global WordNet Grid

Page 64: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

66

GWNG: Main Features

• Construct separate wordnets for each Grid language

• Contributors from each language encode the same core set of concepts plus culture/language-specific ones

• Synsets (concepts) can be mapped crosslinguistically via an ontology

Page 65: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

67

The Ontology: Main Features

• Formal ontology serves as universal index of concepts

• List of concepts is not just based on the lexicon of a particular language (unlike in EuroWordNet) but uses ontological observations

• Ontology contains only upper and mid-level concepts

• Concepts are related in a type hierarchy• Concepts are defined with axioms

Page 66: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

68

The Ontology: Main Features

• In addition to high-level (“primitive”) concept ontology needs to express low-level concepts lexicalized in the Grid languages

• Additional concepts can be defined with expressions in Knowledge Interchange Format (KIF) based on first order predicate calculus and atomic element

Page 67: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

69

The Ontology: Main Features

• Minimal set of concepts (Reductionist view):– to express equivalence across languages– to support inferencing

• Ontology must be powerful enough to encode all concepts that are lexically expressed in any of the Grid languages

• Ontology need not and cannot provide a linguistic encoding for all concepts found in the Grid languages – Lexicalization in a language is not sufficient to warrant inclusion

in the ontology– Lexicalization in all or many languages may be sufficient

• Ontological observations will be used to define the concepts in the ontology

Page 68: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

70

Ontological observations• Identity criteria as used in OntoClean (Guarino &

Welty 2002), :– rigidity: to what extent are properties true for entities

in all worlds? You are always a human, but you can be a student for a short while.

– essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of.

– unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not.

Page 69: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

71

Type-role distinction

• Current WordNet treatment:(1) a husky is a kind of dog(type)(2) a husky is a kind of working dog (role)• What’s wrong? (2) is defeasible, (1) is not:*This husky is not a dogThis husky is not a working dog

Other roles: watchdog, sheepdog, herding dog, lapdog, etc….

Page 70: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

72

Ontology and lexicon

•Hierarchy of disjunct types:Canine PoodleDog; NewfoundlandDog;

GermanShepherdDog; Husky

•Lexicon:– NAMES for TYPES:

{poodle}EN, {poedel}NL, {pudoru}JP((instance x Poodle)

– LABELS for ROLES:{watchdog}EN, {waakhond}NL, {banken}JP

((instance x Canine) and (role x GuardingProcess))

Page 71: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

73

Ontology and lexicon

•Hierarchy of disjunct types:River; Clay; etc…

•Lexicon:– NAMES for TYPES:

{river}EN, {rivier, stroom}NL((instance x River)

– LABELS for dependent concepts:{rivierwater}NL (water from a river => water is not a unit){kleibrok}NL (irregularly shared piece of clay=>non-essential) ((instance x water) and (instance y River) and (portion x y)((instance x Object) and (instance y Clay) and (portion x y)

and (shape X Irregular))

Page 72: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

74

Rigidity

• The “primitive” concepts represented in the ontology are rigid types

• Entities with non-rigid properties will be represented with KIF statements

• But: ontology may include some universal, core concepts referring to roles like father, mother

Page 73: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

75

Properties of the Ontology

• Minimal: terms are distinguished by essential properties only

• Comprehensive: includes all distinct concepts types of all Grid languages

• Allows definitions via KIF of all lexemes that express non-rigid, non-essential properties of types

• Logically valid, allows inferencing

Page 74: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

76

Mapping Grid Languages onto the Ontology

• Explicit and precise equivalence relations among synsets in different languages:– type hierarchy is minimal– subtle differences can be encoded in KIF expressions

• Grid database contains wordnets with synsets that label • --either “primitive” types in the hierarchies, • --or words relating to these types in ways made explicit in

KIF expressions • If 2 lgs. create the same KIF expression, this is a statement

of equivalence!

Page 75: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

77

How to construct the GWNG• Take an existing ontology as starting point;• Use English WordNet to maximize the number of disjunct

types in the ontology;• Link English WordNet synsets as names to the disjunct

types;• Provide KIF expressions for all other English words and

synsets• Copy the relation to the ontology to other languages,

including KIF statements built for English• Revise KIF statements to make the mapping more precise• Map all words and synsets that are and cannot be mapped

to English WordNet to the ontology:– propose extensions to the type hierarchy– create KIF expressions for all non-rigid concepts

Page 76: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

78

Initial Ontology: SUMO (Niles and Pease)

SUMO = Suggested Upper Merged Ontology

--consistent with good ontological practice

--fully mapped to WordNet(s): 1000 equivalence mappings, the rest through subsumption

--freely and publicly available

--allows data interoperability

--allows NLP

--allows reasoning/inferencing

Page 77: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

79

SUMO

• 1,000 generic, abstract, high-level terms

• 4,000 definitional statements

• MILO (Mid-Level Ontology)

closer to lexicon, WordNet

Page 78: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

80

Mapping Grid languages onto the Ontology

• Check existing SUMO mappings to Princeton WordNet -> extend the ontology with rigid types for specific concepts

• Extend it to many other WordNet synsets• Observe OntoClean principles! (Synsets

referring to non-rigid, non-essential, non-unicitous concepts must be expressed in KIF)

Page 79: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

81

Lexicalizations not mapped to WordNet• Not added to the type hierarchy:

{straathond}NL (a dog that lives in the streets)((instance x Canine) and (habitat x Street))

• Added to the type hierarchy:{klunen}NL (to walk on skates from one frozen body to

the next over land)WalkProcess KluunProcessAxioms:(and (instance x Human) (instance y Walk) (instance z

Skates) (wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y) (before y s2) etc…

• National dishes, customs, games,....

Page 80: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

82

Most mismatching concepts are not new types

• Refer to sets of types in specific circumstances or to concept that are dependent on these types, next to {rivierwater}NL there are many other:

{theewater}NL (water used for making tea)

{koffiewater}NL (water used for making coffee)

{bluswater}NL (water used for making extinguishing file)

• Relate to linguistic phenomena:– gender, perspective, aspect, diminutives, politeness,

pejoratives, part-of-speech constraints

Page 81: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

83

• {teacher}EN((instance x Human) and (agent x

TeachingProcess))

• {Lehrer}DE ((instance x Man) and (agent x TeachingProcess))

• {Lehrerin}DE ((instance x Woman) and (agent x TeachingProcess))

KIF expression for gender marking

Page 82: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

84

KIF expression for perspective

sell: subj(x), direct obj(z),indirect obj(y) versus buy: subj(y), direct obj(z),indirect obj(x) (and (instance x Human)(instance y Human)

(instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e)

The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn

Page 83: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

85

Aspectual variants

• Slavic languages: two members of a verb pair for an ongoing event and a completed event.

• English: can mark perfectivity with particles, as in the phrasal verbs eat up and read through.

• Romance languages: mark aspect by verb conjugations on the same verb.

• Dutch, verbs with marked aspect can be created by prefixing a verb with door: doorademen, dooreten, doorfietsen, doorlezen, doorpraten (continue to breathe/eat/bike/read/talk).

• These verbs are restrictions on phases of the same process• Does NOT warrant the extension of the ontology with

separate processes for each aspectual variant

Page 84: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

86

Kinship relations in Arabic

• (~Eam)َع6م father's brother, paternal uncle.

• (xaAl) َخ6ال mother's brother, maternal uncle.

• (Eam~ap) َع6َّم>ة father's sister, paternal aunt.

• اَل6ة (xaAlap) َخ6 mother's sister, maternal aunt

Page 85: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

87

Kinship relations in Arabic

• .........• َقAيَق6ة sister, sister on the paternal (aqiyqapfull$) َش6

and maternal side (as distinct from تDَخE :(uxot<) ُأ'sister' which may refer to a 'sister' from paternal or maternal side, or both sides).

• 6ْكDالن (vakolAna) َث father bereaved of a child (as opposed to يمA 6ِت Aيَّم6ة or (yatiym) َي 6ِت for (yatiymap) َيfeminine: 'orphan' a person whose father or mother died or both father and mother died).

• Dَل6ى 6ْك (vakolaYa) َث other bereaved of a child (as opposed to يمA 6ِت Aيَّم6ة or َي 6ِت for feminine: 'orphan' a َيperson whose father or mother died or both father and mother died).

Page 86: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

88

father's brother, paternal uncle

WORDNETpaternal uncle => uncle

=> brother of ....????

ONTOLOGY(=> (paternalUncle ?P ?UNC) (exists (?F) (and (father ?P ?F) (brother ?F ?UNC))))

Complex Kinship concepts

Page 87: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

89

Universality as evidence

• English verb cut abstracts from the precise process but there are troponyms that implicate the manner :– snip, clip imply scissors, chop and hack a large knife or an axe

• Dutch there is no general verb but only specific verbs:knippen “clip, snip, cut with scissors or a scissor-like tool'”, snijden

“cut with a knife or knife-like tool”, hakken “chop, hack, to cut with an axe, or similar tool”).

• If lexicalization of the specific process is more universal it can be seen as evidence that the specific processes should be listed in the ontology and not the generic verb

Page 88: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

90

Open Questions/Challenges

• What is a word, i.e., a lexical unit?

• What is the status of complex lexemes like English lightning rod, word of mouth, find out, kick the bucket?

• What is a semantic unit, i.e. a concept?

Page 89: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

91

Open Questions/Challenges

• Is there a core inventory of concepts that are universally encoded?

• If so, what are these concepts?• How can crosslinguistic equivalence be verified?• Is there systematicity to the language-specific

extensions?• What are the lexicalization patterns of individual

languages? • Are lexical gaps accidental or systematic?

Page 90: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

92

Coverage: what belongs in a universal lexical database?

• Formal, linguistic criteria for inclusion

• Informal, cultural criteria

• Both are difficult to define and apply!

Page 91: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

93

Advantages of the Global Wordnet Grid

• Shared and uniform world knowledge:– universal inferencing– uniform text analysis and interpretation

• More compact and less redundant databases• More clear notion how languages map to

the knowledge – better criteria for expressing knowledge– better criteria for understanding variation

Page 92: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

94

dog

watchdog

poodlestreet dog

dachshundlapdog

short hair dachshund

long hairdachshund

Expansion from a type to roles

hunting dog

Expansion with pure hyponymy relations

puppy

bitch

Page 93: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

95

dog

watchdog

poodlestreet dog

dachshundlapdog

short hair dachshund

long hair dachshund

Expansion from a role to types and other roles

hunting dog

Expansion with pure hyponymy relations

puppy

bitch

Page 94: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

96

Automotive ontology: (http://www.ontoprise.de)

Page 95: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

97

Who uses ontologies?

Page 96: 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

98