division of semantic labor in the global wordnet grid piek vossen, vu university amsterdam german...
Embed Size (px)
TRANSCRIPT

Division of semantic labor in
the Global WordNet Grid
Piek Vossen, VU University Amsterdam
German Rigau, University of the Basque Country
5th Global Wordnet Conference
Mumbai, India, Jan 30 – Feb 5, 2010

Overview
• KYOTO as a domain implementation of the Global Wordnet Grid
• Scope of knowledge integration• Division of linguistic labor• How to integrate resources?• How to make inferences?

KYOTO – some statistics• European-Asian project
• March 2008 – March 2011
• 7 countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic)
• 12 sites
– Universities & research institutes: VUA, CNR-ILC, CNR-IIT, BBAW, EHU, AS, NICT, Masaryk
– Companies: Synthema, Irion
– User organizations: ECNC, WWF
• 7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese)

Semantic & Syntacticrepresentation
Kyoto Annotation Format
Multilingual Knowledge Base
Fact Base Term Base
Linguistic Processor
12
Fact ExtractorKybot
Term Extractor
Tybot
Wiki Editor
Wikyoto
Wordnets & Ontology
Semantic & Syntacticrepresentation
Kyoto Annotation Format
Multilingual Knowledge Base
Fact Base Term Base
Linguistic Processor
1122
Fact ExtractorKybot
Fact ExtractorKybot
Term Extractor
Tybot
Term Extractor
Tybot
Wiki Editor
Wikyoto
Wordnets & Ontology
KYOTO – Overall architectureOverview of the KYOTO process

GWC2010, Mumbai
5Applying ontology mappings

GWC2010, Mumbai
6
Gobal Wordnet Grid
Domain
Domain
Domain
Domain
Domain
DomainDomain
Domain
OntologyBase concepts
Wn
Wn
WnWn
Wn
Wn
Wn
DOLCE/SUMOOntoWordnet
Domain
V

GWC2010, Mumbai
7
Available repositories in KYOTO
Environment domain• Term database: 500,000 terms per 1,000 documents per language
• Open data project:
– DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples)
– GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names
• Domain thesauri and taxonomies: Species 2000: 2,1 million species
• Wordnets for 7 languages: about 50,000 to 120,000 synsets per language
• Ontologies: SUMO, DOLCE, SIMPLE

GWC2010, Mumbai
8
Domain
T
Domain TV
Domain
TV
Domain
V
T
DomainT T
Domain
Species
Domain
Domain
Domain
Domain
Kyoto Knowledge Base
OntologyOntologyBase concepts
Wn
Wn
WnWn
Wn
Wn
Wn
DBPedia
Terms
500K
2,100K
DOLCE/SUMOOntoWordnet
Terms
500K
Species
2,100K
Domain
V

GWC2010, Mumbai
9
Species in the ontology
- Implies to store 2.1 million species twice in the ontology.

GWC2010, Mumbai
10
Should all knowledge be stored in the central ontology?
• Vocabularies are too large for full inferencing with current reasoners
• Vocabularies are linguistically too diverse to be represented in an ontology
• Inferencing capabilities of formal ontologies is not needed for all levels of knowledge

GWC2010, Mumbai
11
Modeling knowledge in a domain
• Knowledge needs to be divided over different lexical and ontological layers:– Precisely define the relations between lexical and
ontological layers– Precisely define the inferencing based on the
distributed knowledge layers

GWC2010, Mumbai
12
Division of linguistic labor principle
• Putnam 1975: – No need to know all the necessary and
sufficient properties to determine if something is "gold"
– Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts.
– Speakers can still use the word "gold" and communicate useful information

GWC2010, Mumbai
13
Division of semantic labor principle
• Digital version of Putnam (1975): – Computer does not need to have all the necessary and
sufficient properties to determine if something is a "European tree frog"
– Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts.
– Computers can still reason with semantics and do useful stuff with textual data

GWC2010, Mumbai
14
What does the computer need to know?
• Distinction between rigid and non-rigid (Welty & Guarino 2002):– being a "cat" is essential to individual's existence and
therefore rigid– being a "pet" is a temporarily role and therefore non-
rigid; a cat can become a pet and stop being a pet without ceasing to exist
– Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat
• All 2.1 million species are rigid concepts

GWC2010, Mumbai
15
What does the computer need to know?
• Roles and processes in documents have more information value than the defining properties of species:– Species defined in terms of physical properties already
known to expert;– Roles such as "invasive species", "migration species",
"threatened species" express THE important properties of instances of species
• Roles are typically the terms we learn from the text not the species!

GWC2010, Mumbai
16
Wordnet-ontology-relations• Rigid synset relations to ontology:
– Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality:
– sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO)
• Non-rigid synset relations to ontology:– Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event)– sc_domainOf: range of ontology types that restricts a role– sc_playRole: role that is being played– sc_participantOf: the process in wich the role is played
• Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets

Global Wordnet Grid Model
perdurant
change-of-location
migration
endurant
object
organism
bird
role
done-by
has-sourcehas-destinationhas-path
some
hashas
bird_1_N sc_equivalentOf birdrigid
English Wordnet in WN-LMF KYOTO Ontology in OWL-DL(Extension of DOLCE LT)
migration_bird_1_N sc_domainOf bird non-rigid sc_playRole done-by
sc_participantOf migration
migration_4_N sc_equivalentOf migrationmigrate_1 _V sc_equivalentOf migration
duck_1_N, rigid
hyponymsubclass
subclass
subclass
subclass
subclass
subclass

Global Wordnet Grid Model
perdurant
change-of-location
migration
endurant
object
organism
bird
role
done-by
has-sourcehas-destinationhas-path
some
hashas
bird_1_N sc_equivalentOf birdrigid
English Wordnet in WN-LMF KYOTO Ontology in OWL-DL(Extension of DOLCE LT)
migration_bird_1_N sc_domainOf bird non-rigid sc_playRole done-by
sc_participantOf migration
migration_4_N sc_equivalentOf migrationmigrate_1 _V sc_equivalentOf migration
duck_1_N, rigid
subclass
subclass
subclass
subclass
subclass
subclass
Dutch Wordnet migrerende dieren_1_N sc_domainOf organism
(migrating species) sc_playRole done-by non-rigid sc_participantOf migration
equivalent_hypernym eng-30-02356039-n (bird) eend_1_N (duck)
equivalent eng-30-01254614-n (duck)
Spanish Wn, Basque WnItalian Wn, Japanese WnChinese Wn ....
Cross-lingual equivalence mappings are expressed
through wordnet mappings
Cross-lingual equivalence mappings are expressed
through wordnet mappings

Wordnet to ontology mappings
{create, produce, make}Verb, English
-> sc_ equivalenceOf construction
{artifact, artefact}Noun, English
-> sc_domainOf physical_object
-> sc_playRole result-existence
-> sc_participantOf construction
{kunststof}Noun, Dutch // lit. artifact substance
-> sc_domainOf amount_of_matter
-> sc_playRole result-existence
-> sc_participantOf construction
{meat}Noun, English
-> sc_domainOf cow, sheep, pig
-> sc_playRole patient
-> sc_participantOf eat
{ 名 肉 , 食物 , 餐 }Noun, Chinese
-> sc_domainOf animal
-> sc_playRole patient
-> sc_participantOf eat
{ , طعام, لحم Noun, Arabic{غذاء
-> sc_domainOf cow, sheep
-> sc_playRole patient
-> sc_participantOf eat

Wordnet to ontology mappings
{teacher}Noun, English
-> sc_domainOf human
-> sc_playRole done-by
-> sc_participantOf teach
{leraar}Noun, Dutch // lit. male teacher
-> sc_domainOf man
-> sc_playRole done-by
-> sc_participantOf teach
{lerares}Noun, Dutch // lit. female teacher
-> sc_domainOf woman
-> sc_playRole done-by
-> sc_participantOf teach

Wordnet-LMF<LexicalEntry id="footmark">
<Lemma writtenForm="footmark" partOfSpeech="n"/>
<Sense id="footmark_1" synset="eng-30-06645039-n">
<MonolingualExternalRefs>
<MonolingualExternalRef externalSystem="Wordnet3.0" externalReference="" />
</MonolingualExternalRefs>
</Sense>
</LexicalEntry>
<Synset/>
<SenseAxis/>
<SenseAxis id="sa_ita16-eng30_001" relType="eq_synonym">
<Target ID="ita-16-1251-n" />
<Target ID="eng-30-13480848-n"/>
</SenseAxis>

WN-LMF Synset relations<Synset id="eng-30-06645039-n" baseConcept="0"> <!-- footprint -->
<Definition gloss="mark of a foot or shoe on a surface">
<Statement example="the police made casts of the footprints in the soft earth outside the window" />
</Definition>
<OntologicalMetaProperties rigidValue=”1”>
<rigid score=”0.57” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.09” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<SynsetRelations/>
<MonolingualExternalRefs>
<MonolingualExternalRef externalSystem="SUMO" reference="superficialPart" relType="at"/>
<MonolingualExternalRef externalSystem="KYO" reference="mark" relType="sc_subclassOf"/>
</MonolingualExternalRefs>
</Synset>

WN-LMF Synset relations<Synset id="eng-30-02356039-n" baseConcept="0"> <!-- migration bird -->
<Definition gloss="birds that migrate in winter to warmer regions"/>
<OntologicalMetaProperties rigidValue=”0”>
<rigid score=”0.00” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.69” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<SynsetRelations/>
<MonolingualExternalRefs>
<Statement>
<MonolingualExternalRef externalSystem="KYO" reference="bird" relType="sc_domainOf"/>
<MonolingualExternalRef externalSystem="KYO" reference="done-by" relType="sc_playRole"/>
<MonolingualExternalRef externalSystem="KYO" reference="migration" relType="sc_participantOf"/>
</Statement>
</MonolingualExternalRefs>
</Synset>

GWC2010, Mumbai
24
Division of labor in knowledge sources
Eleutherodactylus augusti
Eleutherodactylus
Leptodactylidae
Anura
Amphibia
Chordata
Animalia
Eleutherodactylus atrabracus barking frog
frog:1, toad:1, toad frog:1, anuran:1, batrachian:1, salientian:1
amphibian:3
vertebrate:1,craniate:1
chordate:1
animal:1Base Concept
2.1 million species 100,000 synsets 2,000 types
endurant
physical-object
organism
endemic frogendangered frogpoisonous frogalien frog
500,000 terms
Skos database Wordnet-LMF Ontology-OWL-DL
Term database
perdurant
endanger

GWC2010, Mumbai
25
How to make inferences?
• Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia
• Sql queries to term database• Graph matching on wordnets stored in
DebVisDic• Reasoning on a small ontology

KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong
26
Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text For each term in KAF representation of a text:
(a)If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping
(b)Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.)
(c)Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.)
Collect all mappings from the ontology and all (relevant) ontological implications and insert them into the KAF representation of the text.

KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong
27
Examples
1. Migration birds in the Humber Estuary.
2. The migration of birds to the Humber Estuary
3. Bird migration in the Humber Estuary
4. Birds that migrate to the Humber Estuary

Annotation of ontological implications in KAF
<!-- Migration birds in the Humber Estuary -->
<term lemma=“migration bird” pos=”N.pl”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=”sc_domainOf” reference=“bird"/>
<externalRef resource=“ontology" relation=“sc_participantOf” reference=“migration"/>
<externalRef resource=“ontology” relation=“sc_playRole" reference=“done-by"/>
<externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>
</externalReference>
</term>
<term lemma=”in” pos=”P”/>
<term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/>

<!-- Bird migration in the Humber Estuary -->
<term lemma=“bird” pos = “N.pl”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=”sc_equivalentOf” reference=“bird"/>
</externalReference>
</term>
<term lemma=“migration” pos=”N”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=“sc_equivalentOf” reference=“migration"/>
<externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>
</externalReference>
</term>
<term lemma=”in”/>
<term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/>
Annotation of ontological implications in KAF

<!-- Birds that migrate to the Humber Estuary -->
<term lemma=“bird” pos=”N.pl”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=”sc_equivalentOf” reference=“bird"/>
</externalReference>
</term>
<term lemma=“migrate” pos=”V”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=“sc_equivalentOf” reference=“migration"/>
<externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>
</externalReference>
</term>
<term lemma=”to”/>
<term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/>
Annotation of ontological implications in KAF

KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong
31
Kybot profiles
IF <! bird migration to HE>
T1 + to + T2 &
T1.impliedType="change_of_location" & T1.impliedRole="has-target" &
T2.Type="location"
THEN
<location-target, T1, T2>
IF <! species migration from HE>
T1 + from + T2 &
T1.impliedType="change_of_location" & T1.impliedRole="has-source" &
T2.Type="location"
THEN
<location-source, T1, T2>

Kybot Knowledge Patterns <events>
<event eid="e1" target="t2" lemma="feed" pos="V" tense="PAST"
aspect="NONE" polarity="POS"/>
<event eid="e2" target="t20" lemma="migrate" pos="V" tense="PRESENT"
aspect="NONE" polarity="POS"/>
<role rid="r1" event="e1" target="t1" rtype="agent"/>
<role rid="r2" event="e1" target="t3" rtype="patient"/>
<role rid="r3" event="e1" target="t9" rtype="theme"/>
<role rid="r3" event="e2" target="t21" rtype="agent"/>
<role rid="r4" event="e2" target="t22" rtype="source"/>
<role rid="r5" event="e2" target="t24" rtype="goal"/>
</events>

GWC2010, Mumbai
33
Conclusion: Should all knowledge be stored in the central ontology?
• Vocabularies are too large for full inferencing• Vocabularies are linguistically too diverse to be represented in an
ontology• Inferencing capabilities of formal ontologies is not needed for all
levels of knowledge• A model of division of labor (along the lines of Putnam 1975) in
which knowledge is stored in 3 layers: – SKOS vocabularies and term databases– wordnet (WN-LMF)– ontology (OWL-DL),
• Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning.
• Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language-specific lexicalizations and restrictions.

Conclusions• Ontologies are abstract and minimal and lexicons are large and rich
• Semantic relations in lexicons are complementary to ontological relations
• Semantic relations expressed in a language system should be compatible with ontologies
• Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings
• Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations
• Equivalences across languages partially through ontological expressions and partially across lexicons

Applying WSD to terms
<terms> <term tid=”t1” type=”open” lemma=”bird” pos=”N”> <span id=”w1”/> <externalReferences> <!-- inserted by word-sense-disambiguation -->
<externalRef resource="wn30g" ref="eng-30-01855672-n" conf="0.38"/><externalRef resource="wn30g" ref="eng-30-10157744-n" conf="0.31"/><externalRef resource="wn30g" ref="eng-30-07646821-n" conf="0.30"/>
</externalReferences> </term>
</terms>