yielding ontologies for - cordis · – high-level and mid-level concepts needed to accommodate the...
TRANSCRIPT
Yielding Ontologiesfor
Transition-Based Organization
ICT-211423February, 2008
Intelligent Content and Semantics
General presentation, February 2008ICT-211423
KYOTO (ICT-211423) Overview• Title: Yielding Ontologies for Transition-Based Organization• Funded:
– 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics
– Taiwan and Japan funded by national grants• Goal:
– Platform for knowledge sharing across languages and cultures– Knowledge transition and information across different target groups,
transgressing linguistic, cultural and geographic boundaries.– Open text mining and deep semantic search– Wiki environment that allows people in the field to maintain their knowledge
and agree on meaning without knowledge engineering skills• Duration:
– March 2008 – March 2011• Effort:
– 364 person months of work.
General presentation, February 2008ICT-211423
KYOTO (ICT-211423) Overview• Languages:
– English, Dutch, Italian, Spanish, Basque, Chinese, Japanese• Domain:
– Environmental domain, BUT usable in any domain • Global:
– Both European and non-European languages• Available:
– Free: as open source system and data (GPL)• Future perspective:
– Content standardization that supports world wide communication– Global Wordnet Grid
General presentation, February 2008ICT-211423
Consortium1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology
(Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The
Netherlands), • Subcontractors:
– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)
General presentation, February 2008ICT-211423
ConceptMining
Images
Index
Docs
URLs
Experts
Search
Dialogue
CO2 emission
water pollution
Capture
FactMining
CitizensGovernorsCompanies
Domain
DomainWiki
WordnetsΘ
Abstract PhysicalTop
Middlewater CO2
Substance
Universal Ontology
Process
Environmental organizations
Environmental organizations
Global Wordnet Grid
Kybots
General presentation, February 2008ICT-211423
Generic Knowledge & Language Layer
TopOntology
MiddleOntology
DomainOntology
Milo
Sumo
Dolce Wikipedia
Wikipedia
Wikipedia
WordnetsCentral Ontology
Language IndependentOntology Sources
Gemet
GEO DB
Wikipedia
Language dependentOntology Sources
Meaning
Others
merge
ontologize
map
&
parse
-type hierarchy-axioms
General presentation, February 2008ICT-211423
Ontologize synsets• (Semi-)rigid type hierarchy in the ontology:
– Canine => PoodleDog; NewfoundlandDog; DalmatianDog, etc.• Wordnet consists of names for (semi-)rigid dog-types
and other words for dogs with roles:– NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP
⇔ ((instance x PoodleDog) – LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP
⇒((instance x Canine) and (role x GuardingProcess)) • Type hierarchy remains compact and pure
General presentation, February 2008ICT-211423
Ontologize– "theewater" (water for making tea), Dutch
• (exists (?A ?W)– (and
» (instance ?W Water)» (hasPurposeForAgent ?W» (exists (?T)» (and» (instance ?T Tea)» (part ?W ?T))))))
General presentation, February 2008ICT-211423
Ontologize• Ontologize concepts from a specific wordnet:
– Only disjunct types need to be added (Fellbaum and Vossen 2007).
– For example, CO2 is type of substance, but green-house gas does not represent a different type of gas or substance but refers to substances that play a specific role in specific circumstances.
• All languages can contribute• Knowledge is shared among all participating
languages through the mapping of the different wordnets to the ontology.
General presentation, February 2008ICT-211423
Knowledge mining
• Concept mining:– Extract terms and relations in a language– Map the terms to an existing wordnet– Ontologize terms to concepts and axioms
• Fact mining– Define logical patterns– Define expression rules in a language
General presentation, February 2008ICT-211423
Concept mining
SourceDocuments
LinguisticProcessors
[[the emission]NP[of greenhouse gases]PP[in agricultural areas]PP] NP
Morpho-syntactic analysis
English Wordnet
emission:2gas:1area:1
greenhouse gas:1
rural area:1
geographical area:1
regio:3
location:3 substance:1
emission:3
farmland:2
natural process:1
in
ofTerm hierarchy
emissiongas
greenhouse gas
area
agricultural area
ConceptMiners
General presentation, February 2008ICT-211423
Concept integration
Θ
Abstract Physical
H20 CO2
Substance
CO2Emission WaterPollution
Ontology
Process
Chemical Reaction
English WordnetExtended for domain
emission:2gas:1
greenhouse gas:1
substance:1
emission:3
natural process:1
GlobalWarming
Ontologize
Axiomatize
CO2
(instance s1 Substance)(instance e1 Warming)(katalyist s1 e1)
GreenhouseGas
General presentation, February 2008ICT-211423
Fact mining• KYBOT = Knowledge Yielding Robot• Logical expression
– (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) – (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)
• Expression rules per language: – [N[s1]V[e1]]S– [N[e1]N[s1]N– [[N[e1]][prep][N[s2]]NP
• Ontology * Wordnets– Capabilities– Conditions: WNT -> adjectives, WNT -> nouns– Causes: WNT -> verbs, WNT -> nouns– Process: DamageProcess, ProduceProcess
• Kybot compiler– kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]
General presentation, February 2008ICT-211423
Fact mining
SourceDocuments
LinguisticProcessors
[[the emission]NP[of greenhouse gases]PP[in agricultural areas]PP] NP
Morpho-syntactic analysis
Θ
Abstract Physical
H2O CO2
Substance
CO2 emission
water pollution
Ontology Wordnets &Linguistic Expressions
Generic
Process
Chemical Reaction
Logical Expressions
Domain
[[the emission]NP ] Process: e1[of greenhouse gases]PP Patient: s2[in agricultural areas]PP] Location: a3
Fact analysis
General presentation, February 2008ICT-211423
Wiki for knowledge sharing• Uses XFLOW workflow engine as underlying mechanism• Easy interface tailored to domain experts who don't know the
underlying complex data model (ontology plus multi grid wordnet); • Simplified wiki syntax that is much easier to use for non technical
users than e.g. HTML; • Web based interface; • Rollback mechanism: each change to the content is versioned; • Search functions: synset; • Automatic downloading of information from web resources e.g.
Wikipedia;• Support for collaborative editing and consensus achievement such
as discussion forums, and list of last updates. • Role based user management;
General presentation, February 2008ICT-211423
Wiki for knowledge sharing
• Manage the underlying complex data model in order to keep it consistent:– "water pollution" is inserted into a language specific
wordnet by a domain expert– a new entry will be automatically inserted in the
ontology extension and in every wordnet. – list all dummy entries to be filled in. – English used as the common ground language to
support the extension and propagation of changes between the different wordnets and the ontology.
General presentation, February 2008ICT-211423
Evaluation• Wordnets and ontologies are evaluated across
linguistic partners;• Language and ontology experts will use the Wiki
system to build the basic ontology and wordnetlayers needed for the extension to the domain;
• Domain experts will use the top layer and middle layer of wordnets and ontologies plus the Wikisystem to encode the knowledge in their domains and reach consensus;
• The system is tested by integration in a retrieval system;
General presentation, February 2008ICT-211423
Evaluation
• Cross-lingual portal:– show the effects of deep semantic processing
for user-scenarios– match queries across languages and cultures..
• User queries processed by Kybots and matched with deep semantic patterns:– polluting substance and polluted substance
General presentation, February 2008ICT-211423
Knowledge sharing
• Domains share the generic:– Generic knowledge from the wordnets and the
ontology is re-used and shared in various domains– Generic Kybots (knowledge yielding miners) are re-
used and shared in various domains• Languages share the knowledge:
– Ontologies (both generic and domain-specific) are shared across languages
– Kybots (both generic and domain-specific) are re-used and shared across languages
General presentation, February 2008ICT-211423
ΘAbstract Physical
H20 CO2
Substance
CO2 Emission
Water pollution
Ontology WordnetsLinguisticExpresssions
Generic
Process
Chemical Reaction
Logical Expressions
Kybots
Domain
words words
words words
Kybot sharing
General presentation, February 2008ICT-211423
Sharing Kybots
• General conceptual patterns using a simple logical expression: concentrations of substances, causal relations between processesor conditional states for processes
• Domain text: – people usually do not use special words in a language to refer to the
causal relation itself but they use general words such as “cause” or “factor”.
– Certain valid conditions can be specified in addition to the general ones, as they are relevant for the users.
• CO2 emissions can be derived from a certain process involving certain amounts of the substance CO2 but critical levels can be defined in the text miner as a conceptual constraint.
• Limit the ambiguity of interpretation that arises at the generic levels to only one interpretation at the domain level.
General presentation, February 2008ICT-211423
Major Innovations• Specific knowledge acquired from different textual
sources, domains and languages is grounded to a shared ontology: the specific is anchored in the generic.
• Specific text miners developed for different languages and domains are shared through logical expressions based on the shared ontology.
• Language-based knowledge is anchored to universal knowledge so that all language can contribute and benefit from acquisition.
• Community software allows for maintenance, fine-tuning and customization of the wordnets and ontology and consequently of the information system.
General presentation, February 2008ICT-211423
Results of Kyoto• Open knowledge sharing and anchoring system.• Ontologies:
– high-level and mid-level concepts needed to accommodate the information in the environmental domain.
– Most generic level to maximize the re-usability – Precise enough to yield useful constraints in detecting relations in the domain– Database and XML data free for the whole community.
• Wordnets: – Existing wordnets extended and harmonized with the ontology – Database and XML data freefor the whole community.
• Acquisition tools: – Software in all 7 languages to automatically extract synsets and synset-relations
from text within a domain. • Linguistic processors:
– tokenization, segmentation, tagging, parsing and word-sense disambiguation.– Use existing technology and resources.
General presentation, February 2008ICT-211423
Table 3: Work package list
364TOTAL36126VUADisseminationWP1136198SYNTHEMAExploitationWP1033420ECNCEvaluationWP9301312ECNCDomain extensionWP824125CNR-ILC-IITDatabase systems and wikiWP7244106BBAWKnowledge integrationWP6307120EHUKnowledge miningWP512411IRIONIndexingWP49110IRIONCaptureWP36112SYNTHEMASystem designWP2615VUAUser requirementsWP13619VUAManagementWP0
EndStart PMLead partic.Work package titleWP No
General presentation, February 2008ICT-211423
WP6:KnowledgeIntegration
WP8Domain extension
WP9Evaluatio
WP7Databases & wiki
WP5Knowledge mining
WP4Index
WP3Capture
WP1User requirements
Text & Meta datain XMLFormat
termhierarchy
wordnet
ConceptMiners
termrelations
ontology
Kybots
ManualRevision
WikiDEB
Client
domainwordnet
domainontology
Indexing
sourcedata
Capture
Data & Factsin XML Format
DEBServer
Accessend-users
Index
Userscenarios
Userscenarios
ManualTest
Benchmarkdata
Benchmarking
WP2System Design
General presentation, February 2008ICT-211423
Milestone Overview
month 33VUAWP3, WP4, WP5, WP6, WP7, WP8, WP9
Final evaluationM4
month 21ECNCWP3, WP4, WP5, WP6, WP7, WP8, WP9
Intermediate evaluationM3
month 12BBAWWP3, WP4, WP5, WP6Generic knowledge layerM2
month 6VUAWP1, WP2, WP9System architecture and designM1
Deliverydate
LeadWPs no'sDescriptionMil.
General presentation, February 2008ICT-211423
Complex questions in the cross-lingual environmental portal
measurements to reduce noisegeluidsreducerende maatregelen
air pollution from the Ruhr arealuchtverontreiniging vanuit het ruhrgebied
vegetables from gardengroente uit tuin
cause of air pollutionoorzaak luchtverontreining
what companies are the biggest polluters?welke bedrijven zijn grote luchtvervuilers
environmental complaint waste batteriesmilieu klacht afval batterij
heavy metals in ground waterzware metalen in grondwater
sick because of air pollutionziek door luchtverontreiniging
On how many different ways can you measure air quality
Op hoeveel manieren wordt de luchtkwaliteit gemeten?
air pollution by trafficLuchtvervuiling door verkeer
Akzo Nobel foam in Apeldoorns ChannelAkzo Nobel schuim Apeldoorns Kanaal
Where is air being measuredWaar wordt de lucht gemeten
What companies produce a lot of damaging substances?
Welke bedrijven stoten veel schadelijke stoffen uit
TranslationOriginal question
General presentation, February 2008ICT-211423
Complex questions for Aarhus-registered documents of government permits
measurements to reduce noisegeluidsreducerende maatregelen
air pollution from the Ruhr arealuchtverontreiniging vanuit het ruhrgebied
vegetables from gardengroente uit tuin
cause of air pollutionoorzaak luchtverontreining
what companies are the biggest polluters?welke bedrijven zijn grote luchtvervuilers
environmental complaint waste batteriesmilieu klacht afval batterij
heavy metals in ground waterzware metalen in grondwater
sick because of air pollutionziek door luchtverontreiniging
air pollution by trafficLuchtvervuiling door verkeer
Fine dust emissions Electrabelfijn stof emissies Electrabel
Akzo Nobel foam in Apeldoorns ChannelAkzo Nobel schuim Apeldoorns Kanaal
Where is air being measuredWaar wordt de lucht gemeten
TranslationOriginal question