yielding ontologies for - cordis · – high-level and mid-level concepts needed to accommodate the...

29
Yielding Ontologies for Transition-Based Organization ICT-211423 February, 2008 Intelligent Content and Semantics

Upload: others

Post on 05-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

Yielding Ontologiesfor

Transition-Based Organization

ICT-211423February, 2008

Intelligent Content and Semantics

Page 2: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

KYOTO (ICT-211423) Overview• Title: Yielding Ontologies for Transition-Based Organization• Funded:

– 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics

– Taiwan and Japan funded by national grants• Goal:

– Platform for knowledge sharing across languages and cultures– Knowledge transition and information across different target groups,

transgressing linguistic, cultural and geographic boundaries.– Open text mining and deep semantic search– Wiki environment that allows people in the field to maintain their knowledge

and agree on meaning without knowledge engineering skills• Duration:

– March 2008 – March 2011• Effort:

– 364 person months of work.

Page 3: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

KYOTO (ICT-211423) Overview• Languages:

– English, Dutch, Italian, Spanish, Basque, Chinese, Japanese• Domain:

– Environmental domain, BUT usable in any domain • Global:

– Both European and non-European languages• Available:

– Free: as open source system and data (GPL)• Future perspective:

– Content standardization that supports world wide communication– Global Wordnet Grid

Page 4: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Consortium1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,

Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology

(Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The

Netherlands), • Subcontractors:

– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)

Page 5: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

ConceptMining

Images

Index

Docs

URLs

Experts

Search

Dialogue

CO2 emission

water pollution

Capture

FactMining

CitizensGovernorsCompanies

Domain

DomainWiki

WordnetsΘ

Abstract PhysicalTop

Middlewater CO2

Substance

Universal Ontology

Process

Environmental organizations

Environmental organizations

Global Wordnet Grid

Kybots

Page 6: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Generic Knowledge & Language Layer

TopOntology

MiddleOntology

DomainOntology

Milo

Sumo

Dolce Wikipedia

Wikipedia

Wikipedia

WordnetsCentral Ontology

Language IndependentOntology Sources

Gemet

GEO DB

Wikipedia

Language dependentOntology Sources

Meaning

Others

merge

ontologize

map

&

parse

-type hierarchy-axioms

Page 7: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Ontologize synsets• (Semi-)rigid type hierarchy in the ontology:

– Canine => PoodleDog; NewfoundlandDog; DalmatianDog, etc.• Wordnet consists of names for (semi-)rigid dog-types

and other words for dogs with roles:– NAMES for TYPES:

{poodle}EN, {poedel}NL, {pudoru}JP

⇔ ((instance x PoodleDog) ‏– LABELS for ROLES:

{watchdog}EN, {waakhond}NL, {banken}JP

⇒((instance x Canine) and (role x GuardingProcess)) ‏• Type hierarchy remains compact and pure

Page 8: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Ontologize– "theewater" (water for making tea), Dutch

• (exists (?A ?W)– (and

» (instance ?W Water)» (hasPurposeForAgent ?W» (exists (?T)» (and» (instance ?T Tea)» (part ?W ?T))))))

Page 9: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Ontologize• Ontologize concepts from a specific wordnet:

– Only disjunct types need to be added (Fellbaum and Vossen 2007).

– For example, CO2 is type of substance, but green-house gas does not represent a different type of gas or substance but refers to substances that play a specific role in specific circumstances.

• All languages can contribute• Knowledge is shared among all participating

languages through the mapping of the different wordnets to the ontology.

Page 10: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Knowledge mining

• Concept mining:– Extract terms and relations in a language– Map the terms to an existing wordnet– Ontologize terms to concepts and axioms

• Fact mining– Define logical patterns– Define expression rules in a language

Page 11: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Concept mining

SourceDocuments

LinguisticProcessors

[[the emission]NP[of greenhouse gases]PP[in agricultural areas]PP] NP

Morpho-syntactic analysis

English Wordnet

emission:2gas:1area:1

greenhouse gas:1

rural area:1

geographical area:1

regio:3

location:3 substance:1

emission:3

farmland:2

natural process:1

in

ofTerm hierarchy

emissiongas

greenhouse gas

area

agricultural area

ConceptMiners

Page 12: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Concept integration

Θ

Abstract Physical

H20 CO2

Substance

CO2Emission WaterPollution

Ontology

Process

Chemical Reaction

English WordnetExtended for domain

emission:2gas:1

greenhouse gas:1

substance:1

emission:3

natural process:1

GlobalWarming

Ontologize

Axiomatize

CO2

(instance s1 Substance)(instance e1 Warming)(katalyist s1 e1)

GreenhouseGas

Page 13: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Fact mining• KYBOT = Knowledge Yielding Robot• Logical expression

– (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) – (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)

• Expression rules per language: – [N[s1]V[e1]]S– [N[e1]N[s1]N– [[N[e1]][prep][N[s2]]NP

• Ontology * Wordnets– Capabilities– Conditions: WNT -> adjectives, WNT -> nouns– Causes: WNT -> verbs, WNT -> nouns– Process: DamageProcess, ProduceProcess

• Kybot compiler– kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

Page 14: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Fact mining

SourceDocuments

LinguisticProcessors

[[the emission]NP[of greenhouse gases]PP[in agricultural areas]PP] NP

Morpho-syntactic analysis

Θ

Abstract Physical

H2O CO2

Substance

CO2 emission

water pollution

Ontology Wordnets &Linguistic Expressions

Generic

Process

Chemical Reaction

Logical Expressions

Domain

[[the emission]NP ] Process: e1[of greenhouse gases]PP Patient: s2[in agricultural areas]PP] Location: a3

Fact analysis

Page 15: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Wiki for knowledge sharing• Uses XFLOW workflow engine as underlying mechanism• Easy interface tailored to domain experts who don't know the

underlying complex data model (ontology plus multi grid wordnet); • Simplified wiki syntax that is much easier to use for non technical

users than e.g. HTML; • Web based interface; • Rollback mechanism: each change to the content is versioned; • Search functions: synset; • Automatic downloading of information from web resources e.g.

Wikipedia;• Support for collaborative editing and consensus achievement such

as discussion forums, and list of last updates. • Role based user management;

Page 16: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Wiki for knowledge sharing

• Manage the underlying complex data model in order to keep it consistent:– "water pollution" is inserted into a language specific

wordnet by a domain expert– a new entry will be automatically inserted in the

ontology extension and in every wordnet. – list all dummy entries to be filled in. – English used as the common ground language to

support the extension and propagation of changes between the different wordnets and the ontology.

Page 17: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Evaluation• Wordnets and ontologies are evaluated across

linguistic partners;• Language and ontology experts will use the Wiki

system to build the basic ontology and wordnetlayers needed for the extension to the domain;

• Domain experts will use the top layer and middle layer of wordnets and ontologies plus the Wikisystem to encode the knowledge in their domains and reach consensus;

• The system is tested by integration in a retrieval system;

Page 18: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Evaluation

• Cross-lingual portal:– show the effects of deep semantic processing

for user-scenarios– match queries across languages and cultures..

• User queries processed by Kybots and matched with deep semantic patterns:– polluting substance and polluted substance

Page 19: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Knowledge sharing

• Domains share the generic:– Generic knowledge from the wordnets and the

ontology is re-used and shared in various domains– Generic Kybots (knowledge yielding miners) are re-

used and shared in various domains• Languages share the knowledge:

– Ontologies (both generic and domain-specific) are shared across languages

– Kybots (both generic and domain-specific) are re-used and shared across languages

Page 20: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

ΘAbstract Physical

H20 CO2

Substance

CO2 Emission

Water pollution

Ontology WordnetsLinguisticExpresssions

Generic

Process

Chemical Reaction

Logical Expressions

Kybots

Domain

words words

words words

Kybot sharing

Page 21: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Sharing Kybots

• General conceptual patterns using a simple logical expression: concentrations of substances, causal relations between processesor conditional states for processes

• Domain text: – people usually do not use special words in a language to refer to the

causal relation itself but they use general words such as “cause” or “factor”.

– Certain valid conditions can be specified in addition to the general ones, as they are relevant for the users.

• CO2 emissions can be derived from a certain process involving certain amounts of the substance CO2 but critical levels can be defined in the text miner as a conceptual constraint.

• Limit the ambiguity of interpretation that arises at the generic levels to only one interpretation at the domain level.

Page 22: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Major Innovations• Specific knowledge acquired from different textual

sources, domains and languages is grounded to a shared ontology: the specific is anchored in the generic.

• Specific text miners developed for different languages and domains are shared through logical expressions based on the shared ontology.

• Language-based knowledge is anchored to universal knowledge so that all language can contribute and benefit from acquisition.

• Community software allows for maintenance, fine-tuning and customization of the wordnets and ontology and consequently of the information system.

Page 23: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Results of Kyoto• Open knowledge sharing and anchoring system.• Ontologies:

– high-level and mid-level concepts needed to accommodate the information in the environmental domain.

– Most generic level to maximize the re-usability – Precise enough to yield useful constraints in detecting relations in the domain– Database and XML data free for the whole community.

• Wordnets: – Existing wordnets extended and harmonized with the ontology – Database and XML data freefor the whole community.

• Acquisition tools: – Software in all 7 languages to automatically extract synsets and synset-relations

from text within a domain. • Linguistic processors:

– tokenization, segmentation, tagging, parsing and word-sense disambiguation.– Use existing technology and resources.

Page 24: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Table 3: Work package list

364TOTAL36126VUADisseminationWP1136198SYNTHEMAExploitationWP1033420ECNCEvaluationWP9301312ECNCDomain extensionWP824125CNR-ILC-IITDatabase systems and wikiWP7244106BBAWKnowledge integrationWP6307120EHUKnowledge miningWP512411IRIONIndexingWP49110IRIONCaptureWP36112SYNTHEMASystem designWP2615VUAUser requirementsWP13619VUAManagementWP0

EndStart PMLead partic.Work package titleWP No

Page 25: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

WP6:KnowledgeIntegration

WP8Domain extension

WP9Evaluatio

WP7Databases & wiki

WP5Knowledge mining

WP4Index

WP3Capture

WP1User requirements

Text & Meta datain XMLFormat

termhierarchy

wordnet

ConceptMiners

termrelations

ontology

Kybots

ManualRevision

WikiDEB

Client

domainwordnet

domainontology

Indexing

sourcedata

Capture

Data & Factsin XML Format

DEBServer

Accessend-users

Index

Userscenarios

Userscenarios

ManualTest

Benchmarkdata

Benchmarking

WP2System Design

Page 26: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Milestone Overview

month 33VUAWP3, WP4, WP5, WP6, WP7, WP8, WP9

Final evaluationM4

month 21ECNCWP3, WP4, WP5, WP6, WP7, WP8, WP9

Intermediate evaluationM3

month 12BBAWWP3, WP4, WP5, WP6Generic knowledge layerM2

month 6VUAWP1, WP2, WP9System architecture and designM1

Deliverydate

LeadWPs no'sDescriptionMil.

Page 27: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Complex questions in the cross-lingual environmental portal

measurements to reduce noisegeluidsreducerende maatregelen

air pollution from the Ruhr arealuchtverontreiniging vanuit het ruhrgebied

vegetables from gardengroente uit tuin

cause of air pollutionoorzaak luchtverontreining

what companies are the biggest polluters?welke bedrijven zijn grote luchtvervuilers

environmental complaint waste batteriesmilieu klacht afval batterij

heavy metals in ground waterzware metalen in grondwater

sick because of air pollutionziek door luchtverontreiniging

On how many different ways can you measure air quality

Op hoeveel manieren wordt de luchtkwaliteit gemeten?

air pollution by trafficLuchtvervuiling door verkeer

Akzo Nobel foam in Apeldoorns ChannelAkzo Nobel schuim Apeldoorns Kanaal

Where is air being measuredWaar wordt de lucht gemeten

What companies produce a lot of damaging substances?

Welke bedrijven stoten veel schadelijke stoffen uit

TranslationOriginal question

Page 28: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology

General presentation, February 2008ICT-211423

Complex questions for Aarhus-registered documents of government permits

measurements to reduce noisegeluidsreducerende maatregelen

air pollution from the Ruhr arealuchtverontreiniging vanuit het ruhrgebied

vegetables from gardengroente uit tuin

cause of air pollutionoorzaak luchtverontreining

what companies are the biggest polluters?welke bedrijven zijn grote luchtvervuilers

environmental complaint waste batteriesmilieu klacht afval batterij

heavy metals in ground waterzware metalen in grondwater

sick because of air pollutionziek door luchtverontreiniging

air pollution by trafficLuchtvervuiling door verkeer

Fine dust emissions Electrabelfijn stof emissies Electrabel

Akzo Nobel foam in Apeldoorns ChannelAkzo Nobel schuim Apeldoorns Kanaal

Where is air being measuredWaar wordt de lucht gemeten

TranslationOriginal question

Page 29: Yielding Ontologies for - CORDIS · – high-level and mid-level concepts needed to accommodate the information in the ... term hierarchy wordnet Concept Miners term relations ontology