kunal narsinghani ashwini lahane ontology mapping and link discovery

47
Kunal Narsinghani Ashwini Lahane Ontology Mapping and link discovery

Upload: hailey-mimms

Post on 15-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Kunal NarsinghaniAshwini Lahane

Ontology Mapping and link discovery

Agenda

IntroductionLevels of heterogeneityPrevious work in the fieldPROMPT Suite of ToolsPrompt on ProtégéThe Web of DataCRS : Managing Co-referencesSilk – A link discovery framework

IntroductionCan a single ontology suffice for various applications?

Definition – The task of relating the vocabulary of two Ontologies that share the same domain of discourse

It’s a morphism that consists of a collection of functions assigning symbols used in one vocabulary to the symbols in the other[1]

This would provide a common layer from which ontologies can be accessed and exchange information.

Translation is different from mapping

IntroductionAn analogy to the problem – Clocks

Levels of Heterogeneity in Ontologies Syntactic

Structural

Semantic

Mapping discoveryFirst approach is to use a reference ontology

Example – the upper Ontologies SUMO and DOLCE

What when a shared ontology is not available?

Structural & definitional information can be used to discover mappings

Example tools – IF-Map, QOM, MAFRA & Prompt

IF-MAP architecture

Fig: The steps in IF-MAP

PROMPT Suite of ToolsInteractive tools for ontology merging and

mappingOntology

formal specification of domain information facilitate knowledge sharing and reuse

Different ontologies –may overlap, need to be reconciled

Determine correlation Find all conceptsDetermine similaritiesChange source ontologies or remove overlapRecord mapping for future reference

Ontology ManagementTasks

Finding correlationsMerging ontologiesVersion managementFactoring ontologies

ToolsBenefit from being tightly integrated into

single frameworkUniform user interfaceSame interaction paradigms Easy access from one tool to another

PROMPT Knowledge ModelBased on knowledge model of ProtégéFrame based Types of frames

ClassSet of entities specifying a concept

Slots Attributes of class Has domain and range Must have unique names

Instances Elements of class

PROMPT FrameworkTools for multiple-ontology managementExtension to Protege ontology-editing environmentOpen architecture allows easy extension with

pluginsTools in PROMPT

IPROMPT – Interactive ontology merging toolANCHORPROMPT – a graph-based tool for finding

similarities between ontologiesPROMPTDIFF –for finding a diff between two versions

of the same ontologyPROMPTFACTOR – a tool for extracting a part of an

ontology

PROMPT Framework

IPROMPT

Interactive ontology merging toolLeads user through merging processSuggestions for mergingIdentifies inconsistencies and potential

problemsSuggests strategies for resolving

Uses structure of concepts and their relation along with user input

Decision based on local contextIterative

IPROMPT Algorithm

IPROMPT AlgorithmCreates initial suggestion based on lexical

similarity of namesMerged ontology contains frames which are

similar to frames in input ontologies2 ontologies O1 and O2 are merged to form Om

Merging decisions are designer and task dependent

Set of knowledge based operations definedFor each operation:

Changes performed automaticallyNew merging suggestionsInconsistencies and potential problems

Class hierarchies

Suggestion for merging

IPROMPT Operations

Merge classes Merge slotsMerge instancesShallow copy of a class

Copy class from source ontology to mergedDeep copy of a class

Also copies all the parents of the class up to the root hierarchy

Inconsistencies & Potential Problems

Name conflicts

Dangling references

Redundancy in the class hierarchy

Slot values violating slot-value restrictions

Additional features

Setting up preferred ontology

Maintaining user focus

Providing feedback to user

Logging of ontology merging and editing operations

ANCHORPROMPT

Graph based tool for finding similarities Compares larger portionsGoal : Augment IPROMPT by determining

additional points of similarityInput : Anchors - Set of pairs of related

termsAnchor identification – Manual /AutomaticEach ontology is viewed as a directed

labeled graph

ANCHORPROMPT representation

ANCHORPROMPT algorithm

AlgorithmBegins with anchor pair

TRIAL, TrailPERSON, Person

Path 1: TRIAL -> PROTOCOL -> STUDY-SITE -> PERSON

Path 2: Trial -> Design -> Blinding -> PersonDetermine similarity score for pair of related

termsIf two pairs of terms from the source ontologies

are similar and there are paths connecting the terms, then the elements in those paths are often similar as well

PROMPTDIFFTool for comparing ontology versionsVersion comparison in software code is

based on comparing text filesOntologies have different text representationHeuristics algorithm that produces a

structural diff between two versionsCompares the structure of the two ontology

versionsIdentifies frames changed and what changes

were made

PromptDiff AlgorithmAn extensible set of heuristic matchersFixed-point algorithm to combine the results of the

matchers to produce a structural diff between two versions

PROMPTFACTOR

Tool for factoring out semantically independent part of an large ontology into a new sub-ontology

Ensures that severed links do not introduce ill-defined concepts in the sub-ontology

User can specify concepts of interestPerforms the transitive closure of the

superclass relation and all the relations defined by slots

Target ontology works as stand-alone

PromptFactor Algorithm

User specifies the concept of interestPromptFactor traverses the ontology termDetermines transitive closure of all

relations including subclass-of relationDetermines all the parents of selected term

in hierarchyUser interactiveDetermines inconsistencies

Prompt Demo It is available as a plug-in for Protégé 3.4

Uses linguistic similarity matches between concepts

Also matches slot names and slot value types

In cases where automation is not possible, user intervention is needed; possible actions are suggested

Alignment is followed by merging

Alignment is establishing links between the ontologies

Merging is the creation of a single coherent ontology

Prompt Demo

The Web of DataData sources span a large range of domains

RDF data model is used to publish structured data on the web

Explicit RDF links exist between entities in different data sources

However, there is a lack of tools to set RDF links to other data sources

SilkIt is a link specification language

Allows specification of the links that should be discovered between data sources, as well as conditions to be fulfilled to be linked

Link conditions are specified using similarity metrics; they can use aggregation functions to combine similarity scores

Data access performed using SPARQL

Silk FeaturesSupport for owl:sameAs links and other

types of RDF links

Provides a declarative language to specify link conditions

Datasets need not be replicated locally

Caching, indexing and entity pre-selection are used to enhance performance

Silk LSL example

Silk LSL example..contd

Silk similarity metrics

Similarity metrics can be combined using aggregation functions

Sets of resources can be selected using Silk RDF path selector language

Silk Pre-MatchingComparison of all entities in Source ‘S’ and

Target ‘T’ would need O(|S|*|T|)

Using pre-matching a limited set of target entities that are likely to match a given source entity is found

Performed by indexing the target resources based on their property values

Using this scheme reduces runtime to O(|S| + |T|)

Silk Implementation

Managing coreferences

Semantic web vision - Large quantities of information Readily available InterlinkedMachine readable

Fragmented webSignificant overlapNeed to identify ‘duplicates’Co-reference resolution – determining

“equivalent” URIs

Co-reference Resolution Service (CRS)

Systematic analysis and heuristic based approach :IdentifyingPublishingManaging Using co-reference information

Most prevalent way – owl:sameAsEquivalence – context dependent

CRSes

Maintain sets of equivalent URIsStoring co-reference data separatelyURI definition and synonyms are kept

separateManagement techniques - history, rollback,

annotationUse of multiple CRSes that applications can

useCore functionality in PHP – easy integrationBacked by MySQL

Data representation in CRS

Equivalent URIs are stored in bundles1 URI in each bundle is considered as a

canon- preferred URIFormation of bundles:

Check if URI already exists in any bundleIf not, create a ‘singleton’ bundle for new URIsPerform merge – union of bundles with

“equivalent” URIs Constituent bundles that were merged are

marked inactive

Examples of bundle formation

Data representation

Data storage – Indexed tables of hashed URIs

Permits fast lookup to find:Canon of given URIAll URIs in a bundle

Deprecate URIs by flagsFinding all equivalences -

coref:coreferenceData links to the bundle for that URI and recursively repeat the process for each URI in that bundle

<rdf:RDF xmlns:coref="http://www.rkbexplorer.com/ontologies/coref#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <coref:Bundle> <coref:canon rdf:resource="http://southampton.rkbexplorer.com/id/person-

00021"/> <coref:duplicate rdf:resource="http://acm.rkbexplorer.com/id/person-

102898" /> <coref:duplicate rdf:resource="http://citeseer.rkbexplorer.com/id/resource-

CSP109002" /> <coref:duplicate rdf:resource="http://dblp.rkbexplorer.com/id/people-

27aedbcb" /> <coref:duplicate rdf:resource="http://eprints.rkbexplorer.com/id/kfupm/person-

27aed0c1" /> <coref:duplicate rdf:resource="http://southampton.rkbexplorer.com/id/person-

00021" /> <coref:duplicate rdf:resource="http://wiki.rkbexplorer.com/id/hugh_glaser" /> <coref:lastUpdated>2009-01-16 11:11:40</coref:lastUpdated> </coref:Bundle> </rdf:RDF>RDF description of equivalent URIs in a bundle

Ways to speed up Look up only 1 URI from each CRSFollow only coref:canon predicate

Lookup would need O(log|S|+ log|T|)

References[1] The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping – Natalya F. Noy and Mark A. Musen;Stanford Medical Informatics, Stanford University

[2] Managing Co-reference on the Semantic Web - Hugh Glaser, Afraz Jaffri, Ian C. Millard School of Electronics and Computer Science University of Southampton Southampton, Hampshire, UK

[3] Ontology Mapping: The State of the Art Yannis Kalfoglou and Marco Schorlemmer

[4] Kalfoglou, Y. and Schorlemmer, M. (2003a). IFMap: an ontology mapping method based on information flow theory. Journal on Data Semantics, 1(1):98–127.

[5] Silk – A Link Discovery Framework for the Web of Data Julius Volz, Christian Bizer et al.