data integration ontology mapping

27
Pradeep Pillai and Michael Kandefer Department of Computer Science and Engineering University at Buffalo Buffalo, NY, 14260 {pbpillai,mwk3}@cse.buffalo.edu Schema Matching and Ontology Mapping: A Comparison

Upload: pradeep-b-pillai

Post on 11-May-2015

9.404 views

Category:

Technology


4 download

DESCRIPTION

semantic web ontology mapping

TRANSCRIPT

Page 1: Data Integration Ontology Mapping

Pradeep Pillai and Michael Kandefer

Department of Computer Science and Engineering

University at Buffalo

Buffalo, NY, 14260

{pbpillai,mwk3}@cse.buffalo.edu

Schema Matching and Ontology Mapping: A Comparison

Page 2: Data Integration Ontology Mapping

Interoperability problem

Problem of combining heterogeneous and distributed data sources

Two solutions: Schema matching Ontology mapping

W3C converging on standards for publishing web ontologies (e.g. OWL)

Distributed ontologies is still an issue

Intuition: Schema matching approaches are applicable to the ontology domain

Introduction

Page 3: Data Integration Ontology Mapping

Schema Matching

Ontology Mapping

Comparison

Ontology mapping using schema matching

Conclusion

AGENDA

Page 4: Data Integration Ontology Mapping

Distinction between matching and mapping isn’t clear

Schema matching: process of “establishing [logical] correspondences between elements of the source and target schemas” [Cho08]

Schema mapping: process of generating the assertions from schema matching

Sometimes called “instance mapping”

Schema Matching Definition

Page 5: Data Integration Ontology Mapping

Two general categories [ShvEuz05,MadBerRah01]Element-based: Mappings created based on analysis of the schema elements

String-based Language-based Constraint-based

Structure-based: Mapping created based on analysis of the elements and schema structure Tree-based Graph-based

Matching approaches aren’t mutually exclusive

Hybrid systems employ multiple methodologies

Other properties

Mappings need not be 1:1 Auxiliary information can be utilized

Schema matching topology

Page 6: Data Integration Ontology Mapping

Utilizes string comparisons between elements to establish mappings

Prefix/Suffix: Look for similar prefixes/suffixes

Edit distance: How many swaps, additions, or subtractions it takes to convert one element into the other

NGram: compute the number of common substrings of length n

Ex. COMA, S-Match

Element-based: String mappings

Page 7: Data Integration Ontology Mapping

PurchaseOrderDeliverTo

InvoiceTo

Items

Address

Address

Item

StreetCity

CityStreet

ItemCount

ItemNumberQuantity

UnitOfMeasure

Element-based: String mappings

POPOShipTo

POBillTo

POLinesItem

StreetCity

CityStreet

Count

LineQtyUoM

- Prefix(3) - 3-Gram(2) - Edit distance(5)

Page 8: Data Integration Ontology Mapping

Utilizes properties of language in order to find elements with a common word sense

Normalization Tokenization: Punctuation used to divide an element into tokens. Expansion: Expand acronym and short-hand tokens. Elimination: Remove undesirable tokens, such as prepositions, before

comparison Lemmatization: Tokens converted to their basic form (e.g. remove pluralization)

and compared

Auxiliary information: Utilize external sources to aid matching

Wordnet, thesauri, or dictionaries

Ex. Cupid, S-Match

Element-based: Language mappings

Page 9: Data Integration Ontology Mapping

PurchaseOrder

DeliverTo

InvoiceTo

Items

Address

Address

Item

StreetCity

CityStreet

ItemCount

ItemNumberQuantity

UnitOfMeasure

Element-based: Language mappings

PO

POShipTo

POBillTo

POLinesItem

StreetCity

CityStreet

Count

LineQtyUoM

POBillTo InvoiceTo

<PO,Bill,To> <Invoice,To>Tokenize:

<PO,Bill> <Invoice>Elimination:

Expansion: <Purchase,Order,Bill> <Invoice>

<Purchase,Order,Bill> <Bill>Related form:

Page 10: Data Integration Ontology Mapping

Represents schemas as graphs/trees

Nodes are elements and attributes

Arcs are relationships

Assumes matched elements between two graphs should have related elements that can be matched

Ex. Similarity flooding, Cupid

Structure-based: Graph/tree mappings

Page 11: Data Integration Ontology Mapping

Ontology definition:

“Specification of a conceptualization.” [Gru92] ”Explicit formal specification of the terms in the domain and relations among

them.”

Ontology Mapping Definition:

“Given two ontologies O1 and O2, mapping one ontology onto another means that for each entity (concept C, relation R,or instance I) in ontology O1, we try to find a corresponding entity, which has the same intended meaning, in ontology O2” [Ehrig and Staab]

Ontology Mapping Problem

Page 12: Data Integration Ontology Mapping

Research Classification of Ontology Mapping [Noy04]

Mapping Discovery aims to find the similarities between two ontologies, and how do we determine which concepts and properties represent similar notions?

Declarative formal representation of mappings identifies the ways we can represent the mappings between two ontologies to enable reasoning with that mapping.

Reasoning with mappings Is concerned with performing reasoning based on the mapping between ontologies. After defining the mapping, what type of and how we can perform reasoning on these mappings?

Ontology Mapping Research

Page 13: Data Integration Ontology Mapping

Snoggle

A user interactive visual ontology mapping tool.

User’s define mappings definitions between the two ontologies which are expressed in SWRL (Semantic Web Rule Language).

Converted into Jena Rules which are applied to the Jena inference engine to produce instances which can be queried.

Survey : State of the Art

Page 14: Data Integration Ontology Mapping

GLUE – [ Doa+3 ] - Machine learning techniques to find mappings.

If the system is provided with two ontologies, for each concept in one ontology it finds the most similar concept in the other ontology.

GLUE architecture consists of

- Distribution Estimator - Similarity Estimator. - Relaxation Labeler

GLUE output's one to one correspondences between the taxonomies the ontologies .

- String similarity, structure and and machine learning strategies.

GLUE

Page 15: Data Integration Ontology Mapping

PROMPT [Noy04]

Input: Two ontology's in OWL/ OKBC

Output: Suggestions of mapping and a merging ontology based on the choice made by the user.

1. iPROMPT : Interactive ontology merging tool.

2. AnchorPROMT : Graph-based mappings to provide additional information for iPROMPT.

3. PROMPTDiff : Compares different ontology versions by combining matchers in a fixed point manner.

4. PROMPTFactor : Tool for extracting a part of an ontology.

PROMPT

Page 16: Data Integration Ontology Mapping

Lucene Ontology Mapper

The source ontology is indexed into Lucene Documents (fields) using the Lucene search engine

Each field in the target ontology is provided as a search argument which is turn compared with the fileds in the source document and the hit scores are computed.

Fields with the maximum hit scores are said to be similar and hence mapped.

PowerMap also uses Lucene as part of its Ontology Mapping Framework

IR Approaches

Page 17: Data Integration Ontology Mapping

QOM

String similarity, structure and instances.

Input : Two OWL or RDFS ontology's with elements (e.g., classes, properties, instances) in the ontology's Output: One-to-one or one-to-none correspondences.

1. Heuristics are used to lower the number of candidate mappings. 2. It avoids the complete pair wise comparison of trees in favor of the top-

down strategy 3. Sigmoid functions are applied which emphasizes high individual

similarities and de-emphasizes low individual similarities4. Threshold is used to discard spurious evidence of similarity.

QOM

Page 18: Data Integration Ontology Mapping

Schemas [Cho08,UscGru03]

Specify database structure

Relationships Attributes

Typically relational or XML

Ontologies [UscGru03, ShvEuz05]

Formal semantic specification of a shared conceptualization

Concepts Relationships

Typically encoded with formal languages

Description logics

Most utilize taxonomic structure

Schemas and Ontologies

Page 19: Data Integration Ontology Mapping

Both are forms of meta-data

Both utilized for domain description

Both utilize constraints (but in different ways)

Similarities

Page 20: Data Integration Ontology Mapping

Few differences

The essential (and trivial) difference is what each specifies and their uses

DB for querying

Ontologies for search and derivation

Lines are blurring (e.g. SPARQL)

Schemas don’t have semantics

Relational schemas lack generality

Ontologies use constraints to establish meaning

Schemas use constraints to establish integrity

Differences

Page 21: Data Integration Ontology Mapping

Element matching approaches [Wac+6]

Top-level ontologies Shared ontology utilized for common language and semantics for subsumed ontologies Ontologies that inherit the top-level ontology can be mapped easier

Semantic Correspondence Utilizes top-level ontologies for automatic ontology mapping Formal concept analysis: Produces a common concept lattice between ontologies through

object-attribute analysis

Structure level [ShvEuz05]

Topology matching Utilizes sub-/super- class semantics Assumes the superclasses and subclasses of matched elements are more likely to be

related

Model matching Utilizes semantic interpretations of ontologies to construct logical representations of

potential mappings Utilizes background “knowledge” to provide axioms for the representation Runs a SAT/Validity checker to determine “correct” mappings

Consequences of Differences

Page 22: Data Integration Ontology Mapping

Due to similarities, and few differences

Applications can be made that translate DB Schemas to Ontologies [XuZhaDon06]

Methodologies developed with both in mind will benefit both

Algorithms for schema matching applicable to ontology mapping

Some approaches that rely on semantics prevent the opposite [Hess06]

Schema vocabularies and forced taxonomic structure could eliminate this

Schema → Ontology

Page 23: Data Integration Ontology Mapping

Implementing an algorithm for OWL ontology mapping based on Cupid

Cupid [MadBerRah01] Hybrid approach Uses linguistic and data-type constraint matching followed by tree structure mapping “Derives” mappings as a result of coefficient computation

Our approach Parse two OWL ontologies Use a simple string matcher for initial similarities Utilize tree structure methodology on known OWL semantics

Schema Matching Algorithm

Page 24: Data Integration Ontology Mapping

Assumptions Leaf nodes are structurally (ssim) similar if they have lexical and data-type similarity

lsim(s,t) [0-1]: Lexical similarity uses substring, normalization, and hypernymy and synonymy matching

data-type-similarity(s,t) [0-.5]: Look up table of data-types and their similarity Non-leaf nodes are ssim if they are lsim and their leaf nodes are

weighted similarly (wsim), immediate children do not influence ssim. wsim(s,t) [0-1]: Measure of the lexical and structural similarity. Preference to one or the other

is controlled by a modifying constant.

Constants wstruct: Modifies the influence of each matcher thaccept: When to accept two leaf nodes as strongly linked thhigh/thlow: When to increase/decrease structural similarity cinc/cdec: How much to increase/decrease structural similarity

Algorithm – TreeMatch(S,T)1. Initialize ssim(s,t) = data-type-similarity(s,t) for every leaf node in S and T2. Using post-order traversal, for every node s in S, and node t in T

i. wsim(s,t) = wstruct * ssim(s,t) + (1 – wstruct) * lsim(s,t)ii. if wsim(s,t) > thhigh increase ssim for all leaf nodes of s and t by cinc

iii. if wsim(s,t) < thlow decrease ssim for all leaf nodes of s and t by cdec

Tree Matcher

Page 25: Data Integration Ontology Mapping

PurchaseOrderDeliverTo

InvoiceTo

Items

Address

Address

Item

StreetCity

CityStreet

ItemCount

ItemNumberQuantity

UnitOfMeasure

Cupid Mappings

POPOShipTo

POBillTo

POLinesItem

StreetCity

CityStreet

Count

LineQtyUoM

- High lsim (A1) - High wsim (A2) - Matches

Page 26: Data Integration Ontology Mapping

Schema Matching and Ontology Matching address similar problems

Schema matching approaches are applicable to ontology mapping– Doesn’t utilize semantic information– The opposite doesn’t hold.

Hybrid approaches are the best methodologies for automatic, generic schema matching and ontology mapping

Systems that employ schema matching might be capable of working with ontologies provided minimal adjustment (e.g. Cupid)

– Additional experimentation is needed

Conclusions

Page 27: Data Integration Ontology Mapping

[Cho08] – J. Chomicki. Data Integration: Schema Mapping. February 2008. http://www.cse.buffalo.edu/~chomicki/636/handout-mapping.pdf

[Doa+3] – A. Doan, J. Madhavan, P. Domingos, and A. Halevy . Learning to Map between Ontologies on the Semantic Web. Proceedings of the 11th international conference on World Wide Web. 2002.

[Gru93] – T.R. Grubber. A Translation Approach to Portable Ontologies. Knowledge Acquisition 5(2). 1992

[Hess06] – A. Hess. An Interative Algorithm for Ontology Mapping Capable of Using Training Data. Proceedings of ESWC '06. 2006.

[MadBerRah01] – J. Madhaven, P. Bernstein, and E. Rahm. Gweneric Schema Matching with Cupid. Proceedings of the 27th VLDB Conference. 2001.

[Noy04] - N. Noy. Semantic Integration: A Survey of Ontology-based Approaches. Sigmond Record, Special Issue on Semantic Integration. 2004

[SchvEuz05] – P. Shvaiko and J. Euzenat. A Survey of Schema-based Matching Approaches. Journal on Data Semantics. 2005.

[UscGru05] – M. Uschold and M. Gruninger. Ontology and Semantics for Seamless Connectivity. Sigmond Record 33(4). 2004.

[Wac+6] - H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hubner. Ontology-based Integration of Information: A Survey of Existing Approaches. IJCAI--01 Workshop: Ontologies and Information Sharing. 2001

[XuZhaDon06] – Z. Xu, S. Zhang, and Y. Dong. Mapping between Relational Database Schema and OWL Ontology for Deep Annotation. Proceedings of the 2006 IEEE/WIC/ACM International

Conference on Web Intelligence. 2006.

References