Download - Data Integration Ontology Mapping
Pradeep Pillai and Michael Kandefer
Department of Computer Science and Engineering
University at Buffalo
Buffalo, NY, 14260
{pbpillai,mwk3}@cse.buffalo.edu
Schema Matching and Ontology Mapping: A Comparison
Interoperability problem
Problem of combining heterogeneous and distributed data sources
Two solutions: Schema matching Ontology mapping
W3C converging on standards for publishing web ontologies (e.g. OWL)
Distributed ontologies is still an issue
Intuition: Schema matching approaches are applicable to the ontology domain
Introduction
Schema Matching
Ontology Mapping
Comparison
Ontology mapping using schema matching
Conclusion
AGENDA
Distinction between matching and mapping isn’t clear
Schema matching: process of “establishing [logical] correspondences between elements of the source and target schemas” [Cho08]
Schema mapping: process of generating the assertions from schema matching
Sometimes called “instance mapping”
Schema Matching Definition
Two general categories [ShvEuz05,MadBerRah01]Element-based: Mappings created based on analysis of the schema elements
String-based Language-based Constraint-based
Structure-based: Mapping created based on analysis of the elements and schema structure Tree-based Graph-based
Matching approaches aren’t mutually exclusive
Hybrid systems employ multiple methodologies
Other properties
Mappings need not be 1:1 Auxiliary information can be utilized
Schema matching topology
Utilizes string comparisons between elements to establish mappings
Prefix/Suffix: Look for similar prefixes/suffixes
Edit distance: How many swaps, additions, or subtractions it takes to convert one element into the other
NGram: compute the number of common substrings of length n
Ex. COMA, S-Match
Element-based: String mappings
PurchaseOrderDeliverTo
InvoiceTo
Items
Address
Address
Item
StreetCity
CityStreet
ItemCount
ItemNumberQuantity
UnitOfMeasure
Element-based: String mappings
POPOShipTo
POBillTo
POLinesItem
StreetCity
CityStreet
Count
LineQtyUoM
- Prefix(3) - 3-Gram(2) - Edit distance(5)
Utilizes properties of language in order to find elements with a common word sense
Normalization Tokenization: Punctuation used to divide an element into tokens. Expansion: Expand acronym and short-hand tokens. Elimination: Remove undesirable tokens, such as prepositions, before
comparison Lemmatization: Tokens converted to their basic form (e.g. remove pluralization)
and compared
Auxiliary information: Utilize external sources to aid matching
Wordnet, thesauri, or dictionaries
Ex. Cupid, S-Match
Element-based: Language mappings
PurchaseOrder
DeliverTo
InvoiceTo
Items
Address
Address
Item
StreetCity
CityStreet
ItemCount
ItemNumberQuantity
UnitOfMeasure
Element-based: Language mappings
PO
POShipTo
POBillTo
POLinesItem
StreetCity
CityStreet
Count
LineQtyUoM
POBillTo InvoiceTo
<PO,Bill,To> <Invoice,To>Tokenize:
<PO,Bill> <Invoice>Elimination:
Expansion: <Purchase,Order,Bill> <Invoice>
<Purchase,Order,Bill> <Bill>Related form:
Represents schemas as graphs/trees
Nodes are elements and attributes
Arcs are relationships
Assumes matched elements between two graphs should have related elements that can be matched
Ex. Similarity flooding, Cupid
Structure-based: Graph/tree mappings
Ontology definition:
“Specification of a conceptualization.” [Gru92] ”Explicit formal specification of the terms in the domain and relations among
them.”
Ontology Mapping Definition:
“Given two ontologies O1 and O2, mapping one ontology onto another means that for each entity (concept C, relation R,or instance I) in ontology O1, we try to find a corresponding entity, which has the same intended meaning, in ontology O2” [Ehrig and Staab]
Ontology Mapping Problem
Research Classification of Ontology Mapping [Noy04]
Mapping Discovery aims to find the similarities between two ontologies, and how do we determine which concepts and properties represent similar notions?
Declarative formal representation of mappings identifies the ways we can represent the mappings between two ontologies to enable reasoning with that mapping.
Reasoning with mappings Is concerned with performing reasoning based on the mapping between ontologies. After defining the mapping, what type of and how we can perform reasoning on these mappings?
Ontology Mapping Research
Snoggle
A user interactive visual ontology mapping tool.
User’s define mappings definitions between the two ontologies which are expressed in SWRL (Semantic Web Rule Language).
Converted into Jena Rules which are applied to the Jena inference engine to produce instances which can be queried.
Survey : State of the Art
GLUE – [ Doa+3 ] - Machine learning techniques to find mappings.
If the system is provided with two ontologies, for each concept in one ontology it finds the most similar concept in the other ontology.
GLUE architecture consists of
- Distribution Estimator - Similarity Estimator. - Relaxation Labeler
GLUE output's one to one correspondences between the taxonomies the ontologies .
- String similarity, structure and and machine learning strategies.
GLUE
PROMPT [Noy04]
Input: Two ontology's in OWL/ OKBC
Output: Suggestions of mapping and a merging ontology based on the choice made by the user.
1. iPROMPT : Interactive ontology merging tool.
2. AnchorPROMT : Graph-based mappings to provide additional information for iPROMPT.
3. PROMPTDiff : Compares different ontology versions by combining matchers in a fixed point manner.
4. PROMPTFactor : Tool for extracting a part of an ontology.
PROMPT
Lucene Ontology Mapper
The source ontology is indexed into Lucene Documents (fields) using the Lucene search engine
Each field in the target ontology is provided as a search argument which is turn compared with the fileds in the source document and the hit scores are computed.
Fields with the maximum hit scores are said to be similar and hence mapped.
PowerMap also uses Lucene as part of its Ontology Mapping Framework
IR Approaches
QOM
String similarity, structure and instances.
Input : Two OWL or RDFS ontology's with elements (e.g., classes, properties, instances) in the ontology's Output: One-to-one or one-to-none correspondences.
1. Heuristics are used to lower the number of candidate mappings. 2. It avoids the complete pair wise comparison of trees in favor of the top-
down strategy 3. Sigmoid functions are applied which emphasizes high individual
similarities and de-emphasizes low individual similarities4. Threshold is used to discard spurious evidence of similarity.
QOM
Schemas [Cho08,UscGru03]
Specify database structure
Relationships Attributes
Typically relational or XML
Ontologies [UscGru03, ShvEuz05]
Formal semantic specification of a shared conceptualization
Concepts Relationships
Typically encoded with formal languages
Description logics
Most utilize taxonomic structure
Schemas and Ontologies
Both are forms of meta-data
Both utilized for domain description
Both utilize constraints (but in different ways)
Similarities
Few differences
The essential (and trivial) difference is what each specifies and their uses
DB for querying
Ontologies for search and derivation
Lines are blurring (e.g. SPARQL)
Schemas don’t have semantics
Relational schemas lack generality
Ontologies use constraints to establish meaning
Schemas use constraints to establish integrity
Differences
Element matching approaches [Wac+6]
Top-level ontologies Shared ontology utilized for common language and semantics for subsumed ontologies Ontologies that inherit the top-level ontology can be mapped easier
Semantic Correspondence Utilizes top-level ontologies for automatic ontology mapping Formal concept analysis: Produces a common concept lattice between ontologies through
object-attribute analysis
Structure level [ShvEuz05]
Topology matching Utilizes sub-/super- class semantics Assumes the superclasses and subclasses of matched elements are more likely to be
related
Model matching Utilizes semantic interpretations of ontologies to construct logical representations of
potential mappings Utilizes background “knowledge” to provide axioms for the representation Runs a SAT/Validity checker to determine “correct” mappings
Consequences of Differences
Due to similarities, and few differences
Applications can be made that translate DB Schemas to Ontologies [XuZhaDon06]
Methodologies developed with both in mind will benefit both
Algorithms for schema matching applicable to ontology mapping
Some approaches that rely on semantics prevent the opposite [Hess06]
Schema vocabularies and forced taxonomic structure could eliminate this
Schema → Ontology
Implementing an algorithm for OWL ontology mapping based on Cupid
Cupid [MadBerRah01] Hybrid approach Uses linguistic and data-type constraint matching followed by tree structure mapping “Derives” mappings as a result of coefficient computation
Our approach Parse two OWL ontologies Use a simple string matcher for initial similarities Utilize tree structure methodology on known OWL semantics
Schema Matching Algorithm
Assumptions Leaf nodes are structurally (ssim) similar if they have lexical and data-type similarity
lsim(s,t) [0-1]: Lexical similarity uses substring, normalization, and hypernymy and synonymy matching
data-type-similarity(s,t) [0-.5]: Look up table of data-types and their similarity Non-leaf nodes are ssim if they are lsim and their leaf nodes are
weighted similarly (wsim), immediate children do not influence ssim. wsim(s,t) [0-1]: Measure of the lexical and structural similarity. Preference to one or the other
is controlled by a modifying constant.
Constants wstruct: Modifies the influence of each matcher thaccept: When to accept two leaf nodes as strongly linked thhigh/thlow: When to increase/decrease structural similarity cinc/cdec: How much to increase/decrease structural similarity
Algorithm – TreeMatch(S,T)1. Initialize ssim(s,t) = data-type-similarity(s,t) for every leaf node in S and T2. Using post-order traversal, for every node s in S, and node t in T
i. wsim(s,t) = wstruct * ssim(s,t) + (1 – wstruct) * lsim(s,t)ii. if wsim(s,t) > thhigh increase ssim for all leaf nodes of s and t by cinc
iii. if wsim(s,t) < thlow decrease ssim for all leaf nodes of s and t by cdec
Tree Matcher
PurchaseOrderDeliverTo
InvoiceTo
Items
Address
Address
Item
StreetCity
CityStreet
ItemCount
ItemNumberQuantity
UnitOfMeasure
Cupid Mappings
POPOShipTo
POBillTo
POLinesItem
StreetCity
CityStreet
Count
LineQtyUoM
- High lsim (A1) - High wsim (A2) - Matches
Schema Matching and Ontology Matching address similar problems
Schema matching approaches are applicable to ontology mapping– Doesn’t utilize semantic information– The opposite doesn’t hold.
Hybrid approaches are the best methodologies for automatic, generic schema matching and ontology mapping
Systems that employ schema matching might be capable of working with ontologies provided minimal adjustment (e.g. Cupid)
– Additional experimentation is needed
Conclusions
[Cho08] – J. Chomicki. Data Integration: Schema Mapping. February 2008. http://www.cse.buffalo.edu/~chomicki/636/handout-mapping.pdf
[Doa+3] – A. Doan, J. Madhavan, P. Domingos, and A. Halevy . Learning to Map between Ontologies on the Semantic Web. Proceedings of the 11th international conference on World Wide Web. 2002.
[Gru93] – T.R. Grubber. A Translation Approach to Portable Ontologies. Knowledge Acquisition 5(2). 1992
[Hess06] – A. Hess. An Interative Algorithm for Ontology Mapping Capable of Using Training Data. Proceedings of ESWC '06. 2006.
[MadBerRah01] – J. Madhaven, P. Bernstein, and E. Rahm. Gweneric Schema Matching with Cupid. Proceedings of the 27th VLDB Conference. 2001.
[Noy04] - N. Noy. Semantic Integration: A Survey of Ontology-based Approaches. Sigmond Record, Special Issue on Semantic Integration. 2004
[SchvEuz05] – P. Shvaiko and J. Euzenat. A Survey of Schema-based Matching Approaches. Journal on Data Semantics. 2005.
[UscGru05] – M. Uschold and M. Gruninger. Ontology and Semantics for Seamless Connectivity. Sigmond Record 33(4). 2004.
[Wac+6] - H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hubner. Ontology-based Integration of Information: A Survey of Existing Approaches. IJCAI--01 Workshop: Ontologies and Information Sharing. 2001
[XuZhaDon06] – Z. Xu, S. Zhang, and Y. Dong. Mapping between Relational Database Schema and OWL Ontology for Deep Annotation. Proceedings of the 2006 IEEE/WIC/ACM International
Conference on Web Intelligence. 2006.
References