Data Integration Ontology Mapping

Download Data Integration Ontology Mapping

Post on 11-May-2015

9.391 views

Category:

Technology

4 download

Embed Size (px)

DESCRIPTION

semantic web ontology mapping

TRANSCRIPT

  • 1.Pradeep Pillai and Michael Kandefer Department of Computer Science and Engineering University at BuffaloBuffalo, NY, 14260 {pbpillai,mwk3}@cse.buffalo.edu Schema Matching and Ontology Mapping: A Comparison

2.

  • Interoperability problem
    • Problem of combining heterogeneous and distributed data sources
      • Two solutions:
        • Schema matching
        • Ontology mapping
  • W3C converging on standards for publishing web ontologies (e.g. OWL)
        • Distributed ontologies is still an issue
  • Intuition: Schema matching approaches are applicable to the ontology domain

Introduction 3.

  • Schema Matching
  • Ontology Mapping
  • Comparison
  • Ontology mapping using schema matching
  • Conclusion

AGENDA 4.

  • Distinction between matching and mapping isnt clear
    • Schema matching: process of establishing [logical] correspondences between elements of the source and target schemas [Cho08]
    • Schema mapping:process of generating the assertions from schema matching
        • Sometimes called instance mapping

Schema Matching Definition 5.

  • Two general categories [ShvEuz05,MadBerRah01]
  • Element-based: Mappings created based on analysis of the schema elements
    • String-based
    • Language-based
    • Constraint-based
  • Structure-based: Mapping created based on analysis of the elements and schema structure
    • Tree-based
    • Graph-based
  • Matching approaches arent mutually exclusive
    • Hybrid systems employ multiple methodologies
  • Other properties
    • Mappings need not be 1:1
    • Auxiliary information can be utilized

Schema matching topology 6.

  • Utilizes string comparisons between elements to establish mappings
    • Prefix/Suffix: Look for similar prefixes/suffixes
    • Edit distance: How many swaps, additions, or subtractions it takes to convert one element into the other
    • NGram: compute the number of common substrings of lengthn
    • Ex. COMA, S-Match

Element-based: String mappings 7. Element-based: String mappings - Prefix(3) - 3-Gram(2) - Edit distance(5) PurchaseOrder DeliverTo InvoiceTo Items Address Address Item Street City City Street ItemCount ItemNumber Quantity UnitOfMeasure PO POShipTo POBillTo POLines Item Street City City Street Count Line Qty UoM 8.

  • Utilizes properties of language in order to find elements with a common word sense
    • Normalization
      • Tokenization: Punctuation used to divide an element into tokens.
      • Expansion: Expand acronym and short-hand tokens.
      • Elimination: Remove undesirable tokens, such as prepositions, before comparison
      • Lemmatization: Tokens converted to their basic form (e.g. remove pluralization) and compared
  • Auxiliary information: Utilize external sources to aid matching
    • Wordnet, thesauri, or dictionaries
  • Ex. Cupid, S-Match

Element-based: Language mappings 9. Element-based: Language mappings POBillTo InvoiceTo Tokenize: Elimination: Expansion: Related form: PurchaseOrder DeliverTo InvoiceTo Items Address Address Item Street City City Street ItemCount ItemNumber Quantity UnitOfMeasure PO POShipTo POBillTo POLines Item Street City City Street Count Line Qty UoM 10.

  • Represents schemas as graphs/trees
    • Nodes are elements and attributes
    • Arcs are relationships
  • Assumes matched elements between two graphs should have related elements that can be matched
  • Ex. Similarity flooding, Cupid

Structure-based: Graph/tree mappings 11.

  • Ontology definition:
  • Specification of a conceptualization .[ Gru92 ]
  • Explicit formalspecification of the terms in the domain and relations among them .
  • Ontology Mapping Definition:
  • Given two ontologiesO1andO2 , mapping one ontology onto another means that for each entity (conceptC , relationR ,or instanceI ) in ontologyO1 , we try to find a corresponding entity, which has the same intended meaning, in ontologyO2 [Ehrig and Staab]

Ontology Mapping Problem 12.

  • Research Classification of Ontology Mapping [Noy04]
  • Mapping Discoveryaims to find the similarities between two ontologies, and how do we determine which concepts and properties represent similar notions?
  • Declarative formal representation of mappingsidentifies the ways we can represent the mappings between two ontologies to enable reasoning with that mapping.
  • Reasoning with mappings Is concerned with performing reasoning based on the mapping between ontologies. After defining the mapping, what type of and how we can perform reasoning on these mappings?

Ontology Mapping Research 13.

  • Snoggle
  • A user interactive visual ontology mapping tool.
  • Users define mappings definitionsbetween the two ontologies which are expressed in SWRL (Semantic Web Rule Language).
  • Converted into Jena Ruleswhichare applied to the Jena inference engine to produce instances which can be queried.

Survey : State of the Art 14.

  • GLUE [ Doa+3 ] - Machine learning techniques to find mappings.
  • If the system is provided with two ontologies, for each concept in one ontology it finds the most similar concept in the other ontology.
  • GLUE architectureconsists of
  • -Distribution Estimator
  • - Similarity Estimator.
  • - Relaxation Labeler
  • GLUE output's one to one correspondences
  • between the taxonomies the ontologies .
  • - String similarity, structure and
  • and machine learning strategies.

GLUE 15.

  • PROMPT[Noy04]
  • Input: Two ontology'sin OWL/ OKBC
  • Output: Suggestions of mapping and a merging ontologybased on the choice made by the user.
  • iPROMPT : Interactive ontology merging tool.
  • AnchorPROMT : Graph-based mappings to provide additional information for iPROMPT.
  • PROMPTDiff : Compares different ontology versions by combining matchers in a fixed point manner.
  • PROMPTFactor : Tool for extracting a part of an ontology.

PROMPT 16.

  • Lucene Ontology Mapper
  • The source ontology is indexed into Lucene Documents (fields) using the
  • Lucene search engine
  • Each field in the target ontology is provided as a search argument which is turn compared with the fileds in the source document and the hit scores are computed.
  • Fields with the maximum hit scores are said to be similar and hence mapped.
  • PowerMap also uses Lucene as part of its Ontology Mapping Framework

IR Approaches 17.

  • QOM
  • String similarity, structure and instances.
  • Input : Two OWL or RDFS ontology's with elements (e.g., classes, properties, instances) in the ontology's
  • Output: One-to-one or one-to-none correspondences.
      • Heuristics are used to lower the number of candidate mappings.
      • It avoids the complete pair wise comparison of trees in favor of the top-down strategy
      • Sigmoid functions are applied which emphasizes high individual similarities and de-emphasizes low individual similarities
      • Threshold is usedto discard spurious evidence of similarity.

QOM 18.

  • Schemas [Cho08,UscGru03]
    • Specify database structure
      • Relationships
      • Attributes
    • Typically relational or XML
  • Ontologies [UscGru03, ShvEuz05]
    • Formal semantic specification of a shared conceptualization
      • Concepts
      • Relationships
    • Typically encoded with formal languages
      • Description logics
    • Most utilize taxonomic structure

Schemas and Ontologies 19.

  • Both are forms of meta-data
  • Both utilized for domain description
  • Both utilize constraints (but in different ways)

Similarities 20.

  • Few differences
  • Theessential(and trivial) difference is what each specifies and their uses
    • DB for querying
    • Ontologies for search and derivation
    • Lines are blurring (e.g. SPARQL)
  • Schemas dont have semantics
    • Relational schemas lack generality
  • Ontologies use constraints to establish meaning
  • Schemas use constraints to establish integrity

Differences 21.

  • Element matching approaches [Wac+6]
    • Top-level ontologies
      • Shared ontology utilized for common language and semantics for subsumed ontologies
      • Ontologies that inherit the top-level ontology can be mapped easier
    • Semantic Correspondence
      • Utilizes top-level ontologies for automatic ontology mapping
      • Formal concept analysis: Produces a common concept lattice between ontologies through object-attribute analysis
  • Structure level [ShvEuz05]
    • Topology matching
      • Utilizes sub-/super- class semantics
      • Assumes the superclasses and subclasses of matched elements are more likely to be related
    • Model matching
      • Utilizes semantic interpretations of ontologies to construct logical representations of potential mappings
      • Utilizes background knowledge to provide axioms for the representation
      • Runs a SAT/Validity checker to determine correct mappings

Consequences of Differences 22.

  • Due to similarities, and few differences
    • Applications can be made that translate DB Schemas to Ontologies [XuZhaDon06]
    • Methodologies developed with both in mind will benefit both
    • Algorithms for schema matching applicable to ontology mapping
      • Some approaches that rely on semantics prevent the opposite [Hess06]
      • Schema vocabularies and forced taxonomic structure could eliminate this

Schema -> Ontology 23.

  • Implementing an algorithm for OWL ontology mapping based on Cupid
  • Cupid [MadBerRah01]
    • Hybrid approach
    • Uses linguistic and data-type constraint matching followed by tree structure mapping
    • Derives mappings as a result of coefficient computation
  • Our approach
    • Parse two OWL ontologies
    • Use a simple string matcher for initial similarities
    • Utilize tree structure methodology on known OWL semantics

Schema Matching Algorithm 24.

  • Assumptions
    • Leaf nodes are structurally ( ssim)similar if they have lexical and data-type similarity
      • lsim(s,t) [0-1] : Lexical similarity uses substring, normalization, and hypernymy and synonymy matching
      • data-type-similarity(s,t) [0-.5] : Look up table of data-types and their similarity
    • Non-leaf nodes aressimif they arelsimand their leaf nodes are
    • weighted similarly ( wsim),immediate children do not influencessim .
      • wsim(s,t) [0-1] : Measure of the lexical and structural similarity. Preference to one or the other is controlled by a modifying constant.
  • Constants
    • w struct :Modifies the influence of each matcher
    • th accept : When to accept two leaf nodes asstrongly linked
    • th high /th low : When to increase/decrease structural similarity
    • c inc /c dec : How much to increase/decrease structural similarity
  • Algorithm TreeMatch( S , T )
    • Initializessim(s,t)=data-type-similarity(s,t)for every leaf node inSandT
    • Using post-order traversal, for every nodesinS,and nodetinT
      • wsim(s,t)=w struct*ssim(s,t) + (1 w struct )*lsim(s,t)
      • ifwsim(s,t) > th high increasessimfor all leaf nodes ofsandtbyc inc
      • ifwsim(s,t) < thlowdecreasessimfor all leaf nodes ofsandtbyc dec

Tree Matcher 25. Cupid Mappings - Highlsim (A1) - Highwsim (A2) - Matches PurchaseOrder DeliverTo InvoiceTo Items Address Address Item Street City City Street ItemCount ItemNumber Quantity UnitOfMeasure PO POShipTo POBillTo POLines Item Street City City Street Count Line Qty UoM 26.

  • Schema Matching and Ontology Matching address similar problems
  • Schema matching approaches are applicable to ontology mapping
    • Doesnt utilize semantic information
    • The opposite doesnt hold.
  • Hybrid approaches are the best methodologies for automatic, generic schema matching and ontology mapping
  • Systems that employ schema matching might be capable of working with ontologies provided minimal adjustment (e.g. Cupid)
    • Additional experimentation is needed

Conclusions 27.

  • [Cho08] J. Chomicki. Data Integration: Schema Mapping. February 2008.
  • http://www.cse.buffalo.edu/~chomicki/636/handout-mapping.pdf
  • [Doa+3] A. Doan, J. Madhavan, P. Domingos, and A. Halevy . Learning to Map between Ontologies on the
  • Semantic Web.Proceedings of the 11th international conference on World Wide Web . 2002.
  • [Gru93] T.R. Grubber. A Translation Approach to Portable Ontologies .
  • Knowledge Acquisition 5(2) . 1992
  • [Hess06] A. Hess. An Interative Algorithm for Ontology Mapping Capable of Using Training Data.
  • Proceedings of ESWC '06 . 2006.
  • [MadBerRah01] J. Madhaven, P. Bernstein, and E. Rahm. Gweneric Schema Matching with Cupid.
  • Proceedings of the 27 thVLDB Conference.2001.
  • [Noy04]- N. Noy. Semantic Integration: A Survey of Ontology-based Approaches.
  • Sigmond Record, Special Issue on Semantic Integration.2004
  • [SchvEuz05] P. Shvaiko and J. Euzenat.A Survey of Schema-based Matching Approaches.
  • Journal on Data Semantics.2005.
  • [UscGru05] M. Uschold and M. Gruninger. Ontology and Semantics for Seamless Connectivity.
  • Sigmond Record 33(4). 2004.
  • [Wac+6] - H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hubner.
  • Ontology-based Integration of Information: A Survey of Existing Approaches.IJCAI--01 Workshop:
  • Ontologies and Information Sharing.2001
  • [XuZhaDon06] Z. Xu, S. Zhang, and Y. Dong. Mapping between Relational Database Schema and OWL
  • Ontology for Deep Annotation.Proceedings of the 2006 IEEE/WIC/ACM International Conference
  • on Web Intelligence.2006.

References