[Lecture Notes in Electrical Engineering] Advances in Computer Science and its Applications Volume 279 || A Survey on Ontology Mapping Techniques

Download [Lecture Notes in Electrical Engineering] Advances in Computer Science and its Applications Volume 279 || A Survey on Ontology Mapping Techniques

Post on 21-Dec-2016




1 download

Embed Size (px)


<ul><li><p> H.-Y. Jeong et al. (eds.), Advanced in Computer Science and Its Applications, Lecture Notes in Electrical Engineering 279, </p><p>829</p><p>DOI: 10.1007/978-3-642-41674-3_118, Springer-Verlag Berlin Heidelberg 2014 </p><p>A Survey on Ontology Mapping Techniques </p><p>Yew Kwang Hooi1,*, M. Fadzil Hassan1, and Azmi M. Shariff2 </p><p>1 Computer and Information Sciences Department </p><p>Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Perak, Malaysia 2 Chemical Engineering Department </p><p>Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Perak, Malaysia yewkwanghooi@gmail.com, </p><p>{m_fadzil,azmish}@petronas.com.my </p><p>Abstract. This paper surveys existing ontology mapping techniques towards data interoperability. Existing matcher algorithms and strategies are discussed generally and the research gaps are highlighted. The study concludes that se-mantic mapping has the biggest share of unresolved problems. Bridging the gap in semantic mapping may help improving mapping of ostensibly different domain knowledge and disparate data sources. </p><p>Keywords: Enterprise interoperability, ontology mapping, basic matcher, semantics. </p><p>1 Introduction </p><p>Software systems can achieve more sophisticated abilities by accessing ontologies that provide computer-processable representations of a knowledge domain. The use of Knowledge Work Systems (KWS) to convert facts and knowledge into ontologies is already a growing trend in enterprises. Ontologies provide a layer of abstraction on top data, taxonomy or database schema [1] and offer richer semantics through concept mapping [2]. Various literature [2-10] indicate the potential of ontology mapping for data interoperability between heterogeneous sources. </p><p>2 Mapping Ontologies in Enterprises </p><p>Mapping multiple ontologies is envisaged a future necessity to support complex analysis, decision making and collaborative information systems to achieve a com-mon business objective. Concepts [11] from separate database schemas [5], XML, and other data sources can be represented by ontologies. Future systems using ontolo-gies may share and merge ontologies. However, this interoperability is not easy due to different vocabularies and granularities of ontologies [5, 12]. Aligning ontologies is the focus of a variety of works originating from diverse communities over the years. [2, 13]. </p><p>* Corresponding author. </p></li><li><p>830 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff </p><p>Ontology mapping can resolve complex information exchange through a consen-sual understanding between concepts [14, 15]. This is achieved by identifying the exchange points between different representations [15]. The exchange points may share a common layer [4], a mediated schema that facillitates sharing of heterogene-ous sources [5, 12] and tackles incomplete data issue in Knowledge Systems [16]. This leads to integration of evolving context, information or data [6] for inferencing new knowledge and accurate search [1, 17]. </p><p>3 Literature Review </p><p>3.1 Definitions </p><p>Ontology can be described as a pair of Signature and Axiom sets O=(S,A)where O is the ontology, S is ontological signature, and A is a set of ontological axioms, for restricting the meaning of the terms in the signature [4, 8]. </p><p>Ontology mapping finds correspondences between entities of multiple ontologies [15]. Given ontologies, O1 = (S1,A1) and O2 = (S2,A2), ontological map-ping is a morphism f:S1 S2, such that A2 f(A1) where all correspondences that satisfy O2s axioms also satisfy O1s translated axiom [4]. A correspondence is a function that assigns symbol of one vocabulary to the symbol of another vocabulary [2] [1]. A set of such correspondences between a pair of ontologies is an alignment. Mapping is a directed alignment [3]. </p><p>Partial ontology mapping, is defined as having a sub-ontology O'1 = (S'1, A'1) where S'1 S1,A'1 A1) such that there is a total mapping from O'1 to O2 [18] </p><p>3.2 Mapping Framework </p><p>A high level view of mapping process can be simplistically depicted in Figure 1 [3, 15] : </p><p>Fig. 1. A simple high level view of a mapping process </p><p>The core component of mapping is the matcher. Matcher builds correspondences between ontologies by first selecting a suitable attribute or feature (entity label, struc-tural description of concepts, range for relations, instantiated attributes or extensional descriptions) from the ontologies [3]. The feature selection transform the resource into a light-weight ontology. </p><p>Select feature Search candidates Use matcher(s) Aggregate results Output Input </p><p>Iterate </p><p>Determine mapping </p><p>status </p></li><li><p> A Survey on Ontology Mapping Techniques 831 </p><p>Common mapping approaches [4] [3] link candidate ontologies to a common on-tology using anchors. Anchors are entities which are declared to be equivalent (based on identity and user input). </p><p>Finally, matchers evaluate the similarity criteria of both ontologies. Often, func-tions based on heuristic similarity instead of exact logical similarity are used to avoid a costly exhaustive search. Multiple results from multiple matchers for the same entity pair can be aggregated. Recent advances introduce autonomous combination of mul-tiple matchers. The tie-breaker determining the mapping of each pair may use thre-shold method, relaxation labeling or combining structural and similarity criteria using learning algorithms or/and user input. Iteration stops at predefined loops or when there is no more new mapping proposal [3, 8]. </p><p>3.3 Categories of Matchers </p><p>Basic matcher is a similarity function of a pair of entities, : O x O R where R is [0 1]. In point-to-point approach, matching uses lexical or structural similarity of labels or instances. Various techniques have been developed and can generally be categorized as follows [4]:- </p><p>Terminological Mapping. Mapping uses token analysis to reduce a word to a com-mon format and establish its importance through weighting of relations and compari-son of paths. Various techniques specializing on features of concepts analyzed are: String-based. The analysis quantifies edit distance by counting and normalizing </p><p>the required editing operations (insertion, deletion or substitution of affixes or substrings between two words) to transform the first word to the second. </p><p> Language-based. The analysis tokenizes string using punctuations and cases; then uses lemmatization to find the possible basic forms of the base word. </p><p> Linguistic resource. The analysis refers to an extrinsic source such as WordNet [3] for linguistic knowledge to interpret strings. Sense-based approach determines relationships of a word as hyponym, hypernym, synonym or antonym; whereas gloss-based approach counts the same words in a pair of phrases or sentences. </p><p>Terminological mapping face difficulty in processing word variations in the same ontology or across ontologies [3]. </p><p>Structural Mapping [3]. Structural mapping looks at the relationship (adjacency and path sharing) between concepts within the ontology structure. Two approaches are: Taxonomy mapping: The mapping uses super-concept rule and bounded-paths. </p><p>Super-concept matching assumes similarity of actual concepts if both share the same parent concept. Bounded-paths compares two paths to identify similar con-cepts along the paths. </p><p> Tree-based mapping: Similarity is based on the analysis of the positions within the graphs. Neighbour nodes are assumed to be somehow similar if two nodes from two ontologies are similar. </p></li><li><p>832 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff </p><p>Similar to terminological, structural mapping faces difficulty in processing many kinds of variations that occur in ontologies [3]. </p><p>Semantic Technique. Semantic mapping is the most challenging area and a key re-search area [8, 9]. The key feature of semantic mapping is the use of model theoretic semantics to define well-formed-formula (wff) to express the meaning of anything without ambiguity. Its advantage is the deductive methods for amplifying or cropping the mappings in anchored ontologies to ensure mapping completeness and to elimi-nate bad correspondences [3]. Semantic technique is dependent on anchored ontolo-gies which contains mapping candidates serving as a common ground for comparison [3, 19]. Anchoring works by matching ontologies to the background ontology to ex-tract meanings for concepts using a domain knowledge [3]. The two approaches are using external ontologies and deductive techniques. </p><p>External Ontologies. Semantic technique uses a mediated approach. An intermediate reference ontology can provide general concepts and axioms for clarifying the mean-ing of domain concepts and the relations. The intermediate reference ontology can be an external ontology or using using a hidden intermediate reference ontology that is built on the fly using lexicons, as proposed by Kotis [1]. The user of external ontology is more common. An external ontology is a general, top-level and formal ontology for conceptual modelling. Examples are General Formal Ontology (GFO), WordNet [20], Cyc, Suggested Upper Merged Ontology (SUMO) [21] and Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [22]. It has a higher probability of finding a result by exploiting existing mappings but possibly at a lower accuracy [3]. A formal ontology provides reasoning and deduction methods. </p><p>Deductive Techniques. These techniques merge two ontologies and search correspon-dences through subsumption relation. Subsumption tests mappings and discard map-pings that fares poorly in satisfiability test. Three types of satisfiability techniques are Propositional Satisfiability (SAT), modal satisfiability and Description Logic (DL). [3]. </p><p>SAT technique builds a theory (the domain knowledge or a group of axioms using matchers referring to external sources) as a premise to establish a relationship be-tween concepts, such as described by the following: </p><p>Axioms r(c, c') where c and c' are a pair of concepts and r is a rela-tion {equivalent, subsumption, subclass , not equal} </p><p>Validation is done by an exhaustive check to ensure that there is no possible nega-tion of the formula using instances. </p><p>DL technique is a pure terminological formal knowledge representation technique. It uses subsumption reasoning to establish relations between different ontologies in a pure semantic manner [3, 5]. Two ontologies are merged and the pair of concepts and roles are tested for subsumption. The relationships are expressed in minimal descrip-tion logics syntax. Relation inference is conducted on the description logics to see if the subsumption rules of the components are consistent. </p></li><li><p> A Survey on Ontology Mapping Techniques 833 </p><p>3.4 Aggregating Matchers </p><p>As has been mentioned earlier, ontology mapping often compounds multiple basic matchers. The two generic compositions are [1]:- </p><p> Sequential Composition - Matchers are arranged sequentially. The initial matchers in the order focus on smaller granularity such as linguistic matching. The subsequent matchers use higher granularity matching, such as structure matching. </p><p> Parallel Composition - Both matchers evaluate similar set of ontologies simul-taneously and generate a result respectively, which are compounded into one. </p><p> Mixed Composition - The composition of matchers can be a mixture of sequen-tial and parallel compositions, see Figure 2. AUTOMS for example, use sequen-tial arrangement for multiple passes of mapping and parallel composition to aggregate results of multiple methods. The multiple passes include simple and iterative structural, instance-based or property-based matching; lexical matching; and semantic matching. Aggregation includes result matrices of concepts, proper-ties and relations [23]. [3] </p><p>Note: M is a matrix of concepts, properties and/or relations </p><p>Fig. 2. A global matching system (adapted from [3]) </p><p>4 Research Gap </p><p>According to Euzenat, semantics "provides the rules for interpreting the syntax which do not provide the meaning directly but constrains the possible interpretations of what is declared."[24] . Very few semantic techniques have been developed for ontology mapping despite the potentials of semantic techniques. A tall barrier is the difficulty of combining semantics' deductive technique with ontology's inductive structure. </p><p>Most matcher designs focus on a specific application domain or ontology type (DTD, relation schemas, OWL), hence reducing its reusability potential. Reasons are that ontology itself is designed for a specific application [3] due to intricacy of data in niche domains. Because rewriting a generic matcher for a new domain is inconve-nient, it is therefore desirable to have modularized and general matcher designs. Few mapping techniques designed to handle general ontologies are S-Match, Similarity Flooding, COMA++ and Cupid. </p><p>O </p><p>O' </p><p>M </p><p>M'</p><p>M"</p><p>M"'</p><p>M""A'A" </p><p>Basic matcher 1 </p><p>Basic matcher 2 </p><p>Semantic amplifier</p><p>Aggregation</p><p>Select correspon-dences </p><p>Extract mapping</p></li><li><p>834 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff </p><p>Table 1. Summary of Ontology Matching Techniques Techniques Objective Limitations and gaps Ref. String-based Synonyms, Homonyms, Normalization, String equality, Substring test, Edit distance, Token-based distance, Path comparison. </p><p>Simple matching effective when very similar strings are used on the schema name to denote the same concept. </p><p>Unable to distinct synonyms or homonyms effectively. </p><p>P.84 [3] [23] </p><p>Language-based1. Intrinsic methods / Linguistic normalization Reduce words to standar-dized form using tokeniza-tion, lemmatization, term extraction or stop-word elimination </p><p>2. Extrinsic methods Uses external resources. </p><p>To improve interpretation and apprehension of terms used using natural lan-guage processing (NLP). </p><p>Very dependent on linguistic resources such as Stemmers, Part-of-speech taggers, Lex-icons, and Thesauri Effectiveness hurdled by pres-ence of a foreign languages and syntactic variations of the same word (spellings, abbreviations, prefixes, suffixes). It does not take into account the structure of ontology entities to find the most coherent match. </p><p>P.92 [3] [20] </p><p>Structure-based/Constraint-based (Internal structure) Keys are the most useful identifier. Works by comparing structure and the properties of entities. </p><p>To match schema to deter-mine if the classes are equivalent. It is often used to quickly find possible matches with shallow accuracy. </p><p>Lacking accuracy. Ineffective when two equivalent entities has different data types for its properties. It may be possible that different entities have similar properties. </p><p>p.92 [3] </p><p>Structure-based(Relational structure) Wu-Palmer similarity Upward cotopic similarity Compare structure of entities using relations. Similarity is based on similar counting of edges in the graphs. </p><p>To match concepts in tax-onomy, formal ontologies and semantic networks. </p><p>Difficulty in detecting (using iterative algorithm) and inability to handle mutual influence between related parts. Using edge count is inconc...</p></li></ul>