Transcript

H.-Y. Jeong et al. (eds.), Advanced in Computer Science and Its Applications, Lecture Notes in Electrical Engineering 279,

829

DOI: 10.1007/978-3-642-41674-3_118, © Springer-Verlag Berlin Heidelberg 2014

A Survey on Ontology Mapping Techniques

Yew Kwang Hooi1,*, M. Fadzil Hassan1, and Azmi M. Shariff2

1 Computer and Information Sciences Department Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Perak, Malaysia

2 Chemical Engineering Department

Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Perak, Malaysia [email protected],

{m_fadzil,azmish}@petronas.com.my

Abstract. This paper surveys existing ontology mapping techniques towards data interoperability. Existing matcher algorithms and strategies are discussed generally and the research gaps are highlighted. The study concludes that se-mantic mapping has the biggest share of unresolved problems. Bridging the gap in semantic mapping may help improving mapping of ostensibly different domain knowledge and disparate data sources.

Keywords: Enterprise interoperability, ontology mapping, basic matcher, semantics.

1 Introduction

Software systems can achieve more sophisticated abilities by accessing ontologies that provide computer-processable representations of a knowledge domain. The use of Knowledge Work Systems (KWS) to convert facts and knowledge into ontologies is already a growing trend in enterprises. Ontologies provide a layer of abstraction on top data, taxonomy or database schema [1] and offer richer semantics through concept mapping [2]. Various literature [2-10] indicate the potential of ontology mapping for data interoperability between heterogeneous sources.

2 Mapping Ontologies in Enterprises

Mapping multiple ontologies is envisaged a future necessity to support complex analysis, decision making and collaborative information systems to achieve a com-mon business objective. Concepts [11] from separate database schemas [5], XML, and other data sources can be represented by ontologies. Future systems using ontolo-gies may share and merge ontologies. However, this interoperability is not easy due to different vocabularies and granularities of ontologies [5, 12]. Aligning ontologies is the focus of a variety of works originating from diverse communities over the years. [2, 13]. * Corresponding author.

830 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff

Ontology mapping can resolve complex information exchange through a consen-sual understanding between concepts [14, 15]. This is achieved by identifying the exchange points between different representations [15]. The exchange points may share a common layer [4], a mediated schema that facillitates sharing of heterogene-ous sources [5, 12] and tackles incomplete data issue in Knowledge Systems [16]. This leads to integration of evolving context, information or data [6] for inferencing new knowledge and accurate search [1, 17].

3 Literature Review

3.1 Definitions

Ontology can be described as a pair of Signature and Axiom sets O=(S,A)where O is the ontology, S is ontological signature, and A is a set of ontological axioms, for restricting the meaning of the terms in the signature [4, 8].

Ontology mapping finds correspondences between entities of multiple ontologies [15]. Given ontologies, O1 = (S1,A1) and O2 = (S2,A2), ontological map-ping is a morphism f:S1→ S2, such that A2╞ f(A1) where all correspondences that satisfy O2’s axioms also satisfy O1’s “translated axiom” [4]. A correspondence is a function that assigns symbol of one vocabulary to the symbol of another vocabulary [2] [1]. A set of such correspondences between a pair of ontologies is an alignment. Mapping is a directed alignment [3].

Partial ontology mapping, is defined as having a sub-ontology O'1 = (S'1, A'1) where S'1 ⊆ S1,A'1 ⊆ A1) such that there is a total mapping from O'1 to O2 [18]

3.2 Mapping Framework

A high level view of mapping process can be simplistically depicted in Figure 1 [3, 15] :

Fig. 1. A simple high level view of a mapping process

The core component of mapping is the matcher. Matcher builds correspondences between ontologies by first selecting a suitable attribute or feature (entity label, struc-tural description of concepts, range for relations, instantiated attributes or extensional descriptions) from the ontologies [3]. The feature selection transform the resource into a light-weight ontology.

Select feature Search candidates Use matcher(s) Aggregate results Output Input

Iterate

Determine mapping

status

A Survey on Ontology Mapping Techniques 831

Common mapping approaches [4] [3] link candidate ontologies to a common on-tology using anchors. Anchors are entities which are declared to be equivalent (based on identity and user input).

Finally, matchers evaluate the similarity criteria of both ontologies. Often, func-tions based on heuristic similarity instead of exact logical similarity are used to avoid a costly exhaustive search. Multiple results from multiple matchers for the same entity pair can be aggregated. Recent advances introduce autonomous combination of mul-tiple matchers. The tie-breaker determining the mapping of each pair may use thre-shold method, relaxation labeling or combining structural and similarity criteria using learning algorithms or/and user input. Iteration stops at predefined loops or when there is no more new mapping proposal [3, 8].

3.3 Categories of Matchers

Basic matcher is a similarity function of a pair of entities, σ : O x O → R where R is [0 1]. In point-to-point approach, matching uses lexical or structural similarity of labels or instances. Various techniques have been developed and can generally be categorized as follows [4]:-

Terminological Mapping. Mapping uses token analysis to reduce a word to a com-mon format and establish its importance through weighting of relations and compari-son of paths. Various techniques specializing on features of concepts analyzed are:

• String-based. The analysis quantifies edit distance by counting and normalizing the required editing operations (insertion, deletion or substitution of affixes or substrings between two words) to transform the first word to the second.

• Language-based. The analysis tokenizes string using punctuations and cases; then uses lemmatization to find the possible basic forms of the base word.

• Linguistic resource. The analysis refers to an extrinsic source such as WordNet [3] for linguistic knowledge to interpret strings. Sense-based approach determines relationships of a word as hyponym, hypernym, synonym or antonym; whereas gloss-based approach counts the same words in a pair of phrases or sentences.

Terminological mapping face difficulty in processing word variations in the same

ontology or across ontologies [3].

Structural Mapping [3]. Structural mapping looks at the relationship (adjacency and path sharing) between concepts within the ontology structure. Two approaches are:

• Taxonomy mapping: The mapping uses super-concept rule and bounded-paths. Super-concept matching assumes similarity of actual concepts if both share the same parent concept. Bounded-paths compares two paths to identify similar con-cepts along the paths.

• Tree-based mapping: Similarity is based on the analysis of the positions within the graphs. Neighbour nodes are assumed to be somehow similar if two nodes from two ontologies are similar.

832 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff

Similar to terminological, structural mapping faces difficulty in processing many kinds of variations that occur in ontologies [3].

Semantic Technique. Semantic mapping is the most challenging area and a key re-search area [8, 9]. The key feature of semantic mapping is the use of model theoretic semantics to define well-formed-formula (wff) to express the meaning of anything without ambiguity. Its advantage is the deductive methods for amplifying or cropping the mappings in anchored ontologies to ensure mapping completeness and to elimi-nate bad correspondences [3]. Semantic technique is dependent on anchored ontolo-gies which contains mapping candidates serving as a common ground for comparison [3, 19]. Anchoring works by matching ontologies to the background ontology to ex-tract meanings for concepts using a domain knowledge [3]. The two approaches are using external ontologies and deductive techniques.

External Ontologies. Semantic technique uses a mediated approach. An intermediate reference ontology can provide general concepts and axioms for clarifying the mean-ing of domain concepts and the relations. The intermediate reference ontology can be an external ontology or using using a hidden intermediate reference ontology that is built on the fly using lexicons, as proposed by Kotis [1]. The user of external ontology is more common. An external ontology is a general, top-level and formal ontology for conceptual modelling. Examples are General Formal Ontology (GFO), WordNet [20], Cyc, Suggested Upper Merged Ontology (SUMO) [21] and Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [22]. It has a higher probability of finding a result by exploiting existing mappings but possibly at a lower accuracy [3]. A formal ontology provides reasoning and deduction methods.

Deductive Techniques. These techniques merge two ontologies and search correspon-dences through subsumption relation. Subsumption tests mappings and discard map-pings that fares poorly in satisfiability test. Three types of satisfiability techniques are Propositional Satisfiability (SAT), modal satisfiability and Description Logic (DL). [3].

SAT technique builds a theory (the domain knowledge or a group of axioms using matchers referring to external sources) as a premise to establish a relationship be-tween concepts, such as described by the following:

Axioms → r(c, c') where c and c' are a pair of concepts and r is a rela-tion {equivalent, subsumption, subclass , not equal}

Validation is done by an exhaustive check to ensure that there is no possible nega-

tion of the formula using instances. DL technique is a pure terminological formal knowledge representation technique.

It uses subsumption reasoning to establish relations between different ontologies in a pure semantic manner [3, 5]. Two ontologies are merged and the pair of concepts and roles are tested for subsumption. The relationships are expressed in minimal descrip-tion logics syntax. Relation inference is conducted on the description logics to see if the subsumption rules of the components are consistent.

A Survey on Ontology Mapping Techniques 833

3.4 Aggregating Matchers

As has been mentioned earlier, ontology mapping often compounds multiple basic matchers. The two generic compositions are [1]:- • Sequential Composition - Matchers are arranged sequentially. The initial

matchers in the order focus on smaller granularity such as linguistic matching. The subsequent matchers use higher granularity matching, such as structure matching.

• Parallel Composition - Both matchers evaluate similar set of ontologies simul-taneously and generate a result respectively, which are compounded into one.

• Mixed Composition - The composition of matchers can be a mixture of sequen-tial and parallel compositions, see Figure 2. AUTOMS for example, use sequen-tial arrangement for multiple passes of mapping and parallel composition to aggregate results of multiple methods. The multiple passes include simple and iterative structural, instance-based or property-based matching; lexical matching; and semantic matching. Aggregation includes result matrices of concepts, proper-ties and relations [23]. [3]

Note: M is a matrix of concepts, properties and/or relations

Fig. 2. A global matching system (adapted from [3])

4 Research Gap

According to Euzenat, semantics "provides the rules for interpreting the syntax which do not provide the meaning directly but constrains the possible interpretations of what is declared."[24] . Very few semantic techniques have been developed for ontology mapping despite the potentials of semantic techniques. A tall barrier is the difficulty of combining semantics' deductive technique with ontology's inductive structure.

Most matcher designs focus on a specific application domain or ontology type (DTD, relation schemas, OWL), hence reducing its reusability potential. Reasons are that ontology itself is designed for a specific application [3] due to intricacy of data in niche domains. Because rewriting a generic matcher for a new domain is inconve-nient, it is therefore desirable to have modularized and general matcher designs. Few mapping techniques designed to handle general ontologies are S-Match, Similarity Flooding, COMA++ and Cupid.

O

O'

M

M'

M"

M"'

M""A'A"

Basic matcher 1

Basic matcher 2

Semantic amplifier

Aggregation

Select correspon-dences

Extract mapping

834 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff

Table 1. Summary of Ontology Matching Techniques

Techniques Objective Limitations and gaps Ref.

String-based Synonyms, Homonyms, Normalization, String equality, Substring test, Edit distance, Token-based distance, Path comparison.

Simple matching effective when very similar strings are used on the schema name to denote the same concept.

Unable to distinct synonyms or homonyms effectively.

P.84 [3] [23]

Language-based1. Intrinsic methods / Linguistic normalization Reduce words to standar-dized form using tokeniza-tion, lemmatization, term extraction or stop-word elimination 2. Extrinsic methods Uses external resources.

To improve interpretation and apprehension of terms used using natural lan-guage processing (NLP).

Very dependent on linguistic resources such as Stemmers, Part-of-speech taggers, Lex-icons, and Thesauri Effectiveness hurdled by pres-ence of a foreign languages and syntactic variations of the same word (spellings, abbreviations, prefixes, suffixes). It does not take into account the structure of ontology entities to find the most coherent match.

P.92 [3] [20]

Structure-based/Constraint-based (Internal structure) Keys are the most useful identifier. Works by comparing structure and the properties of entities.

To match schema to deter-mine if the classes are equivalent. It is often used to quickly find possible matches with shallow accuracy.

Lacking accuracy. Ineffective when two equivalent entities has different data types for its properties. It may be possible that different entities have similar properties.

p.92 [3]

Structure-based(Relational structure) Wu-Palmer similarity Upward cotopic similarity Compare structure of entities using relations. Similarity is based on similar counting of edges in the graphs.

To match concepts in tax-onomy, formal ontologies and semantic networks.

Difficulty in detecting (using iterative algorithm) and inability to handle mutual influence between related parts. Using edge count is inconclusive se-mantically as there is possibility that the same class hierarchy can be summarized as a short alter-native. More work has been done on taxonomical form of relation but less on mereology (part-of) relation.

p.98 [3]

Extensional Hamming distance Jaccard similiarity Formal Concept Analysis (FCA) Concept Lattice Instance identification Disjoint extension Statistical-approach Similarity-based Matching-based

Relatively accurate match-ing of classes when both ontologies are using a set of common individu-als/instances; or tangible and non-changing indices.

When instance information is not available.

p.110 [3]

Semantics External ontology Deductive Propositional Description logic

Used to find all inconsis-tent correspondences to complete mappings and generating justification for mapping result.

Ontology requires inductive inputs but semantics technique is deductive in nature. Currently, there is a lack of interoperability between inductive technique and deductive semantic techniques.

p.110 [3]

A Survey on Ontology Mapping Techniques 835

Other areas noted for more research are:

• Relationship between entities are mostly quantitative, expressed in confidence range [0 1]. Matcher that determines qualitative relationship between entities us-ing logical relations (e.g. equivalence or subsumption) may be more expressive.

• Most matcher analyzes schema despite the abundance of data repository and instances. The very few instance-based solutions use Naive Bayesian classifier and common value patterns.

• Only a few matchers (DCM, Wise-Integrator) handle more than one pair of on-tologies. The results are usually one-to-one mapping although it may be possible to find one-to-many or many-to-many. Most matchers process ontologies with tree structure. Few matchers (COMA++, Cupid and OLA) process graphs.

Table 1 is a non-exhaustive list of existing basic mapping techniques.

5 Conclusion

This paper investigates the advancements made in technologies and theoretical foun-dations of ontology mapping. Matchers find correspondences between ontology enti-ties from narrow perspectives. Therefore, results of multiple matchers are combined for better accuracy and mapping probability. Semantic mapping extends existing matching techniques through enrichment and formalism of anchored ontologies for higher accuracy. Future challenges are to enhance deductive-inductive operability, generalization and modularity of semantic mapping methods.

References

1. Kotis, K.: Ontology Matching, Ai-Lab, ICS Eng. University of the Aegean (2007) 2. Kalfoglou, Y.A.M.S.: Ontology mapping: the state of the art. The Knowledge Engineering

Review 18(1), 1–31 (2008) 3. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Heidelberg (2007) 4. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. The Knowledge

Engineering Review 18(1), 1–31 (2003) 5. Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H.,

Hübner, S.: Ontology-Based Integration of Information - A Survey of Existing Approach-es, pp. 108–117 (2001)

6. Kasabov, N.: ECOS: The Knowledge Engineering Approach. In: ICANN, Porto 7. Cui, L., Zhao, J., Zhang, R.: The Integration of HAZOP Expert System and Piping and In-

strumentation Diagrams. Process Safety and Environmental Protection 88, 327–334 (2010) 8. Kotis, K., Vouros, G.A., Stergiou, K.: Towards automatic merging of domain ontologies:

The HCONE-merge approach. Web Semant. 4(1), 60–79 (2006) 9. Uschold, M.: Creating, Integrating and Maintaining Local and Global Ontologies. In:

ECAI 2000, Berlin, Germany (2000) 10. Beneventano, D., et al.: Ontology-driven Semantic Mapping. In: Mertins, K., et al. (eds.)

Enterprise Interoperability III, pp. 329–341. Springer, London (2008)

836 Y.K. Hooi, M. Fadzil Hassan, and A.M. Shariff

11. Gottgtroy, P.: Uilding Evolving Ontology Maps for Data Mining and Knowledge Discov-ery in Biomedical Informatics. In: BIOMAT - Brazilian Symposium of Mathematical and Computational Biology, Rio de Janeiro (2003)

12. Goh, C.H.: Representing and reasoning about semantic conflicts in heterogeneous informa-tion systems (1997)

13. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. The Knowledge Engineering Review 18(1), 1–31 (2003)

14. Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. In: Wu, X., Jain, L. (eds.). Springer-Verlag London Limited, London (2004)

15. Davies, J., Studer, R., Warren, P.: Semantic Web Technologies - Trends and Research in Ontology-based Systems. The Atrium, Sothern Gate, Chichester, West Sussex. John Wiley and Sons Ltd. (2006)

16. Smith, R.G.: Knowledge-Based Systems - Concepts, Techniques, Examples. Canadian High Technology Show, Lansdowne Park, Ottawa (1985)

17. Kuo, Y.-T., et al.: Domain ontology driven data mining: a medical case study. In: Proceed-ings of the 2007 International Workshop on Domain Driven Data Mining. ACM, San Jose (2007)

18. Meseguer, J.: General logics. Logic Colloquium 87, 275–329 (1989) 19. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge

in ontology matching. In: Proceedings of ECAI (2006) 20. Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of semantic distance.

Computational Linguistics 32(1), 13–47 (2006) 21. Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the Internation-

al Conference on Formal Ontology in Information Systems, vol. 2001, pp. 2–9. ACM, Ogunquit (2001)

22. Gangemi, A., et al.: Sweetening WORDNET with DOLCE. AI Mag. 24(3), 13–24 (2003) 23. Lasek, I.: Ontology Matching Techniques (2011) 24. The Knowledge-based Economy. General Distribution. Organization for Economic Co-

operation and Development, Paris (1996)


Top Related