a hybrid method for integrating multiple

18
A Hybrid Metho d for Integrating Multiple Ontologies  Trong Hai Duong 1 , Ngoc Thanh Nguyen 2 , and Geun Sik Jo 1  1 School of Computer and Information Engineering, Inha University, Korea [email protected],  [email protected]  2 Institute of Computer Science, Wroclaw University of Technology, Poland [email protected]  Abstract. While there have been a variety of researches focusing on ontology integration based on simple techniques (e.g., element- or structure-level techniques), the hybrid approaches combining the simple techniques have not been explored. In this paper, we describe a hybrid method to integrate multiple ontologies in several levels such as element level, internal structure, and relational structure. A semantic supporting environment (SSE) combining special domains (e.g., WordNet) and text corpus are defined in the proposed approach. An enriched ontology model (EOM) has been proposed to reduce the initial complexity of the process of ontology integration. Subsequently, the semantic network called OnConceptSNet is provided. The relations between the concepts in the OnConceptSNet are derived from the SSE. An Enhanced Algorithm (EA) has been proposed to enh ance OnConceptSNet .  Keywords. Knowle dge integration, Ontology integration, Semantic network, Meta-rules. 1 Introduction Ontology has become a “buzz w ord” in th e semantic w eb an d semantic data processin g, and its import ance is being recogn ized in a multipli city of resear ch fields and application areas, such as knowledge engineering, database design and integrati on, information retrieval and extraction, standard search (e. g., Yahoo and Lycos), ecommer ce (e.g., Amazon and eBay), configuration (e.g., Dell and PC-Order), and gov ernment intelligence (e.g., DARPA’s High Performance Knowledge Base (HPKB) program). The ontologies play a central role in facilitating data exchange between the several sources. In general, the problem of ontology integration can be formulated as follows:  For given ontologies O 1  , …, O n one  should determine an ontology O which could rep lace them (Gangemi et al. 1998, Pinto and Martins 2001). Ontology integrati on is then a comp lex task, since th e on tologies have various characteristics and forms such as languages, domains, structur es of ontologi es may differ from each other. Therefore, the authors of (Lee et al. 2006) have suggested an ontology-

Upload: trong-hai-duong

Post on 06-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 1/18

A Hybrid Method for Integrating Multiple On tologies

Trong Hai Duong 1 , Ngoc Thanh Nguyen 2 , and Geun Sik Jo 1

1 School of Computer and Information Engineering, Inha University, [email protected], [email protected]

2 Institute of Computer Science , Wroclaw University of Technology, P oland [email protected]

Abstract.

While there have been a variety of researches focusing on ontology integration based on simple techniques (e.g.,

element- or structure-level techniques), the hybrid approaches combining the simple techniques have not been

explored. In this paper, we describe a hybrid method to integrate multiple ontologies in several levels such aselement level, internal structure, and relational structure. A semantic supporting environment (SSE) combining

special domains (e.g., WordNet) and text corpus are defined in the proposed approach. An enriched ontology model

(EOM) has been proposed to reduce the initial complexity of the process of ontology integration. Subsequently, the

semantic network called OnConceptSNet is provided. The relations between the concepts in the OnConceptSNet are

derived from the SSE. An Enhanced Algorithm (EA) has been proposed to enhance OnConceptSNet .

Keywords. Knowledge integration, Ontology integration, Semantic network, Meta-rules.

1 Introduction

Ontology has become a “buzz word” in the semantic web and semantic data processing, and its importance is being

recognized in a multiplicity of research fields and application areas, such as knowledge engineering, database design and

integration, information retrieval and extraction, standard search (e.g., Yahoo and Lycos), ecommerce (e.g., Amazon and

eBay), configuration (e.g., Dell and PC-Order), and go vernmen t intelligence (e.g., DARPA’s High Performance

Knowledge Base (HPKB) program). The ontologies play a central role in facilitating data exchange between the several

sources.

In general, the problem of ontology integration can be formulated as follows: For given ontologies O 1 , …, O n one

should determine an ontology O which could replace them (Gangemi et al. 1998, Pinto and Martins 2001). Ontology

integration is then a complex task, since the ontologies have various characteristics and forms such as languages, domains,

structures of ontologies may differ from each other. Therefore, the authors of (Lee et al. 2006) have suggested an ontology-

Page 2: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 2/18

based architecture, which provides a solid basis for existing studies about ontology integration task. Pinto and Martins (2001)

identified the activities which should be performed in the ontology integration process. Recently, there has been an

increased interest in creating various tools serving to ontology integration: PROMPT (Noy and Musen 2003) is a semi-

automatic and interactive tool suitable for performing ontology mapping, alignment, versioning, and merging, based on the

Frame paradigm. Noy and Musen have developed ANCHORPROMPT (2001) for ontology mapping and PROM-PTDIFF

(2002) for ontology merging. The limitation of PROMPT is that two ontologies taking part in the mapping (and merging)

process must be different versions of the same ontology. MAFRA (Maedche et al. 2002) is an ontology mapping frame-

work using Semantic Bridge Ontology (SBO). In MAFRA, similarity between two concepts is calculated mainly using

lexical analysis via WordNet, domain glossaries, bilingual dictionaries, and corpuses. There is no explicit deterministic

heuristics other than lexical heuristics (or synonyms), in the semantic bridge construction. ONION (Mitra and Wiederhold

2002) is a heuristic-based ontology composition system to resolve the terminological heterogeneity using two matching

approaches: linguistic matching via WordNet and instance-based matching via databases. Chimaera (McGuiness et al. 2000)

is an ontology merging and diagnosis tool developed by the Stanford University Knowledge Systems Laboratory (KSL).

Owing to this tool, two semantically identical terms from different ontologies are coalesced so that they are referred by the

same name in the resulting ontology, next it identifies the terms that should be related with each other by subsumption,

disjointness, or instance relationships and provides the support for introducing those relationships. GLUE (Doan et al. 2001,

Doan et al. 2002) is a system that employs a multi-strategy machine learning technique with jointing probability distribution.

Firstly, GLUE identifies the similarities of instances. And secondly, it compares between the relations, based on thesimilarity results of instances. GLUE uses two kinds of base learners: a name learner and a number of content learners.

The purpose of the above mapping tools it not to create a new ontology from multiple ontologies. In this paper, we

propose a new method to integrate multiple ontologies. Our main contributions consist of the following elements:

- Enriched Ontology Model (EOM) has been proposed to improve the semantic concepts in ontologies from which the

complexity is reduced initially by a direct matching between the same types of concept, instead of matching blindly or

exhaustively among all concepts.

- Semantic Supporting Environment (SSE) has been defined. It not only provides the semantic relations between the

concepts in which the relations acquire from the knowledge of combining the special domain (e.g. WordNet) and the text

corpus discovery, but also enhances the ability of special domain such as supplementing new relations of concepts to the

special domain. Moreover, the techniques of similarity analysis used in SSE are combined with instance-based similarity,

lexical-based, schema-based, and taxonomy- based.

Page 3: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 3/18

- A semantic network called OnConceptSNet has been also provided. It allows two concepts owing many relations in the

progress of ontology integration. The OnConceptSNet provides a rich semantic environment in order that the relations

between concepts enhance themselves.

- An Enhanced Algorithm (EA) has been proposed in which OnConceptSNet is initiated by the static rules and the

knowledge included in SSE, next enhanced by the meta-rules, and finally reduced by the dynamic rules. The final

OnConceptSNet will be the one that representing the candidate ontologies.

2 Basic Notions

We assume a real world (A,V)where Ais the finite set of attributes and Vis the domain of A. Also, Vcan be explained as a

set of the values of the attribute, and V=⋃V ∈ (Vis the domain of attribute a). In this paper, we accept the following

assumptions:

Definition 1 (Ontology). An ontology is a quintuplet:

= ( , ∑, , , )

where,

– : set of concepts (the classes);

– : set of instances of the concepts;

– : set of binary relations between the concepts from , or between the concepts from and values

defined in a standard or user-defined data type;

– : set of axioms, which can be interpreted as integrity constraints or relationship between instances

and concepts. It means that is set of restrictions or conditions (necessary & sufficient) to define

the concepts in C;

– < , ∑ > : is the taxonomic structure of the concepts from where ∑ is the collection of

subsumption relationship (⊑) between any two concepts from . For two concepts and ∈ , ⊑if only if any instances that are the members also are the members of concept ,

and it is not vice versa.

The is known as the set of properties. For every ∈, there is a specific domain and rangeℛsuch that p: ,

where ⊆Cand if ℛ⊆Cthen p called object property, else if ℛis a set of standard of user-defined data types then p called

data type property. We assume that concepts c and correspond to the domain and range of property p respectively , where p

Page 4: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 4/18

is also known as the attribute of the concept c. With two given instances and that belong to the corresponding concepts c

and respectively. We denote Ras the relation from instance to via the property (attribute) p. If p is an inversely

functional property, the relation from instance to via the property p is denoted as R.

Definition 2 (Concept). A concept of an ( , ) -based ontology is defined as a quadruple:

= ( , , , )

where c is the unique identifier for the instances of the concept. The ⊆ A is a set of attributes

describing the concept and ⊆ is the attributes’ domain: = ⋃ ∈ . The ⊆ is a set of

restrictions or conditions (necessary & sufficient) to define concept c. The can be presented as a

constraint function : → such that ( ) ∈ for all ∈ .

Pair ( , )is called the possible world of concept c and is called the structure of concept c. It should be noticed that

within ontology there may be two or more concepts with the same structure. If this situation takes place, the constraint

function ∈will be useful to express the relationship between them. For example, two concepts RedWine and WhiteWine

are the same structure { hasMaker, hasColor }. But (hasColor ) = {∃ ℎ =} and (hasColor ) =

{∃ ℎ = ℎ}.

Definition 3 (Instance). An instance of a concept c is described by the attributes from set with the

values from set . Thus, the instance of a concept c is defined as a pair:

= ( , )

where is the unique identifier of the instance in world ( , ) and is the value of the instance,

which is a tuple of type , and constraint function is satisfied. The value can be presented as a

function : → such that ( ) ∈ for all ∈ .

Value is also called the description of the instance within concept c. A concept may be interpreted as a set of all

instances described by its structure. By ( , )we denote the set of instances belonging to concept c in ontology O and we

have = ⋃ ( , ) ∈ .

Definition 4 (Key Identity). Key Identity (KI) of a concept is an attribute from set which provides a

unique value to each individual of the concept in the real world ( , ). Formally, if is a KI of concept

c, it satisfies the following conditions:

- ∈ ,

- ∈ ( , ) |∀ , ∈ .∧ → = , and

- ∈ ( , ) |∀ , ∈ .∧ → =

Page 5: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 5/18

The first two condition means than the KI of a concept must necessarily provides the same KI value for the same

instance of the concept. The third condition means that it must be sufficient to recognize two instances which both actually

exist and with the same KI value as the same instance. All above conditions imply that KI of a concept should be globally

identifiable for instances inside the real world ( , ). The KI is also known as such rigid property (Guarino and Welty 2000)

that is essential to all its instances.

Example 1. We consider the concept Person owing the hasFingerprint , which is KI . We say that the instance Jean has

hasFingerprint of 000155BDC , the instance Peggy has hasFingerprint of 000155BDC . Because hasFingerprint is a KI , we

can deduce that Jean and Peggy must be the same instance. It is necessary to note that because hasFingerprint is a KI , there

always exist the inverse relation isFingerprintOf . If two instances 000155BDC and 000155BEF are isFingerprintOf of the

instance Jean, 000155BDC and 000155BEF must be the same instance. However, it should be noticed that if 000155BDC

and 000155BEF were explicitly stated to be two different instances, the above statements would lead to an inconsistency.

Definition 5 (Local Identity). Local Identity (LI) of a concept is an attribute from set , which provides

a unique value to each individual of the concept in the possible world ( , ). Formally, if is an LI of

concept c, it satisfies the following conditions.

- ∈ ,

- ∈ ( , ) |∀ , ∈. ∧ → = , and

- ∈ ( , ) |∀ , ∈. ∧ → =.

The difference between a KI and a LI i s tha t a L I of a concept can be only locally identifiable for instances inside the

possible world ( , ).

3 Enriched Ontology Integration

There are two possible relationships within the semantic matching corresponding between concepts. They are subsumption

relationship (⊑)and equality (⇔). The most of previous mapping tools try to find the matching among all concepts in

different ontologies, therefore the complexity increases rapidly in mapping between large ontologies or in integrating

multiple ontologies. Consequently, a novel possibility of this Enriched Ontology Model defined in this paper is that it can

flatten the iterations of a matching process and therefore reduce the complexity. That is its advantage over other existing

mapping methods.

3.1 A Classification of Concepts

Page 6: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 6/18

Definition 6 (Defined Concept). Defined Concept (DC) is a concept which has at least one KI. Formally, if c is a

DC, its constraint function satisfies the following conditions.

-∃ ∈ , () ℎ , and

- the attribute a is a KI.

Example 2. We refer Example 1 in which the concept Person is an example of the DC . The DC is also known as a rigid

sort (Guarino and Welty 2000) that supplies a principle of identity for its individuals.

Definition 7 (Partition Concept). Partition Concept (PC) is a part of a DC. Formally, if c is a PC, it satisfies the following

conditions.

-∃ ∈ ,∀ ∈: x(a) is a constant value, and

- the concept c is a defined concept satisfying ( ).

Example 3. We consider two concepts MalePerson and FemalePerson with the same structure { hasGender }. The

MalePerson is defined as the concept Person that satisfies (ℎ )={∃ ℎ . The

FemalePerson is defined as the concept Person that satisfies (ℎ )={∃ ℎ . Thus the

concepts MalePerson and FemalePerson are the PC.

Definition 8 (Inherited Concept). Inherited Concept (IC) is a sub-concept of either defined concept or partition

concept, or another inherited concept. It has at least one LI. Formally, if c is a IC, then its constraint function satisfies

the following conditions.

-

∃ ∈ , () ℎ , and

- the attribute a is a LI.

Example 4. If we say that two concepts Student with its KI hasIdStudent and Employee with its hasIdEmployee are sub-

concepts of the concept Person , we can refer that the Student and Employee must be IC .

Definition 9 (Primitive Concept). Primitive Concept ( PvC ) is a concept which has neither KI nor LI. Formally,

if a concept is a PvC, then its constraint function has not any set of necessary & sufficient conditions (only has

necessary conditions).

Example 5. We consider the UndergraduateStudent, MasterStudent, and DoctoralStudent defined through the concept

Student . But they do not have any set of necessary & sufficient conditions, then we can refer that these concepts must be the

PvC . It is imp ortan t to notice that the concepts will never be placed as sub-concepts of a PvC .

Proposition 1 (Concept Classification ). For given an ontology O belonging to real world ( , ) , we denote four

different sets of DCs, PCs, ICs and PvCs to be , , respectively .

Page 7: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 7/18

1. C∪ C∪ C∪ C= ,

2. C∩ C∩ C∩ C=∅ ,3. the levels of concepts increase in the order of PvC, IC, PC, and DC respectively.

Proposition 2 (Axiom ). For two given concepts: ( , , , )belonging to ontology , and ( , , , ) belonging to ontology .

1. The necessary condition for to be equivalent with is that and are of the same type,

2. For any two concepts ∈ and ∈ is in the , then can not be placed as sub-concept of ,

3. For any two concepts ∈ and ∈ , is subsumption of if only if level is higher than ′ .

3.2 Enriched Ontology Model

The (EOM) is a process to enrich the semantic of concepts in ontologies in which the concepts are classified to four

different types of concept, such above proposition. We define Enriched Ontology Model as follows:

Definition 10 ( E n r i c h e d Ontology Model). Enriched Ontology Model (EOM) is a quintuplet:

=( ,C,C,C,C,ℜ)

Where O is an ontology. The sets of concepts DC, PC, IC, PvC are corresponding to , , , which satisfy

ℜbeing a set of ontological axioms and constraint as follows:

ℜ=

⎩⎪

(cs) ∈,∃ ∈ ;(cs) ∈,∃ ∈ ,∀ ∈( ,),() ∈ , =; (cs) ∈,∃ ∈ ⋃ ⋃ , ⊑,∃ ∈ (cs) ∈,∀ ∈ ()≠∆∎;(ax) ∈⋃ , ∈( ,) ℎ (ax) ∈, ∈( ,) ℎ (ax) ∈( , ), ℎ ⊑ (ax) ∈, ∀∈ ℎ ⋢;(ax) ,∈ ∈, ⊑ ⊑ ℎ ⋈ ;

Here ∆is the necessary condition, ∎is the sufficient condition, ⋈is the disjoint relationship, and = ⊓means that

concept c is defined by intersection between the concept and the attribute a.

Figure 1 illustrates our proposed method for heuristic matching. According to the above proposition, there are only direct

matching equality between the concepts asserted in the same classification types and the matching subsumption from the

low level of concepts to the higher level of concepts in classifications, instead of matching to all concepts by traversing

taxonomies completely. Moreover, while calculating similarity between concepts, most of existing mapping methods often

compare all properties belonging to each concept and its name/label. However, our methodology does not require that.

Instead of comparing blindly or exhaustively among all properties belonging to each concept, we just focus on some

Page 8: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 8/18

properties that identify the concept. For example, while computing the similarity between two concepts in the DC type, we

only compare their key identity property. It is called key identity-based as show in Figure 1.

Our approach depends on analyzing the internal structure of concepts, so we refer it as internal structure -based or EOM-

based matching (similarity).

4. Semantic Supporting Environment

4.1 Similarity between Two Words.

There have been lots of previous works focusing on finding similarity between words based on the WordNet-based.

However, we can distinguish two basic approaches: (1) The similarity measures are based on the path’s lengths between

concepts such as Lch (Leacock and Chodorow 1998), Wup (Wu and Palmer 1994). Most of these similarity measures

subject to is-a hierarchy in which the concepts occur. But is-a relations in WordNet do not cross part of speech boundaries,

so these WordNet-based similarity measures are limited to making judgments between noun pairs (e.g., cat and dog ) and

verb pairs (e.g., run and walk ). While being included by WordNet, the adjectives and adverbs are not organized into is-a

hierarchies. (2) The similarity measures are based on information content, which is a corpus-based measure of the

specificity a concept. These measures include Res (Resnik 1995), Lin (Lin 1998), and Jcn (Jiang and Conrath 1997).

Intrinsic to the calculation of information content is the use of tagged corpora; the intuition is that the more often a concept

appears in a corpus, the less specific it is, so the methods depend on tagged corpora. Such a strategy is not without an

unpleasant thorn; two well known and somewhat discouraging problems are inherent to them: Manually tagging corpora is a

wearisome and highly time consuming burden; It is very difficult to obtain a statistically valid and reliable corpus that

Figure 1. Heuristic matching

Page 9: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 9/18

reflects truly the word usage; many relatively common words may not appear even in very large corpora. This problem is

usually referred to as the sparse data problem.

Therefore, we have proposed a new method (Duong et al. 2008b) based on the WordNet to measure the similarity, which

take advantages over the above methods. Moreover, the similarity is across part of speak. WordNet is limited by its database

and lack of relations between concepts existing in it. For this reason, we have proposed another method (Duong et al. 2008a)

to acquire relation between the entities from text corpus. We also combine WordNet-based and text corpus to provide the

relations between the concepts, which called Semantic Supporting Environment (SSE).

4.2 Combining Text Corpus and WordNet-Based

Similarity between Two Concepts. Most instances of a concept are the set of hyponyms of the concept. For

example, when the concept Country has instances as Vietnam, Korea, Poland, it is considered as the hypernym of Vietnam,

Korea, and Poland. For this reason, a method to compute the similarity between two concepts via their instances has been

proposed as follows:

={, ,…,},is the name/label of the instances of

={, ,…,}is the set of the tokens resulting from two processes: the processing of demarcating and the possibility

of classifying sections of string ∈, and the processing of determining the lemma for each word of the tokens. For

example, parsing the name Hands_Free_Kits into tokens{ hands, free, kits } by recognizing the punctuation and determining

the part of speech of each word in the tokens to the final one {hand, free, kit}.

={, ,…,} the set of more general words of ∈ , =1.. .

The words of are generated from the WordNet’s relations and Text corpus discovery (Duong et al. 2008).

= () where ⊆and if ∈, exist at least nsets ,i=1,2,…,n

= () where ⊆and if ∈,there exists at least ∈ and is more general word of .

We define the similarity between two structures of two concepts as follows:

is the representation of the structure of the concept c. ={, ,… } , where ,i=1,2,…,nare the properties of

the concept:

Page 10: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 10/18

, =∑( (∈( ,)))∈ ()

The similarity between two concepts cand cis defined as follows:

( ,)= (, )+ , + (( , )+ ( , )) where 0≤ , , ≤1, + + =1and (, )is the similarity between two labels of concepts ,and

Combining Acquisition Algorithm. The representation of the OnConceptSNet is built or extended as the initial step

by acquiring the knowledge from WordNet-Based and Text corpus discovery. We suppose that the relation (, )will

exist between two concepts and that come from the OnConceptSNet . While comparing a result (, )to the

WordNet-based, three possibilities are available:

1. Both concepts and are in WordNet, and their relation (, )is already in the database of WordNet, it is

suggested to update the OnConceptSNet .

2. Both concepts and are in WordNet, and their relation (, )is not; it is suggested to update the

OnConceptSNet and the WordNet.

3. The concepts and are not present, these concepts and the corresponding (, )relation are suggested to add

the Knowledge-base of Assistant WordNet and to update OnConceptSNet (just the relation (, )).Here we sketch the collaborative acquisition algorithm which combines WordNet-based and Text corpus to discover new

relations between the entities of ontologies for ontology integration tasks as follows (see figure 2.):

- Knowledge of Assistance WordNet is a Concept Net based on the ontology with its relations: is kind of, is equivalent of . It

receives messages from Feedback component, then updates the relations between the entities of ontologies which is not

existent in WordNet-based.

Figure 2. Combined acquisition algorithm

Page 11: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 11/18

- Mining from Text Corpus is the procedure that is mentioned in (Duong et al. 2008a). It discovers new relations between

the entities of ontologies through Text corpus.

- Ontology Integration Task , will be presented in the next section, receives the relation R(c,c)and updates OnConcept-

SNet.

- Feedback is a cache of new relation and mark (mark is used to identify new relation which should be updated in

Knowledge of Assistance WordNet or WordNet-based).

5 Ontology Integration Strategies

5.1 The OnConceptSNet

In this section, we present a semantic network of ontologies’ concept, called OnConceptSNet which serves to integrate

multiple ontologies and reconcile semantic conflicts between the ontologies. The OnConceptSNet builds or extends the

concept representations by acquiring knowledge from WordNet-Base, Text corpus, and Meta-rules. The knowledge may

change the old network by adding or deleting nodes and arcs or by modifying the numerical values of arcs (relations) or the

relation between nodes, called weights, associated with the arcs.

An OnConceptSNet is a directed loop graph with quadruple:

=(∗,∗, , )

where,

– ∗is a set of nodes representing concepts that come from ,…, ,

– ∗is a set of arcs representing the relations between concepts: semantic equivalent ( ⇔), more general (⊑), disjoint (⊥),overlap (≍). Each arc is associated with a numerical value being weight (w) of a relation is represented by the

corresponding arc.

– N is an adjacency matrix of G, written N (G), n-by-n matrix in which n is the number of nodes in G. Entry is the number

of arcs in G with endpoints ( , )/ = and otherwise entry to distinguish and its corresponding ontology.

– M is the incidence matrix of G, written M (G) , n-by-m matrix in which m is the number of edges (relations) in G. If is

the start point of , entry is equal -1. If is the second point of , entry is equal w ( >0) that is the weight

of the arc and is qual 0 in the others case.

– If vertex v is a start point of edge e, then v and e are incident values.

– The degree of vertex v, written d(v) is the number of incident values of edges.

Page 12: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 12/18

– Local matrix of vertex , written L( ) is M(G) limited by Left column ∑ ()∈[..) +1and right column

∑ ()∈[..) + (), where is a vertex at row i of matrix M(G) .

Example 5. We consider the OnConceptSNet as instance of Figure 3. Let’s have a look at matrix G(N) from which we can

get to know that the concepts a and d are in ontology 1 and the concepts b and c are in ontology 2 because N(a, a)= N(d,

d)=1 and N(b, b)= N(c, c)=2 . Moreover, the number of edges with end points ( b, c ) is 3 because N(b, c)=3 . According to

the G(M) we can distinguish the relationship between the nodes and its corresponding weight. For instance, b is more

general than a and its weight is 0.5. Furthermore, the searching space of concepts is reduced by focusing in the local matrix

of the concepts. The gray window of matrix M(G) is local matrix of concept c.

5.2 Example of Meta-rules for Generating New Probability Distributions

In the process of determining the relation of candidate ontologies’ concepts, two concepts can have many relations. Each

relation has own weight. In instance, it may be incorrect if depending only on the weight to determine the best relation

between the concepts because the relation of two concepts should be considered in many different aspects, this relation

depends not only the similar terms or similar structures themselves, but also the relationships’ interaction of the neighbor

concepts. Therefore, in this section, we discuss on how to generate the new probability distributions depending upon the

existing ones that might be able to change and enhance the old network of OnConceptSNet as follows:

We use the following notation conventions through the rest of this section:

- The concepts from have the notation with a prime ( ′) and conversely the concepts from O have the notation without

prime ( ′),

Figure 3. An example of OnConceptSNet .

Page 13: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 13/18

- Lower-case q with or without a subscript denotes a property,

- (, )indicates that q is the relation between and where is q’s domain and is q’s range,

- ( , )indicates that the probability of the match ( ) is x, where r is the match between and and r is

either subsumption or equivalent .

- The max and δ are the expert-provided constant less than 1.

Where the equality meta-rules are as follows:

1. (⇔,)∧( >)∧( , )∧( , )∧( > , ⇔1..2)→( , )∧ ( , ).

2. (⇔,)∧( >)∧( , )∧( , )∧( > , =1..2)→( , )∧ ( , ).3. (⇔,)∧( >)∧( , )∧( , )∧( > , =1..2)∧( ≠⊥)→(⇔,) , where

= (1,min (1,+ )), x’is the previous probability of the match ( ⇔).

4. (⇔,)∧( >)∧( , )∧( , )∧( > , =1..2)∧( ≠=)→(≍,), where

= (1,+ ), x’is the previous probability of the match ( ≍).

5. (⇔,)∧( , )∧( , )∧( ≠⊥)∧ (⇔, )∧( > , =1..3)→(⇔,),where = (1, + ).

6. (⇔,)∧( ⇔,1)∧ (, )∧ ′(, )→(⇔, (1, + )) 7. q( , )∧( , )∧(⇔, )∧(⇔, )∧(> , =1..2)→ ( ⇔ ′, (( , )+) Here we present three main steps of the enhanced algorithm to build OnConceptSNet :

a. Initial step: combining the static rules in our work (Duong et al. 2008) with EOM- based to find out the relations

between the nodes,

b. Enhanced step: using the meta-rules 1 and 2 to enhance initially OnConceptSNet, after that the meta-rules 3, 4, and 5

are used to enhance the neighbor matching by analyzing the subsumption relationships between the concepts such as

generalization, specialization, and siblings. Finally the meta-rules 6, 7 are used to enhance OnConceptSNet by

analyzing the relation of the OnConceptSNet and the properties (relations) between concepts. This step repeats until all

edges are applied the meta-rules.

c. Reduced step: we use rules, which is not presented here, to determine the best representing among the relations

between the same two nodes of OnConceptSNet . Then the OnConceptSNet is reduced by dynamic rules in (Duong et

al. 2008). An ontology is represented by the final OnConceptSNet , which best replace the candidate ontologies.

Page 14: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 14/18

Note that the above-mentioned meta-rules are just some examples of equality meta-rules. Other meta-rules such as the

subsumption, overlap and disjoint meta-rules are not presented here. Moreover, because these meta-rules enhance relations

between the concepts of OnConceptSNet by analyzing relation structure between the concepts, this approach is called

relation structure-based similarity.

6 Multiple Ontologies Integration Progress

The Figure 4 below illustrates the ontology integration progress where the most of components are already presented in the

previous sections. Therefore, in this section, we just discuss on how to recognize the identity of concepts, where it is clue to

classify concepts in the EOM.

Here we show some methods to recognize identities. We assume that all candidate ontologies are transformed to ontologies

OWL. Firstly, we collect all the necessary and sufficient properties of the concept. Secondly, we represent an identity as the

property of the concept and distinguish it from other properties by the characteristic of one-to-one functional between itsdomain and range by implement two different methods as follows:

1. As we know, the identities can be written in OWL by using owl:DatatypeProperty with three restrictions:

owl:FunctionalProperty, owl:InverseFunctionalProperty , and owl:cardinality = 1. Here, we use the following

heuristic to distinguish a DC : If a concept is the one of top-most taxonomy in a given ontology and it contains at

least one identity, it must be a DC .

Figure 4. Ontology Integration Processing.

Page 15: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 15/18

Matching Methods

Instance -based

Lexical -based

Schema -based

T axonomy -based

Internal Structure

Relation Structure

PROMPT Y Y Y Y N N MAFRA Y Y Y Y N ONION Y Y Y Y N N GLUE Y Y Y Y N N

HYBRID Y Y Y Y Y Y

N

2. We consider the concept of ontology as the possible domain and the other candidate ontologies in which the concept

occurs as global domain. After that, we check the characteristic of one-to-one functional between its domains

(possible domain, global domain) and range to recognize the KI s and LI s. We also use the following heuristic to

distinguish a KI and a LI : If a property has the characteristic of one-to-one functional between its global domain and

range, it is a KI . If a property has the characteristic of one-to-one functional between its possible domain and range, it

is a LI .

7 Experiments

In this section, we will discuss the aspects as follows: the first aspect concerns the similarity analysis techniques with

existing mapping methods; the second aspect compares the complexity of our method and CLUE’s Content-based matching;

the last aspect refers to the evaluation of experimental results.

In Table 1, we compare the techniques for similarity analysis of existing mapping tools with our composed approach in

novel internal structure-based and relation structure-based. It should be noted that the taxonomy-based similarity is the

similarity between two concepts and determined by analyzing the subsumption relationships between them such as

generalization, specialization, and siblings. However, the relation structure-based similarity relation is not only based on

analyzing structural relationships between the concepts in the taxonomy but also on their properties.

Here we present the comparative complexity between our methods and CLUE’s content-based matching as follows:

Supposing that N, N, and Nare the maximum numbers of nodes, properties (attributes), and instances. Let us assume that

the complexity of comparing two attribute values between two instances is O(1). Then, the complexity of calculating the

similarity between two instances will be O(N). The complexity for the similarity determination between two nodes is

O(N×N). Finally, the matching between two ontologies will cost O(logN×N×N). In order to compare GLUE with

our matching method, let us substitute N for every parameter; the cost of GLUE will become O(N×logN), while our

Table 1. Comparative techniques of similarity analysis

Page 16: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 16/18

matching method costs O(N×logN), because the method does not require comparing all properties belonging to each

class. Figure 5 illustrates the complexity difference between our methods of matching and GLUE’s content-based matching

in a line chart style. The chart states that the complexity is difference especially showing by the number of properties,

assuming that number of concepts and instances are equal in each case. Whenever the number of properties belonging to

concepts increases, the complexity difference increases proportionally.

We collect a large number of ontologies from Internet and compose the ontologies corresponding to them. Each sample

includes at least three ontologies. We called N is the total number of pairs of matching concepts between the candidate

ontologies by experts, N and N are corresponding to the number of correct pairs of matching concepts and the

number of incorrect pairs of matching concepts found out by our system.

Precision is used to evaluate the ratio of incorrectly extracted relationships:

Precision=NN +N

0

500

1000

1500

1 2 3 4 5 6 7 8 9 10

N u m

b e r o f

c o m p a r i s o n s

Number of properites

EOM-based

Content-based

EOM & Content-

based

0%

20%

40%

60%

80%

100%

Precision Recall

EOM-based

EOM & Content-

based

Figure 5. The differen t complexity between EOM-based and Content-based.

Figure 6. The evaluation of experimental results.

Page 17: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 17/18

Recall is used to evaluate the ratio of correct matching found out by the system:

Recall=NN

The Figure 6 illustrate some comparative experimental results between EOM-based matching and combining EOM-

based and content-based matching.

8 Conclusion

According to our studies of ontology integration, the methods of multiple ontologies integration have not been explored yet.

The hybrid method that is presented in this paper is a smart approach for multiple ontologies integration in which the

OnConceptSNet is as a semantic network serving to reconcile multiple ontologies. The relations between concepts of the

OnConceptSNet are derived from a semantic support environment SSE combining special domain and text corpus. The

OnConceptSNet is enhanced by the meta-rules. EOM-based matching is a heuristic, whose advantage is based on the initial

reduction of the complexity using a direct matching between the same types of concepts. In the future work we will deal

with exploring the EOM-based matching, that enables classifying concepts more correctly.

References

Doan, A., Domingos, P., and Halevy, A. 2001. Reconciling Schemas of Disparate Data Sources: a machine learning approach. In

Proceedings of ACM SIGMOD Conference, pp. 509-520.

Doan, A., Madhavan, J., Domingos, P., and Halevy, A. 2002. Learning To Map between Ontologies on the Semantic Web. World

Wide Web Consortium 2002-WWW2002, ACM. pp. 662 - 673.

Duong T.H., Nguyen N.T., Jo G.S. 2008a. A Method for Integration across Text Corpus and WordNet-based Ontologies. To appear in:

Proceedings of Workshop CISWSN 2008. IEEE Computer Society Press.

Duong T.H., Nguyen N.T., Jo G.S. 2008b. A Method for Integration of WordNet-based Ontologies Using Distance Measures. In:

Proceedings of KES 2008. Lecture Notes in Artificial Intelligence 5177, 210-219.

Ehrig, M. and Sure, Y. 2004. Ontology mapping - an integrated approach. First European Semantic Web Symposium. pp. 76-91.

Gangemi, A., Pisanelli, D.M., and Steve, G. 1998. Ontology Integration: Experiences with Medical Terminologies. In Nicola Guarino,

editor, Formal Ontology in Infor- mation Systems. IOS Press. pp. 163-178.

Guarino, N. and Welty, C. 2000. Ontological Analysis of Taxonomic Relationships, In, Laender, A. and Storey, V., eds, Proceedings of

ER-2000. The 19th International Conference on Conceptual Modeling, Springer- Verlag. pp. 210-224.

Page 18: A Hybrid Method for Integrating Multiple

8/3/2019 A Hybrid Method for Integrating Multiple

http://slidepdf.com/reader/full/a-hybrid-method-for-integrating-multiple 18/18

Jiang, J., and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings on International

Conference on Research in Computational Linguistics. pp. 19–33.

Leacock, C., and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In Fellbaum, C., ed.,

WordNet: An electronic lexical database. MIT Press. pp. 265–283.

Lee, J., Chae, H., Kim, K., and Kim, C.H. 2006. An Ontology Architecture for Integration of Ontologies. Processing The Semantic Web

– ASWC. pp. 205-211.

Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning.

Publishe Morgan Kaufma. pp. 296—304.

Maedche, A., Moltik, B., Silva, N.,Volz, R. 2002. MAFRA-An ontology MApping FRAmework in the context of the semantic

web.In:Proceedings of the EKAW 2002,Siguenza,Spain. pp. 235-250

McGuinness, D., Fikes, R., Rice, J., and Wilder, S. 2000. An environmen t for merging and testing large ontologies. In Proceedings of

the 7 International Conference on Principles of Knowledge Representation and Reasoning, Colorado, USA. pp. 483-493.

Mitra, P., and Wiederhold, G. 2002. Resolving Terminology Heterogeneity in Ontologies. In Proceedings of ECAI’02 Workshop on

Ontologies and Semantic Interop- erability, Lyon, France. p p. 45-50.

Noy , N. F., and Musen, M. A. 2001. Anchor-PROMPT: Using Non-Local Context for Semantic Matching. In Workshop on

Ontologies and Information Sharing at the Seventeenth International Join t Conference on Artificial Intelligence (IJCAI-2001),

Seattle, WA. pp. 242 – 258.

Noy, N. F., and Musen, M. A. 2002. PROMPTDIFF: A Fixed-Point Algorithm for Comparing Ontology Versions. In Proceedings of the

18th National Conference on Artificial Intelligence (AAAI’02) . pp. 744–750.

Noy, N. F., and Musen, M. A. 2003. The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping. In International Journal

of Human-Computer Studies, vol.59, pp. 983-1024.

Pinto, H.S., Martins, J.P. 2001 A Methodology for Ontology Integration. In: Proceedings of the First International Conference on

Knowledge Capture. ACM Press. pp. 131-138.

Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14 th International Joint

Conference on Artificial Intelligence. pp. 448–453.

Wu, Z., and Palmer, M. 1994. Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational

Linguistics. pp. 133–138.