a mapping-based tree similarity algorithm and its application to ontology alignment

11
A mapping-based tree similarity algorithm and its application to ontology alignment JiHua Wang , Hong Liu, HuaYu Wang Shandong Normal University, Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong, China article info Article history: Received 5 April 2013 Received in revised form 31 October 2013 Accepted 1 November 2013 Available online xxxx Keywords: Tree similarity algorithm Concept similarity Concept tree mapping Ontology alignment Ontology integration abstract A mapping-based tree similarity algorithm is proposed for matching concept trees in ontology alignment to integrate various information sources in the Semantic Web. Concepts regarding classes and properties are the most critical ontological elements and metadata. First, the similarity between the individual con- cepts of each type is defined. These concept systems, which are considered as the foundation of ontology, are described as tree modes for overall comparison. Based on the minimal cost of edit operations, previ- ous tree similarity measuring approaches are extremely complicated because three or four edit opera- tions are involved. Moreover, such approaches ignore the similarity among single nodes. In the proposed algorithm, node similarity, instead of changing operation, is adopted and the inserting and deleting operation is omitted. The proposed algorithm is more concise and effective because it satisfies the maximum mapping theorem without damaging tree isomorphism. The algorithm is resolved and realized by a dynamic programming scheme. Then, the algorithm is independently used to compare class and property trees, and their mapping concept sets are regarded as the main part of the ontology align- ment. Demonstration examples are used to prove the effectiveness and feasibility of the algorithm in ontology alignment. Ó 2013 Published by Elsevier B.V. 1. Introduction Ontology is one of the foundations of knowledge management. Therefore, the search for methods for unambiguously integrating ontology from different domains is an important issue in knowl- edge reuse and integration. In Semantic Web-oriented information integration, data from different data sources are transformed into a resource description framework for storage. However, these inte- grated data are difficult to use because the information is still rep- resented by a customized concept vocabulary, and the relationship between concepts from different sources remains unclear. Domain ontology refers to the description of concepts and their relation- ships. Integration of Semantic Web information is achieved by establishing the connection between the concepts of source ontol- ogy and those of domain ontology. That is, an ontological represen- tation must be converted into another for the concept to be understood. The mapping process between two ontologies is called ontology alignment. The focus of this research is the mapping framework of a massive ontology with hundreds or even thou- sands of classes and properties. Such a framework is extremely complex and difficult to deal with [1]. To date, this task has been artificially realized, and ontology alignment can be achieved using two manual methods: (i) mapping the two ontologies to a third- party shared ontology and (ii) mapping the two ontologies directly. However, manual methods limit dynamic sharing between knowl- edge and service. Moreover, research on automatic ontology align- ment must be considered because the Ontology Alignment Evaluation Initiative (OAEI) has previously launched an alignment algorithm competition [2]. Ontology is a concept system representing certain domain knowledge [3]. A commonly used form is a tree pattern in which each concept is labeled as a node [4–6]. A large number of scholars have conducted research on tree similarity and tree pattern match- ing. In these studies, the similarity between two trees is measured by edit distance, which is regarded as the cost of a tree into another tree. Such algorithms do not reveal the concept semantic similarity of each node pair. Instead, these algorithms are merely simply con- cerned with the equality or otherwise of two nodes, which is not sufficient for ontology alignment. In this paper, ontology concepts are initially divided into two types of class and property. The similarity of individual concepts among each type is individually defined, and its range is limited between 0 and 1. Maximum mapping and similarity between two trees, with ranges which are generally more than 1, are pro- posed according to the mapping theorem. The mapping-based algorithm is analyzed and set up using a dynamic programming 0950-7051/$ - see front matter Ó 2013 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.knosys.2013.11.002 Corresponding author. Tel.: +86 13589121231. E-mail addresses: [email protected] (J. Wang), [email protected] (H. Liu), [email protected] (H. Wang). Knowledge-Based Systems xxx (2013) xxx–xxx Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys Please cite this article in press as: J. Wang et al., A mapping-based tree similarity algorithm and its application to ontology alignment, Knowl. Based Syst. (2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

Upload: huayu

Post on 21-Dec-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A mapping-based tree similarity algorithm and its application to ontology alignment

Knowledge-Based Systems xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier .com/ locate /knosys

A mapping-based tree similarity algorithm and its applicationto ontology alignment

0950-7051/$ - see front matter � 2013 Published by Elsevier B.V.http://dx.doi.org/10.1016/j.knosys.2013.11.002

⇑ Corresponding author. Tel.: +86 13589121231.E-mail addresses: [email protected] (J. Wang), [email protected] (H. Liu),

[email protected] (H. Wang).

Please cite this article in press as: J. Wang et al., A mapping-based tree similarity algorithm and its application to ontology alignment, Knowl. Base(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

JiHua Wang ⇑, Hong Liu, HuaYu WangShandong Normal University, Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, Shandong, China

a r t i c l e i n f o a b s t r a c t

Article history:Received 5 April 2013Received in revised form 31 October 2013Accepted 1 November 2013Available online xxxx

Keywords:Tree similarity algorithmConcept similarityConcept tree mappingOntology alignmentOntology integration

A mapping-based tree similarity algorithm is proposed for matching concept trees in ontology alignmentto integrate various information sources in the Semantic Web. Concepts regarding classes and propertiesare the most critical ontological elements and metadata. First, the similarity between the individual con-cepts of each type is defined. These concept systems, which are considered as the foundation of ontology,are described as tree modes for overall comparison. Based on the minimal cost of edit operations, previ-ous tree similarity measuring approaches are extremely complicated because three or four edit opera-tions are involved. Moreover, such approaches ignore the similarity among single nodes. In theproposed algorithm, node similarity, instead of changing operation, is adopted and the inserting anddeleting operation is omitted. The proposed algorithm is more concise and effective because it satisfiesthe maximum mapping theorem without damaging tree isomorphism. The algorithm is resolved andrealized by a dynamic programming scheme. Then, the algorithm is independently used to compare classand property trees, and their mapping concept sets are regarded as the main part of the ontology align-ment. Demonstration examples are used to prove the effectiveness and feasibility of the algorithm inontology alignment.

� 2013 Published by Elsevier B.V.

1. Introduction

Ontology is one of the foundations of knowledge management.Therefore, the search for methods for unambiguously integratingontology from different domains is an important issue in knowl-edge reuse and integration. In Semantic Web-oriented informationintegration, data from different data sources are transformed into aresource description framework for storage. However, these inte-grated data are difficult to use because the information is still rep-resented by a customized concept vocabulary, and the relationshipbetween concepts from different sources remains unclear. Domainontology refers to the description of concepts and their relation-ships. Integration of Semantic Web information is achieved byestablishing the connection between the concepts of source ontol-ogy and those of domain ontology. That is, an ontological represen-tation must be converted into another for the concept to beunderstood. The mapping process between two ontologies is calledontology alignment. The focus of this research is the mappingframework of a massive ontology with hundreds or even thou-sands of classes and properties. Such a framework is extremelycomplex and difficult to deal with [1]. To date, this task has been

artificially realized, and ontology alignment can be achieved usingtwo manual methods: (i) mapping the two ontologies to a third-party shared ontology and (ii) mapping the two ontologies directly.However, manual methods limit dynamic sharing between knowl-edge and service. Moreover, research on automatic ontology align-ment must be considered because the Ontology AlignmentEvaluation Initiative (OAEI) has previously launched an alignmentalgorithm competition [2].

Ontology is a concept system representing certain domainknowledge [3]. A commonly used form is a tree pattern in whicheach concept is labeled as a node [4–6]. A large number of scholarshave conducted research on tree similarity and tree pattern match-ing. In these studies, the similarity between two trees is measuredby edit distance, which is regarded as the cost of a tree into anothertree. Such algorithms do not reveal the concept semantic similarityof each node pair. Instead, these algorithms are merely simply con-cerned with the equality or otherwise of two nodes, which is notsufficient for ontology alignment.

In this paper, ontology concepts are initially divided into twotypes of class and property. The similarity of individual conceptsamong each type is individually defined, and its range is limitedbetween 0 and 1. Maximum mapping and similarity betweentwo trees, with ranges which are generally more than 1, are pro-posed according to the mapping theorem. The mapping-basedalgorithm is analyzed and set up using a dynamic programming

d Syst.

Page 2: A mapping-based tree similarity algorithm and its application to ontology alignment

2 J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx

scheme. The algorithm is applied to mapping two types of concepttrees to achieve ontology alignment. It appears that the algorithmis also suitable for applications related to tree comparison (such asweb page comparison and product model comparison [7]) besidesontology alignment.

The main innovations of this paper are as follows. Firstly, themapping-based algorithm only calculates the similarities amongmatched-nodes and removes the cost computations of insertingand deleting nodes. Thus, running time is reduced and the key ele-ment of the paired nodes is highlighted. Secondly, the algorithmunderlines the similar situation of nodes rather than the equal sit-uation of nodes. Thirdly, the concept similarity about a class or aproperty is defined, and the aligning ontology is realized accordingto the hierarchical structures to keep the overall isomorphism.

In Section 2, studies related to tree similarity computation,ontology alignment, and integration are introduced. In Section 3,the two types of concepts in ontology (classes and properties) aswell as their systems (class trees and property trees) are described.The similarity among individual concepts is also defined. In Section4, tree similarity based on the mapping theorem is presented. Thealgorithm and mapping theorem are analyzed in Section 5. Algo-rithm implementation and its complexity are discussed in Section6. In Section 7, the algorithm is applied to ontology alignment. Thelimitation of the algorithm and the final conclusions are presentedin Section 8.

2. Related studies

A tree is the most common data structure in the field of com-puter science and studies on tree similarity algorithms have beenconducted for decades. Computation of the tree edit distancebetween two ordered or unordered labeled trees is one of themethods used for measuring the similarity between trees, i.e. ifthe edit distance is smaller, then the two trees are closer. Compu-tation of edit distance is focused on finding the minimum cost of asequence of deletions, insertions, and relabelings in both trees.Thus, it allows the trees to be transformed into isomorphic trees.

The edit distance method originated in 1974 from the string-to-string correction problem proposed by Wagner [8]. In 1979, thismethod was developed by Tai into a tree-to-tree correction problemwith a time complexity of O(n6) [9]. Zhang proposed simple and fastalgorithms for the edit distance between trees to reduce the com-plexity to O(n4) [10,11]. Klein used the heavy path decompositionalgorithm to calculate the edit distance with a time complexity ofO(n3logn) [12]. In 2005, a method proposed by Dulucq further re-duced the complexity to O(n2log2 n) [13]. In 2009, Demaine analyzedthe family of decomposition strategies and presented the optimaldecomposition algorithm with a time complexity of O(n3) [14].

Ontology integration refers to the development of a larger andmore complete ontology from available ontologies. This processcan be achieved by calculating concept similarity and conceptmapping.

Numerous articles on ontology mapping have investigatedmatching at the element level as a means of comparing two indi-vidual concepts or properties. Mitra estimated the similarity be-tween two concepts using a linguistic matcher [15]. Warincomputed similarity using a lexicon [16]. A syntax-based mappingmethod computed the appearance similarity of concepts based onthe edit distance of their names rather than their semantics [17]. Ininstance-based methods, concepts are considered the same if theyhave the same instances, whereas concepts are considered similarif they have an equal proportion of the same instances [18]. In rule-based mapping methods, multiple similarity values obtainedthrough a number of heuristic rules are weighted to obtain theaverage similarity of a pair of concepts [19]. Liu and Knappe

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

considered the path length between two concepts in the thesaurus(a vocabulary tree) as the semantic distance [20,21]. Wu synthe-sized semantic distance, semantic overlap ratio, and layered depthto calculate the similarity between two concepts [22].

Comparison and mapping among individual concepts does notaccurately reflect overall ontology alignment. Thus, a number ofscholars have studied concept matching at the structural level. Amapping method based on conceptual hierarchy structure primar-ily involves hyponymy and whole–part relationships. Here, parentand child nodes significantly affect the similarity among currentconcepts, whereas grandchild nodes have minimal effect [23]. Inthe clustering method, ontology mapping is divided into class,property, and relationship mapping, which are completed hierar-chically [24]. Cupid matching by Madhavan belongs to a hybrid ap-proach at the element and structural levels. In this technique,element mapping is calculated first, and then structure mappingis completed from bottom to top in a tree, i.e. the conceptual sys-tem [25]. The COMA matching system was developed to transforma conceptual system into a directed acyclic graph and then used toperform ontology mapping at the structural level using graphmatching [26]. Dieng, Melnik, and Fausto also solved the similaritybetween two ontologies using graph pattern matching [27–29].These structural level algorithms primarily reflect the influenceof adjacent nodes, such as brothers and ancestors, on current nodesimilarity. That is, if neighbor nodes are more similar, then currentnode similarity is further strengthened. Thus, the result of thisontology alignment is node pairs with maximum accumulatedsimilarity.

A number of concept mappings still have to be manually spec-ified in the aforementioned alignment methods and the degree ofautomation is relatively low. In addition, several methods are lim-ited to specific areas, whereas other methods are focused on local(e.g. the neighborhood) rather than global mapping. Algorithmsspecifically developed for ontology mapping include the tree com-parison algorithm based on edit distance [4] and the algorithm forhierarchical case trees [30]. In these algorithms, the semantic sim-ilarity of individual concepts is considered at the element level,whereas the entire correspondence of the ontology is consideredat the structural level. However, these algorithms are more compli-cated because numerous operations are involved at the element le-vel. The proposed algorithm is more concise because it onlyinvolves similarity calculations among matched nodes.

3. Concept trees and similarity of individual concepts

The Semantic Web identifies data as unique and addressableand standardizes the connections between interrelated informa-tion through statements. Each unique data element is connectedto one another to form a large-scale context. That is, there is ontol-ogy with a statement as the basic unit.

A statement consists of a subject, a predicate, and an object. Thesubject is an instance of a class. The predicate is considered as aproperty, which is further subdivided into three types: object, data,and annotation. The object is an instance of a class or text. A classand a property comprise a concept, and such concepts are repre-sented by the vocabulary or terminology of a field. A concept hasabstraction and generalization meanings, whereas an instance onlyhas one specific meaning. The instances of each concept are specif-ically expressed by properties and their values. In ontology, theconcept systems of classes and properties are individually orga-nized as tree patterns called concept trees. These trees are rooted,unordered, and labeled. All instances appear as a graph pattern.Accordingly, these concept trees consist of a class tree, an objectproperty tree, a data property tree, and an annotation property

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 3: A mapping-based tree similarity algorithm and its application to ontology alignment

J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx 3

tree, which have to be individually aligned from the source ontol-ogy to achieve ontology alignment.

Thus, ontology, which is denoted by O, primarily has threeparts: class (C), property (P), and instance (I), i.e. O(C, P, I). There-fore, the three main ontological elements are class concepts, prop-erty concepts, and instances. Class instances perform subject andobject (property value) roles in one statement, whereas propertieslink all instances. Formulae for the determining the semantic sim-ilarity in these three elements are presented in the following.These formulae enable machines to automatically finish theircomputations.

3.1. Similarity of property concepts

The concepts of data, object, and annotation properties are theminimum units of meaning with a single connotation. The vocab-ulary or terms that can express property concepts in the sourceand target ontology can be manually projected on the same wordlibrary (e.g. WordNet) where similarities of terms can be computedusing a path-based method. The idea of the algorithm can be ex-pressed as follows: if the path between two terms in WordNet islonger, then their similarity is smaller.

The similarity simP e [0, 1] between property concepts pc1 eO1(P) and pc2 e O2(P) can be expressed by:

simPðpc1;pc2Þ ¼ 1=f1þ exp½depðpc1Þ=D�gþ 1=f1þ exp½depðpc2Þ=D�g; ð1Þ

where cp(pc1, pc2) denotes the least common parent of pc1 and pc2,and dep(pc) denotes the depth of concept pc from cp(pc1, pc2) inWordNet. The range 1 6 D 61 is a parameter adjusted accordingto user experience.

3.2. Similarity of instances

If two instances include identical properties, then they are thesame. The property overlap ratio of the instances, which is the per-centage of the same or similar properties in both instances, can beused to measure their similarity. The expression for the similaritysimI e [0, 1] between instances I1 e O1(I) and I2 e O2(I) can be writ-ten as follows:

simIðI1; I2Þ ¼ jIpsetðI1Þ \ IpsetðI2Þj=jIpsetðI1Þ [ IpsetðI2Þj; ð2Þ

where Ipset(I) denotes the property pc set of an instance I and|Ipset(I)| denotes the number of properties of I. If simP(I1�pci, I2�pcj)P a, where a is an adjustable constant, then I1 � pci = I2 � pcj. Thatis, property i of instance 1 is the same as property j of instance 2.

3.3. Similarity of class concepts

Class concepts have complex connotations. The accuracy of asimilarity comparison among class concepts cannot be guaranteedby a synonym-based or path-based approach. If two class conceptshave the same properties, then the two concepts are the same. Iftwo concepts have an equal or approximately equal proportion ofthe same properties, then the two concepts are similar. In addition,the instances of a class concept can clearly characterize its essen-tial connotations. Therefore, the similarity between two class con-cepts can be computed using their instance sets.

The similarity simC e [0,1] between class concepts cc1 e O1(C)and cc2 e O2(C) can be expressed via Eqs. (3) and (4),

simCðcc1; cc2Þ ¼ jpsetðcc1Þ \ psetðcc2Þj=jpsetðcc1Þ [ psetðcc2Þj;psetðccÞ ¼

[Ii2cci

IpsetðIiÞ; ð3Þ

simCðcc1; cc2Þ ¼ simIðcc1 � Ii; cc2 � IjÞ; 8Ii 2 cc1; 8Ij 2 cc2; ð4Þ

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

where pset(cc) denotes the property pc set of a class cc, and |pset(cc)|denotes the number of properties belonging to cc. If simP(cc1�pci,cc2�pcj) P b, where b is an adjustable constant, then cc1 � pci = cc2 �pcj, i.e. property i of class 1 is the same as property j of class 2.Eq. (4) is a special (but regular) version of Eq. (3) wherein every in-stance in cc1 or every instance in cc2 includes the same properties.

3.4. Progressive relationship in the three similarities

In general, the similarities between property concepts and be-tween class concepts are calculated using the three aforemen-tioned formulae concomitantly. simP is used for mappingbetween two property concept trees and simC is used for mappingbetween two class concept trees. simI, which is used to comparetwo individual instances, is calculated using simP. simC, which isused to compare two class concepts, is calculated by simI or simP.

The primary goal of ontology alignment is to establish the map-ping of classes and properties from the source ontology to the tar-get ontology. Instances do not need to be mapped because they aredynamically extending. Moreover, the connotations of instancesare effectively expressed by class and property concepts whichact as metadata. However, mapping among concept systems can-not be achieved by merely calculating the similarity between eachpair of concepts. Therefore, overall isomorphic correspondence be-tween class and property trees must be established.

For convenience in the calculation, but without losing general-ity, the following assumptions are made: simC and simp are denotedby sim(i, j) e [0, 1], and sim(i, £) = sim(£, j) = 0, sim(£, £) = 0,where i and j represent a class or property concept, and ø is a nullconcept.

4. Mapping and similarity between concept trees

In previous studies, tree similarity was calculated by determin-ing edit distance, and the two nodes were clearly defined to beeither equal or unequal. The nodes of two concept trees, whichare considered as abstract representations of property or class con-notation, can no longer be represented by a simple equal relation-ship, but by a similarity relationship. Therefore, their similaritycannot be described simply by a 1 or 0, but by the range [0,1].

In the present study, the similarity among unordered concepttrees with regard to a class or a property is measured merely onthe basis of the mapping pairs or similar nodes in the isomorphiccondition, without considering deleted and inserted nodes. Theproposed method and the edit distance method are based on thesame mapping theorem for maintaining tree isomorphism. The for-mer computes the sum of the similarities among mapping pairsand considers it as the tree similarity, whereas the latter calculatesthe cost of three kinds of edit operations, viz. insertion, deletion,and relabeling. The mapping of node pairs to compute the similar-ity between two concept trees is just the alignment of concepttrees.

Let T and T0 be individually rooted, unordered concept trees inthe source and target ontologies, respectively. T[i] denotes thenode of T with position i in pre-order or post-order. The numberof nodes in T is denoted by |T| = m. T0[j] and |T0| = n are defined sim-ilarly for the target ontology. If the label of node T[i] is f(i), and thelabel of node T0[j] is k(j), then f(i) and k(j) represent property orclass concepts. Therefore, the similarity sim(i, j) between two indi-vidual concepts f(i) and k(j) is simP(pc1, pc2) according to Eq. (1), orsimC(cc1, cc2) according to Eqs. (3) or (4).

A mapping among concept trees produced by edit operations isa graphical specification of the application of edit operations toeach node in the two trees. A diagram of a mapping is presentedin Fig. 1.

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 4: A mapping-based tree similarity algorithm and its application to ontology alignment

f8(x)

f5(x) f6(x)

f4(x)

f3(x)

f7(x) k4(y) k6(y)

k3(y)

k1(y) k2(y)

f1(x)

k7(y)

f2(x) k5(y)

Fig. 1. Mapping between trees.

T T`

i+1 j+1

T T`

i+1 j+1

4 J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx

In the figure, the dotted line from T[i] to T0[j] indicates thatd 6 sim(i, j) 6 1, where d is a constant, and (T[i], T0[j]) or (i, j) is amapping pair or is sufficiently similar. The nodes of T and T0 un-touched by the dotted lines are not matched and, thus, remain un-changed when the two concept trees are compared. Such a diagramis called a tree mapping and is expressed as the triplet (M, T, T0),where M is any set of mapping pairs of (i, j) (1 6 i 6 |T|, 1 6 j 6 |T0|)from T to T0 that satisfies the mapping theorem.

For any pair (i1, j1) and (i2, j2) in M:

(a) i1 = i2 in T if, and only if, j1 = j2 in T0 (one-to-one);(b) i1 is an ancestor (or descendant) of i2 in T if, and only if, j1 is

an ancestor (or descendant) of j2 in T0 (ancestor–descendantrelationship).

Based on this theorem, when tree structures are kept intactwithout inserting and deleting operations, the accumulative simi-larity of mapping pairs, i.e. the sum of sim(i, j) (the value of whichis generally more than 1) can adequately reflect the similarity be-tween two trees.

The sum of sim(i, j) is defined as the mapping similarityS_MAP(M) or S_MAP(T, T0) between T and T0. That is:

S MAPðMÞ ¼ S MAPðT; T 0Þ ¼Xði;jÞ2M

simði; jÞ: ð5Þ

In any sequence of edit operations transforming T into T0, amapping M exists such that S_MAP(M) is maximum and nodes withinserting and deleting operations have minimal contributions tothe similarity of the mapping. If every sim(i, j) in M is 1, then themaximum S_MAP(M) is the maximum number of mapping pairs.Thus, the similarity between two trees S_TREE(T, T0) is determinedby the maximum S_MAP(M) from T to T0.

S_TREE(T, T0) and its set of mapping pairs MAP, which is alsoregarded as the largest common sub-tree of both trees, can be cal-culated by determining the maximum mapping similarity,

Case 1

i j

Case 2

i j

S TREEðT; T 0Þ ¼ MAXfS MAPðMÞg ¼ MAXXði;jÞ2M

simði; jÞ( )

; ð6Þ

MAP ¼ f� � � ði; jÞ � � �gjMAXfS MAPðMÞg: ð7Þ

T T`

i+1

Case 3

j+1

i j

T T`

i+1

Case 4

j+1

i j

Fig. 2. Four determined cases.

5. Analysis of the mapping computation between concept trees

In this study, a dynamic programming scheme based on the map-ping theorem is used to compute S_TREE(T, T0). To number the nodesin an unordered tree, an arbitrary order is fixed among the childrenof each interior node to yield an ordered tree. The nodes in T and T0

are numbered post-order, as shown below. Let T(i1:i2) denote theportion of T consisting of nodes T[i1],T[i1+1], . . . ,T[i2]. The portion willalso be expressed as T(i2). When T[i1] is the leftmost leaf node of thetree rooted at T[i2], T[i1] is also represented as L(i2). T(i1:i2) or

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

T(L(i2):i2) is called a sub-tree and is denoted as ST(i2). SF(i2) orSF(L(i2):i2) is an unordered forest that can delete root node T[i2] fromST(i2). The child nodes of T[i2] are represented by C1(i2),C2(i2), . . . ,Ck(i2).

T0(j1:j2), ST0(j2), L(j2), SF(j2), and C1(j2),C2(j2), . . . ,Cl(j2) are simi-larly defined.

Let S_TREE(i, j) or S_TREE(i1:i, j1:j) be the similarity of trees T(i1:i)and T0(j1:j), which are more likely to be forests. MAP_TREE(i, j) is itsmapping pair set, i1 2 {1,2, . . . , i � 1}, j12{1,2, . . . , j � 1}. Thus,

S TREEði; jÞ ¼ MAXX

ði;jÞ2MAP TREE

simði; jÞ" #

; ð8Þ

MAP TREEði; jÞ ¼ f. . . ; ði; jÞ; . . .gjS TREEði; jÞ: ð9Þ

S_SUBTREE(ST(i), ST0(j)) or S_SUBTREE((L(i):i, L(j):j) or S_SUB-TREE(i, j) is the similarity between sub-trees ST(i) or T(L(i):i) andST0(j) or T(L(j):j). MAP_SUBTREE(ST(i), ST0(j)) or MAP_SUBTREE(i, j)is its mapping pair set.

In dynamic programming, at least one of the following fourcases must hold (Fig. 2):

Case 1: T[i + 1] is untouched by a dotted line. Then T(i) is com-pared with T0(j + 1), S_TREE(i + 1, j + 1) = S_TREE(i, j + 1) + sim(i + 1, ø) = S_TREE(i, j + 1).Case 2: T0[j + 1] is untouched by a dotted line. Then T(i + 1) iscompared with T0(j), S_TREE(i + 1, j + 1) = S_TREE(i + 1, j) + sim(ø, j + 1) = S_TREE(i + 1, j).Case 3: Both T[i + 1] and T0[j + 1] are in MAP_TREE(i + 1, j + 1).Suppose (i + 1, t) and (s, j + 1) are in MAP_TREE(i + 1, j + 1). Then,node s is in T(i) and node t is in T0(j). As a special case, (i + 1,j + 1) in MAP_TREE(i + 1, j + 1) are touched by the same dottedline. So, S_TREE(i + 1, j + 1) = S_TREE(i, j) + sim(i + 1, j + 1) =S_TREE(L(i + 1) � 1, L(j + 1) � 1) + S_SUBTREE(i + 1, j + 1), MAP_TREE(i + 1, j + 1) = MAP_TREE(i, j) + (i + 1, j + 1) = MAP_TREE(L(i + 1) � 1, L(j + 1) � 1) + MAP_SUBTREE(i + 1, j + 1).Case 4: T[i + 1] and T0[j + 1] are not in MAP_TREE(i + 1, j + 1).Then, S_TREE(i + 1, j + 1) = S_TREE(i, j), MAP_TREE(i + 1, j + 1) =MAP_TREE(i, j).

Therefore, based on the four determined cases:

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 5: A mapping-based tree similarity algorithm and its application to ontology alignment

sub-tree ST(i2) sub-tree ST(j2), ST(j`2)

T T`

i1 j1

i2j2

j`2

j`1

Fig. 3. Sub-trees and ancestor–descendant relationship.

travel

traffic visitor

land

bus

sights transport tourist

road

bus light bus

ship

tour

train business

Fig. 5. Class tree matching in a travel ontology.

J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx 5

S TREEðiþ 1; jþ 1Þ ¼ MAX½S TREEði; jÞ þ simði; jÞ;S TREEðiþ 1; jÞ; S TREEði; jþ 1Þ; S TREEði; jÞ�¼ MAX½S TREEði; jÞ þ simði; jÞ;

S TREEðiþ 1; jÞ; S TREEði; jþ 1Þ�: ð10Þ

Given that sim(i, j) P 0, Case 4 is omitted.When (i + 1, j + 1) are in MAP_TREE, all their descendants can be

matched mutually according to the mapping theorem. In Fig. 3, if(i2, j2) is in MAP_TREE, then any (i1, j1) may be in MAP_TREE, i1 isin SF(i2) and j1 is in SF(j2), but ði1; j

01Þ must not be in MAP TREE; j01

is in SFðj02Þ. Therefore, if only ST(i + 1) and ST(j + 1) are compared,then S_TREE(i + 1, j + 1) = S_SUBTREE(ST(i + 1), ST0(j + 1)), and MAP_TREE(i + 1, j + 1) = MAP_SUBTREE(i, j).

The following typical situations are analyzed to further illus-trate the aforementioned ideas (see Fig. 4).

A tree with one node is called a single-node tree. A two-leveltree with a parent node and child nodes is called a basic tree. A treewith three levels or more is called a complex tree.

(1)

(2)

(3)

(5)

i j

kl

h

i j

g

(4)

h

i j

g

Fig. 4. Typical situations.

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

(1) When T and T0 are single-node trees, (i, j) in MAP_TREE canhold, and S_TREE(T, T0) = MAX[sim(i, j), sim(ø, j), sim(i, ø)] =sim(i, j).

(2) When T or T0 is a single-node tree and the other is a complextree, (i, j) in MAP_TREE can hold, and S_TREE(T, T0) = MAX[sim(1, j),sim(2, j), . . . ,sim(i, j)].

(3) When T and T0 are basic trees, i and j represent the parentnodes of T and T0, respectively. The four cases exist, and Case4 is omitted.

In Case 2, (i, 1) or . . . or (i, t) or . . . or (i, j � 1) in MAP_TREE holds.S_TREE(T, T0) = MAX[sim(i, 1),sim(i, 2), . . . ,sim(i, j � 1)] = sim(i, x),and (i, x) is in MAP_TREE, which is the same as in Case 1.In Case 3, (i, j) must be in MAP_TREE. S_TREE(T, T0) = S_TREE(i � 1, j � 1) + sim(i, j), where S_TREE(i � 1, j � 1) is the maxi-mum sum of the similarities among paired nodes by crosswisecomparison of their child node sets CSET = {C1(i),C2(i), . . . ,Ck(i)}and CSET0 = {C1(j),C2(j), . . . ,Cl(j)}. This case is the optimal match-ing problem of a weighted bipartite graph CSET[CSET0,|CSET| = |CSET0| when null nodes are inserted if k – l [31]. Theweight is the similarity between a pair of nodes, and the opti-mal pairs are in MAP_TREE.

(4) When T and T0 are ST(i) and ST(j), the four cases exist, andCase 4 is omitted.

In Case 2, sub-tree ST(i) is compared with the sub-trees rootedat the children of node j. S_TREE(i, j) = S_TREE(ST(i), SF(j)) =MAX[S_TREE(ST(i), ST(C1(j)], S_TREE(ST(i),ST(C2(j)), . . . ,S_TREE(ST(i), ST(Cl(j))) = S_TREE(ST(i), ST(Cx(j)), and MAP_TREE(i, j) =MAP_TREE(ST(i), SF(j)) = MAP_TREE(ST(i), ST(Cx(j))).Case 1 is the same as Case 2, in which S_TREE(SF(i), ST(j)) =S_TREE(ST(Cy(i), ST(j)).In Case 3, (i, j) must be in MAP_TREE according to the mappingtheorem. S_TREE(T, T0) = S_TREE(i � 1, j � 1) + sim(i, j). In fact,S_TREE(i � 1, j � 1) = S_TREE(SF(i), SF(j)). The computation ofS_TREE(SF(i), SF(j)) has three cases:Case S1: SF(i) is one part of SF(j), S_TREE(SF(i), SF(j)) = MAX(S_TREE(SF(i),SF(C1(j)),S_TREE(SF(i),SF(C2(j)), . . . ,S_TREE(SF(i), SFCl(j))) = S_TREE(SF(i), SF(Cxf(j)).Case S2: SF(j) is one part of SF(i), S_TREE(SF(i), SF(j)) =MAX(S_TREE(SF(C1(i), SF(j)),S_TREE(SF(C2(i), SF(j)), . . . ,S_TREE(SF(Ck (i),SF(j))) = S_TREE(SF(Cyf(i),SF(j)).Case S3: SF(i) and SF(j) are mutually matched, S_TREE(SF(i),SF(j)) is the maximum sum of the similarities among the pairedsub-trees rooted at their children CSET = {C1(i),C2(i), . . . ,Ck(i)}and CSET0 = {C1(j),C2(j), . . . ,Cl(j)}. This case is the optimal match-ing problem of a weighted bipartite graph CSET[CSET0,|CSET| = |CSET0| when null trees are inserted if k – l. The weightis the similarity of a pair of sub-trees, and the optimal pairs arein MAP_TREE. Thus, S_TREE(SF(i), SF(j)) is represented asKM(SF(i), SF(j)).Based on Cases S1, S2, and S3, S_TREE(SF(i), SF(j)) = MAX(S_TREE(SF(i), SF(Cxf(j)), S_TREE(SF(Cyf(i), SF(j)), KM(SF(i), SF(j))),and MAP_TREE(SF(i), SF(j)) corresponds to S_TREE(SF(i), SF(j)).Based on Cases 1, 2, 3, and 4, S_SUBTREE(ST(i), ST(j)) = MAX(S_TREE(ST(i), ST(Cx(j)), S_TREE(ST(Cy(i), ST(j)), S_TREE(SF(i),

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 6: A mapping-based tree similarity algorithm and its application to ontology alignment

6 J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx

SF(j)) + sim(i, j)), and MAP_SUBTREE(ST(i), ST(j)) corresponds toS_SUBTREE(ST(i), ST(j)).

(5) When T and T0 are complex trees, one sub-tree in T is com-pared with any sub-tree in T0, and vice versa. Finally, T andT0, which serve as the biggest sub-trees, are recursively com-pared by their sub-trees and forests from leaf nodes to root.

Thus, trees are decomposed into sub-trees recursively. Thetechnique for computing mapping-based similarity between twounordered trees is as follows.

(1) T and T0 are decomposed into sub-trees by each node.(2) Each similarity between one sub-tree in T and one sub-tree

in T0 is computed following a certain order.(3) T and T0 are the biggest sub-trees, S_TREE(T, T0) and MAP_

TREE(T, T0) are then computed.

6. Implementation of the mapping-based tree similarityalgorithm

Let Ø be a null tree, so S_TREE(T, Ø) = S_TREE(Ø, T0) = 0, andS_TREE(Ø, Ø) = 0 are used as initial conditions for the followingcomputations.

The nodes in T and T0 are numbered post-order and placed,respectively, in two structure-type arrays T[m] and T0[n] sequen-tially by a child–sibling link. Each element in T[m] and T0[n] con-tains four fields:

{Code /�post-order number�/Data /�concept term or label�/Child /�the leftmost son pointer�/Sibling /�the right sibling pointer�/

}.

L(i) is the function for obtaining the leftmost leaf node of sub-tree ST(i). C(i) is the function for obtaining the child nodes of nodei. Thus, C(i) in T is placed in Q[k] and C(j) in T0 is placed in Q0[l]. Ifk > l, then |k � l| Ø is inserted into Q0[l], and vice versa.

The optimal matching of Q[] and Q0[] is computed using thefunction KM(Q[], Q0[]) to obtain S_TREE(SF(i), SF(j)) = S_KM(Q[],Q0[]) and their mapping node set MAP_TREE(SF(i), SF(j)) =MAP_KM(Q[], Q0[]). KM(Q[], Q0[]) is a kind of Kuhn–Munkres algo-rithm [31].

P(2

Algorithm:

lease cite this a013), http://dx

(1)

For i = 1 to m (2) For j = 1 to n (3) C(i) is placed in Q[k]; (4) C(j) is placed in Q0[l]; (5) For j0 = 1 to l (6) {If A < S_TREE(SF(i), SF(Q0[j0])) (7) A = S_TREE(SF(i), SF(Q0[j0])); (8) MAP_A = MAP_TREE(SF(i), SF(Q0[j0]));} (9) For i0 = 1 to k (10) {If B < S_TREE(SF(Q[i0]), SF(j)) (11) B = S_TREE(SF(Q[i0]), SF(j)); (12) MAP_B = MAP_TREE(SF(Q[i0]), SF(j));} (13) {k0 = MAX(k, l) (14) C = S_KM(Q[k0], Q0[k0]); (15) MAP_C = MAP_KM(Q[k0], Q0[k0]);}

{

(16) If A > B (17) S_TREE(SF(i), SF(j)) = A;

rticle in press as: J. Wang et al., A mapping-based tree similarity.doi.org/10.1016/j.knosys.2013.11.002

(18)

algorithm and

MAP_TREE(SF(i), SF(j)) = MAP_A;

(19) Else (20) S_TREE(SF(i), SF(j)) = B; (21) MAP_TREE(SF(i), SF(j)) = MAP_B; (22) If S_TREE(SF(i), SF(j)) < C (23) S_TREE(SF(i), SF(j)) = C; (24) MAP_TREE(SF(i), SF(j)) = MAP_C;

/� S_TREE(SF(i), SF(j)) put in array S_F[i][j],MAP_TREE(SF(i), SF(j)) put in M_F[i][j]�/

}

(25) For j0 = 1 to l (26) {If ASUB < S_SUBTREE(ST(i), ST(Q0[j0])) (27) ASUB = S_SUBTREE(ST(i), ST(Q0[j0])); (28) MAP_ASUB = MAP_SUBTREE(ST(i), ST(Q0[j0]));} (29) For i0 = 1 to k (30) {If BSUB < S_SUBTREE(ST(Q[i0]), ST(j)) (31) BSUB = S_SUBTREE(ST(Q[i0]), ST(j)); (32) MAP_BSUB = MAP_SUBTREE(ST(Q[i0]), ST(j));}

{

(33) If ASUB > BSUB (34) S_SUBTREE(ST(i), ST(j)) = ASUB; (35) MAP_SUBTREE(ST(i), ST(j)) = MAP_ASUB; (36) Else (37) S_SUBTREE(ST(i), ST(j)) = BSUB; (38) MAP_SUBTREE(ST(i), ST(j)) = MAP_BSUB; (39) If S_SUBTREE(ST(i), ST(j)) < S_TREE(SF(i),

SF(j)) + sim(i, j)

(40) S_SUBTREE(ST(i), ST(j)) = S_TREE(SF(i),

SF(j)) + sim(i, j);

(41) MAP_SUBTREE(ST(i), ST(j)) = MAP_TREE(SF(i),

SF(j)) + (i, j)

/� S_SUBTREE(ST(i), ST(j)) put in array S_T[i][j],

MAP_SUBTREE(ST(i), ST(j)) put in M_T[i][j]�/

}

(42)

End.

The final output of the algorithm is as follows: S_TREE(T,T0) = S_TREE(m, n) = S_SUBTREE(m, n), MAP_TREE(T, T0) = MAP_TREE(m, n) = MAP_SUBTREE(m, n).

The complexity of the algorithm is primarily affected by themaximum number of nodes, N, in the trees and the tree degree,D. If the complexity of KM() is D2 + D, then the complexity of thealgorithm is N2 � (D2 + 5D), and therefore O(n4) in the worst case.

Though the complexity of the algorithm is the same as that ofthe existing algorithms, the mapping-based algorithm saves therunning time at each step by removing the cost functions of inser-tion and deletion. Thus, the total running time can be reduced byabout 66%.

7. Application to ontology alignment

In most cases, the ontology can be organized into a tree struc-ture in which each node represents one concept about a class orproperty. In the Protégé software package, for example, the ontol-ogy includes property trees, class trees, and instance graphs. Themapping-based tree similarity algorithm is applied to ontologyalignment and integration to obtain the overall isomorphic map-ping of the concept trees instead of merely the mapping of individ-ual concepts. In practice, class trees and all types of property treesmust be individually aligned.

Example 1. The trees in Fig. 1 are instantiated as class trees in thetravel ontology shown in Fig. 5.

its application to ontology alignment, Knowl. Based Syst.

Page 7: A mapping-based tree similarity algorithm and its application to ontology alignment

Table A.1Matrix of the similarity of trees (S_TREE).

S_F j = 1 j = 2 j = 3 j = 4 j = 5 j = 6 j = 7

i = 1 0 0 0 0 0 0 0i = 2 0 0 0 0 0 0 0i = 3 1 1 1 1 1 1 1i = 4 1 1 1.7 1.7 1.7 1.7 1.7i = 5 1 1 1.7 2.55 2.55 2.55 2.55i = 6 1 1 1.7 2.55 2.55 3.45 3.45i = 7 1 1 1.7 2.55 2.55 3.45 3.45i = 8 1 1 1.7 2.55 2.55 3.45 4.45

J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx 7

Eq. (3) is used to calculate simC(visitor, tourist). Their propertiesare distinguished by their instances, as follows: visitor (name, sex,ID, age, job, hobby, telephone, e-mail, and address) and tourist(name, sex, ID, age, job, hobby, telephone, e-mail, address, and sal-ary). Therefore,

simCðvisitor; touristÞ ¼ simCðcc1; cc2Þ¼ jpsetðcc1Þ \ psetðcc2Þj=jpsetðcc1Þ [ psetðcc2Þj ¼ 9=10 ¼ 0:9:

For the sake of simplicity, the other results are given while omittingthe computation process. The similarity of the roots are sim(8, 7) =

Data resource Input parameter

ArtificialNature

Literal

Digital Flux

Table

WoodyPetrous

Streams

Ca

Property tree T

Usage function

Storage

Destroy

Sink

Consume

Store Export

Create Empty

Source

Add

Absorb

Class tree T

Input

Fig. 6. Property and class tree matchin

Table A.2Mapping set of trees (MAP_TREE).

M_T 1 2 3 4 5

1 / / / / /2 / / / / /3 (bus,

bus)/ (bus, bus) (bus, bus) /

4 (bus,bus)

/ (bus, bus) (la/d,road)

(bus, bus) (la/d, road) /

5 (bus,bus)

/ (bus, bus) (la/d,road)

(bus, bus) (la/d, road) (traffic, tra/sport)

/

6 / / / / /

7 / / / / /8 (bus,

bus)/ (bus, bus) (la/d,

road)(bus, bus) (la/d, road) (traffic, tra/sport)

/

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

simC(travel, tour) = 1, sim(6, 6) = simC(visitor, tourist) = 0.9, sim(5,4) = simC(traffic, transport) = 0.85, sim(4, 3) = simC(land, road) = 0.7,sim(3, 1) = simC(bus, bus) = 1. The rest are equal to zero.The resultsfor S_TREE and MAP_TREE obtained using the mapping-based treesimilarity algorithm are shown in Tables A.1 and A.2. The similarityis S_TREE(8, 7) = 4.45 and MAP_TREE(8, 7) = MAP_SUBTREE(8, 7). Thetravel ontology alignment is {(bus, bus) (land, road) (traffic, trans-port) (visitor, tourist) (travel, tour)}.

Example 2. The function ontology and property vocabulary for thisexample were set up for product design according to the functiontaxonomy from the National Institute of Standards and Technology[32] in the Shandong Provincial Key Laboratory for DistributedComputer Software Novel Technology. The class and the propertytree branches in the function ontology are shown in Fig. 6.

The similarity between property concepts simP() is computed usingEq. (1):

depðInputÞ ¼ 1; depðDeviceÞ ¼ 2; D ¼ 2;

simPðInput;DeviceÞ ¼ 1=f1þ exp½depðInputÞ=D�gþ 1=f1þ exp½depðDeviceÞ=D�g ¼ 0:65:

High

ColorTexture

Bright Dark

Input

Device

pacityParameter

Form

Method

PowerRedirect

Property tree T`

Collect

Usage function

EmitExtract

Source

FatuousExcavate

Resign

ConsumeEmpty

SinkStorage

Class tree T`

g in the function design ontology.

6 7

/ // // (bus, bus)

/ (bus, bus) (la/d, road)

/ (bus, bus) (la/d, road) (traffic, tra/sport)

(visitor,tourist)

(visitor, tourist)

/ /(visitor,tourist)

(bus, bus) (la/d, road) (traffic, tra/sport) (visitor, tourist)(travel, tour)

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 8: A mapping-based tree similarity algorithm and its application to ontology alignment

Table B.1Matrix of the similarity of individual property concepts (simP()).

j = 0 j = 1 j = 2 j = 3 j = 4 j = 5 j = 6 j = 7 j = 8 j = 9 j = 10 j = 11 j = 12

i = 0 0.36 0.36 0.3 0.3 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 1 0.3 0.3 0.24 0.24 0.3 0.3 0.39 0.3 0.3 0.39 0.39 0.39 0.5i = 2 0.3 0.3 0.24 0.24 0.3 0.3 0.39 0.3 0.3 0.39 0.39 0.39 0.5i = 3 0.3 0.3 0.24 0.24 0.3 0.3 0.39 0.3 0.3 0.39 0.39 0.39 0.5i = 4 0.36 0.36 0.3 0.3 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 5 0.45 0.45 0.39 0.39 0.45 0.45 0.54 0.45 0.45 0.54 0.54 0.54 0.65i = 6 0.36 0.36 0.3 0.3 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 7 0.36 0.36 0.3 0.3 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 8 0.36 0.36 0.3 0.3 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 9 0.36 0.36 0.3 0.3 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 10 0.45 0.45 0.39 0.39 0.45 0.45 0.54 0.45 0.45 0.54 0.54 0.54 0.65i = 11 0.56 0.56 0.5 0.5 0.56 0.56 0.65 0.56 0.56 0.65 0.65 0.65 0.76

Table B.2Matrix of the similarity of property trees (S_TREE).

j = 0 j = 1 j = 2 j = 3 j = 4 j = 5 j = 6 j = 7 j = 8 j = 9 j = 10 j = 11 j = 12

i = 0 0.36 0.36 0 0 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 1 0 0 0 0 0 0 0.39 0 0 0.39 0.39 0.39 0.5i = 2 0 0 0 0 0 0 0.39 0 0 0.39 0.39 0.39 0.5i = 3 0 0 0 0 0 0 0.39 0 0 0.39 0.39 0.39 0.5i = 4 0.36 0.36 0 0 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 1.73i = 5 0.45 0.45 0.39 0.39 0.45 0.45 1.26 0.45 0.45 1.26 0.54 0.54 1.55i = 6 0.36 0.36 0 0 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 7 0.36 0.36 0 0 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 8 0.36 0.36 0 0 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 9 0.36 0.36 0 0 0.36 0.36 0.45 0.36 0.36 0.45 0.45 0.45 0.56i = 10 0.45 0.45 0.39 0.39 0.45 0.45 1.98 0.45 0.45 1.26 0.54 0.54 2.45i = 11 0.56 0.56 0.5 0.5 1.34 0.56 1.55 0.56 0.56 1.55 0.65 0.65 3.28

Table B.3Mapping set of property trees (MAP_TREE).

j = 0 j = 1 j = 2 j = 3 j = 4 j = 5 j = 6 j = 7 j = 8 j = 9 j = 10 j = 11 j = 12

i = 0 (Artificial,Capacity)

(Artificial,Texture)

0 0 (Artificial,Color)

(Artificial,High)

(Artificial,Device)

(Artificial,Power)

(Artificial,Redirect)

(Artificial,Parameter)

(Artificial,Form)

(Artificial,Method)

(Artificial,Input)

i = 1 0 0 0 0 0 0 (Woody,Device)

0 0 (Woody,Parameter)

(Woody,Form)

(Woody,Method)

(woody,Input)

i = 2 0 0 0 0 0 0 (Petrous,Device)

0 0 (Petrous,Parameter)

(Petrous,Form)

(Petrous,Method)

(Petrous,Input)

i = 3 0 0 0 0 0 0 (Streams,Device)

0 0 (Streams,Parameter)

(Streams,Form)

(Streams,Method)

(Streams,Input)

i = 4 (Nature,Capacity)

(Nature,Texture)

0 0 (Nature,Color)

(Nature,High)

(Nature,Device)

(Nature,Power)

(Nature,Redirect)

(Nature,Parameter)

(Nature,Form)

(Nature,Method)

(Nature,Input)(Woody,Device)(Petrous,Parameter)(Streams,Form)

i = 5 (Dataresource,Capacity)

(Dataresource,Texture)

(Dataresource,Bright)

(Dataresource,Dark)

(Dataresource,Color)

(Dataresource,High)

(Dataresource,Device)(Artificial,Capacity)(Nature,Texture)

(Dataresource,Power)

(Dataresource,Redirect)

(Dataresource,Parameter)(Artificial,Power)(Nature,Redirect)

(Dataresource,Form)

(Dataresource,Method)

(Dataresource,Input)(Artificial,Device)(Nature,Parameter)

i = 6 (Literal,Capacity)

(Literal,Texture)

0 0 (Literal,Color)

(Literal,High)

(Literal,Device)

(Literal,Power)

(Literal,Redirect)

(Literal,Parameter)

(Literal,Form)

(Literal,Method)

(Literal,Input)

i = 7 (Digital,Capacity)

(Digital,Texture)

0 0 (Digital,Color)

(Digital,High)

(Digital,Device)

(Digital,Power)

(Digital,Redirect)

(Digital,Parameter)

(Digital,Form)

(Digital,Method)

(Digital,Input)

i = 8 (Flux,Capacity)

(Flux,Texture)

0 0 (Flux,Color)

(Flux,High)

(Flux,Device)

(Flux,Power)

(Flux,Redirect)

(Flux,Parameter)

(Flux,Form)

(Flux,Method)

(Flux, Input)

i = 9 (Table,Capacity)

(Table,Texture)

0 0 (Table,Color)

(Table,High)

(Table,Device)

(Table,Power)

(Table,Redirect)

(Table,Parameter)

(Table,Form)

(Table,Method)

(Table, Input)

8 J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx

Please cite this article in press as: J. Wang et al., A mapping-based tree similarity algorithm and its application to ontology alignment, Knowl. Based Syst.(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

Page 9: A mapping-based tree similarity algorithm and its application to ontology alignment

Table C.1Matrix of the similarity of individual class concepts (simC()).

j = 0 j = 1 j = 2 j = 3 j = 4 j = 5 j = 6 j = 7 j = 8 j = 9 j = 10

i = 0 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 1 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 2 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 3 0.29 0.29 0.55 0.67 0.18 0.18 0.18 1 0.29 0.6 0.5i = 4 0.22 0.22 0.91 0.9 0.13 0.13 0.13 0.6 0.22 1 0.83i = 5 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 6 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 7 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 8 0.32 0.32 1 0.82 0.125 0.125 0.125 0.55 0.32 0.91 0.92i = 9 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 10 0.25 0.25 0.73 0.89 0.15 0.15 0.15 0.75 0.25 0.89 0.67i = 11 0.24 0.24 0.82 1 0.14 0.14 0.14 0.7 0.24 0.9 0.75i = 12 0.4 0.4 0.92 0.75 0.35 0.35 0.35 0.5 0.4 0.83 1

Table C.2Matrix of the similarity of class trees (S_TREE).

j = 0 1 2 3 4 5 6 7 8 9 10

i = 0 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 1 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 2 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 3 0.29 0.29 0.55 0.67 0 0 0 1 0.29 1 1i = 4 0 0 1.41 0.9 0 0 0 0.6 0 2.25 3.34i = 5 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 6 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 7 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 8 0.32 0.32 1.5 0.82 0 0 0 0.55 0.32 1.91 3.43i = 9 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 10 0.25 0.25 0.73 0.89 0 0 0 0.75 0.25 0.89 0.67i = 11 0 0 1.32 1 0 0 0 0.7 0 1.9 2.53i = 12 0.4 0.4 1.5 1 0.35 0.35 0.35 1 0.4 2.25 5.75

Table B.3 (continued)

j = 0 j = 1 j = 2 j = 3 j = 4 j = 5 j = 6 j = 7 j = 8 j = 9 j = 10 j = 11 j = 12

i = 10 (Inputparameter,Capacity)

(Inputparameter,Texture)

0 0 (Inputparameter,Color)

(Inputparameter,High)

(Inputparameter,Device)(Literal,Capacity)(Digital,Texture)(Flux,Color)(Table,High)

(Inputparameter,Power)

(Inputparameter,Redirect)

(Inputparameter,Parameter)(Literal,Power)(Digital,Redirect)

(Inputparameter,Form)

(Inputparameter,Method)

(Inputparameter,Input)(Literal,Device)(Digital,Parameter)(Flux, Form)(Table,Method)

i = 11 (Input,Capacity)

(Input,Texture)

(Input,Bright)

(Input,Dark)

(Input,Color)(Dataresource,Bright)(Inputparameter,Dark)

(Input,High)

(Input,Device)(Dataresource,Capacity)(Inputparameter,Texture)

(Input,Power)

(Input,Redirect)

(Input,Parameter)(Dataresource,Power)(Inputparameter,Redirect)

(Input,Form)

(Input,Method)

(Input, Input)(Dataresource,Device)(Artificial,Capacity)(Nature,Texture)(Inputparameter,Parameter)(Literal,Power)(Digital,Redirect)

J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx 9

The other similarities of the property concepts are found simi-larly, as shown in Table B.1.

The similarity of the property trees S_TREE() is 3.28. The align-ment of the property trees MAP_TREE() is {(Input, Input) (Data re-source, Device) (Artificial, Capacity) (Nature, Texture) (Input

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

parameter, Parameter) (Literal, Power) (Digital, Redirect)}. The de-tailed results from the mapping-based tree similarity algorithm areshown in Tables B.2 and B.3.

The property pair (pci, pcj) in the above alignment of propertytrees means that pci = pcj is used in the following computation of

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 10: A mapping-based tree similarity algorithm and its application to ontology alignment

Table C.3Mapping set of class trees (MAP_TREE).

j = 0 1 2 3 4 5 6 7 8 9 10

i = 0 (Absorb,Extract)

(Absorb,Emit)

(Absorb,Source)

(Absorb,Storage)

0 0 0 (Absorb,Empty)

(Absorb,Consume)

(Absorb,Sink)

(Absorb, Usagefunction)

i = 1 (Destroy,Extract)

(Destroy,Emit)

(Destroy,Source)

(Destroy,Storage)

0 0 0 (Destroy,Empty)

(Destroy,Consume)

(Destroy,Sink)

(Destroy, Usagefunction)

i = 2 (Consume,Extract)

(Consume,Emit)

(Consume,Source)

(Consume,Storage)

0 0 0 (Consume,Empty)

(Consume,Consume)

(Consume,Sink)

(Consume, Usagefunction)

i = 3 (Empty,Extract)

(Empty,Emit)

(Empty,Source)

(Empty,Storage)

0 0 0 (Empty,Empty)

(Empty,Consume)

(Empty,Empty)

(Empty, Empty)

i = 4 0 0 (Sink,Source)(Absorb,Extract)(Destory,Emit)

(Sink,Storage)

0 0 0 (Sink,Empty)

0 (Sink, Sink)(Absorb,Consume)(Empty,Empty)

(Sink, Usagefunction) (Absorb,Storage) (Destroy,Sink) (Consume,Source)

i = 5 (Add,Extract)

(Add,Emit)

(Add,Source)

(Add,Storage)

0 0 0 (Add,Empty)

(Add,Consume)

(Add, Sink) (Add, Usagefunction)

i = 6 (Create,Extract)

(Create,Emit)

(Create,Source)

(Create,Storage)

0 0 0 (Create,Empty)

(Create,Consume)

(Create,Sink)

(Create, Usagefunction)

i = 7 (Emport,Extract)

(Emport,Emit)

(Emport,Source)

(Emport,Storage)

0 0 0 (Emport,Empty)

(Emport,Consume)

(Emport,Sink)

(Emport, Usagefunction)

i = 8 (Source,Extract)

(Source,Emit)

(Source,Source)(Add,Extract)(Create,Emit)

(Source,Storage)

0 0 0 (Source,Empty)

(Source,Consume)

(Source,Sink) (Add,Empty)(Create,Consume)

(Source, Usagefunction) (Add,Storage) (Create,Sink) (Emport,Source)

i = 9 (Store,Extract)

(Store,Emit)

(Store,Source)

(Store,Storage)

0 0 0 (Store,Empty)

(Store,Consume)

(Store,Sink)

(Store, Usagefunction)

i = 10 (Collect,Extract)

(Collect,Emit)

(Collect,Source)

(Collect,Storage)

0 0 0 (Collect,Empty)

(Collect,Consume)

(Collect,Sink)

(Collect, Usagefunction)

i = 11 0 0 (Storage,Source)(Store,Extract)(Collect,Emit)

(Storage,Storage)

0 0 0 (Storage,Empty)

0 (Storage,Sink)(Store,Empty)(Collect,Consume)

(Storage, Usagefunction) (Store,Storage) (Collect,Sink)

i = 12 (Usagefunction,Extract)

(Usagefunction,Emit)

(Source,Source)(Add,Extract)(Create.Emit)

(Usagefunction,Storage)

(Usagefunction,Resign)

(Usagefunction,Excavat)

(Usagefunction,Fatuous)

(Empty,Empty)

(Usagefunction,Consume)

(Sink, Sink)(Empty,Empty)(Absorb,Consume)

(Usage function,Usage function)(Sink, Sink) (Source,Source) (Storage,Storage) (Empty,Empty) (Absorb,Consume) (Add,Extract) (Create,Emit)

10 J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx

the class similarity. The similarity between class concepts simC() iscomputed using Eq. (4):

simCðStorage; SinkÞ ¼ simIðStorage � I1; Sink � I1Þ ¼ 9=10 ¼ 0:9:

Similarly, the other similarities are as shown in Table C.1.The simi-larity of the class trees S_TREE() is 5.75. The alignment of the classtrees MAP_TREE() is {(Usage function, Usage function) (Sink, Sink)(Source, Source) (Storage, Storage) (Empty, Empty) (Absorb, Con-sume) (Add, Extract) (Create, Emit)}. The detailed results obtainedusing the mapping-based tree similarity algorithm are given inTables C.2 and C.3.

8. Discussion and conclusions

Concepts in ontology are divided into two categories: class andproperty and property and class trees are the most important partsof an ontology. Their matching using metadata alignment achievesmost of the ontology alignment. In this study, the similarity be-tween individual property concepts is initially set up using depth,and the similarity between individual class concepts is set up usinginstance or property overlap. Then, the set of matched nodes (i.e.the output of the mapping-based algorithm applied to the concepttrees) is regarded as the ontology alignment.

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

The analysis of the proposed algorithm and the examples givenshows that using the mapping-based algorithm for concept treealignment exhibits the following features. (1) It considers thesemantic similarities of nodes. (2) It considers the ancestor–descen-dant relations among concepts. (3) It reduces edit operations. (4) Itconsiders overall isomorphic matching between two conceptsystems.

Compared with the edit distance method, the sum of similari-ties among matched nodes based on the maximum mapping theo-rem is a simpler means of comparing concept trees. This is becauseit considers matched nodes and omits inserted and deleted nodes.The method conforms more to intuitive human thinking with re-gard to comparing things. Therefore, this method is more conciseand realistic.

However, the algorithm has to individually align class and prop-erty trees from the source ontology to the target ontology. Itsapplication is also limited only to class and property types. Thealgorithm is also not suitable for other ontological elements suchas instances and relations.

In future studies, the similarity among individual concepts willbe modeled more accurately. Also, mappings between other onto-logical elements, such as instances, relationships, and rules, will beconsidered in addition to classes and properties.

ty algorithm and its application to ontology alignment, Knowl. Based Syst.

Page 11: A mapping-based tree similarity algorithm and its application to ontology alignment

J. Wang et al. / Knowledge-Based Systems xxx (2013) xxx–xxx 11

Acknowledgments

This work was supported by the Shandong Provincial Scienceand Technology Development Plan (2010G0020807), the NationalNatural Science Foundation of China (61272094), the Natural Sci-ence Foundation of Shandong Province (ZR2010QL01), and theShandong Provincial Key Laboratory Project.

References

[1] John Hebeler, Matthew Fisher, Ryan Blace, Andrew Perez-Lopez, Semantic WebProgramming, Wiley Publishing, Indiana, 2009.

[2] OAEI, 2013. <http://oaei.ontologymatching.org>.[3] Cersa Sanin, Edward Szczerbicki, Carlos Toro, An OWL ontology of set of

experience knowledge structure, Journal of Universal Computer Science 2(2007) 209–223.

[4] Yunjiao Xue, Chun Wang, Hamada H. Ghenniwa, Weiming Shen, A treesimilarity measuring method and its application to ontology comparison, TheJournal of Universal Computer Science 15 (2009) 1766–1781.

[5] Miyoung Cho, Hanil Kim, Pankoo Kim, A new method for ontology mergingbased on concept using WordNet, in: International Conference on AdvancedCommunication Technology (ICACT), 2006, pp. 1573–1576.

[6] Sheng Li, Heping Hu, Xian Hu, An ontology mapping method based on treestructure, Semantics, Knowledge and Grids (SKG) (2006) 87–98.

[7] Wu Dianshuang, Lu Jie, Guangquan Zhang, Hua Lin, A fuzzy matching basedrecommendation approach for mobile products/services, Intelligent SystemsDesign and Applications (ISDA) (2010) 645–650.

[8] Robert A. Wagner, The string-to-string correction problem, Journal of the ACM21 (1974) 168–173.

[9] Kuochung Tai, The tree-to-tree correction problem, Journal of the ACM 26(1979) 422–433.

[10] Kaizhong Zhang, A new editing based distance between unordered labeledtrees, Combinatorial Pattern Matching (CPM) (1993) 254–265.

[11] Kaizhong Zhang, Dennis Shasha, Simple fast algorithms for the editingdistance between trees and related problems, SIAM Journal on Computing18 (1989) 1245–1262.

[12] Philip N. Klein, Computing the edit-distance between unrooted ordered trees,in: European Symposium on Algorithms (ESA), 1998, pp. 91–102.

[13] Serge Dulucq, Hélène Touzet, Analysis of tree edit distance algorithms, Journalof Discrete Algorithms 3 (2005) 448–471.

[14] Eric D. Demaine, Shay Mozes, Benjamin Rossman, Oren Weimann, An optimaldecomposition algorithm for tree edit distance, ACM Transactions onAlgorithms 6 (2009) 1–19.

Please cite this article in press as: J. Wang et al., A mapping-based tree similari(2013), http://dx.doi.org/10.1016/j.knosys.2013.11.002

[15] Prasenjit Mitra, Gio Wiederhold, Resolving terminological heterogeneity inontologies, in: European Conference on Artificial Intelligence (ECAI), 2002, pp.45–50.

[16] Martin Warin, Henrik Oxhammar, Martin Volk, Enriching an ontology withWordNet based on similarity measures, in: MEANING–2005 Workshop, 2005.

[17] Alexander Maedche, Steffen Staab, Measuring similarity between ontologies,Knowledge Engineering and Knowledge Management: Ontologies and theSemantic Web (EKAW) 2473 (2002) 251–263.

[18] Cao Zewen, Qian Jie, Zhang Weiming, Su Deng, A composite approach forconcept similarity computation, Computer Science 3 (2007) 171–175.

[19] Sushama Prasad, Yun Peng, Timothy Finin, A tool for mapping between twoontologies using explicit information, in: Autonomous Agents and Multi-agentSystems (AAMAS), 2002.

[20] Qun Liu, Sujian Li, Word similarity computing based on How-net,Computational Linguistics and Chinese Language Processing 7 (2002) 59–76.

[21] Rasmus Knappe, Henrik Bulskov, Troels Andreasen, Similarity graphs,Foundations of Intelligent Systems 2871 (2003) 668–672.

[22] Wu Jian, Wu Zhaohui, Li Ying, Deng Shuiguang, Web service discovery basedon ontology and similarity of words, Chinese Journal of Computers 4 (2005)595–602.

[23] Satoshi Sekine, Kiyoshi Sudo, Takano Ogino, Statistical matching of twoontologies, Special Interest Group on the Lexicon of the Association forComputational Linguistics (SIGLEX ACL) (1999) 69–73.

[24] Pepijn Visser, Valentina Tamma, An experience with ontology clustering forinformation integration, in: International Joint Conference on ArtificialIntelligence (IJCAI), 1999.

[25] Jayant Madhavan, Philip A. Bernstein, Erhard Rahm, Generic schema matchingwith Cupid, Proceedings of the VLDB (2001) 49–58.

[26] Hong-Hai Do, Erhard Rahm, COMA – a system for flexible combination ofschema matching approaches, Proceedings of the VLDB (2002) 610–621.

[27] Rose Dieng, Stefan Hug, Comparison of personal ontologies representedthrough conceptual graphs, European Conference on Artificial Intelligence(ECAI) (1998) 341–345.

[28] Sergey Melnik, Hector Garcia-Molina, Erhard Rahm, Similarity flooding: aversatile graph matching algorithm and its application to schema matching,International Conference of Data Engineering (ICDE) (2002) 117–128.

[29] Fausto Guinchiglia, Mikalai Yatskevich, Enrico Giunchiglia, Semanticmatching, Knowledge Engineering Review 3 (2007) 265–280.

[30] Wu Dianshuang, Lu Jie, Guangquan Zhang, Similarity measure models andalgorithms for hierarchical cases, Expert Systems with Applications 38 (2011)15049–15056.

[31] Gary Chartrand, Ping Zhang, A First Course in Graph Theory, DoverPublications, New York, 2012.

[32] Simon Szykman, Janusz W. Racz, Ram D. Sriram, The representation of functionin computer-based design, ASME Design Engineering Technical Conferences(1999) 1–14.

ty algorithm and its application to ontology alignment, Knowl. Based Syst.