reusing ontology mappings for query routing in semantic peer-to-peer environment

10
Reusing ontology mappings for query routing in semantic peer-to-peer environment Jason J. Jung * Knowledge Engineering Laboratory, Department of Computer Engineering, Yeungnam University, Dae-Dong, Gyeongsan 712-749, Republic of Korea article info Article history: Received 23 July 2009 Received in revised form 1 April 2010 Accepted 17 April 2010 Keywords: Query transformation Semantic peer-to-peer networks Ontology mapping Mapping composition abstract To efficiently support automated interoperability between ontology-based information systems in distributed environments, the semantic heterogeneity problem has to be dealt with. To do so, traditional approaches have acquired and employed explicit mappings between the corresponding ontologies. Usually these mappings can be only obtained from human domain experts. However, it is too expensive and time-consuming to collect all possible mapping results on distributed information systems. More seriously, as the num- ber of systems in a large-scale peer-to-peer (P2P) network increases, the efficiency of the ontology mapping is exponentially decreased. Thereby, in this paper, we propose a novel semantic P2P system, which is capable of (i) sharing and exchanging existing mappings among peers, and (ii) composing shared mappings to build a certain path between two sys- tems. Given two arbitrary peers (i.e., source and destination), the proposed system can pro- vide indirect ontology mappings to make them interoperable. In particular, we have focused on query-based communication for evaluating the proposed ontology mapping composition system. Once direct ontology mappings are collected from candidate peers, a given query can be (i) segmented into a set of sub-queries, and (ii) transformed to another query. With respect to the precision performance, our experimentation has shown an improvement of about 42.5% compared to the keyword-based query searching method. Ó 2010 Elsevier Inc. All rights reserved. 1. Introduction Knowledge in many information systems should be efficiently managed by various knowledge processes, i.e., storage, achieving, retrieval and integration. As users are participating in ever more specialized and diverse tasks, many studies have proposed various methodologies and architectures for designing and implementing knowledge management (KM) systems. Especially, along with the semantic web paradigm, such KM systems on distributed environments have shown the high importance of knowledge sharing. Interlinking of KM systems is being attempted for collecting the maximal amount of rel- evant resources from other available sources. One of the most well-known approaches is to construct and exploit domain ontologies. These ontology-based information systems have been developed and employed in various domains e.g., e-learning [20] and telecommunication [13] to effi- ciently manage local resources and information. For many purposes [18,21,16,20], such knowledge-based systems have been interlinked. On this distributed environment, these information systems have to be able to automatically interact for obtaining the maximal amount of resources [19]. There have been several projects such as Piazza [7] and Bibster [6]. Particularly, in this study, we are focusing on ontol- ogy-based information systems where such mutual interactions can be based on ontology mapping processes. 0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.04.018 * Tel.: +82 53 810 3534. E-mail addresses: [email protected], [email protected] Information Sciences 180 (2010) 3248–3257 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins

Upload: jason-j-jung

Post on 26-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reusing ontology mappings for query routing in semantic peer-to-peer environment

Information Sciences 180 (2010) 3248–3257

Contents lists available at ScienceDirect

Information Sciences

journal homepage: www.elsevier .com/locate / ins

Reusing ontology mappings for query routing in semanticpeer-to-peer environment

Jason J. Jung *

Knowledge Engineering Laboratory, Department of Computer Engineering, Yeungnam University, Dae-Dong, Gyeongsan 712-749, Republic of Korea

a r t i c l e i n f o

Article history:Received 23 July 2009Received in revised form 1 April 2010Accepted 17 April 2010

Keywords:Query transformationSemantic peer-to-peer networksOntology mappingMapping composition

0020-0255/$ - see front matter � 2010 Elsevier Incdoi:10.1016/j.ins.2010.04.018

* Tel.: +82 53 810 3534.E-mail addresses: [email protected], j2jung@int

a b s t r a c t

To efficiently support automated interoperability between ontology-based informationsystems in distributed environments, the semantic heterogeneity problem has to be dealtwith. To do so, traditional approaches have acquired and employed explicit mappingsbetween the corresponding ontologies. Usually these mappings can be only obtained fromhuman domain experts. However, it is too expensive and time-consuming to collect allpossible mapping results on distributed information systems. More seriously, as the num-ber of systems in a large-scale peer-to-peer (P2P) network increases, the efficiency of theontology mapping is exponentially decreased. Thereby, in this paper, we propose a novelsemantic P2P system, which is capable of (i) sharing and exchanging existing mappingsamong peers, and (ii) composing shared mappings to build a certain path between two sys-tems. Given two arbitrary peers (i.e., source and destination), the proposed system can pro-vide indirect ontology mappings to make them interoperable. In particular, we havefocused on query-based communication for evaluating the proposed ontology mappingcomposition system. Once direct ontology mappings are collected from candidate peers,a given query can be (i) segmented into a set of sub-queries, and (ii) transformed to anotherquery. With respect to the precision performance, our experimentation has shown animprovement of about 42.5% compared to the keyword-based query searching method.

� 2010 Elsevier Inc. All rights reserved.

1. Introduction

Knowledge in many information systems should be efficiently managed by various knowledge processes, i.e., storage,achieving, retrieval and integration. As users are participating in ever more specialized and diverse tasks, many studies haveproposed various methodologies and architectures for designing and implementing knowledge management (KM) systems.Especially, along with the semantic web paradigm, such KM systems on distributed environments have shown the highimportance of knowledge sharing. Interlinking of KM systems is being attempted for collecting the maximal amount of rel-evant resources from other available sources.

One of the most well-known approaches is to construct and exploit domain ontologies. These ontology-based informationsystems have been developed and employed in various domains e.g., e-learning [20] and telecommunication [13] to effi-ciently manage local resources and information.

For many purposes [18,21,16,20], such knowledge-based systems have been interlinked. On this distributed environment,these information systems have to be able to automatically interact for obtaining the maximal amount of resources [19].There have been several projects such as Piazza [7] and Bibster [6]. Particularly, in this study, we are focusing on ontol-ogy-based information systems where such mutual interactions can be based on ontology mapping processes.

. All rights reserved.

elligent.pe.kr

Page 2: Reusing ontology mappings for query routing in semantic peer-to-peer environment

Information System A

Ontology A

Information System B

Ontology B

Annotation Annotation

Interoperability

Mapping between Ontologies

Fig. 1. Automatic semantic interoperability between heterogeneous information systems A and B.

J.J. Jung / Information Sciences 180 (2010) 3248–3257 3249

However, local ontologies as domain knowledge are generally constructed by domain experts as well as being based onlocal database schema [8]. The semantics of each information systems is unique, but they are heterogeneous. In [9], we havealready noted that each information system tends to include its own information, which is

� related to specific and unique topics,� represented as consistent linguistic terminologies,� organized by local database schema, and� annotated with local metadata.

In summary, the resources in an information system are semantically encoded by referring to the corresponding localontologies.

Because of semantic heterogeneity problems among local ontologies, it is difficult for information systems on distributedenvironments to be made semantically interoperable. In other words, these systems are supposed to automatically interactby sharing their own resources and knowledge.

In order to solve these problems, the most well-known solution is to find the mapping information between the ontol-ogies. As shown Fig. 1, given two ontologies OA and OB, an ontology mapping result MðOA;OBÞ is a prerequisite conditionfor interoperability between systems A and B.

More importantly, we have already realized that manual ontology mappings by human users are much more precise thanautomatic mapping results by software tools, but more expensive than the automatic ones. These problems result from thelack of domain expertise as well as the complex internal structure of the ontologies (e.g., a large number of concepts andproperties). Thus, a number of ontology mapping algorithms (e.g., manual, automatic, and semi-automatic approaches)for efficiently discovering correspondences between ontologies have been proposed [4].

However, most of the mapping tools and algorithms have a scalability problem with a large number of distributed ontol-ogies. While they can obtain explicit and direct mapping results from given two source ontologies, they have some seriousdifficulties on scalability from an increasing number of ontologies on a general distributed environment.

To deal with this problem, we focus on composing an indirect mapping by reusing the mapping information previouslyobtained and available on the network. It means that given two ontologies OA andOB, we can make two information systemsSA and SB interoperable by composing two existing mappingsMðOA;OCÞ andMðOC ;OBÞ, instead of finding a direct mappingMðOA;OBÞ.

Thereby, in this paper, the ontologies and mapping information should be formalized with more definitions for the map-ping composition. For this, novel performance measures for ontology mapping (i.e., semantic coverage ratio and two differ-ent heuristics for selecting transformation paths) will be introduced. Especially, to evaluate the performance of sharing andcomposing mapping results, a multi-agent platform has been employed. Any two heterogeneous agents on the multi-agentplatform can communicate by a query-answering process, so we can measure how precisely the proposed mapping compo-sition is conducted on the distributed network.

The outline of this paper is as follows. In the following Section 2, the major components for building ontology-based infor-mation systems will be proposed and a similarity-based ontology mapping algorithm will be investigated. Section 3 willshow how to share and compose the existing mapping information. Section 4 will exhibit experimental results collectedby evaluation. Finally, Section 5 will address the concluding remarks of this work and future research plan.

2. Ontology-based information systems

Ontology-based information systems are supposed to be machine processable. In this study, the system is mainly com-posed of two parts; (i) a number of distributed ontologiesO, and (ii) a mapping set with neighbors. The number of ontologiesis the same as the number of the information systems on the network, because we assume that each information system hasonly one ontology.

Page 3: Reusing ontology mappings for query routing in semantic peer-to-peer environment

3250 J.J. Jung / Information Sciences 180 (2010) 3248–3257

Definition 1 (Ontology). An ontology O is represented as

1 For

O :¼ C;R; ER; ICð Þ; ð1Þ

where C and R are a set of classes (or concepts), a set of relations (e.g., equivalence, subsumption, disjunction, etc.), respec-tively. ER # C � C is a set of relationships between classes, represented as a set of triples fhci; r; cjijci; cj 2 C; r 2 Rg. IC is apower set of instance sets of a class ci 2 C.

These ontologies are grounded with a set of instances. In terms of description logic, IC can be replaced with A-Box.Using any mapping tools and algorithms, each mapping result can be represented as a set of correspondences between

ontology entities with confidence values. Here, the ontologies are defined in a simplified manner, because we want to applythe ontologies and correspondences to various practical applications.

Definition 2 (Correspondences). Given two ontologies O and O0, a set of correspondences are given by

MðO;O0Þ ¼ e; e0; rM;CFh ije 2 O; e0 2 O0; rM;CF 2 ½0;1�f g; ð2Þ

where e and e0are a pair of matched entities. Mapping relationship is rM = {�,v,w,\} where elements indicate equivalence,

subset, superset, and disjoint relationships, respectively. CF indicates a confidence value of the pair.The confidence value means the precision between the ontology entities.

Definition 3 (Confidence value). Confidence value can be computed in several different ways. For example, a confidencevalue can be measured by a string matching function Dist:

CFhe;e0 i ¼1� Dist LðeÞ; Lðe0Þð Þ

max LðeÞ; Lðe0Þð Þ ; ð3Þ

where L is a function for returning a label of ontology entity.There are several well-known ontology mapping tools [1,2,15,17]. In this paper, we prefer the similarity-based ontology

mapping approach proposed in [5]. It defines similarities (e.g., SimC, SimR, SimA) between classes, relationships, attributes,and instances. It is based on the principle that the greater the similarities between the features of two entities, the greaterthe similarities between these entities themselves. Given a pair of classes from two different ontologies, the similarity mea-sure SimC is assigned in [0, 1]. The similarity (SimC) between c and c

0is defined as

SimC c; c0ð Þ ¼X

E2NðCÞpC

E MSimY E cð Þ; E c0ð Þð Þ; ð4Þ

whereNðCÞ# fE1 . . . Eng is the set of all relationships in which classes participate (for instance, subclasses, instances, or attri-butes). The weights pC

E are normalized (i.e.,P

E2NðCÞpCE ¼ 1).

If we restrict ourselves to class labels (L) and the three relationships in NðCÞ, which are the superclass (Esup), the subclass(Esub) and the sibling class (Esib), Eq. (4) can be rewritten as

SimC c; c0ð Þ ¼ pCL simL L Aið Þ; LF Bj

� �� �þ pC

supMSimC Esup cð Þ; Esup c0ð Þð Þ þ pCsubMSimC Esub cð Þ; Esub c0ð Þ

� �

þ pCsibMSimC Esib cð Þ; Esib c0ð Þ

� �; ð5Þ

where the set functions MSimC compute the similarity of two entity collections.As a matter of fact, the distance between two sets of classes can be established by finding the maximal matching that

maximizes the summed similarity between the classes:

MSimCðS; S0Þ ¼max

Phc;c0 i2PairingðS;S0 Þ SimCðc; c0Þð Þ

� �maxðjSj; jS0jÞ

; ð6Þ

in which Pairing provides a matching of the two set of classes. Methods such as the Hungarian method allow us to find di-rectly the pairing which maximizes similarity. The Hungarian method is a combinatorial optimization algorithm whichsolves the assignment problem in polynomial time and it anticipated later primal–dual methods [14]. The OLA algorithmis an iterative algorithm that computes the similarities between two ontological entities [5]. This measure is normalized be-cause if SimC is normalized, the divisor is always greater or equal to the dividend.

A normalized similarity measure can be turned into a distance measure by taking its complement to 1Edist

C ðx; yÞ ¼ 1� SimCðx; yÞ� �

. Such a distance introduces a new relation EdistC in the concept network C.

As a simple example in Fig. 2, once two ontologiesOA andOB are mapped, we can obtain the mapping results (indicated asblue arrows1) in Table 1).

At last, given a number of ontology-based systems, we can formulate a distributed ontology-based information system, asfollows:

interpretation of color in Fig. 2, the reader is referred to the web version of this article.

Page 4: Reusing ontology mappings for query routing in semantic peer-to-peer environment

Table 1Mapping results between ontologies OA and OB .

he e0

R CFi

Person People � 0.33Faculty Assistant Prof � 0.05Secretary Researcher � 0.3Prof Professor � 0.48Full_Prof Full Prof � 1.0

Fig. 2. An example of similarity-based ontology mapping and three ontology-based information systems on distributed environment.

J.J. Jung / Information Sciences 180 (2010) 3248–3257 3251

Definition 4 (Distributed ontology-based information system). A distributed ontology-based information system G consists ofNG number of ontology-based information systems fS1; . . . ; SNGg. Some of the information systems are interlinked with eachother. This linkage between Si and Sj means the existence of mapping information between the corresponding ontologiesMðOi;OjÞ. Thus, a distributed ontology-based information system G is represented as

G ¼ M Oi;Oj� �

jTG Si; Sj� �� �

; ð7Þ

where function TG returns topological feature to find out whether Si and Sj are linked or not.In practice, given a certain distributed ontology-based information system, we can easily understand the topology of the

network by TG, because it returns a two-dimensional matrix.

Example 1. Suppose that a distributed ontology-based information system G1 is constructed as shown in Fig. 2. There areonly two links, which represent the mapping results, between SA and SB and between SB and SC. Thus, we can obtain a two-dimensional matrix, as follows:

G1 ¼ MðOA;OBÞ;MðOB;OCÞjTG ¼0 1 11 0 01 0 0

264

375

8><>:

9>=>;: ð8Þ

Queries can be sent for information sharing between SA and SB. By referring to the direct mapping byMðOA;OBÞ, a queryq = ‘Secretary’ which is not understandable in SB can be rewritten as ‘Researcher.’

However, it is still difficult for SA and SC to be made interoperable, because there is no direct mapping between them.

3. Mapping composition for query transformation

In this work, we want to estimate indirect mappings between information systems for which there are no ontology map-ping results. To do so, we are focusing on reusing the existing mapping results and their proper composition for progressionfrom the source information system to the destination. For example, in Fig. 2, even though there is no direct mapping be-tween SA and SC (i.e., a query from system SA is not understandable in SC), we can compose two mapping resultsMðOA;OBÞ and MðOB;OCÞ.

Definition 5 (Indirect mapping). An indirect mapping fM in a distributed ontology-based information system G isrepresented as

fMðSSrc; SDestÞ ¼ RSDestSSrcMðOi;OjÞ� �

; ð9Þ

where SSrc are SDest are the source and destination information sources in G, respectively. Here, we can find out whether thereis a path between them by repeating multiplication of the topology matrix TG.

Now we want to show a query transformation by using the indirect mapping between the ontologies in a distributedenvironment.

Page 5: Reusing ontology mappings for query routing in semantic peer-to-peer environment

Fig. 3. Mapping composition with semantic coverage: SA = {c1, . . .,c8}, SB = {c9, . . .,c12}, and SC = {c13, . . .,c18}. Two sets of query-activated classes CQ(q1) ={c10 = ‘Secretary’, c11 = ‘Full_Prof’} and CQ(q2) = {c3 = ‘Researcher’, c7 = ‘Full Prof’}.

Table 2Example on query transformation.

Step Query Mapping Query’

First c10 ^ c11 hc10,c3, � ,0.45 i, hc11,c7, � ,1.0i c3 ^ c7

Second c3 ^ c7 hc7,c13, � ,0.5 i, hc6,c16, � ,0.33i c13

3252 J.J. Jung / Information Sciences 180 (2010) 3248–3257

3.1. Semantic query transformation

A query from a source system has to be transformed to make it understandable to the destination system by referring tothe composed mapping results. Thereby, we have to realize a set of query-activated class CQ.

Definition 6 (Query). A query from an ontology-based information system SA is represented as

2 Spa

q :: cj:qjq ^ q0jq _ q0; ð10Þ

where c 2 CA.

Definition 7 (Query-activated class). Given a query traveling to an ontology-based information system Sk, a set of query-acti-vated class CQ(q) can be extracted as

CQ ðqÞ ¼ cjc 2 q; c 2 CAf g: ð11Þ

For example, suppose that the following query q1, which is written by SparQL2, is sent from SA to SC in Fig. 2.

PREFIX abc: <http://intelligent.pe.kr/TestOntology#>SELECT ?Secretary ?Full_ProfWHERE {?Full_Prof abc:Teach abc:Course;

?Secretary abc:Assist abc:Prof;

}

Thus, as shown in Fig. 3, we can extract a set of query-activated class CQ(q1) = {c10 = ‘Secretary’, c11 = ‘Full_Prof’} but class‘Teach’ and ‘Course’ are not. Consequently, the query q = c10 ^ c11 can be transformed through two steps, as shown in Table 2.

Here, we can see information loss by mismatching during the second step (i.e., c3). This is an important issue to discoverthe optimal state (i.e., minimizing error propagation), and it will be discussed later.

3.2. Semantic coverage

More importantly, we have to take into account more general cases. As shown in Fig. 4, when there are a large number ofontology-based information systems, the network of the distributed ontology-based information system can be more com-plex. It means that there can be more than one path from arbitrary information systems (i.e., source and destination). Theshortest path is not alway the best choice, but it should be based on the ontology mapping condition and semantics of a givenquery. For example, if a query should travel from A to E, we have to decide the best composition path fMðA; EÞ out of thefollowing candidates.

�PðMðA;CÞ;MðC; EÞÞ,

rQL, http://www.w3.org/TR/rdf-sparql-query/.

Page 6: Reusing ontology mappings for query routing in semantic peer-to-peer environment

A

B

GC

D

E

F

Fig. 4. A general case with a large number of information systems.

Table 3Measuring semantic coverage ratio by two heuristics.

Heuristics Path1 (SA ? SB) Path2 (SA ? SC)

H1 sH1q2¼ 2

2 ¼ 1 sH1q2¼ 0:45þ1

2 ¼ 0:725

H2 sH2q2¼ 1

2 ¼ 0:5 sH2q2¼ 0:5

1 ¼ 0:5

J.J. Jung / Information Sciences 180 (2010) 3248–3257 3253

�PðMðA;DÞ;MðD; FÞ;MðF; EÞÞ,

� and more.

Thus, we need to decide which path is best for composing mapping results. In this paper, a heuristic approach is exploited,and we want to empirically justify these heuristics. We introduce a semantic coverage ratio for representing two differentheuristics (H1 and H2).

Definition 8 (Semantic coverage ratio). A semantic coverage ratio sQ means the matching ratio of the size of twocorrespondence sets to a given query-activated classes. This sQ can be defined by the following two heuristics:

� H1: As the more correspondences are mapped with query-activated classes, the semantic coverage ratio is increased:

sH1Q SSrc; SDestð Þ ¼ cjc 2 eSrc;M OSrc;ODestð Þf gj j

jCQ j: ð12Þ

� H2: as the confidence values of correspondences are higher, the semantic coverage ratio is increased:

sH2Q SSrc; SDestð Þ ¼

Pek2CQ

CFk

fek 2 CQgj j : ð13Þ

As an example in Fig. 3, let a query q2 to be sent from SA though either SB or SC. To decide the better mapping path, we canmeasure the semantic coverage ratios by using those two heuristics H1 and H2 in Table 3. The implication is that the queryshould be transformed via SB.

3.3. Transformation path selection

The best path for transforming a query is selected by serial aggregation of the semantic coverage ratio. Given two infor-mation systems SSrc and SDest, the aggregated semantic coverage ratio is computed by

sQ SSrc; SDestð Þ ¼maxPathk

YSDest

SSrc

sQ Si; Sj� �

; ð14Þ

where Pathk is a set of all possible paths from SSrc to SDest.

4. Experimental results and discussion

In order to evaluate the proposed distributed ontology-based information system, we have built seven ontology-basedinformation systems (i.e., SA to SG) with linkages, as shown in Fig. 4. All of the mapping results have been manually collectedby human experts as a reference mapping [12]. They are used for a comparison with the indirect mapping composed fromthe existing mappings.

In this work, we have been focusing on two evaluation issues, which are i) mapping composition and ii) transformationpath selection. Each of the experimental results was compared with the reference alignments.

4.1. Evaluation on mapping composition

There have been a variety of ontology mapping tools to find the correspondences between two ontologies. We have se-lected OLA API [3], which is a string similarity-based matching tool. It is not significantly important to justify our choice of

Page 7: Reusing ontology mappings for query routing in semantic peer-to-peer environment

3254 J.J. Jung / Information Sciences 180 (2010) 3248–3257

matching tool, because we are only interested in composing the existing mapping obtained from any matching tool. Oneimportant practical reason why we exploited OLA API was that we expect it to be more useful in a large-scale distributedenvironment, because OLA API has been shown to be highly scalable [10].

By using OLA API [3], we have automatically collected the direct mapping results (i.e.,M). The mapping results have beencomposed in all possible cases (i.e., fM). The performance of mapping composition has been tested by precision and recall:

Table 4Recall a

Dire

MAB

MBC

MCD

MDE

MEF

MFG

Precision ¼ jM\fMj

jfMj and Recall ¼ jM\fMj

jMj : ð15Þ

The results of three cases (i.e.,MAB;MBC , andMCD) are shown in Table 4. On average, we have obtained relatively goodresults (73% recall and 79% precision).

nd precision of mapping composition. MðOA;OBÞ is simply rewritten to ðMÞAB .

ct mapping ðMÞ Indirect mapping ðfMÞ Recall R Precision P

MAC � MCB 0.76 0.65MAD � MDC � MCB 0.73 0.62

MAE � MED � MDC � MCB 0.66 0.6MAF � MFE � MED � MDC � MCB 0.53 0.57

MAG �MGF � MFE � MED � MDC � MCB 0.51 0.56

MBD �MDC 0.86 0.74MBE � MED � MDC 0.74 0.72

MBF � MFE � MED � MDC 0.72 0.65MBG � MGF � MFE � MED � MDC 0.69 0.64

MBA �MAG � MGF � MFE � MED � MDC 0.66 0.62

MCE �MED 0.73 0.66MCF �MFE � MED 0.67 0.63

MCG � MGF � MFE � MED 0.66 0.56MCA � MAG � MGF � MFE �MED 0.54 0.52

MCB � MBA � MAG � MGF � MFE � MED 0.51 0.52

MDF � MFE 0.64 0.72MDG � MGF � MFE 0.63 0.73

MDA � MAG � MGF � MFE 0.58 0.63MDB � MBA � MAG � MGF � MFE 0.52 0.55

MDC � MCB � MBA � MAG � MGF � MFE 0.45 0.48

MEG � MGF 0.79 0.75MEA �MAG � MGF 0.75 0.72

MEB � MBA � MAG � MGF 0.74 0.71MEC � MCB � MBA �MAG � MGF 0.69 0.7

MED � MDC � MCB � MBA � MAG � MGF 0.68 0.66

MFA � MAG 0.77 0.75MFB � MBA � MAG 0.72 0.7

MFC � MCB � MBA � MAG 0.68 0.62MFD � MDC �MCB � MBA � MAG 0.67 0.58

MFE � MED � MDC � MCB � MBA � MAG 0.63 0.52

Fig. 5. Recall measurement with respect to the length of transformation path.

Page 8: Reusing ontology mappings for query routing in semantic peer-to-peer environment

J.J. Jung / Information Sciences 180 (2010) 3248–3257 3255

We note that as the mapping results are composed (i.e., the number of mapping composition is increased) in all cases, therecall and precision inherently decreases. The results are depicted in Figs. 5 and 6, respectively. This is the information losscased by the mismatching problem of ontology mapping algorithms.

4.2. Evaluation on transformation path selection

Regarding the second issue, we have tested the performance of transformation path selection resulting from two heuris-tics (i.e., H1 and H2) by inviting real users to participate. The link topology of the distributed ontology-based information sys-tem has been simply built, as shown in Fig. 7. Thirty users were asked to generate 10 queries with SparQL in order to searchfor a certain piece of information. These queries could only be sent to three system SA, SD, and SG, for considering multiplepaths along with linkages.

Fig. 6. Precision measurement with respect to the length of transformation path.

Query

AB

C

D

EF

G

Fig. 7. Finding the best transformation path in a distributed ontology-based information systems.

Table 5Performance of transformation path selection for two users.

Heuristics Users Information systems Recall Precision

H1 U1 B 0.78 0.67C 0.68 0.63E 0.72 0.65F 0.63 0.73

U2 B 0.67 0.57C 0.69 0.67E 0.73 0.75F 0.79 0.82

H2 U1 B 0.67 0.64C 0.74 0.37E 0.83 0.7F 0.73 0.62

U2 B 0.63 0.63C 0.75 0.77E 0.67 0.48F 0.47 0.52

Page 9: Reusing ontology mappings for query routing in semantic peer-to-peer environment

3256 J.J. Jung / Information Sciences 180 (2010) 3248–3257

Table 5 shows the performance of transformation path selection for two users. On average, for all invited users, heuristicH1 has shown 65.3% recall and 74.2% precision, while H2 has shown 59.5% recall and 68.3% precision. Hence, we found that H1

outperforms H2 by about 12.4%.

5. Concluding remarks

As more ontology-based information systems are participating in the global network, we have to establish an efficientinteroperability platform to semantically understand resources from remote and heterogeneous systems. This work has beenfocusing on mapping composition with any ontology mapping algorithms. We are emphasizing that the mapping composi-tion is independent of the ontology mapping algorithms. More importantly, this system relies on the scalability of ontologymapping processes on a large-scale distributed environment. In this paper, we have proposed a query rewriting applicationon an ontology-based peer-to-peer environment. The queries from certain information systems have been successfully trans-formed by aggregating and composing possible mapping results. An additional contribution of this work is that we have ta-ken into account information loss whenever queries are repeatedly transformed.

Similar to semantic social network studies [11], we may consider only concepts applied to specific knowledge. Becausedynamic CoP identification is a NP-hard and APX-hard problem, we have to evaluate our heuristics by computing the sim-ilarity. Furthermore, this similarity can be extended to contextual centrality, which means the potential power of bridgingamong blogs on social networks.

For future work of semantic centrality, we have three main plans that involve investigating the followings issues: (i)semantic subgroup discovery, to organize sophisticated user groups by enhancing the designed discovery methods, (ii) querypropagation, to determine the ordering (or route) of potential peers to which the queries will be sent, and (iii) semanticsynchronization, to maximize the efficiency interoperability by information diffusion. Furthermore, we have to considerenhancement of the semantic centrality measurement C} by combining with (i) authoritative and hub centrality measure-ment, and (ii) the modified shortest paths spdðn; tÞ ¼ 1

C}ðnÞþC}ðtÞ. Finally, as another important experimental issue, we are plan-

ning to investigate the query transformation process in terms of the time taken by the system. Even though we have reducedcomputational cost for ontology matching, it is possible that users will experience more waiting time for relevant results.

Acknowledgement

This work was supported by the Korean Science and Engineering Foundation (KOSEF) Grant funded by the KoreanGovernment (MEST) (2009-0066751).

References

[1] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, Pedro Domingos, Imap: discovering complex semantic matches between databaseschemas, in: Gerhard Weikum, Arnd Christian König, Stefan Deßloch (Eds.), Proceedings of the ACM SIGMOD International Conference onManagement of Data, June 13–18, Paris, France, ACM, 2004, pp. 383–394.

[2] Marc Ehrig, York Sure, Foam – framework for ontology alignment and mapping – results of the ontology alignment evaluation initiative, in: BenjaminAshpole, Marc Ehrig, Jérôme Euzenat, Heiner Stuckenschmidt (Eds.), Proceedings of the K-CAP 2005 Workshop on Integrating Ontologies, October 2,Banff, Canada, CEUR Workshop Proceedings, vol. 156, CEUR-WS.org, 2005.

[3] Jérôme Euzenat, An API for ontology alignment, in: Sheila A. McIlraith, Dimitris Plexousakis, Frank van Harmelen (Eds.), Proceedings of the 3rdInternational Semantic Web Conference, Lecture Notes in Computer Science, vol. 3298, Springer, 2004, pp. 698–712.

[4] Jérôme Euzenat, Pavel Shvaiko, Ontology Matching, Springer, Heidelberg, Germany, 2007.[5] Jérôme Euzenat, Petko Valtchev, Similarity-based ontology alignment in OWL-Lite, in: Ramon López de Mántaras, Lorenza Saitta (Eds.), Proceedings of

the 16th European Conference on Artificial Intelligence (ECAI’2004), August 22–27, 2004, Valencia, Spain, IOS Press, 2004, pp. 333–337.[6] Peter Haase, Björn Schnizler, Jeen Broekstra, Marc Ehrig, Frank van Harmelen, Maarten Menken, Peter Mika, Michal Plechawski, Pawel Pyszlak, Ronny

Siebes, Steffen Staab, Christoph Tempich, Bibster – a semantics-based bibliographic peer-to-peer system, Journal of Web Semantics 2 (1) (2004) 99–103.

[7] Zachary G. Ives, Alon Y. Halevy, Peter Mork, Igor Tatarinov, Piazza: mediation and integration infrastructure for semantic web data, Journal of WebSemantics 1 (2) (2004) 155–175.

[8] Xing Jiang, Ah-Hwee Tan, Learning and inferencing in user ontology for personalized semantic web search, Information Sciences 179 (16) (2009) 2794–2808.

[9] Jason J. Jung, Ontological framework based on contextual mediation for collaborative information retrieval, Information Retrieval 10 (1) (2007) 85–109.[10] Jason J. Jung, Query transformation based on semantic centrality in semantic social network, Journal of Universal Computer Science 14 (7) (2008)

1031–1047.[11] Jason J. Jung, Social grid platform for collaborative online learning on blogosphere: a case study of eLearning@BlogGrid, Expert Systems with

Applications 36 (2) (2009) 2177–2186.[12] Jason J. Jung, An empirical study on optimizing query transformation on semantic peer-to-peer networks, Journal of Intelligent & Fuzzy Systems 21 (3)

(2010) 187–195.[13] Jason J. Jung, Hojin Lee, Kwang Sun Choi, Contextualized recommendation based on reality mining from mobile subscribers, Cybernetics and Systems

40 (2) (2009) 160–175.[14] Harold W. Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly 2 (1955) 83–97.[15] Alexander Maedche, Boris Motik, Nuno Silva, Raphael Volz, Mafra – a mapping framework for distributed ontologies in the semantic web, in: Asunción

Gómez-Pérez, V. Richard Benjamins (Eds.), Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management(EKAW 2002), October 1–4, Siguenza, Spain, Lecture Notes in Computer Science, vol. 2473, Springer, 2002, pp. 235–250.

[16] Jan Morbach, Aidong Yang, Wolfgang Marquardt, Ontocape – a large-scale ontology for chemical process engineering, Engineering Applications ofArtificial Intelligence 20 (2) (2007) 147–161.

Page 10: Reusing ontology mappings for query routing in semantic peer-to-peer environment

J.J. Jung / Information Sciences 180 (2010) 3248–3257 3257

[17] Natalya Fridman Noy, Mark A. Musen, Prompt: algorithm and tool for automated ontology merging and alignment, in: Proceedings of the 17thNational Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, July 30 – August 3, 2000,Austin, Texas, USA, AAAI Press/The MIT Press, 2000, pp. 450–455.

[18] Philip R.O. Payne, Eneida A. Mendonça, Stephen B. Johnson, Justin B. Starren, Conceptual knowledge acquisition in biomedicine: a methodologicalreview, Journal of Biomedical Informatics 40 (5) (2007) 582–602.

[19] Biao Qin, Shan Wang, Xiaoyong Du, Qiming Chen, Qiuyue Wang, Graph-based query rewriting for knowledge sharing between peer ontologies,Information Sciences 178 (18) (2008) 3525–3542.

[20] Wen-Chung Shih, Chao-Tung Yang, Shian-Shyong Tseng, Ontology-based content organization and retrieval for scorm-compliant teaching materials indata grids, Future Generation Computer Systems 25 (6) (2009) 687–694.

[21] Natalia Villanueva-Rosales, Michel Dumontier, yOWL: an ontology-driven knowledge base for yeast biologists, Journal of Biomedical Informatics 41 (5)(2008) 779–789.