Transcript
Page 1: A context-aware semantic similarity model for ontology environments

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2011; 23:505–524Published online 5 November 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1652

A context-aware semantic similarity model for ontologyenvironments

Hai Dong∗,†, Farookh Khadeer Hussain and Elizabeth Chang

Digital Ecosystems and Business Intelligence Institute, Curtin University of Technology, Perth WA 6845, Australia

SUMMARY

While many researchers have contributed to the field of semantic similarity models so far, we find thatmost of the models are designed for the semantic network environment. When applying the semanticsimilarity model within the semantic-rich ontology environment, two issues are observed: (1) most of themodels ignore the context of ontology concepts and (2) most of the models ignore the context of relations.Therefore, in this paper, we present a solution for the two issues, including a novel ontology conversionprocess and a context-aware semantic similarity model, by considering the factors of both the contextof concepts and relations, and the ontology structure. Furthermore, in order to evaluate this model, wecompare its performance with that of several existing models’ performance in a large-scale knowledgebase, and the evaluation result preliminarily proves the technical advantage of our model in ontologyenvironments. Conclusions and future works are described in the final section. Copyright � 2010 JohnWiley & Sons, Ltd.

Received 6 July 2010; Accepted 9 July 2010

KEY WORDS: ontology; OWL; semantic network; semantic similarity model

1. INTRODUCTION

Semantic relatedness refers to human judgment about the extent to which a given pair of concepts arerelated to each other [1]. Studies have shown that most people agree on the relative semantic relat-edness of most pairs of concepts [2, 3]. Therefore, many technologies have been developed to datein order to precisely measure the extent of similarity relatedness and similarity between conceptsin multiple disciplines, such as information retrieval (IR) [4–9], natural language processing (NLP)[10–13], linguistics [14], health informatics [15], bioinformatics [1, 16–19], web services [20],ontology extraction/matching [21–23] and other fields. In the fields of IR and NLP, the researchesprimarily focus on word sense disambiguation [9, 10], multimodal document retrieval [24], textsegmentation [7, 12] and query preciseness enhancement [5, 6]. In the linguistic area, the researchesemphasize computing semantic similarity between uncertain or imprecise concept labels [14]. Inthe health domain, the researchers are mainly concerned with seeking similar health science terms.In the field of bioinformatics, the focus is on measuring the similarity between concepts from thegene ontology [16–19]. In the field of web services, the researches concentrate on semantic servicediscovery [20]. In the field of ontology extraction/matching, semantic similarity models are usedin the process of ontology similarity measurement [21–23]. Moreover, the semantic similarity

∗Correspondence to: Hai Dong, Digital Ecosystems and Business Intelligence Institute, Curtin University ofTechnology, Perth WA 6845, Australia.

†E-mail: [email protected]

Copyright � 2010 John Wiley & Sons, Ltd.

Page 2: A context-aware semantic similarity model for ontology environments

506 H. DONG, F. K. HUSSAIN AND E. CHANG

models can be also used to estimate the similarity between land use and land cover classificationsystems [25].

However, when exploring these semantic similarity models, we observe that most of the existingmodels focus only on the semantic network environment but ignore the special features of theontology environment. For example, most of the models do not have specific solutions to processthe context of concept attributes and the context of relations when estimating similarity betweenconcepts. Based on this finding, we develop a novel context-aware solution for the semanticsimilarity measure in the ontology environment. This solution contains an ontology conversionprocess and a hybrid semantic similarity model, which involves assessing the concept similarityfrom the perspectives of both the ontology structure and the context of ontology concepts andrelations.

The remainder of the paper is organized as follows. In Section 2 we conduct a detailed compar-ison between ontology and the semantic network, and then review and analyze the existing semanticsimilarity models in order to discover the issues that arise when applying the models within theontology environment. In Section 3, we provide an ontology conversion process to preliminarilyaddress the issues found in Section 2. In Section 4, we present the proposed hybrid semantic simi-larity model. In Section 5, to thoroughly validate the model, we implement a series of experimentsand perform scientific evaluations and experimentations. The conclusion is drawn and future workis proposed in the final section.

2. RELATED WORKS

2.1. Ontology and semantic network

In the field of information science, ontology is defined by Gruber [26] as ‘an explicit specificationof conceptualization’. An ontology primarily consists of the following components:

• Classes that define a group of individuals that share the same features.• Properties that describe relations between classes. In OWL, there are two sorts of properties

as follows:

◦ ObjectProperty that defines relations between two or more than two classes, and◦ DatatypeProperty that defines relations between instances of classes and RDF literals and

XML schema datatypes [27].

• Restrictions and characteristics that describe constraints on relations. In OWL, restrictionsinclude allValuesFrom (∀), someValuesFrom (∃), hasValue (�), cardinality (=), minCardi-nality (≥), maxCardinality (≤); characteristics include FunctionalProperty (one property hasa unique value), InverseOf (one property is the inverse of another property), InverseFunc-tionalProperty (the inverse of one property is functional), TransitiveProperty (properties aretransitive) and SymmetricProperty (properties are symmetric) [27].

• Axioms that describe the rules followed by an ontology when applying it to a domain. In OWL,the class axioms include one of (enumerated classes) , disjointWith (classes are disjointedwith each other), equivalentClass (two classes are equivalent) , subClassOf (one class is aspecification of another class) [27].

A semantic network is defined as ‘a graphic notation for representing knowledge in patternsof interconnected nodes and arcs’ [28]. WordNet is a typical example of a semantic network, inwhich words or phrases are represented as nodes and are linked by multiple relations. The mostcommon relations are meronymy (A is a part of B), holonymy (B is part of A), hyponymy (A isa subordinate of B), hypernymy (A is superordinate of B), synonymy (A is a synonym of B), andantonymy (A is the opposite of B).

In Table I, we make a general comparison between ontologies and semantic networks based ontheir components. The main differences are that ontology concepts and relations can be definedwith more attributes, restrictions and characteristics, compared with single-word/phrase-composed

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 3: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 507

Table I. Comparison between ontologies and semantic networks.

Components Ontologies Semantic networks

Classes Have individuals Do not have individualsProperties Have object properties and

datatype propertiesDo not have datatype properties

Restrictions and characteristics Have restrictions andcharacteristics

Do not have restrictions andcharacteristics

Axioms Have axioms Do not have one of anddisjointWith

counterparts in semantic networks. Therefore, it can be concluded that ontologies can express moresemantic information than can semantic networks.

2.2. Semantic similarity models

In the literature, there are many similarity measures. For the purpose of discussion, we divide theminto three main categories according to the utilized information as follows—edge (distance)-basedmodels [4, 6, 9, 29–32], node (information content)-based models [10, 33, 34] and hybrid models[11, 35–37]. In the remainder of the section, we will briefly introduce the three categories andthe typical models in each category, and analyze their limitations when applying them within theontology environment.

Edge (distance)-based models. Edge-based models are based on the shortest path betweentwo nodes in a definitional network. Definitional networks are a type of hierarchical/taxonomicsemantic network, in which all nodes are linked by is–a relations [28]. The models are based onthe assumption that all nodes are evenly distributed and are of similar densities and the distancebetween any two nodes is equal. They can also be applied to a network structure.

One typical edge-based model was provided by Rada et al. [4], and is described asFor two nodes C1 and C2 in a semantic network,

Distance(C1,C2)=Minimum number of edges seperating C1 and C2 (1)

and the similarity between C1 and C2 is given by

simRada(C1,C2)=2×Max−Distance(C1,C2) (2)

where Max is the maximum depth of a definitional network.In order to ensure that the interval of simRada is between 0 and 1, Equation (2) can also be

expressed as

simRada(C1,C2)=1− Distance(C1,C2)

2×Max(3)

Leacock and Chodorow [29] considered that the number of edges on the shortest path betweentwo nodes should be normalized by the depth of a taxonomic structure, which is expressedmathematically as

Distance(C1,C2)= Minimum number of edges seperating

2×Max(4)

and the similarity between C1 and C2 is given by

simLeacock(C1,C2)=− log(Distance(C1,C2)) (5)

Wu and Palmer [30] mentioned the node that subsumes two nodes when computing the similaritybetween the two nodes, which can be expressed mathematically as follows:

simWu&Palmer (C1,C2)= 2× N3

N1 + N2 +2× N3(6)

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 4: A context-aware semantic similarity model for ontology environments

508 H. DONG, F. K. HUSSAIN AND E. CHANG

where C3 is the most informative node that subsumes C1 and C2, N1 is the minimum number ofedges from C2 to C3, N2 is the minimum number of edges from C2 to C3, N3 is the depth of C3.

Node (information content)-based models. Information content-based models are used to judgethe semantic similarity between concepts in a definitional network or in a corpus, based onmeasuring the similarity by taking into account information content, namely the term occurrence incorpora or the subsumed nodes in taxonomies. These models can avoid the disadvantage of the edgecounting approaches that cannot control variable distances in a dense definitional network [10].

Resnik [10] developed such a model whereby the information shared by two concepts can beindicated by the concept THAT subsumes the two concepts in a taxonomy. Then, the similaritybetween the two concepts C1 and C2 can be mathematically expressed as follows:

simResnik(C1,C2)= maxC∈S(C1,C2)

[− log(P(C))] (7)

where S(C1,C2) is the set of concepts that subsume both C1 and C2, and P(C) is the possibilityof encountering an instance of concept C .

Lin’s [33] semantic similarity model is the extension of Resnik’s model, which measures thesimilarity between two nodes as the ratio between the amount of commonly shared informationof the two nodes and the amount of information of the two nodes, which can be mathematicallyexpressed as follows:

simLin = 2×simResnik(C1,C2)

I C(C1)+ I C(C2)(8)

Pirro [34] proposed a feature-based similarity model, which is based on Tversky’s theory thatthe similarity between two concepts is a function of common features between the two conceptsminus those in each concept but not in another concept [38]. By integrating Resnik’s model, thesimilarity model can be mathematically expressed as follows:

simP&S(C1,C2)={

3×simResnik(C1,C2)− I C(C1)− I C(C2) if C1 =C2

1 if C1 =C2(9)

Hybrid models. Hybrid models are composed of multiple factors for similarity measure. Jiangand Conath [35] developed a hybrid model that uses the node-based theory to enhance the edge-based model. Their method takes into account the factors of local density, node depth and linktypes. The weight between a child concept C and its parent concept P can be measured as

wt(C, P)=(

�+(1−�)E

E(P)

)(d(P)+1

d(P)

)�

(I C(C)− I C(P))T (C, P) (10)

where d(P) is the depth of node P , E(P) is the number of edges in the child links, E is the averagedensity of the whole hierarchy, T (C, P) represents the link type, and � and � (�≥0,0≤�≥1) arethe control parameters of the effect of node density and node depth on the weight.

The distance between two concepts is defined as follows:

Distance(C1,C2)= ∑C∈{path(C1,C2)−L S(C1,C2)}

wt(C, p(C)) (11)

where path(C1,C2) is the set that contains all the nodes in the shortest path from C1 to C2, andL S(C1,C2) is the most informative concept that subsumes both C1 and C2.

In some special cases such as when only the link type is considered as the factor of weightcomputing (�=0, �=1 and T (C, P)=1), the distance algorithm can be simplified as follows:

Distance(C1,C2)= I C(C1)+ I C(C2)−2×simResnik(C1,C2) (12)

where I C(C)=− log P(C).Finally, the similarity value between two concepts C1 and C2 is measured by converting the

semantic distance as follows:

simJiang&Conath(C1,C2)=1−Distance(C1,C2) (13)

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 5: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 509

Table II. Comparison of the typical semantic similarity models.

Category Models Working environment Measure factors

Edge-based Rada et al. [4] Definitional networks Shortest pathLeacock andChodorow [29]

Definitional networks Shortest path

Wu and Palmer [30] Definitional networks Shortest path and node depthNode-based Resnik [10] Definitional networks

or corporaSubsumed nodes in definitional networksor word occurrences in corpora

Lin [33] Definitional networksor corpora

Subsumed nodes in definitional networksor word occurrences in corpora

Pirro [34] Definitional networksor corpora

Subsumed nodes in definitional networksor word occurrences in corpora

Hybrid Jiang and Conrath [35] Semantic networks Shortest path, subsumer, local density,node depth and link types

Li et al. [11] Semantic networks Shortest path, node depth and local density

In addition, Seco’s [39] research showed that the similarity equation can also be expressed as

simJiang&Conath(C1,C2)=1− Distance(C1,C2)

2(14)

The testing results show that the parameters � and � do not heavily influence the similaritycomputation [35].

Li et al. [11] proposed a hybrid semantic similarity model combining structural semantic infor-mation in a nonlinear model. The factors of path length, depth and density are considered in theassessment, which can be mathematically expressed as

simLi (C1,C2)=

⎧⎪⎨⎪⎩

e−�l · e�h −e−�h

e�h +e−�hif C1 =C2

1 if C1 =C2

(15)

where l is the shortest path length between C1 and C2, h is the depth of the subsumer of C1 andC2, � and � are the effects of l and h on the similarity measure.

In order to analyze the features of these models described above, in Table II, we present a hori-zontal comparison for these semantic similarity models. By means of combining this comparisonand the comparison between ontologies and semantic networks displayed in Table I, we concludethat there are two limitations when applying these models in an ontology environment, which canbe expressed as follows:

First, the edge-based and node-based models primarily focus on estimating similarity for nodesin definitional networks. Since types of relations are one-fold in definitional networks, the factorsof types of relations and contexts of relations are ignored when calculating similarity. However,as introduced in Section 2.1, in an ontology environment, the types of relations are various, andrelations can be defined by multiple restrictions. Obviously, the two factors cannot be ignoredwhen computing similarity for ontology concepts.

Second, all these models ignore the factor of the context of nodes when computing semantic simi-larity, due to the feature of nodes in semantic networks, in which a node is composed of a single wordor phrase without adequate properties. In contrast, in the ontology environment, ontology conceptsare defined with sufficient datatype and object type properties, and the combinations of theseproperties can be regarded as the crucial identifications for the concepts. Obviously, the contextsof ontology concepts cannot be ignored when computing similarity between ontology concepts.

Consequently, in order to address the two limitations of these semantic similarity models, in theremainder of this paper, we present an ontology conversion process and a context-aware semanticsimilarity model for an ontology environment.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 6: A context-aware semantic similarity model for ontology environments

510 H. DONG, F. K. HUSSAIN AND E. CHANG

3. ONTOLOGY CONVERSION PROCESS

3.1. Lightweight ontology space

In order to address the limitations of the semantic similarity models, we provide a concept oflightweight ontology space, which includes two basic definitions as follows:

Definition 1 (Pseudo-concept)We define a pseudo-concept � for an ontology concept C , which can be represented as a tuple asfollows:

�={C, [�i ,��i], [o j ,�o j

],Cxo j

,�yo j

} (16)

where in OWL-annotated semantic web documents, C is the name (or Uniform Resource Identifier(URI)) of the concept C , each [] is a property tuple including a property and its restriction (ifavailable), �i (i =1...n) is a datatype property(s) of the concept C , ��i

is a restriction (s) for thedatatype property �i , o j ( j =1 . . .m) is an object property(s) of the concept C , �o j

is a restriction(s)for the object property o j , Cx

o j(x =1 . . .k) is a concept(s) related by the object property o j and

�yo j

(y =1 . . .k−1) is a Boolean operation(s) between concepts Cxo j

.

The aim of defining the pseudo-concept is to encapsulate all properties, and restrictions andcharacteristics of the properties of a concept into a corpus for the concept, which enables thefeasibility of assessing similarity between concepts based on the contexts of their pseudo-concepts.

Definition 2 (Lightweight ontology space)Based on the definition of pseudo-concept, we define a lightweight ontology space as a space ofpseudo-concepts, in which pseudo-concepts are linked only by is–a relations [40]. An is–a relationis a generalization/specification relationship between an upper generic pseudo-concept and a lowerspecific pseudo-concept. In OWL documents, the is–a relation is represented by subClassOf. Theaim of constructing a pseudo-concept space is to simplify the complicated ontology structure andhence to construct a definitional network-like taxonomy. This taxonomy enables the feasibility ofmeasuring concept similarity based on the existing semantic similarity models.

3.2. Theorems for ontology conversion process

In order to convert an ontology into a lightweight ontology space, we need a conversion process.It should be noted that the proposed ontology conversion process takes place only in OWL Liteor OWL DL-annotated semantic web documents. Additionally, from the definitions above, it canbe observed that the conversion process concerns only the schema (concept) level but not theinstance level, because the information about instances is special to some degree and cannotcompletely represent belonged concepts. In order to consider the complexity and flexibility indefining restrictions and characteristics for object properties and datatype properties, a set oftheorems, aligned with the conversion process, needs to be defined. The theorems can be dividedinto six categories in accordance with the components of a pseudo-concept, which are the theoremsregarding the conversion of concepts, datatype properties, object properties, property restrictions,property characteristics and Boolean operations. In the remainder of this section, we introduce andillustrate these theorems based on the six divisions.

Theorem 1If C is the name (URI) of a concept, then C is a component of its pseudo-concept.

For example, for the concept C1 shown in Figure 1, its pseudo-concept �1 ={C1}.Theorem 2.1If C is the name (URI) of a concept and � is a datatype property of C , then � is a component ofits pseudo-concept.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 7: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 511

Figure 1. Example of an ontology concept.

Figure 2. Example of an ontology concept with a datatype property.

Figure 3. Example of an inherited ontology concept with a datatype property.

For example, for the concept C1 shown in Figure 2, it has a datatype property �. According toTheorem 2.1, its pseudo-concept �1 ={C1,�}.Theorem 2.2If C1 is the name (URI) of a concept, � is a datatype property of C1 and C2 is the name (URI) ofa subclass of C1, then � is a component of the pseudo-concept of C2.

For example, for the concepts C1 and C2 shown in Figure 3, C1 has a datatype property � andC2 is a subclass of C1. According to Theorem 2.2, the pseudo-concept �2 for C2 is a tuple thatcan be expressed as {C2,�}.Theorem 3.1If C1 is the name (URI) of a concept, o is an object property of C1 and C2 is the name (URI) ofa concept that relates to C1 through o, then o and C2 are the components of the pseudo-conceptof C1.

For example, for the concepts C1 and C2 shown in Figure 4, C1 has an object property o thatconnects C1 to C2 . According to Theorem 3.1, the pseudo-concept �1 for C1 is a tuple that canbe expressed as {C1,o,C2}.Theorem 3.2If C1 is the name (URI) of a concept, o is an object property of C1, C2 is the name (URI) of aconcept that relates to C1 through o and C3 is the name (URI) of a subclass of C1, then o and C2are the components of the pseudo-concept.

For example, for the concepts C1, C2 and C3 shown in Figure 5, C1 has an object property othat connects C1 to C2, and C3 is a subclass of C1. According to Theorem 3.2, the pseudo-concept�3 for C3 is a tuple that can be expressed as {C3,o,C2}.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 8: A context-aware semantic similarity model for ontology environments

512 H. DONG, F. K. HUSSAIN AND E. CHANG

Figure 4. Example of an ontology concept with an object property.

Figure 5. Example of an inherited ontology concept with an object property.

Figure 6. Example of an ontology concept with a restricted datatype property.

Theorem 4.1If C is the name (URI) of a concept, � is a datatype property of C and � is a restriction for �, thenthe tuple [�,�] is a component of the pseudo-concept of C .

For example, for the concept C1 shown in Figure 6, it has a datatype property �, which has avalue restriction hasValue and a cardinality restriction minCardinality 5. According to Theorem 4.1,its pseudo-concept �1 ={C1, [�,hasValue minCardinality 5]}.Theorem 5.3If C1 is the name (URI) of a concept, o is an object property of C1 , C2 is the name (URI) ofa concept that relates to C1 through o and � is a restriction for the datatype property o, then thetuple [o,�] is a component of the pseudo-concept of C1.

For example, for the concepts C1 and C2 shown in Figure 7, C1 has an object property o thatconnects C1 to C2, and o has a property restriction someValuesFrom and a cardinality restrictioncardinality 1. According to Theorem 3.2, the pseudo-concept �1 for C1 is a tuple that can beexpressed as {C1, [o,someValuesFrom],C2, [o, cardinality 1],C2}.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 9: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 513

Figure 7. Example of an ontology concept with a restricted object property.

Figure 8. Example of an ontology concept with a functional datatype property.

Figure 9. Example of an ontology concept with a functional datatype property.

Theorem 5.1If C is the name (URI) of a concept and � is a functional datatype property of C , then the tuple[�,cardinality 1] is a component of the pseudo-concept of C .

For example, concept C1 shown in Figure 8 has a functional datatype property �. According toTheorem 2.1, its pseudo-concept �1 ={C1, [�,cardinality 1]}.Theorem 5.3If C1 is the name (URI) of a concept, o is a functional object property of C1 and C2 is the name(URI) of a concept that relates to C1 through o, then the tuple [o,cardinality 1] is the componentof the pseudo-concept of C1.

For example, for the concepts C1 and C2 shown in Figure 9, C1 has a functional object propertyo that connects C1 to C2. According to Theorem 5.3, the pseudo-concept �1 for C1 is a tuple thatcan be expressed as {C1, [o,cardinality 1],C2}.Theorem 5.3If C1 is the name (URI) of a concept, o is a transitive object property of C1, C2 is the name (URI)of a concept that relates to C1 through o and C3 is the name (URI) of a concept that relates to C2through o, then o, C2 and C3 are the components of the pseudo-concept of C1.

For example, for the concepts C1, C2 and C3 shown in Figure 10, C1 has a transitive objectproperty o that connects C1 to C2, and C2 has o that connects C2 to C3. According to Theorem5.3, the pseudo-concept �1 for C1 is a tuple that can be expressed as {C1,o,C2,o,C3}.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 10: A context-aware semantic similarity model for ontology environments

514 H. DONG, F. K. HUSSAIN AND E. CHANG

Figure 10. Example of ontology concepts with a transitive object property.

Figure 11. Example of ontology concepts with a symmetric object property.

Theorem 5.4If C1 is the name (URI) of a concept, o is a symmetric object property of C1 and C2 is thename (URI) of a concept that relates to C1 through o, then o and C2 are the components of thepseudo-concept of C1, and o and C1 are the components of the pseudo-concept of C2.

For example, for the concepts C1 and C2 shown in Figure 11, C1 has a symmetric object propertyo that connects C1 to C2. According to Theorem 5.4, the pseudo-concept �1 for C1 is a tuple thatcan be expressed as {C1,o,C2} and the pseudo-concept �2 for C2 is a tuple that can be expressedas {C2,o,C1}.Theorem 5.5If C1 is the name (URI) of a concept, o1 is an inverse functional object property of C1, C2 is thename (URI) of a concept that relates to C1 through o1 and o2 is the inverse property of o1, thenthe tuple [o2,cardinality 1] is the component of the pseudo-concept of C2.

For example, for the concepts C1 and C2 shown in Figure 12, C1 has an inverse functional objectproperty o1 that connects C1 to C2, and o2 is the inverse property of o1. According to Theorem5.5, the pseudo-concept �2 for C2 is a tuple that can be expressed as {C2, [o1,cardinality 1],C1}.Theorem 6.1If C1 is the name (URI) of a concept, o is an object property of C1, C2 and C3 are the names (URI)of concepts that relate to C1 through o and � is a Boolean operation (unionOf or intersectionOf )between C2 and C3 for o, then o, C2, � and C3 are the components of the pseudo-concept of C1.

For example, for the concepts C1, C2 and C3 shown in Figure 13, C1 has an object prop-erty o that connects C1 to C2 and C3, and C2 and C3 are connected with intersectionOf.According to Theorem 6.1, the pseudo-concept �1 for C1 is a tuple that can be expressed as{C1,o,C2, intersectionOf,C3}.Theorem 6.2If C1 is the name (URI) of a concept, o is an object property of C1 and C2 is the name (URI) ofa concept that relates to C1 through the complement of o, then complementOf C2 is a componentof the pseudo-concept of C1.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 11: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 515

Figure 12. Example of ontology concepts with an inverse functional object property.

Figure 13. Example of ontology concepts connected with a Boolean operation (unionOf or intersectionOf ).

Figure 14. Example of ontology concepts connected with a complementOf operation.

For example, for the concepts C1 and C2 shown in Figure 14, C1 has an object property o thatconnects C1 to the complement of C2. According to Theorem 6.2, the pseudo-concept �1 for C1is a tuple that can be expressed as {C1,o,complementOfC2}.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 12: A context-aware semantic similarity model for ontology environments

516 H. DONG, F. K. HUSSAIN AND E. CHANG

4. CONTEXT-AWARE SEMANTIC SIMILARITY MODEL

As described in the previous section, there are two advantages for the ontology conversion processas follows:

• Each ontology concept is converted to a pseudo-concept, which is a tuple of plain texts. Sincethe pseudo-concepts include almost all the features of ontology concepts, it is possible tomeasure the similarity between concepts based on the contexts of pseudo-concepts.

• An ontology with a complicated structure can be simplified to a lightweight ontology bymeans of the conversion process. The taxonomic lightweight ontology enables the adoptionof the existing semantic similarity models to measure the similarity between concepts.

In this section, we propose a hybrid semantic similarity model, by assessing the concept similarityfrom the two perspectives above. This model integrates a pseudo-concept-based semantic similaritymodel and a lightweight ontology structure-based semantic similarity model, which are introduced,respectively, in Sections 5.1 and 5.2.

4.1. Pseudo-concept-based semantic similarity model

In the IR field, in order to measure the similarity between two corpora, a usual method is to usecosine correlation, which can be mathematically expressed as follows:

simcos�(x, y)= x · y

‖x‖ ‖y‖ (17)

where each corpus can be represented by a vector in which each dimension corresponds to aseparate term and the weight of each term in the vector can be obtained by the TF-IDF scheme.

In this research, in order to measure the similarity between two pseudo-concepts, we adopt thecosine correlation aligned with the pseudo-concept model displayed in Equation (16). There aresome special features in the pseudo-concept model, which can be described as follows:

• Each component is separated by a comma and is viewed as a basic unit for the measure. Forexample, in the property tuple [o,cardinality 1], cardinality 1 is seen as a whole for themeasure.

• The property tuples have the following features:

◦ Each property tuple contains no more than two items, which is a property and a restriction(if necessary).

◦ The weights of the terms occurring in each property tuple should be averaged, as a propertytuple should be treated the same as other single items in a pseudo-concept tuple in themeasure. For example, in the tuple [o,someValuesFrom], if the TF-IDF weight of o is0.56 and of someValuesFrom is 0.44, then their actual weights should be 0.28 and 0.22,as the average weight of the tuple is 0.5.

◦ In each tuple, a property has a priority over its affiliated restriction in the measure,since the restriction is a modifier of the property. In other words, if there are twoproperty tuples, their properties are different and their restrictions are the same, thenthere is no similarity between the two property tuples. For example, a pseudo-concept�1 has a tuple [o1,someValuesFrom] and another pseudo-concept �2 has a tuple[o2,someValuesFrom], the similarity value between the two tuples is 0 as C1 =C2.

In accordance with the features of the pseudo-concept model, we design an enhanced cosinecorrelation model to implement the similarity measure, which is displayed in Figure 15.

4.2. Lightweight ontology structure-based semantic similarity model

As mentioned previously, the lightweight ontology structure enables the use of existing semanticsimilarity models in the ontology environment. Here we take the means of Resnik’s node-basedmodel (Equation (7)) for the lightweight ontology-based semantic similarity measure. Nevertheless,

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 13: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 517

Figure 15. Pseudo-code of the pseudo-concept-based semantic similarity model.

one limitation of Resnik’s model is that its interval is [0,∞] . For the purpose of according withthe interval of the cosine correlation, we normalize Resnik’s model by given

|simResnik(�1,�2)|=

⎧⎪⎨⎪⎩

max�∈S(�1,�2)[− log(P(�))]

max�∈�[− log(P(�))]if �1 =�2

1 if �1 =�2

(18)

where � is the collection of concepts in a lightweight ontology.

4.3. Hybrid semantic similarity model

Here we leverage the two semantic similarity models above by means of a weighted arithmeticmean, which can be expressed as

sim(C1,C2)= (1−�)×simcos�(�1,�2)+�×|simResnik(�1,�2)| (19)

where 0≤�≤1.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 14: A context-aware semantic similarity model for ontology environments

518 H. DONG, F. K. HUSSAIN AND E. CHANG

5. EVALUATION

5.1. Performance indicators

In order to empirically compare our proposed model with the existing models, we utilize thesix most widely used performance indicators from the IR field as the evaluation metrics. Theperformance indicators in this experiment are defined as follows:

Precision. Precision in the IR field is used to measure the preciseness of a search system [41].Precision for a single concept refers to the proportion of matched and logically similar concept inall concepts matched to this concept, which can be represented by the following equation:

Precision(S)= Number of matched and logically similar concepts

Number of matched concepts(20)

With regard to the whole collection of concepts in an ontology, the total precision is the sumof the precision value for each concept normalized by the number of concepts in the collection,which can be represented by the following equation:

Precision(T )=∑n

i=1 Precision(Si )

n(21)

Mean average precision. Before we introduce the definition of the mean average precision,the concept of average precision should be defined. Average precision for a single concept is theaverage of precision values after truncating a ranked concept list matched by this concept aftereach of the logically similar concepts for this concept [41]. This indicator emphasizes the returnof more logically similar concepts earlier, which can be represented as:

Average precision(S)= Sum(Precision @ Each logically similar concept in a list)

Number of matched and logically similar concepts in a list

(22)

Mean average precision refers to the average of the average precision values for the collectionof concepts in an ontology, which can be represented as:

Mean average precision=∑n

i=1 Average precision(Si )

n(23)

Recall. Recall in the IR field is used to measure the effectiveness of a search system [41]. Recallfor a single concept is the proportion of matched and logically similar concepts in all conceptsthat are logically similar to this concept, which can be represented by the following equation:

Recall(S)= Number of matched and logically similar concepts

Number of logically similar concepts(24)

With regard to the whole collection of concepts in an ontology, the total recall is the sum of therecall value for each concept normalized by the number of concepts in the collection, which canbe represented by the following equation:

Recall(T )=∑n

i=1 Recall(Si )

n(25)

F-measure. F-measure in the IR field is used as an aggregated performance scale for a searchsystem [41]. In this experiment, F-measure is the mean of precision and recall, which can berepresented as:

F-measure= 2×Precision×Recall

Precision + Recall(26)

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 15: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 519

When the F-measure value reaches the highest level, it means that the aggregated value betweenprecision and recall reaches the highest level at the same time.

F-measure�. F-measure� is another measure that combines precision and recall, and the differ-ence is that users can specify the preference on recall or precision by configuring different weights[42]. In this experiment, we employ F-measure(�=2) that weights recall twice as much as preci-sion, which is close to the fact that most search engines are concerned more with recall thanprecision, as a result of most users’ purposes in obtaining information [43]. F-measure(�=2)can be represented below as:

F-measure(�=2)= (1+�2) ·Precision × Recall

�2 · Precision + Recall= 5× Precision × Recall

4× Precision + Recall(27)

All of the above indicators have the same limitation—they do not consider the number of non-logically similar concepts in a matched concept collection of a concept. Furthermore, if there is nologically similar concept in the matched collection, recall cannot be defined. To resolve this issue,we need another performance indicator—Fallout. In this experiment, fallout for a single concept isthe proportion of a non-logically similar concept matched by this concept in the whole collectionof non-logically similar metadata for this concept [41], which can be represented as:

Fallout(S)= Number of matched and non-logically similar concept

Number of non-logically similar concept(28)

With regard to the whole collection of concepts, the total fallout value is the sum of the falloutvalue for each concept normalized by the number of concepts in an ontology, which can berepresented as:

Fallout(T )=∑n

i=1 Fallout(Si )

n(29)

In contrast to other performance indicators, the lower the fallout value, the better the searchperformance.

5.2. Experiments

In this experiment, we empirically evaluate the performance of the proposed model by comparingits performance with the existing semantic similarity models, in terms of the performance indicatorsintroduced above. For the evaluation purpose we choose several typical semantic similarity models,including Rada’s model (Equation (3)) from the edge-based models, Resnik’s model (Equation (7))and Lin’s model (Equation (8)) from the node-based models and Jiang and Conath’s model (Equa-tion (14)) from the hybrid models. In order to obtain precise data, we implement the subsequentexperiments in a large-scale knowledge base—a health service ontology, which is a conceptual-ization and shared vocabulary of the available health services. The ontology consists of more than200 concepts and around 10 000 instances, and its details can be found from [44].

In the IR field, when a query is sent to a search system, a list of results with similarity valuesis returned from the system. Then the search system needs to decide an optimal threshold valuethat is used to filter the irrelevant results with lower similarity values, in order to obtain thebest performance [45–47]. Analogously, in our subsequent experiments, as a result of that theperformance of each model being different on different threshold values, we need to find the bestperformance for each model. Hence, we need to find the optimal threshold value for each modelwhere each model can achieve the best performance. Consequently, for each model, we decideto start the initial threshold value at 0 and to increase 0.05 at each time until 0.95, since all theintervals of the models are between 0 and 1 except for Resnik’s model, which is between 0 andinfinite. To deal with this problem, we adopt the normalized Resnik’s model (Equation (18)) toreplace Resnik’s model (Equation (7)), because the former has the same performance as the latterbut the interval of the former is between 0 and 1. Subsequently, we obtain the performance datafor each model at each time of the variation of the threshold value.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 16: A context-aware semantic similarity model for ontology environments

520 H. DONG, F. K. HUSSAIN AND E. CHANG

Figure 16. Variation of F-measure values of the four models on threshold values.

Since the F-measure and F-measure(�=2) are two aggregated metrics, we decide to use themas the primary benchmarks for seeking the optimal threshold value. Figures 16 and 17, respectively,show the variation of F-measure values and the variation of F-measure(�=2) values of the fourcandidate models on different threshold values.

Based on the two figures above, we choose the optimal threshold value for each candidatemodel, on which each model can obtain the highest F-measure value and F-measure(�=2)value. Following that, we need to acquire the optimal threshold value for the proposed model.Owing to the fact that our model is based on a weighted arithmetic mean, we also need to findout the optimal � value on which the model can achieve the best performance. Figures 18 and 19,respectively, show the variation of F-measure values of our model on threshold value and � value.

Eventually, we choose the optimal threshold values for each model, respectively, based on thehighest F-measure value and the highest F-measure (�=2) value, which are shown in Tables IIIand IV. Subsequently, we horizontally compare their performance based on the six indicators.

First of all, the performance of the five models on the highest F-measure value is depicted inTable III. It is observed that our model has a significant advantage over the other models in termsof precision, recall and F-measure, in addition to holding the second position on the mean averageprecision and fallout.

Second, the performance of the five models on the highest F-measure (�=2) value is displayedin Table IV. Similar to Table III, our model stands at the first position on precision, recall andF-measure (�=2), and at the second position on mean average precision and fallout.

Based on the two comparisons, it can be deduced that our model performs better than theother models in this experiment. Therefore, we primarily prove the proposed model by theseexperiments.

The reason that the statistical data are relatively low for these models is that we determinethe answer set for each concept based on human judgment. For a large number of conceptswithin the health service ontology, the answer sets are empty, since they are unique and there areno logically similar concepts for them. These concepts lower the average performance of thesemodels.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 17: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 521

Figure 17. Variation of F-measure(�=2) values of the four models on threshold values.

Figure 18. Variation of F-measure values of Dong et al.’s model on threshold values and � values.

6. CONCLUSION

In this paper, by observing the features of the existing semantic similarity models, we find twolimitations within the models when applying them in the ontology environment, which are: (1) thesemodels ignore the context of relations and (2) they ignore the context of ontology concepts. In orderto resolve the two issues, we design a novel solution, including an ontology conversion process

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 18: A context-aware semantic similarity model for ontology environments

522 H. DONG, F. K. HUSSAIN AND E. CHANG

Figure 19. Variation of F-measure(�=2) values of Dong et al.’s model on threshold values and � values.

Table III. Performance of the five models on the highest F-measure values.

Mean averageModel Optimal threshold Precision precision Recall Fallout F-measureNames values (%) (%) (%) (%) (%)

Rada’s model >0.5 13.57 44.00 52.41 11.89 21.55Resnik’s model >0.9 25.60 67.25 34.50 2.82 29.39Lin’s model >0.35 18.79 61.55 43.13 5.86 26.17Jiang and Conath’s model >0.15 22.97 90.68 19.55 0.80 21.12Dong et al.’s model (�=0.4) >0.25 40.23 73.44 54.26 2.02 46.20

Table IV. Performance of the five models on the highest F-measure(�=2) values.

Mean average F-measureModel Optimal threshold Precision precision Recall Fallout (�=2)names values (%) (%) (%) (%) (%)

Rada’s model >0.5 13.57 44.00 52.41 11.89 33.33Resnik’s model >0.4 16.32 46.45 53.70 11.99 36.83Lin’s model >0.25 14.31 47.30 54.58 12.46 22.80Jiang and Conath’s model >0 17.81 82.67 24.52 1.69 34.92Dong et al.’s model (�=0.3) >0.15 30.64 68.17 71.12 3.51 56.26

and a hybrid semantic similarity model. The ontology conversion process aims at encapsulatingthe context of relations and ontology concepts into the body of a pseudo-concept, and transformingan ontology with a complicated structure into a simple lightweight ontology. In order to cope withvarious properties, restrictions and characteristics of properties in the OWL Lite/DL annotatedsemantic web documents, we define a set of theorems for the conversion process. Next, we providea hybrid semantic similarity model, which includes an enhanced cosine correlation model tocompute the similarity between two concepts from the perspective of a pseudo-concept context,and a normalized Resnik’s model to calculate the similarity from the perspective of the lightweightontology structure. Eventually, we take the means of a weighed arithmetic mean to combine

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 19: A context-aware semantic similarity model for ontology environments

A CONTEXT-AWARE SEMANTIC SIMILARITY MODEL 523

the two similarity measures. In order to validate the model, we implement it in a large-scaleknowledge base—a health service ontology. Based on the six performance indicators adopted fromthe IR field, we compare our model with the other four typical models—Rada’s model from theedge-based models, Resnik’s model and Lin’s model from the node-based models and Jiang andConath’s model from the hybrid models. The experimental results show that our model has betterperformance than the other four models, which preliminarily proves its feasibility.

Future works will concentrate mainly on the following three aspects: (1) we will evaluate ourmodel using other large-scale knowledge bases; (2) we will enhance the semantic similarity modelby considering more factors for the similarity computation; (3) we will enhance the ontologyconversion process to better represent the features of the context of properties and restrictions andcharacteristics of properties.

REFERENCES

1. Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG. Measures of semantic similarity and relatedness in thebiomedical domain. Journal of Biomedical Informatics 2006; 40(3):288–299.

2. Miller G, Charles W. Contextual correlates of semantic similarity. Language and Cognitive Processes 1991;6(1):1–28.

3. Rubenstein H, Goodenough JB. Contextual correlates of synonymy. Communications of the ACM 1965; 8(10):627–633.

4. Rada R, Mili H, Bicknell E, Blettner M. Development and application of a metric on semantic nets. IEEETransactions on Systems, Man and Cybernetics 1989; 19(1):17–30.

5. Hliaoutakis A, Varelas G, Voutsakis E, Petrakis EGM, Milios EE. Information retrieval by semantic similarity.International Journal on Semantic Web and Information Systems 2006; 2(3):55–73.

6. Lee J, Kim M, Lee Y. Information retrieval based on conceptual distance in IS-A hierarchies. Journal ofDocumentation 1993; 49(2):188–207.

7. Song W, Li CH, Park SC. Genetic algorithm for text clustering using ontology and evaluating the validity ofvarious semantic similarity measures. Expert Systems with Applications 2009; 36(5):9095–9104.

8. Srihari RK, Zhang ZF, Rao AB. Intelligent indexing and semantic retrieval of multimodal documents. InformationRetrieval 2000; 2:245–275.

9. Sussna M. Word sense disambiguation for free-text indexing using a massive semantic network. Proceedings of theSecond International Conference on Information and Knowledge Management (CIKM ’93). ACM: Washington,1993; 67–74.

10. Resnik P. Semantic similarity in a taxonomy: An information-based measure and its application to problems ofambiguity in natural language. Journal of Artificial Intelligence Research 1999; 11:95–130.

11. Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multipleinformation sources. IEEE Transactions on Knowledge and Data Engineering 2003; 15(4):871–882.

12. Lin D. Automatic retrieval and clustering of similar words. Proceedings of the 17th International Conference onComputational Linguistics (COLING 98), Montreal, Que., Canada. ACM: New York, 1998; 768–774.

13. Rosenfield R. A maximum entropy approach to adaptive statistical modelling. Computer Speech and Language1996; 10:187–228.

14. Tang Y, Zheng J. Linguistic modelling based on semantic similarity relation among linguistic labels. Fuzzy Setsand Systems 2006; 157(12):1662–1673.

15. Steichen O, Bozec CD, Thieu M, Zapletal E, Jaulent MC. Computation of semantic similarity within an ontologyof breast pathology to assist inter-observer consensus. Computers in Biology and Medicine 2006; 36(7):768–788.

16. Chiang J-H, Ho S-H, Wang W-H. Similar genes discovery system (SGDS): application for predicting possiblepathways by using GO semantic similarity measure. Expert Systems with Applications 2008; 35(3):1115–1121.

17. Couto FM, Silva MJ, Coutinho PM. Measuring semantic similarity between Gene Ontology terms. Data andKnowledge Engineering 2007; 61(1):137–152.

18. Othman RM, Deris S, Illias RM. A genetic similarity algorithm for searching the Gene Ontology terms andannotating anonymous protein sequences. Journal of Biomedical Informatics 2008; 41(1):65–81.

19. Sevilla JL, Segura VC, Podhorski A, Guruceaga E, Mato JM, Martinez-Cruz LA, Corrales FJ, Rubio A. Correlationbetween gene expression and GO semantic similarity. IEEE/ACM Transaction on Computational Biology andBioinformatics 2005; 2(4):330–338.

20. Liu M, Shen W, Hao Q, Yan J. An weighted ontology-based semantic similarity algorithm for web service.Expert Systems with Applications 2009; 36(10):12480–12490.

21. Bhatt M, Flahive A, Wouters C, Rahayu JW, Taniar D, Dillon TS. A distributed approach to sub-ontology extraction.Proceedings of the 18th International Conference on Advanced Information Networking and Applications (AINA2004), Fukuoka, Japan. IEEE Computer Society: Silver Spring, MD, 2004; 636–641.

22. Bhatt M, Wouters C, Flahive A, Rahayu JW, Taniar D. Semantic completeness in sub-ontology extractionusing distributed methods. Computational Science and Its Applications—ICCSA 2004, Assisi, Italy, Lagan A,Gavrilova ML, Kumar V, Mun Y, Tan CJK, Gervasi O (eds.). Springer: Berlin, 2004; 508–517.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe

Page 20: A context-aware semantic similarity model for ontology environments

524 H. DONG, F. K. HUSSAIN AND E. CHANG

23. Bhatt M, Flahive A, Wouters C, Rahayu JW, Taniar D. MOVE: A distributed framework for materialized ontologyview extraction. Algorithmica 2006; 45(3):457–481.

24. Sahami M, Heilman T. A web-based kernel function for measuring the similarity of short text snippets. Proceedingsof the15th International World Wide Web Conference (WWW 2006), Edinburgh, U.K. ACM: New York, 2006;377–386.

25. Feng CC, Flewelling DM. Assessment of semantic similarity between land use/land cover classification systems.Computers, Environment and Urban Systems 2004; 28(3):229–246.

26. Gruber T. A translation approach to portable ontology specifications. Knowledge Acquisition 1995; 5(2):199–220.27. McGuinness DL, Harmelen Fv. OWL web ontology language overview: W3C recommendation, W3C, 10 February

2004. Available at: http://www.w3.org/TR/2004/REC-owl-features-20040210/ [10 October 2009].28. Sowa JF. Semantic networks. Encyclopedia of Artificial Intelligence, Shapiro SC (ed.). Wiley: New York, 1992.29. Leacock C, Chodorow M. Combining local context and WordNet similarity for word sense identification. WordNet:

An Electronic Lexical Database, Fellbaum C (ed.). Wiley: New York, 1995; 265–283.30. Wu Z, Palmer M. Verb semantics and lexical selection. Proceedings of the 32nd Annual Meeting of the Associations

for Computational Linguistics. Association for Computational Linguistics: Las Cruces, New Mexico, U.S.A.,1994; 133–138.

31. Hirst G, St-Onge D. Lexical chains as representations of context for the detection and correction of malapropisms.WordNet: An Electronic Lexical Database, Fellbaum C (ed.). The MIT Press: Cambridge, MA, U.S.A., 1998;305–331.

32. Richardson R, Smeaton AF. Using WordNet in a knowledge-based approach to information retrieval. Dublin CityUniversity, Dublin, 1995.

33. Lin D. An Information-theoretic definition of similarity. Proceedings of the 15th International Conference onMachine Learning (ICML ’98), Madison, WI, U.S.A. Morgan Kaufmann: Los Altos, CA, 1998; 296–304.

34. Pirro G. A semantic similarity metric combining features and intrinsic information content. Data and KnowledgeEngineering 2009; 68(11):1289–1308.

35. Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of theInternational Conference on Research in Computational Linguistics (ROCLING X), Taiwan, 1997; 19–33.

36. Maguitman A, Menczer F, Roinestad H, Vespignani A. Algorithmic detection of semantic similarity. Proceedingsof the 14th International Conference on World Wide Web (WWW 2005), Chiba, Japan. ACM: New York, 2005;107–116.

37. Zuber VS, Faltings B. OSS: A semantic similarity function based on hierarchical ontologies. Proceedings of the20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India. Morgan Kaufmann:Los Altos, CA, 2007; 551–556.

38. Tversky A. Features of similarity. Psychological Review 1977; 84(2):327–352.39. Seco N. Computational models of similarity in lexical ontologies. Master’s Thesis, University College Dublin,

Dublin, 2005.40. Dong H, Hussain FK, Chang E. A hybrid concept similarity measure model for ontology environment. On the

Move to Meaningful Internet Systems: OTM 2009 Workshops, Vilamoura, Portugal, Meersman R, Tari Z, Herrero P(eds.). Springer: Berlin, 2009; 848–857.

41. Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. ACM Press: Harlow, 1999.42. Rijsbergan CJv. Information Retrieval. Butterworths: London, 1979.43. Su LT. The relevance of recall and precision in user evaluation. Journal of the American Society for Information

Science and Technology 1999; 45(3):207–217.44. Dong H, Hussain FK, Chang E. A framework for discovering and classifying ubiquitous services in digital health

ecosystems. Journal of Computer and System Sciences 2010; DOI: 10.1016/j.jcss.2010.02.009.45. Fahringer T, Jugravu A, Pllana S, Prodan R, Junior CS, Truong H-L. ASKALON: A tool set for cluster and grid

computing. Concurrency and Computation: Practice and Experience 2005; 17(2–4):143–169.46. Schwartz C. Web search engines. Journal of the American Society for Information Science 1998; 49(11):973–982.47. Silvestri F, Puppin D, Laforenza D, Orlando S. Toward a search architecture for software components. Concurrency

and Computation: Practice and Experience 2006; 18(10):1317–1331.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:505–524DOI: 10.1002/cpe


Top Related