a semantic metrics suite for evaluating modular ontologies

Contents lists available at SciVerse ScienceDirect

Information Systems

Information Systems 38 (2013) 745–770

0306-43

http://d

n Corr

Canada

E-m

wdu@u

journal homepage: www.elsevier.com/locate/infosys

A semantic metrics suite for evaluating modular ontologies

Faezeh Ensan a,n, Weichang Du b

a Sauder School of Business, University of British Columbia, Vancouver, BC, Canadab Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada

a r t i c l e i n f o

Available online 3 January 2013

Keywords:

Ontologies

Modular ontologies

Ontology design

Ontology evaluation

Measurements

Semantic metrics

Cohesion

Coupling

Description logics

Reasoning performance

79/$ - see front matter & 2012 Elsevier Ltd.

x.doi.org/10.1016/j.is.2012.11.012

espondence to: Suite 303, 18 Pemberton A

. Tel.: þ1 647 348 8750.

ail addresses: [email protected] (F. E

nb.ca (W. Du).

a b s t r a c t

Ontologies, which are formal representations of knowledge within a domain, can be used

for designing and sharing conceptual models of enterprises information for the purpose

of enhancing understanding, communication and interoperability. For representing a

body of knowledge, different ontologies may be designed. Recently, designing ontologies

in a modular manner has emerged for achieving better reasoning performance, more

efficient ontology management and change handling. One of the important challenges in

the employment of ontologies and modular ontologies in modeling information within

enterprises is the evaluation of the suitability of an ontology for a domain and the

performance of inference operations over it. In this paper, we present a set of semantic

metrics for evaluating ontologies and modular ontologies. These metrics measure

cohesion and coupling of ontologies, which are two important notions in the process

of assessing ontologies for enterprise modeling. The proposed metrics are based on

semantic-based definitions of relativeness, and dependencies between local symbols, and

also between local and external symbols of ontologies. Based on these semantic

definitions, not only the explicitly asserted knowledge in ontologies but also the implied

knowledge, which is derived through inference, is considered for the sake of ontology

assessment. We present several empirical case studies for investigating the correlation

between the proposed metrics and reasoning performance, which is an important issue

in applicability of employing ontologies in real-world information systems.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Semantic Web techniques have been increasinglyapplied to real-world problems in the recent years. TheOntology for Biomedical Investigations (OBI) [45], theGene and Gene products ontology [1] and the Earth andenvironmental terminologies (SWEET Ontologies) [43] areamong the large ontologies used for describing complexdomains. So far, numerous techniques, frameworks andmethodologies for ontology development and manage-ment [15,17,14] and ontology description [37,30] havebeen proposed. Among these languages, DescriptionLogics (DLs)-based languages have shown to be able to

All rights reserved.

venue, Toronto, ON,

nsan),

provide a strong basis for defining and formalizing ontol-ogies. OWL-DL is an instance of DL-based languages,which has been widely used for representing ontologies.Various reasoning algorithms [4], query answering tech-niques [21] and reasoning engines [27,44] facilitate theemployment of DL-based ontologies.

For describing a given domain of discourse, differentontology designs are often feasible. One approach toontology design is monolithic design, i.e., all capturedconcepts, roles, axioms and assertions of the domain aregathered in one monolithic ontology. For instance all theontology concepts and roles are placed in a single OWLfile in the case of OWL-DL ontologies. The secondapproach is to use a modularization strategy, i.e., thedomain is collectively described using a set of ontologies,called ontology modules, each of which describes a sub-domain of discourse. Sub-domains of a domain may havedependencies on one another. Hence, their corresponding

www.elsevier.com/locate/infosys

www.elsevier.com/locate/infosys

http://dx.doi.org/10.1016/j.is.2012.11.012



http://crossmark.dyndns.org/dialog/?doi=10.1016/j.is.2012.11.012&domain=pdf



mailto:[email protected]

mailto:[email protected]


Faezeh Ensan, Weichang Du / Information Systems 38 (2013) 745–770746

ontology modules cannot be represented independently.An ontology module may need to refer to symbols(classes, roles and individuals) of other modules. OWL-DL has an ‘‘import’’ feature that can be used for integrat-ing different ontology modules together and allows themto have references to external symbols from other mod-ules. Using the import feature, an ontology module is ableto import the whole knowledge base of one or moreontology modules. Modular ontology formalisms such asDDL [8], E-connections [33], P-DL [5] and IBF [16] providealternative ways for defining and integrating ontologymodules. These formalisms provide new extensions toexisting Description Logics syntax and semantics. Theobjective of these formalisms is to allow ontology mod-ules to evolve independently and the reasoning to beperformed only on the relevant parts of the knowledgebase.

The emergence of new technologies for developingmodular ontologies [49] highlights the increasing need forappropriate metrics for evaluating modular ontologies.A clear set of such metrics for ontology evaluation canfacilitate the comparison and selection of ontologies fordifferent applications. It also enhances the process ofmodular ontology design and development by providingunambiguous guidelines for ontology developers. As yet,no widely accepted metric suites have been introducedfor assessing modular ontologies.

1.1. Related works

Considerable amount of research has been conductedon the evaluation of ontologies [9]. One importantapproach for evaluating ontologies is to investigate theircompleteness, correctness and coverage with regards to adomain of discourse. This investigation can be accom-plished by comparing ontologies with a gold standard,which is manually designed for the domain of discourse[36,10]. In Spyns et al. [46], ontology triples are evaluatedby analyzing how appropriately they cover the topic of acorpus. The notions of precision and recall are employedfor measuring domain coverage. The work of Brewsteret al. [11] follows a similar approach for investigating howwell a given ontology fits the domain knowledge. In theirwork, the concepts and properties of the ontology arecompared with the terms in domain-specific documents,and the overlaps are analyzed. In addition, a probabilisticmethod is employed for finding the best ontology for agiven domain-specific corpus.

Another approach for ontology evaluation is task-based evaluation. According to this approach, an ontologyis analyzed to find out how effective it is in the perfor-mance of a well-defined task, i.e., whether the use of agiven ontology in a specific task effects the efficiency ofthe tasks in that context or not [41]. In Lozano-Tello andGomez-Perez [34], a different approach for ontologyevaluation is proposed. There, a set of characteristics ofontologies, which should be considered by human expertsfor assessment and evaluation, is introduced and dis-cussed. These characteristics are about the content of anontology, the language in which the ontology is imple-mented, and the methodology and software tool that are

employed for developing the ontology and the cost of theusage of the ontology.

The importance of metrics for measuring quality ofontologies is widely acknowledged in the literature.Gangemi et al. [20] propose measures for assessingstructural, functional and usability-profiling dimensionsof ontologies. The functional dimension focuses on theapplication of an ontology in a context and the usability-profiling dimension focuses on profiling and metadata.The structural dimension is concerned with the syntaxand formal semantics of ontologies and is based on therepresentation of ontologies as graphs. Some of theproposed structural metrics are the depth and breath ofthe ontology graph, the cardinality of sibling nodes andleaf nodes, the average rate of axioms per class, consis-tency ratio, cycle ratios and inverse relation ratio.

In Burton-Jones et al. [12], a metric suite is proposedfor ontology assessment. These metrics measure thesyntactic, semantic, pragmatic and social qualities of anontology. Syntactic quality is concerned with the propor-tion of the features of the ontology language that areutilized for representing an ontology and the degree ofincorrect syntax within that ontology. Semantic qualityfocuses on the meaning of the ontology terms. Throughsemantic quality metrics, the meanings of the ontologyterms are investigated in a lexical database such asWordNet to see if the utilized terms are meaningful andclear. In addition, the inconsistent usage of terms aremeasured. Pragmatic quality evaluates the usefulness of agiven ontology through its true statements, its relativityto the application requirements and also its size. Finally,social qualities measure the number of times that a givenontology is accessed by other ontologies.

In Tartir et al. [50], a set of metrics are introduced forassessing the schema and the entire knowledge base(including individuals) of an ontology. Some of thesemetrics include: relationship richness, which measuresthe placement of the defined roles in the ontology;inheritance richness, which measures the average numberof subclasses per class; attribute richness, which measuresthe number of attributes per class; and class richness,which measures the distribution of instances betweenclasses. Also, in Tartir et al. [50], a metric for measuringcohesion is introduced, which is equal to the number ofseparate connected components of the graph that repre-sents a given ontology. It is assumed that two nodes(corresponding to two instances) are connected if there isa relationship between their instances. Following a simi-lar approach in considering ontology graphs for the sakeof assessment, the work by Zhang et al. [55] proposes aset of metrics for measuring ontology complexity. Thesemetrics measure the number of concepts, relations, pathsand the average number of relations per concepts, theaverage path per concept, and the average connectivitydegree of each concept.

The work in Wang et al. [53] is among few existingresearch on modular ontology evaluation in the literature.It introduces a set of characteristics that modular ontolo-gies need to have. These characteristics include reusability,encapsulation, loose coupling, authorization, self-contain-ment, scalability, and reasoning support. It analyzes different

Faezeh Ensan, Weichang Du / Information Systems 38 (2013) 745–770 747

modular ontology formalisms to find out how they supportthese characteristics.

Cohesion and coupling are two important measuresthat can be employed for evaluating modular ontologies.Cohesion refers to the uniformity of an ontology moduleand coupling refers to the interdependencies that existsbetween ontology modules. Cohesion and coupling arewell-known measurements for evaluating object orienteddesigns. In Chidamber and Kemerer [13], two objects aredefined as coupled if at least one of them affects the other.Cohesion for an object is defined based on the degree ofsimilarity of its methods. In Kramer and Kaindl [32], a setof metrics are introduced for measuring coupling andcohesion in knowledge-based systems. These metricsmeasure the cohesion of a frame and the coupling oftwo or more frames by means of the relations betweenthe slots of the frames that are induced through theircommon references in rules.

There are a few proposals in the literature that inves-tigate the notions of cohesion and coupling for ontologies.In Yao et al. [54], the cohesion of an ontology is investi-gated through some structural metrics such as the num-ber of root classes, the number of leaf classes, and theaverage depth of inheritance. In Stuckenschmidt and Klein[47], the authors introduce the notion of coherent mod-ules and propose a methodology for partitioning a mono-lithic ontology to a set of coherent modules, while acoherent module contains a set of concepts that aredependant on each other. In this specific work, only thosedependencies are considered that can be derived from thestructure of a given ontology (not from its semantic).Authors claim this structural method, contrary to anysemantic method, can scale up to large ontologies.

In Orme et al. [40], the coupling of an ontology ismeasured by counting the number of external classes thatare used for defining classes and properties in the ontol-ogy, the number of references to external classes, and thenumber of includes in the ontology. In Ma et al. [35], a setof metrics are introduced for measuring cohesion ofontologies. These metrics are: Number of Ontology Parti-tions (NOP), Number of Minimally Inconsistent Subsets(NMIS) and Average Value of Axiom Inconsistencies(AVAI). These metrics are obtained based on the seman-tics of a given ontology rather than its syntax. Since thesemetrics are introduced for changing ontologies, theirfocus is mostly on inconsistencies that may be inducedby ontology axioms.

1.2. Motivation

Despite the increasing demand for developing ontolo-gies in a modular manner and the considerable amount ofresearch on modular ontologies in the recent years [49],only a few works have been reported on evaluatingmodular ontologies and even less accomplished works onmetrics for evaluating modular ontologies. In the following,we describe the main arguments that motivate our work:

�
The main contributions of a modular approach in theprocess of ontology development are to bring flexibility
for component reuse, support for more efficient queryanswering and enhance component change and evolu-tion. These important perspectives have not receivedenough attention in the literature for evaluating ontol-ogies. For assessing the quality of a modular ontology,in addition to the currently discussed criteria forevaluating ontologies in the literature such as domaincoverage or possible inconsistencies and syntax errors,we need to evaluate ontologies from a ‘‘design-based’’perspective, i.e., attempt to perform the followinganalyses: (understandability) How well can ontologiesbe understood by a third-party observer? (reusability)How well can a portion of the ontology be understoodand shared for being reused in developing ontologiesfor other domains of discourse? How much of non-related knowledge is protected when shared withexternal users? (evolvability and maintainability) Whena portion of the ontology evolves or versioned, howmuch are the other parts of the ontology affected?(reasoning performance) How efficient is query answer-ing over the ontology when queries are only concernedwith a small portion of the ontology?
� For representing the same body of knowledge, ontol-
ogies may be designed in monolithic or modular ways.There may be even more than one modular designswhere the number of modules, distribution of con-cepts, roles and axioms and the relationships betweenontology modules are organized differently. Althoughthe advantages of modular designs for ontologies overthe monolithic designs have been well discussed in theliterature [48], it is hard to find measurements andmetrics for evaluating and comparing various ontologydesigns. For example, assume that there are twomodular designs with different number of modules,which represent the same body of knowledge. It isdesirable to have a set of metrics that enable us toevaluate these deigns and to investigate which one isbetter with regards to design-based perspectives suchas understandability or evolvability.
� Most existing metrics for assessing ontologies are
syntax based, i.e., they are based on the graph repre-sentation of ontologies or only consider the explicitlyasserted axioms in the ontology knowledge bases anddo not take the semantics of ontologies into accountproperly [52]. For evaluating modular ontologies andontology designs, it would be desirable to have a set ofsemantic metrics that are independent of the repre-sentation and the syntax of the knowledge base.Semantic metrics have this considerable advantagethat consider not only the explicitly asserted axiomsin an ontology but also the implied axioms, which areobtained by reasoning.

1.3. Contributions

The objective of this paper is to propose a set ofmetrics for evaluating DL-based modular and monolithicontologies from a design-based perspective. For thispurpose, we define a set of metrics for ontologies, mod-ular ontologies and also ontology designs. We investigate


the notions of cohesion and coupling, which are highlyattached to the design-based perspective for the evalua-tion of ontologies. This way, highly cohesive and lowlycoupled ontology modules can be understood, reused andevolved more easily. We define a set of metrics forassessing cohesion and coupling in ontologies and ontol-ogy designs. These metrics are defined based on thedefinitions of dependencies and relativeness betweenlocal symbols of ontologies (as a basis for cohesion) andbetween their local and external symbols (as a basis forcoupling). In view of the fact that the local and externaldependencies are defined semantically, all relationshipsbetween ontological symbols, either explicitly asserted ornot, will be taken into account for the sake of assessmentand measurement.

The main contributions of this paper can be enumer-ated as follows:

�
We provide a comprehensive basis for evaluating DL-based ontologies in modular and monolithic designs.We formalize the notion of metrics for evaluatingontology modules and comparing ontology designs. � We define a set of metrics for measuring cohesion and
coupling in ontologies. These metrics are based onprecise semantic-based definitions for dependenciesbetween local and external symbols in ontologies.Based on the cohesion and coupling metrics, we definemetrics for evaluating different ontology designs.
� We present several studies for analyzing the associa-
tion between the introduced metrics and the timerequired for consistency checking and running con-junctive queries over ontology modules. We show thatconsistency checking and query execution are consid-erably more efficient in highly cohesive ontologymodules.

1.4. Outline

The rest of this paper is organized as follows: Section 2gives some preliminaries about DL based ontologies,Interface-Based modular ontology Formalism (IBF) [16,18],and the query dispatching and integrating policy thatwe employed for running and testing conjunctive queriesover modular designs. We use IBF and also ‘‘owl:im-ports’’ mechanism to model modular designs in our casestudies. Among the presented sections, Section 5 is theonly dependent section on the content of Sections 2.2and 2.3 for presenting the case studies. In Section 3, weprovide definitions for ontology modules, monolithic andmodular ontology designs and metrics for evaluatingontology modules and comparing ontology designs.Section 4 introduces cohesion and coupling for evaluat-ing ontology modules and provides a set of metrics formeasuring cohesion and coupling in ontology modulesand ontology designs. In Section 5 the proposed metricsare investigated with regards to the reasoning perfor-mance. Section 6 provides some discussion on the metricsand directions for future work. Finally, Section 7 concludesthe paper.

2. Background

2.1. DL-based ontologies

An ontology (a DL knowledge base) is defined asM¼/T ,AS; where T denotes TBox, which comprises aset of General Inclusion Axioms (GCIs); and A stands forABox, which is the assertional part of the knowledge base.A GCI is of the form of ALB, where A and B can be nameconcepts (also called atomic concepts) or complex conceptdescriptions [2]. Complex descriptions are built based onconcept constructors, role constructors, and named conceptsand roles. Different Description Logics provide different setsof concept and role constructors. For example, the Descrip-tion Logic ALC has these concept constructors: conceptconjunction ðuÞ, concept disjunction ðtÞ, concept complementð:Þ, universal restriction ð8Þ, and existential restriction ð(Þ. Fora more complete discussion about DLs constructors seeBaader et al. [3]. The signature of an ontology M, which isdenoted as SigðMÞ, is defined as a set of all concept names,role names, and individual names in its TBox and ABox. Thesemantics of an ontology is given by means of interpreta-tions. An interpretation I ¼ ðDI ,�I Þ consists of a non-emptyset DI and a mapping function �I , which maps each conceptC 2 SigðMÞ to CIDDI , each role R 2 SigðMÞ to RIDDI �DI and each individual a 2 SigðMÞ to aI 2 DI . Function �I isextended to map complex concepts and roles that are definedbased on the constructors of different Description Logics (amore complete explanation on the various DLs concept androle constructors can be found in Baader et al. [3]). Aninterpretation I satisfies a general inclusion axiom CLD iffCIDDI , satisfies an assertion C(a) iff aI 2 CI and an asser-tion Rðx,yÞ iff /xI ,yIS 2 RI . The satisfaction relationbetween I and the other axioms, which are defined by moreexpressive Description Logics, such as those that support rolehierarchy and inverse roles, can be found in [3,31]. Aninterpretation I is a model of a knowledge base M if itsatisfies every axiom and assertion in the TBox and ABox ofM. An axiom a is implied by a knowledge base M anddenoted asMFa iff all models ofM satisfies a.

2.2. Interface-based modular ontology formalism

The Interface-Based modular ontology Formalism (IBF)[16,18] is a formalism for developing modular ontologies.Through this formalism, a modular ontology is defined asa set of ontology modules and interfaces. In IBF, thedependencies between modules have been modeledthrough interfaces. An ontology module can either realize

or utilize an interface. In fact, realizer modules export thepublic part of their TBox through interfaces to be reusedby utilizer modules. A utilizer module augments itsknowledge base with the definitions and assertions pro-vided by the realizer modules. In the definition of mod-ular ontologies, a configuration function exists thatspecifies which utilizer and realizer modules should beconnected to each other. Based on IBF, ontology modulescan communicate through interfaces and not directly,which is an important step to reduce the tightlycoupled-ness of ontologies. IBF follows an augmentation

approach for sharing ABox assertions through ontology

Fig. 1. An IBF modular ontology for academic people and publications.


modules. Accordingly, IBF uses conjunctive or epistemicqueries to retrieve all individuals of interface conceptsand roles from a realizer module and augments thedomain of the utilizer module with these individuals.

Formal definitions for IBF syntax and semantics can befound in Ensan and Du [18]. Here, we just describe itsapplication for defining modular ontologies through anexample. Fig. 1 shows an IBF modular ontology. This modularontology is comprised two ontology modules: OM-Publication and OM-Person and one interface: Inf-Person.OM-Person is an ontology module, which describes people inan academic context. OM-Publication describes academicpublications such as conference papers, books and manuals.1

OM-Person realizes the interface while OM-Publication uti-lizes it. OM-Publication uses the symbol in Inf-Person in itsTBox in the axiom: >L8 publicationAuthor.Inf-Person:Per-son. According to IBF, for augmentation, OM-Publicationshould pose queries over OM-Person for the concepts androles of the interface (in this case there is only one interfaceconcept, which is Person). After retrieving individuals, thefollowing assertions are inserted into the OM-Publicationontology module: Inf-Person:Person(John), Inf-Person:Per-son(Mary), Inf- Person:Person(Elizabeth).

2.3. Query execution

Conjunctive queries are expressive query languages forsearching ABoxes of knowledge bases and can be formulated

1 The content of OM-Person and OM-Publication is inspired from the

Univ-Bench ontology of the LUBM [26] framework.

in different language such as SPARQL [42]. In Section 6, wewill analyze the required time for answering conjunctivequeries over modular ontologies. Here, we briefly explainour methodology for executing conjunctive queries overmodular ontologies. Consider the following simple query:

SELECT ?X

WHERE f

?X rdf:type Publication.

g

This query has one symbol: Publication which is a localsymbol of OM-Publication ontology module. Hence, thisquery is posed to OM-Publication. Now consider thefollowing query, a more complex conjunctive query thathas symbols that belong to local symbols of differentontology modules.

SELECT ?X ?Y

WHERE f


?X publicationAuthor ?Y.

?Y rdf:type Person

g

This query has three symbol: Publication and publica-tionAuthor, which are local symbols of OM-Publicationontology module; and Person, which is a local concept ofOM-Person module. In this case, the query is broken downinto two smaller queries:

SELECT ?X ?Y

WHERE f


?X publicationAuthor ?Y

g


SELECT ?Y

WHERE f

?Y rdf:type Person

g

where the first query is posed to OM-Publication and thesecond query is posed to OM-Person. The result set of theoriginal query is formed by joining the result sets of thesesub-queries.

3. Evaluation of ontologies: modular and monolithicdesigns

In this section, we first clarify our interpretation ofnotions of ontology modules and ontology designs. Wegive precise definitions of metrics for evaluating ontologymodules and ontology designs.

Up until now, ontology modules and modular ontolo-gies have been defined differently in the literature[48,18,6,24]. The basic notion in which most existingdefinitions coincide is that ontology modules have refer-ences to external symbols from other ontologies. Moreformally, an ontology module M is a DL knowledge basewhose signature consists of two disjoint sets of localsymbols ðLocðMÞÞ, and external symbols ðExtðMÞ). Exter-nal symbols are the local symbols of other ontologymodules, and SigðMÞ � LocðMÞ ] ExtðMÞ. This definitionis inclusive for ordinary ontologies that are developedbased on the monolithic approach. Accordingly, mono-lithic ontologies are ontology modules whose set ofexternal symbols are empty.

In the following, we present a definition for numericalmetrics that measure ontology modules. Since the defini-tion of ontology modules includes cases where ontologiesare designed monolithically, the following definition canalso be used for defining metrics for evaluating mono-lithic ontologies. For defining ontology metrics, we followthe approach that is employed in Chidamber and Kemerer[13] for defining software metrics for object orientedsystems.

Definition 1. Let fM1, . . . ,Mng be a set of ontologymodules. Further, assume P is a binary relation on theseontology modules that corresponds to a specific perspec-tive for measuring ontology modules. Further, letfq1, . . . ,qng be a set of numbers and S be a binary relationon these numbers, such as 4 , o or ¼ . A numericalmetric m maps a tuple of ontology modules and therelationship between them ð/fM1, . . . ,MngÞ,PS to a setof numbers and a binary relation on numbers/fq1, . . . ,qng,SS.

As an explanatory example, assume there are threeontology modulesM1,M2 andM3. Further, assume P is arelationship between these ontology modules from thereusability perspective such that PðM1,M2Þ means thatM1 is more reusable than M2. Consider three arbitrarynumbers q1 ¼ 10, q2 ¼ 5 and q3 ¼ 2 and the relation S onthese numbers be ‘‘less-than’’ or ‘‘o ’’. m is a numericalmetric that assignsM1 to q1,M2 to q2,M3 to q3 and P toS. Based on m, since q3oq2oq1,M3 is more reusable thanM2 and M2 is more reusable than M1.

For introducing a numerical metric for measuringontology modules, it should be shown how the corre-sponding numbers are produced for each ontologymodule. In addition, it should be justified how therelation between numbers (S) corresponds to the rela-tionship between ontology modules from a specificperspective (P).

Example 1. Different ontology modules have differentlevels of understandability. Hence, there is a relationshipP between ontology modules from the understandabilityperspective such that PðM1,M2Þ means that ontologymoduleM1 is more understandable thanM2. As a simpleexample for numerical metrics, assume that the size ofontology modules is defined for measuring ontologymodules from the understandability perspective. Thejustification would be that ontology modules with smallersizes represent a more focused portion of the domainof discourse and can be understood more easily. The sizeof an ontology module is defined as the number ofsymbols in its signature, i.e., for ontology module M,SizeðMÞ ¼ 9SigðMÞ9. For ontology modules Mi, SizeðMiÞ

is the number based on which the relationship S (‘‘less-than-equal’’ (r)) can be defined. The numerical metricSize assigns a number to each ontology module. Therelationship S that exists between these numberscorresponds to the relationship P between ontologymodules.

Definition 2 formally defines ontology designs:

Definition 2. An ontology design D is defined as a tuple/M,SymS where M is the set of ontology modules. Foreach M 2M, there is a set of ontology modules NDM,such that M=2N and ExtðMÞD

SMi2NLocðMiÞ. Sym is the

set of all local symbols of ontology modules in the design:Sym�

SMi2M LocðMiÞ.

The following example shows an ontology design andits components for representing the Tourism domain.

Example 2. Let OM-Accommodation and OM-Attractionsbe two ontology modules for describing accommoda-tions and attractions of the Tourism domain, respectively.The knowledge base of OM-Accommodation is fHotelLAccommodation, MotelLAccommodationg. The knowledgebase of OM-Attraction is fBeachLSightseeing, MuseumLSightseeing, AppropriateSightseeingLSightseeing u ( isCloseTo.OM-Accommodation: Accommodation g. D¼/M,SymS isan ontology design for the Tourism domain where M¼fOM� Accommodation; OM� Attractiong and Sym¼ fHotel;

Motel; Accommodation; Beach;SightSeeing; Museum; AppropriateSightSeeing; isCloseTog.

In Definition 2, 9M9 denotes the number of onto-logy modules that the ontology design D has. If 9M9 isequal to one, D is a monolithic design and its onlymodule does not have any references to any externalontologies.

Example 3. Consider the Tourism domain that is describedin Example 2. This domain can be represented throughthe monolithic ontology design D0 ¼/M0,Sym0S, whereM0 includes just one ontology module: OM-Tourism.


The knowledge base of the OM-Tourism is: fHotelLAccommodation, MotelLAccommodation, BeachL Sightseeing,MuseumLSightseeing, AppropriateSightseeingL Sightseeingu(isCloseTo. Accommodation g. Here Sym0 is equal to Sym inExample 2.

Similar to the definition of metrics for ontology mod-ules, we can define metrics for ontology designs asfollows:

Definition 3. Let fD1, . . . ,Dng be a set of ontology designs.Assume P to be a binary relation on ontology designs thatcorresponds to a specific perspective such as maintain-ability, reusability and reasoning performance. Further,let fq1, . . . ,qng be a set of numbers and S be a binaryrelation on these numbers, e.g., 4 , o or ¼ . A numericalmetric m maps a tuple of ontology designs and therelationships between them /fmathcalD1, . . . ,Dng,PS to aset of numbers and a binary relation on numbers/fq1, . . . ,qng,SS.

For example, assume that metric m measures theaverage size of ontology modules in an ontology designfor evaluating their understandability. For an ontologydesign D¼/M,SymS, m¼ 9Sym9=9M9. This metric is basedon the hypothesis that an ontology design that includesontology modules with smaller average size can beunderstood better by a third-party observer. Basedon this metric, mðD1ÞomðD2Þ implies that ontology designD1 and its ontology modules are more understandablethan ontology design D2 and its constituent ontologymodules. For instance, the value of m for the ontologydesigns D and D0, the ontology designs that are describedin Examples 2 and 3, is equal to 9Sym9=9M9¼ 9=2¼ 4:5and 9Sym09=9M09¼ 9=1¼ 9, respectively. This metricimplies that ontology design D is more understandablethan D0.

4. A set of metrics for evaluating ontology modules

In this section, we introduce cohesion and couplingmeasurements for evaluating ontology modules andontology designs. Cohesion and coupling are two impor-tant concepts for evaluating software designs that havebeen thoroughly investigated in the Software Engineeringcommunity [13,29]. Analogously, these notions can beemployed for evaluating ontologies and ontology designs.Good ontology designs should aim at minimizing couplingand maximizing cohesion.

An ontology module is considered to be highly cohe-sive if the information conveyed by that module are alldescribing a very specific sub-domain of discourse, i.e.,they are focusing on the description or employment of afocused group of related concepts. An ontology module isconsidered to be lowly coupled if it has the least amountof dependency on external information. In other words,the concepts and roles of a low-coupled ontology moduleare defined based on or use the minimum possibleexternal concepts and roles from other ontology modules.

4.1. Semantic dependencies

In this section, we provide methods for measuringcohesion and coupling in ontology modules and ontologydesigns. Cohesion in an ontology module is concernedwith how ‘‘related’’ and ‘‘dependent’’ its concepts and rolesare. Coupling means how ‘‘related’’ and ‘‘dependent’’ itsconcepts and roles are on the external symbols fromexternal ontology modules. For a better explanation,consider the following three ontology modules:

M1¼ fA,Bg, A and B are local concepts.M2¼ fALBg, A and B are local concepts.M3¼ fALBg, A is a local concept and B is an externalconcept form an external ontology module M03.

In this example, in the first situation, A and B are twounrelated concepts that are collocated in the ontologymodule M1. They may be diverse pieces of informationthat describe unrelated subjects. Hence, ontology moduleM1 is not highly cohesive. In the second situation, A isdefined as a subclass of B and hence A and B are stronglyrelated to each other. Here, ontology moduleM2 is highlycohesive, i.e., it describes a related sub-domain of dis-course. In the third situation, A is strongly dependent onB, which is an external symbol from the foreign ontologymodule M03. Hence, in this ontology design, M is highlycoupled onM03. Conclusively, the more dependencies arebetween the local symbols of an ontology module, themore cohesive it is. Further, the more dependencies arebetween local symbols of an ontology module and exter-nal symbols from foreign modules, the more coupled it ison foreign ontology modules.

In the following, in order to precise the definition ofcohesion and coupling, we first provide formal definitionsfor dependencies between concepts and roles in ontologymodules. Based on these definitions, we can quantify thedependencies between local symbols and between localand external symbols in an ontology module. Subse-quently and based on the quantities of the dependencyrelationships, we measure the cohesion and coupling ofontology modules. Finally, we define cohesion and cou-pling for ontology designs based on the values of thecoupling and cohesion of their ontology modules.

A concept or a role in an ontology module is dependenton the other, if its semantics are limited or bounded, oncethe semantic of the other has been set up. For example,assume an ontology module with two concepts A, B andan axiom, which asserts ALB. In this knowledge base, A isdependant on B, i.e., once B has been interpreted to be asubset of the domain elements, the valid interpretationsof A are limited to those that interpret A as a subset of B.As another example, consider an ontology module withthe axiom AL(R:C. In this ontology the interpretations ofA are limited, once the semantic of C has been established.The dependency of a concept on a role happens when thesemantics of the concept are limited by the domain or therange of that role. For instance, back to the previousexample of an ontology with the axiom: AL(R:C, once R

has been interpreted to be a set of the ordered pairs of


elements, the semantics of A are bounded to the set ofdomain elements of R. Roles can have dependencies oneach other, as well. For example, in an ontology with arole axiom R1LR2, R1 is dependent on R2. The dependencyof a role on a concept happens when its interpretationsare limited once the semantic of the concept is specified.For example, in an ontology with the axiom (R:>LC, R isdependent on C.

In order to technically formalize dependencies inontology modules, we define two types of dependencies:strong dependencies and moderate dependencies. Definition4 provides a definition for strong dependencies in ontol-ogy modules:

Definition 4. Let A and B be two concepts and P and R

two roles where A,B,P,R 2 SigðMÞ and M is an ontologymodule. Further, assume MjA� ?, MjA�>,MjB� ?, MjB�>, Mj(P:>� ?, and Mj(P:>�>.Let M�B be obtained by adding the axiom BL ? to M.Further letMþB be obtained by adding the axiom >LB toM. A is strongly dependent on B, if one of the followinghappens:

1.
M�BFA� ?
2.
MþBFA� ?
3.
M�BFA�>
4.
MþBFA�>
Moreover, the role R is strongly dependent on B if eitherM�BF(R:>� ?, M�BF(R:>�>, MþBF(R:>� ?, orMþBF(R:>�>. Let M�P be obtained by adding theaxiom (P:>L ? to M and MþP be obtained by addingthe axiom >L(P:> to M. A is strongly dependant on P ifeither M�PFA� ?, M�PFA�>, MþPFA� ?, orMþPFA�>. Further, the role R is strongly dependenton P if either M�PF(R:>� ?, M�PF(R:>�>,MþPF(R:>� ?, or MþPF(R:>�>.

Definition 5. We define SDepMðAÞ as the set of allsymbols in M on which A is strongly dependent.

Intuitively, a strong dependency occurs on the extremeboundaries: when a concept B or the domain of a role P isinterpreted as an empty set or as the top concept. In thesesituations, a concept A, which is strongly dependant on B

or P, is interpreted as an empty set or the top concept.

Example 4. In the following, we investigate strongdependencies in ontology modules in different situationsbased on Definition 4:

1.
Let M¼ fALBg, M�B ¼ fALB,BL ?g, M�BFAL ?,consequently, A is strongly dependent on B.
2.
Let M¼ fALBg, MþA ¼ f>LA,ALBg, MþAF>LB,consequently, B is strongly dependent on A.
3.
Let M¼ fAL:Bg, MþB ¼ fAL:B,>LBg, MþBFAL ?,consequently, A is strongly dependent on B.
4.
Let M¼ fAL(R:Cg, M�C ¼ fAL(R:C,CL ?g, M�CFAL ?, consequently, A is strongly dependent on C.
5.
Let M¼ fAL(R:Cg, M�R ¼ fAL(R:C,(R:>L ?g, M�RFAL ?, consequently, A is strongly dependent on R.
6.
Let M¼ fR1LR2g, M�R2¼ fR1LR2,(R2:>L ?g, M�R2
F(R1:>L ?, consequently, R1 is strongly dependent onR2.

7.
Let M¼ f(R:>L:Ag, MþA ¼ f(R:>L:A,>LAg, MþAF(R:>L ?, consequently, R is strongly dependent on A.
Moderate dependency is concerned with situationswhere the interpretations of a concept or a role areconditionally limited by the interpretations of theother symbols. For instance, assume an ontology modulewith three concepts A, B and C and an axiom: ALB t C. Inthis knowledge base, once the interpretations of theconcept B and C have been set up and they have beeninterpreted as a set of domain elements, the valid inter-pretations of A are limited to those that interpret A as asubset of B or C. Now, assume that C is interpreted as anempty set. In this condition, the semantics of A arebounded to the semantics of B. We say A is moderately

dependent on B and C.Definition 6 defines moderate dependencies between

symbols of an ontology module.

Definition 6. Let P and Q be two symbols in ontologymodule M, and M�Q and MþQ be defined similarly toDefinition 4. P is moderately dependent on Q, if P is notstrongly dependent on Q and either SDepMðPÞ differs fromSDepM�Q

ðPÞ or SDepMðPÞ differs from SDepMþQðPÞ.

Definition 7. We define MDepMðPÞ as a set of all symbolsin M on which P is moderately dependent.

Intuitively, when a concept or domain of a role isinterpreted as an empty set or as the top concept, theinterpretations of some concepts and roles may beaffected in such a way that they find new strong depen-dencies to other symbols. For instance, in ontology mod-ule M with ALB t C, SDepMðAÞ ¼ |, i.e., A does not haveany strong dependency on other symbols. However in thisknowledge base, if B is interpreted as an empty set, A willhave strong dependencies on C, so SDepM�B

ðAÞ ¼ fCg,which differs from SDepMðAÞ. Hence, according toDefinition 6, A has a moderate dependency on B. Thefollowing examples show moderate dependencies inother knowledge bases.

Example 5. Let M be an ontology module with thisaxiom A u CLD. In this ontology module, A is stronglydependent on no symbol (SDepMðAÞ ¼ |). In this knowl-edge base, MþC implies that ALD. Accordingly,SDepMþ C

ðAÞ is equal to fDg, which differs from SDepMðAÞ;and hence A is moderately dependent on C. Further,M�D

implies that AL:C. Accordingly, SDepM�DðAÞ is equal to

fCg, which differs from SDepMðAÞ; and hence A is moder-ately dependent on D. In this ontology module,MDepMðAÞ ¼ fC,Dg.

Example 6. LetM be an ontology module with followingaxioms:

ALB

ALðB u (R:CÞ t D

ALB t :K

ormation Systems 38 (2013) 745–770 753

SDepMðAÞ ¼ fBg. Now, we investigate the moderate depen-
In this ontology, A is strongly dependent on B, so
dency of A on C, R, K and D.

Faezeh Ensan, Weichang Du / Inf

�
M�C implies that ALB,ALD,ALB t :K. Accordingly,SDepM�CðAÞ ¼ fB,Dg, which differs from SDepMðAÞ. Con-sequently, A is moderately dependent on C. � M�R implies that ALB,ALD,ALB t :K. Accordingly,
SDepM�RðAÞ ¼ fB,Dg, which differs from SDepMðAÞ. Con-sequently, A is moderately dependent on R.
� M�K implies that ALB,ALðB u (R:CÞ t D, AL>. Accord-
ingly, SDepM�K ðAÞ ¼ fBg which does not differ fromSDepMðAÞ. Additionally, MþK implies thatALB,ALðB u (R:CÞ t D. Accordingly, SDepM�K ðAÞ ¼ fBg

which does not differ from SDepMðAÞ. Consequently,A is not moderately dependent on K.
� M�D implies that ALB,ALðB u (R:CÞ,ALB t :K.
Accordingly, SDepM�DðAÞ ¼ fB,R,Cg, which differs fromSDepMðAÞ. Consequently, A is moderately dependenton D.

The notion of dependency between symbols of anontology module, in the sense that we defined earlier inthis section, is more general than those dependencies thatare captured by strong and moderate definitions. Forexample, in an ontology module with the axiomALB t C t D, the semantics of A are affected by thesemantics of B. Nonetheless, there is neither a strongnor a moderate dependency between A and B according toDefinitions 4 and 6, respectively. In fact, in this situation,A has a ‘‘moderately’’ moderate dependency on B, i.e., thesemantics of A are limited by the semantics of B if thesemantics of both C and D have been already set up.Theoretically it is possible to find more semantic dependen-cies in an ontology module than those that have beencaptured by Definitions 4 and 6. Definition 8 provides a basisfor finding more complex moderate dependencies—e.g., adependency between A and Bi in a module with the axiomALB1 t B2 t � � � tBi t � � � t Bn.

Definition 8. Let P and Q be two symbols in ontologymodule M and M�Q and MþQ be defined similarly toDefinition 4. The moderate dependency, which is definedin Definition 6, is a first-degree moderate dependency andis denoted as P�1Q . Furthermore, MDep1

MðPÞ is the set ofall symbols in M on which P is first-degree-moderatelydependent. For every nZ2, we define P�nQ if P is notstrongly dependent on Q and P is not moderately depen-dent on Q of the degree of mon and also eitherMDepn�1

M ðPÞ differs from MDepn�1M�QðPÞ or MDepn�1

M ðPÞ differsfrom MDepn�1

Mþ QðPÞ.

Example 7. LetM be an ontology module with the axiomA u B u CLD. In this ontology module MDep1

MðAÞ ¼ |. But,MDep1

Mþ BðAÞ ¼ fC,Dg (see Example 5), which differs from

MDep1MðAÞ, so A is second-degree-moderately dependant

on B, i.e., A�2B.

Obviously, Definition 8 induces a considerablehigh computational cost. In this paper, we restrictthe definition of the evaluation metrics just to be

based on the dependencies that are defined inDefinitions 4 and 6 and leave the investigation oflow-cost algorithms for finding more complex seman-tic dependencies between ontological elements forfuture work.

4.2. Semantic metrics

In order to define semantic metrics based on semanticdependencies, we distinguish between the dependenciesbetween local symbols and the dependencies of localsymbols on the external ones. The former representsthe cohesion of an ontology module while the lattershows its coupling on other modules. In the following,we formally define local and external strong and moder-ate dependencies.

Definition 9. For ontology module M:

SDepLocðMÞ ¼ f/P,QS9P 2 LocðMÞ, Q 2 SDepMðPÞ

and Q 2 LocðMÞg.SDepExtðMÞ ¼ f/P, QS9P 2 LocðMÞ,Q 2 SDepMðPÞ

and Q 2 ExtðMÞg.MDepLocðMÞ ¼f/P,QS9P 2 LocðMÞ,Q 2 MDepMðPÞ and Q 2 LocðMÞg.MDepExtðMÞ ¼f/P,QS9P 2 LocðMÞ,Q 2 MDepMðPÞ and Q 2 ExtðMÞg.

Based on the definitions of dependency sets, weintroduce two metrics for measuring cohesion andtwo metrics for measuring coupling in ontologymodules.

Definition 10. For an ontology module M, NSLD andNMLD are two metrics for measuring its cohesion thatare defined as follows:

NSLD is the Number of Strong Local Dependencies thatexist between local symbols of an ontologymodule. NSLDðMÞ ¼ 9SDepLocðMÞ9, i.e., the sizeof the SDepLoc set for ontology module M.

NMLD is the Number of Moderate Local Dependenciesthat exist between local symbols of an ontologymodule. NMLDðMÞ ¼ 9MDepLocðMÞ9, i.e., thesize of the MDepLoc set for ontology moduleM.

NSLD and NMLD are numerical metrics that assignnumbers to ontology modules according to Definition 10.There is a ‘‘less-than-or-equal’’, ‘‘Z ’’, relationshipbetween the numbers that have been assigned to ontol-ogy modules. This relationship associates with the cohe-sion of ontology modules. The larger the values assignedby NSLD and NMLD to the ontology modules are, the morecohesive the ontology modules would be.

Definition 11 introduces two metrics for measuringcoupling in ontology modules.


Definition 11. For an ontology module M, NSED andNMED are two metrics for measuring its coupling onexternal ontology modules and are defined as follows:

NSED is the Number of Strong External Dependenciesthat exist between local symbols of an ontologymodule on its external symbols. NSEDðMÞ ¼9SDepExtðMÞ9, i.e., the size of the SDepExt setfor ontology module M.

NMED is the Number of Moderate External Dependen-cies that exist between local symbols of anontology module on its external symbols.NMEDðMÞ ¼ 9MDepExtðMÞ9, i.e., the size of theMDepExt set for ontology module M.

NSED and NMED are numerical metrics that assignnumbers to ontology modules according to Definition 11.There is a ‘‘less-than-or-equal’’, ‘‘r ’’, relationshipbetween the assigned numbers by these metrics. This isrelated to the coupling of ontology modules. The lowervalues that are assigned by these metrics to ontologymodules, the less coupled the ontology modules are onforeign ontology modules.

Let us remark an important point here: SDepLoc,MDepLoc, SDepExt and MDepExt are defined as sets andtherefore do not have any redundant ordered pairs ofsymbols. Observably, the definitions for metrics for cohe-sion and coupling prevent double counting:

Remark 1. Definitions 10 and 11 for cohesion and cou-pling metrics prevent double counting.

In addition to the number of strong and moderatedependencies, Definition 12 introduces another metric formeasuring cohesion in ontology modules. According tothis definition, MLD is the Maximum number of LocalDependencies, which can exist between local symbols ofan ontology module.

Definition 12.

MLD Let n¼ 9LocðMÞ9, MLD is equal to n� ðn�1Þ.

MLD is a numerical metric that assigns numbers toontology modules. Ontology modules with smaller localsize, which leads to less value for MLD, are more probableto describe a focused sub-domain of discourse. They aremore understandable and are more appropriate for reuse.Observably, there is a ‘‘less-than-or-equal’’, ‘‘r ’’, relation-ship between the assigned numbers by MLD that associ-ates with cohesion of ontology modules.

Definition 13 provides an inclusive metric for measur-ing the cohesion of ontology modules based on the valuesof the NSLD, NMLD and MLD metrics.

Definition 13. Let M be an ontology module that has atleast one local symbol and g and d be two real numberssuch that g,d40. COH is a metric for measuring cohesion

of ontology module M and is defined as following:

COHðMÞ ¼1 if 9LocðMÞ9¼ 1

g� NSLDþd�NMLD

ðgþdÞ �MLDotherwise

8><>:

ð1Þ

In Definition 13, cohesion is measured by a weightedaverage of the number of all strong and moderate depen-dencies that are between local symbols in the ontologymodule over the number of all potential local dependenciesthat could possibly exist. A highly cohesive ontology modulehas a larger number of local dependencies relative to themaximum number of local dependencies that may exist.When there is just one local symbol, when MLD is zero, thevalue of the metric is defined to be equal to 1. COH is anabsolute metric which assigns a value between zero and oneto ontology modules. Since MLD is always more than NSLDand NMLD, the upper bound of COH is one.

An inclusive metric for measuring coupling of ontologymodules, based on NSED and NMED metrics is defined asfollows:

Definition 14. LetM be an ontology module and g and dbe two real numbers such that g,d40. COPðMÞ, thecoupling of ontology module M is defined as follows:

COPðMÞ ¼0 if 9ExtðMÞ9¼ 0

g� NSEDþd�NMED

ðgþdÞ � ð9LocðMÞ9� 9ExtðMÞ9Þotherwise

8><>:

ð2Þ

Coupling is calculated based on a weighted average ofthe number of all strong and moderate dependencies thatexist between local symbols on the external ones in theontology module over the number of all potential externaldependencies that could possibly exist. Given an ontologymodule M, at most 9LocðMÞ9� 9ExtðMÞ9 dependenciescan exist between its local symbols and the external ones.COP has a value between 0 and 1.

According to these definitions, strong and moderatedependencies have different impacts on the cohesion andcoupling of an ontology module. This difference is driven bydifferent values of their corresponding coefficients in Eqs. (1)and (2). Since strong dependencies (measured by NSLD andNSED) represent a stronger type of relationship betweenontological symbols (such as concept subsumption) compar-ing with a moderate type (measured by NMLD and NMED), itlooks reasonable to assign a higher value to g comparing withd. In different domains and applications, the exact values ofthe coefficients for strong and moderate dependencies areindicated based on the discretion of domain experts andontology evaluators. In the following and for our examplesand case studies, we consider g be twice larger than d (in ourexperiments, we assigned g¼ 2 and d¼ 1).

Based on the introduced measures, we now define acohesion and a coupling metric for evaluating differentontology designs as follows:

Definition 15. For an ontology design D¼/M,SymS:

COHDesðDÞ ¼PM2MCOHðMÞ � 9LocðMÞ9P

M2M9LocðMÞ9ð3Þ


COPDesðDÞ ¼PM2MCOHðMÞ � 9LocðMÞ9P

M2M9LocðMÞ9ð4Þ

Intuitively, the cohesion of an ontology design isdependent on the cohesion values of its contributingontology modules. Similarly, the coupling of an ontologydesign is dependent on the coupling values of its ontologymodules. COHDes and COPDes are numerical metrics forassessing ontology designs. They assign numbers to eachontology design. There is a ‘‘greater-than-or-equal-rela-tionship’’ between the numbers which are assigned byCOHDes. This relationship associates with the cohesion ofan ontology design such that higher value for the metricmeans that the ontology design is more cohesive. There isa ‘‘less-than-or-equal-relationship’’ between the numberswhich are assigned by COPDes which means that a lowervalue for the metric shows that the ontology design is lesscoupled with the foreign ontology modules.

5. Empirical study of the modular ontology metrics w.r.t.query answering performance

In this section, we empirically investigate the associa-tion between the introduced metrics in the previoussections and the time that a reasoning engine needs forconsistency checking and answering conjunctive queriesover ontology modules.

5.1. Cohesion and coupling metrics and query answering

time

Cohesion and coupling can be considered to be asso-ciated with the time that reasoning engines need forquery answering and reasoning. The performance ofreasoning in ontologies can be affected by various para-meters. Different DLs, such as ALC, SHION , SHIQ, andSHOIQ, induce different time complexities. In addition,the number of classes, roles, axioms and assertions inontologies affects the time required for query answeringand reasoning tasks. Nonetheless, all these parametersbeing equal, the design of an ontology also can haveinfluence on the reasoning performance. As an explana-tory example, assume two designs for an ontology, firstthe ontology is comprised a couple of highly cohesive,low-coupled ontology modules and second, a monolithicdesign, where there is just one complex ontology modulethat has the whole knowledge base. In the first case, anincoming query may be answered by applying reasoningalgorithms on one or a limited number of interrelatedontology modules. On the other hand in the second case,for answering all queries the whole knowledge baseshould be processed even though the query is related toa small portion of the ontology module.

The introduce metrics in this paper represent cohesionof ontology modules and coupling between them in thesense important for measuring query answering time.NSLD and NMLD are measures of the number of strongand moderate dependencies that exist between localsymbols of an ontology module. NSED and NMED are

measures of the number of strong and moderate depen-dencies that exist between local and external symbols inan ontology module. The more dependent the localsymbols of an ontology are to each other, the largervalues NSLD and NMLD have. Moreover, the more depen-dent the local symbols of an ontology are on its externalsymbols, the larger values NSED and NMED have. Hence,the more cohesive an ontology module is, i.e., relatedknowledge of a specific sub-domain is gathered in one orfew modules and is not spread all over the knowledgebase, the more values NSLD and NMLD would have.

MLD is a metric that measures the most number ofstrong and moderate dependencies that can existbetween local symbols in an ontology module. A largevalue for a fraction like NSLD=MLD for a given moduleimplies that the local symbols of the module are welldependant (there is a large number of dependenciesbetween them comparing with all decencies that couldexit in the best case), which means that the module isdescribing a focused sub-domain.

Finally, COH and COP, COHDes, COPDes are compoundmetrics, which are defined based on the previously men-tioned metrics. COHDes, COPDes have theory-based modelsthat an ontology design is more cohesive when its con-stituent ontology modules are more cohesive and it ismore coupled when its constituent ontology moduleshave coupling on each others.

5.2. Hypotheses

Cohesion represents the degree of similarity betweenlocal concepts and roles in an ontology module. Considertwo ontology designs that describe the same domain andrepresent the same set of concepts, axioms and assertions,while they differ in the number of ontology modules andthe distribution of domain knowledge in their modules. Inthis setting, we expect that the time required for answer-ing a query over a highly cohesive ontology module froman ontology design to be less than when the query isposed to a lowly cohesive ontology module from the otherdesign. By assuming that ontology designs represent thesame domain information, we ensure that query answer-ing time is just affected by the quality of the design andnot influenced by other criteria such as size and complex-ity of the entire domain. We refer to ontology designs thatrepresent the same body of knowledge as comparable

designs. Hypothesis 1 formally describes the associationbetween the cohesion metric and the time required foranswering queries over ontology modules.

Hypothesis 1 Assume D1¼/M1,Sym1S and D2¼/M2,Sym2S be two comparable ontologydesigns, whereM1 2M1 andM2 2M2. AssumeQ be a query that can be answered by bothontology modules M1 and M2. IfCOHðM1Þ4COHðM2Þ, the time required foranswering Q over M1 is less than the timerequired for answering Q over M2 when thesystem and tool setting are the same inboth cases.


The exact definition for comparable designs dependson the modular formalisms that are employed for repre-senting modular ontologies. In case of OWL imports andIBF, Definition 16 gives a method for finding comparabledesigns. There is some other works in the literature thatcan be used to define comparable designs in otherformalisms such as E-Connections [25].

Definition 16. Let D1¼/M1,Sym1S and D2¼/M2,Sym2S be two ontology designs. Assume O1 to be anontology module that includes all symbols, axioms andassertions in all ontology modules in D1. Further, assumeO2 to be an ontology that includes all symbols, axiomsand assertions in all ontology modules in D2. We say D1and D2 are comparable designs if Sym1� Sym2 and foreach axiom a in TBox or ABox of O1, O2Fa and for eachaxiom b in TBox or ABox of O2, O1Fb.

For this definition, it is assumed that the sets of localsymbols of ontology modules in ontology designs aredisjoint. Based on Definition 16, all symbols of differentontology modules are mapped into one monolithic ontol-ogy, and then the monolithic ontologies related to differ-ent designs are compared with each other.

For running a conjunctive query over an ontologymodule, its ABox consistency should be checked first.We expect that an ontology module with a low cohesion,which represents various concepts, roles and ABox asser-tions besides the subject of the query, needs more timefor ABox consistency checking rather than a highly cohe-sive one, which focuses on representing the subject of thequery.

Hypothesis 2 For the conjunctive query Q and ontologymodules M1 and M2 that are described inHypothesis 1, the time required for answeringQuery Q including ABox consistency checkingover M1 is less than this time over M2, whenthe system and tool setting are the same inboth cases.

Coupling represents how dependent an ontology mod-ule is on others for representing a sub-domain of dis-course. The more coupled a modular design is, the higherthe probability for involving a larger number of ontologymodules for a given query would be. The number ofinvolved modules is important especially in distributeddesigns, where ontology modules are deployed on differ-ent servers, and the cost of the communication betweenmodules and the integration of results is considerable.

5.3. Evaluation of ontologies and queries

For the sake of evaluation, we analyze query answer-ing over three ontologies in different designs as follows:

1.

2 http://www.vicodi.org3 See http://wiki.dbpedia.org/Datasets and http://wiki.dbpedia.org/

OnlineAccess

Lehigh University Benchmark (LUBM) [26]: LUBM is abenchmark for the evaluation of Semantic Web tech-niques. LUBM includes an ontology, which is calledUniv-Bench, which describes universities, depart-ments and academic people and activities. Based on

Univ-Bench and using the IBF formalism, we designedtwo comparable modular ontology designs for describ-ing universities and academic activities. LUBM pro-vides 14 extensional queries for test and evaluation. Inorder to make the difference between query answeringtimes over different ontology modules more signifi-cant, we modify the LUBM queries. We also analyzedtwo more queries over LUBM from Tzoganis et al. [51](Query 15 and Query 16). Appendix A gives the queriesthat we used for the evaluation.

2.
VICODI2: VICODI is an ontology about European His-tory, which is developed as a component of the VICODIcontextualization system [39]. Using the Neon [28]toolkit and its plugins for ontology modularization, wedesigned two comparable modular designs for VICODI.For analyzing query answering time, we evaluated thefollowing five queries over VICODI, some of which aretaken from Motik and Sattler [38]:
Query1 (x) Location(x)

Query2

(x,y,z)

Military-Person(x), hasRole(y,x), related(x,z)

Query3

(x,y)

Time-Dependent-Relation(x),

hasRelationMember(x,y), Event(y)

Query4

(x,y)

Object(x), hasRole(x,y), Symbol(y)

Query5(x) Individual(x), hasRole(x,y), Scientist(y),

hasRole(y,z), Discoverer(z), hasRole(z,m),

Inventor(M)

3.
DBPedia [7]: DBPedia is an ontology for making thestructured information of Wikipedia available on theWeb. Using Neon toolkit, we designed two comparablemodular designs for DBPedia. Appendix B gives thefour queries that we designed based on the samplequeries provided by the DBPedia project.3
In developing modular designs based on the abovementioned ontologies, we checked modular designs forbeing comparable. For this purpose, we checked thefollowing actions after designing a modular design:

�
Each symbol in a monolithic ontology is represented inexactly one module of its corresponding modulardesign. � Each TBox axiom in a monolithic ontology is either
explicitly or implicitly represented in at least onemodule of its corresponding modular design.
� Each ABox assertion in a monolithic ontology is repre-
sented in at least one module of its correspondingmodular design.

5.3.1. Ontology designs

In this section, we describe the modular ontology designsthat we developed for the TBox of LUBM Univ-Bench, VICODI,

http://www.vicodi.org

http://wiki.dbpedia.org/Datasets

http://wiki.dbpedia.org/OnlineAccess

http://wiki.dbpedia.org/OnlineAccess

Fig. 2. LUBM modular ontology design 1: an IBF modular ontology for representing the represented knowledge of Univ-Bench ontology.


and DBPedia. Figs. 2 and 3 show two modular ontologydesigns for representing the TBox of Univ-Bench ontology. Inthe rest of this paper, we refer to the Univ-Bench ontology asLUBM monolithic design, the modular ontology in Fig. 2 asLUBM Modular Design 1, and the modular ontology in Fig. 3as LUBM Modular Design 2.

In modular design 1 there are two ontology modules:OM-Publication and OM-Person-Organization. OM-Publication has all concepts, roles and TBox axiomsrelated to the publication notion. OM-Person-Organi-zation includes all concepts, roles and TBox axioms ofUniv-Bench except those that have already been repre-sented in OM-Publication. These two ontology modulesare connected through two interfaces: Inf-Person-Org andInf-Pub. Inf-Person-Org is realized by OM-Person-Organization and has two concepts: Person and Research.Person and Research are used by OM-Publication ontologymodules for defining the range of the publicationAuthorand publicationResearch properties, respectively. The pre-fix of Inf-Person-Org indicates that the concepts Researchand Person are utilized from an external ontology modulethrough the Inf-Person-Org interface. Inf-Pub has theconcept Publication, which is used by OM-Person-Organization ontology module for defining the range oforgPublication property.

In Modular Design 2, there are three ontology mod-ules: OM-Organization, OM-Person, and OM-Publication.OM-Publication in this design has the same knowledgebase as it has in design 1, except that it utilizes conceptsPerson and Research from OM-Person through interfaceInf-Person, and these concepts are preceded by the prefixInf-Person. OM-Organization has all concepts and proper-ties related to organizations like University, Program,College and so on, and OM-Person includes all remainingconcepts and properties of Univ-Bench, which are mostly

related to the notion of person. OM-Person and OM-Organization are related through two interfaces: Inf-Organd INF-Person. OM-Person realizes Inf-Person and uti-lizes Inf-Org, while OM-Organization realizes Inf-Org andutilizes Inf-Person. Additionally, OM-Organization utilizesthe concept Publication from OM-Publication though Inf-Pub.

In order to create modular designs for TBoxes ofVICODI and DBPedia ontologies, we used the Neon toolkitand its Ontology Partitioning and Module Extractionplugins. Ontology partitioning plugin supports decompos-ing an ontology into smaller modules. Using this plugin, auser can select an ontology, specify some parameters andexecute a partitioning algorithm. The result of the algo-rithm is a set of OWL ontologies, while their dependenciesare modeled by means of ‘‘owl:imports’’. Algorithm para-meters are minimum size of modules (Min-size), and levelof transitive co-inclusions (level). The Module Extractionplugin supports the extraction of smaller modules froman ontology or from a module that is created by theontology partitioning algorithm.

Fig. 4 shows two modular ontology designs for VICODI,which are created by Neon. In the first design, which isreferred to as VICODI Modular Design 1, there are 10modules. This design is created by executing the Neonpartitioning algorithm with these parameters: Min-size¼5 and level¼3. In this design, the largest modulehas 40 symbols and 424 axioms on which seven othermodules are directly dependant, i.e., they import thisontology. The smallest module has five symbols and fiveaxioms. The second design, which is referred to as VICODImodular design 2, is created by executing the Neonpartitioning algorithm with these parameters: Min-size¼3 and level¼10. This design has 22 modules. Thelargest ontology module has 25 symbols and 42 axioms

Fig. 3. LUBM modular ontology design 2: an IBF modular ontology for representing the represented knowledge of Univ-Bench ontology.

Fig. 4. VICODI modular ontology designs: two modular designs for VICODI ontology that are created by Neon toolkit.

4 All modular designs and ontology modules can be found in http://

www.filedropper.com/isdata


and 11 other modules are dependent on it, while it is notdependent on any module. The smallest module has threesymbols and three axioms and is dependent on anothermodule, while no module is dependent on it.

Fig. 5 shows two ontology designs for the DBPediaTBox. The first one, which is referred as DBPedia ModularDesign 1, is created by the Neon partitioning algorithmthat is initialized by the parameters Min-size¼10 andlevel¼4, and also some slight modification on ontologymodule #10 for making it more focused on the notions ofFilm and Movie Artists. This design has 10 ontologymodule whose largest module has 1358 symbols and2181 axioms. This module depends on no other module,while nine other modules depend on it. The smallestmodule has 10 symbols and 12 axioms. For designingthe second modular design, DBPedia Modular Design 2,

we used the module extraction plugin in Neon andextracted symbols related to these notions: Organization,Person, Place, Work, Biology, and Event. The correspond-ing modules are Org_Module, Person_Module, Place_Mo-dule, Work_Module, Bio_Module, and Event_Module. Thelinks in Fig. 5 show how these ontologies import theothers.4

5.3.2. Metric values

Table 1 shows the ontology modules to whom testqueries are posed in LUBM, VICODI, and DBPedia datasets.Obviously, in monolithic designs, all queries are posed to

http://www.filedropper.com/isdata

http://www.filedropper.com/isdata

Fig. 5. Two DBPedia modular ontology designs, which are created by Neon toolkit.

Table 1Ontology modules over which queries are posed.

Dataset Query Monolithic design Modular design 1 Modular design 2

LUBM Queries 1, 4, 5, 6, 7, 9, 10, 14, 15, 16 Univ-Bench OM-Person-Organization OM-Person

Queries 2, 8, 12, 13 Univ-Bench OM-Person-Organization OM-Person; OM-Organization

Query 3 Univ-Bench OM-Publication OM-Publication

Query 11 Univ-Bench OM-Person-Organization OM-Organization

VICODI Query 1 VICODI Module 0 Module 0; Module 7; Module 9

Query 2 VICODI Module 1 Module 1

Query 3 VICODI Module 0; Module 7 Module 0; Module 19

Query 4 VICODI Module 2; Module 3; Module 8 Module 2; Module 12; Module 18; Module 20

Query 5 VICODI Module 0; Module 1 Module 1; Module 6

DBPedia Query 1 DBPedia Module 0; Module 4; Module 10 Person-Module

Query 2 DBPedia Module 0 Person-Module; Org-Module; Place-Module

Query 3 DBPedia Module 1 Work-Module

Query 4 DBPedia Module 0; Module 7 Org-Module

5 http://wiki.dbpedia.org/Ontology


the only one ontology that exists. In modular designs,based on the constituting symbols in a query, the queriesare posed to one or more ontology modules.

Table 2 in Appendix C shows the value of the cohesionmetric (COH) for monolithic ontologies and also for thoseontology modules in modular designs over which thequeries are posed. For calculating cohesion and coupling,we set d¼ 1 and g¼ 2. In this table, W-Avg stands for aweighted average of the values of COH and is calculatedas W-AvgðCOHðM1Þ, . . . ,COHðMnÞÞ ¼

Pi ¼ ni ¼ 1 COHðMiÞ�

9LocðMiÞ9=Pi ¼ n

i ¼ 1 9LocðMiÞ9. This table also shows thevalue of the coupling metric for the ontology designs.

5.3.3. Datasets and test environment

LUBM has a data generator for creating scalable exten-sional data for its ontology. LUBM(n, s) denotes a datasetthat has n universities and is generated using a seed valueof s. The LUBM dataset is represented through a set ofsmall files. For the sake of our evaluation, we created adatasets, LUBM(5,0), with 5 universities and 129 533individuals. We developed a Java program that assignsthe exact same individuals that the LUBM benchmarkcreates for the monolithic design to the modular designs.

Using this application, we generated a set of files corre-sponding to ABox assertions of the monolithic design andtwo set of files for ABox assertions corresponding to OM-Person-Organization and OM-Publication concepts androles, and also three set of files for ABox assertions ofOM-Person, OM-Organization, and OM-Publication con-cepts and roles. Each individual in each of the files of themodular designs has a corresponding instance, which hasbeen asserted in a file in the set of files for the monolithicdesign. Hence, the monolithic and modular designs havethe same extensional data and they can be compared toeach other.

We used the ontology provided by the VICODI projectas the monolithic dataset. For DBPedia, we used thedatasets provided in the project website5 and created asingle monolithic dataset. Since the sizes of the mono-lithic ontology files were large, the Neon toolkit was notable to modularize VICODI and DBPedia ontologies.Hence, we used Neon for modularizing only the TBox ofthese ontologies, and we developed a java application that

http://wiki.dbpedia.org/Ontology


assigned to each module appropriate instances. For aninstance a and ontology module M, the applicationinserts the axiom A(a) to M if the class A is a symbol inthe TBox of M and A(a) is in the monolithic ontology. Inaddition, for a property R, Rða,bÞ is inserted to M if R is asymbol in the TBox of M. Following the Neon approach,we used the same namespace for all classes, roles, andindividuals, even if they are in different modules.

We employed Pellet 1.0 [44] as the OWL-DL reasoner forrunning queries over ontology modules. The machine that weused is a PC with 2.13 GHz Intel Core 2 Duo, 2 GB Memoryand Windows XP 2002 Service Pack 3. We set the maximumof heap size to 1 GB for running queries. In our evaluation, wedistinguish between the time required for loading knowledgebase (the time needed for loading ontologies), the timerequired for ABox consistency checking, and the timerequired for running a query. Pellet checks ABox consistencyin the first run of a query. We consider it as the time for bothconsistency checking and query execution. For the timerequired for running a query (excluding ABox consistencychecking), we run each query 10 times consecutively andcompute the average time disregarding the first run. In ourevaluation, we deployed all modules on one server. In thisdeployment, the time needed for joining the result sets ofqueries, which are posed over more than one module, isignorable compared with the query execution time.

5.4. Results

In this section, we represent test results for the LUBM,VICODI, and DBPedia ontologies and the modular designs.

5.4.1. LUBM

All instance retrieval queries over the monolithicdesign are posed to Univ-Bench ontology, the only ontol-ogy module which exists in this design. In modulardesign 1, conjunctive queries are posed to either OM-Publication or OM-Person-Organization; whereas in mod-ular design 2, queries are posed over OM-Publication,OM-Person, OM-Organization, or both OM-Person and

Fig. 6. The time required for running query 3 over the LUBM

OM-Organization. For instance, consider two Queries 3and 11. In modular design 1, Query 3; whose constituentsymbols are Publication and publicationAuthor, which areboth local symbols of the OM-Publication ontology mod-ule; is posed over OM-Publication. Similarly, in modulardesign 2, this query is posed to OM-Publication. Theconstituent symbols of Query 11 are ResearchGroup,subOrganization, and University. In modular design 1, allthese symbols belong to OM-Person-Organization, and inmodular design 2 all these symbols belong to OM-Organization. Hence, Query 11 is posed to OM-Person-Organization and OM-Organization in modular designs 1and 2, respectively. Fig. 6 shows the time required forrunning Query 3 and Query 11 over the monolithic andtwo modular designs for the LUBM(5,0) dataset.

OM-Publication is dedicated to representing the notion ofacademic publications and has the same knowledge base inboth modular designs. It is more cohesive compared to theUniv-Bench, which represent academic activities and pub-lications. The value of COH metric for OM-Publication andUniv-Bench also confirm this observation. COH(Univ-Bench)is 0.056 that is less than COH (OM-Publication), which is0.098. As Fig. 6 shows the time required for running Query 3over monolithic design is more than the time required forrunning Query 3 over modular designs.

According to Fig. 6, the time required for runningQuery 11 over the monolithic and modular designs is lessthan 10 ms, that is, too low to show a meaningfulcorrelation with the values of cohesion metric.

Fig. 7 shows the time required for running and the timerequired for consistency checking and running Queries 1,4, 5, 6, 7, 9, 10, 14, 15 and 16. In modular design 1,all these queries are posed to OM-Person-Organization.In modular design 2, all of them are posed to OM-Person.As the figure shows, query execution time and alsoconsistency checking and query execution time over themonolithic design with COH of 0.056 is considerably morethan over two modular designs 1 and 2 with COH of 0.085and 0.11, respectively. In addition, the running time ofqueries in modular design 1 is more compared to modulardesign 2 whose COH metric has a larger value.

(5,0) dataset in the monolithic and modular designs.

Fig. 7. ABox consistency checking and query execution time over LUBM monolithic and modular designs.


Fig. 8 shows the time required for running and the timerequired for consistency checking and running Queries 2, 8,12 and 13. In modular design 1, all these queries are posedto OM-Person-Organization, while in modular design 2,they are more complex. They are first broken into twosub-queries and are posed to both OM-Person, and OM-Organization, and then the result sets of sub-queries areintegrated. As the figure shows, query execution time andalso consistency checking and query execution times overthe monolithic design with COH of 0.056 are more thanthree time in two modular designs 1 and 2 with COH of0.085 and 0.1043, respectively. In addition, the modulardesign 2, which has a larger value for its COH metric, has abetter performances in both query execution and ABoxconsistency checking.

5.4.2. VICODI

Fig. 9 shows the time that the reasoning engineneeded for running five VICODI queries, for loading theknowledge base and running queries, and for ABox con-sistency checking and running queries over the VICODIdataset in the monolithic and modular designs. This figureshows that these times are considerably more in themonolithic design whose value for the COH metric isnoticeably less than the value of COH metrics in modulardesigns. In addition, this figure shows that modulardesign 2, whose COH metric has a larger value in allqueries compared with modular design 1, shows aroughly better performance in all aspects of query execu-tion, ABox consistency checking and knowledge baseloading.

Fig. 8. ABox consistency checking and query execution time over LUBM monolithic and modular designs.


5.4.3. DBPedia

Fig. 10 shows the time that the reasoning engine neededfor running queries 1, 2, and 4, for ABox consistency checkingand running queries over DBPedia dataset in the monolithicand modular designs. DBPedia is a very large dataset includ-ing about 1,478,000 instances. Checking consistency andrunning queries over this huge ontology was impossible inour computer. Hence, the execution time of all queries isunmeasurable in the monolithic design. Modular design 1 hasa large ontology module (Module 0) to which the other smallones are connected (see Fig. 5). The size of this module is notmuch smaller than the DBPedia ontology in the monolithicdesign. Hence, all queries that are posed to Module 0 in thisdesign are unmeasurable in our environment. On the otherhand, these queries were successfully run in the modulardesign 2, which has ontology modules with higher cohesion.

Fig. 11 shows the running time of Query 3 over twoDBPedia modular designs. Query 3 is not posed to Module 0(the largest ontology module in the DBPedia modulardesign 1), but instead it is posed to the small cohesiveModule 1 with COH¼0.0375. Query 3 in modular design2 is posed to Person_Module and Work_Module. Theweighted average of COH metrics of these modules is0.011246, which is lower than COH of Module 1 inmodular design 1. As the figure shows, query executionhas a significantly better performance in modular design1 compared to design 2 for this query.

Hypotheses 1 and 2 are supported by the results presentedin this section. Accordingly, the time required for runningqueries over LUBM, VICODI and DBPedia and also the timeneeded for ABox consistency checking are less in the ontologymodules that have larger values for their COH metric.

Fig. 9. The time required for query execution, for ABox consistency checking and query execution, and for knowledge base loading and query execution

over VICODI dataset in the monolithic and modular designs.


Fig. 10. The time required for query execution and for ABox consistency checking and query execution of queries 1, 2, and 4 over DBPedia dataset in the

monolithic and modular designs.


5.5. Discussion

As we mentioned earlier in this section, coupling canintuitively be correlated with the number of modules thatare involved with a given query on a specific subject.

In order to investigate this hypothesis, we need to have awide range of queries provided for each dataset in orderto be able to find a reliable value for the probable numberof involved modules for each query. Among the datasetsthat we explored, LUBM has a wider range of queries.

Fig. 11. The time required for query execution and for ABox consistency checking and query execution of query 3 over DBPedia dataset in modular

designs.

Fig. 12. The relationship between COP of a design and the number of modules to be involved for queries.


Fig. 12 shows the relationship between the number ofinvolved modules for queries and the value of the COPmetric for two modular designs of the LUBM dataset.

Fig. 12 shows that queries over the LUBM modulardesign 2 (with a larger value for its COPDES metric) thatare more probable to be posed to more than one module.Obviously, in a distributed deployment, where modulesare deployed on different servers, this design faces morecomplexities in posing queries and integrating theirresults.

Unfortunately, we were not able to find a query log, aset of queries that are posed to the ontologies in real-world applications, for any of the datasets, and hence weleave a more general investigation of the correlationbetween coupling metrics and the number of involvedmodules for queries for future work.

For calculating cohesion and coupling metrics fordifferent ontology modules in this section, we just con-sider their TBoxes, i.e., we considered the semanticdependencies between roles and concepts but not indivi-duals. Our intuition was that the TBox of an ontologymodule can be a good representative of the semanticdependencies that exist between the symbols of the whole

knowledge base, including its ABox. Obviously, for theontologies whose ABoxes have asymmetrically broadened,we need to consider dependencies between all symbols.

6. Discussion and future work

There is an extensive body of work in the literature forformalizing and finding the relevant axioms of an ontol-ogy to a set of terms in order to provide solutions for theextraction and integration of ontology modules [22–24].These solutions mostly focus on finding the relevancy ofaxioms to symbols. Nonetheless, we can find some com-monalities in their approach with the approach that isemployed in this paper for finding semantic dependenciesbetween symbols. Given a set of symbols S, Grau et al. [22]defines local( S) as all axioms that are local w.r.t. S. In thisdefinition, an axiom a is local w.r.t. S if it is possible to takeany interpretation for the symbols in S and extend it to amodel of a while the interpretation interprets the addi-tional symbols as the empty set. Based on this approach, alocal axiom w.r.t. S must be a tautology when all itsadditional symbols are replaced by ?.


Both the approaches utilize the idea of replacing somesymbols in the ontology with ? (or alternatively >) forfinding dependencies. The approach that is employed inthis paper analyzes and formalizes the impact of thereplacement of every symbol on the other ontologicalsymbols for finding symbolic interdependencies, whilethe other approach focuses on extracting a subset ofaxioms of an ontology that is related to a set of symbols.

For future work, we would like to analyze the employ-ment of the presented metrics in this paper in the process ofmodule extraction for extracting more cohesive and lesscoupled modules from an ontology, related to a set ofsymbols.

Furthermore in this paper, we showed the relationshipbetween the cohesion and coupling metrics and thereasoning performance. Intuitively, we can observe thatthese metrics can be employed for evaluating ontologiesfrom other design-based criteria such as understandabil-ity, reusability and maintainability. Highly cohesive ontol-ogy modules can be easily understood by a third-partyobserver. This is because the intent and the informationpresented by the module are limited to specific informa-tion. Similarly, low-coupling facilitates understandabilityand reusability of ontology modules. For understandingand reusing an ontology module, least amount of externalknowledge should be explored and understood.

High-cohesive modules enhance more effective reusa-bility and maintainability by allowing their users toutilize a related group of concepts and roles and avoidthe import of non-related subjects and sub-domains ofdiscourse. Low-coupling facilitates change propagationsand inconsistency resolutions. When a low-coupled ontol-ogy module is modified, the least possible number of theother ontology modules and the least portion of theirknowledge bases are affected. Observably in low-coupledontology modules, the less effort is needed for inconsis-tency resolution and applying new revisions.

We leave the precise analysis of the relationshipbetween the cohesion and coupling metrics and thesedesigns based criteria for future work. For this purpose,we intend to analyze the cohesion and coupling metrics inthe context of different empirical studies with differentdomains and different groups of domain experts andontology designers.

Even though we defined higher-degree moderatedependencies, we based the definitions of the couplingand cohesion metrics on the strong and first-degree moder-ate dependencies. The reason is that finding all higher-degreemoderate dependencies is a very computationally expensive

task and hardly achievable for most ontologies and systems.Theoretically, the definitions of the cohesion and couplingmetrics match the intuitive notion of dependencies betweenontological terms when they capture all dependencies.However, we kept their definitions as computationallyachievable as possible for now and left the exploration ofthe efficient algorithms and methods for capturing othertypes of semantics dependencies for future work.

7. Conclusion

In this paper, we proposed a set of semantic metrics forevaluating cohesion and coupling of ontologies in mono-lithic and modular designs. Through these metrics, we areable to compare different ontology modules and alsodifferent ontology designs that may possibly exist forrepresenting a body of knowledge. We empirically inves-tigated several case studies of conjunctive query answer-ing time for both monolithic and modular ontologydesigns. The investigations showed that the time requiredfor answering queries and ABox consistency checkingover ontology modules with higher value for their cohe-sion metric is less compared with ontology modules withlower values for their cohesion metric.

The main aspects of our work for evaluating modularontologies are as follows:

�
The introduced metrics for assessing modular ontolo-gies are based on semantic definitions of dependenciesbetween local symbols and between local and externalsymbols of ontology modules. This semantic approachhas the advantage that it considers all asserted andimplied axioms for ontology evaluation. � Our work for assessing ontologies focuses on both
internal and external attributes of ontologies (internaland external attributes are introduced in Fenton [19]).We introduced metrics for measuring cohesion andcoupling that are internal attributes of ontologies. Wealso investigated reasoning performance: the externalattribute of ontologies, and we showed the associationbetween the metrics and this external attribute.
� The evaluation framework, which is introduced in
this paper, supports evaluating ontology modulesand ontology designs. Based on this framework,different ontology designs that may exist for repre-senting the same body of knowledge can be evalu-ated and compared with regards to their cohesionand coupling.

Appendix A

The following are the conjunctive queries that we used for evaluating reasoning performance over LUBM knowledgebases.

Query1
SELECT ?X ?C WHEREf
?X rdf : type ub : GraduateStudent:

?X ub : takesCourse ?C

g


Query2
SELECT ?X ?Y ?Z WHEREf
?X rdf : type ub : GraduateStudent:

?Y rdf : type ub : University:

?Z rdf : type ub : Department:

?X ub : memberOf ?Z:

?Z ub : subOrganizationOf ?Y :

?X ub : undergraduateDegreeFrom ?Y

g

Query3
SELECT ?X ?Y WHERE f
?X rdf : type ub : Publication:

?Y rdf : type ub : Professor:

?X ub : publicationAuthor ?Y

g

Query4
SELECT ?X ?Y1 ?Y2 ?Y3 WHERE f
?X rdf : type ub : Professor:

?X ub : name ?Y1:

?X ub : emailAddress ?Y2:

?X ub : telephone ?Y3

g

Query5
SELECT ?X ?Y WHEREf
?X rdf : type ub : Person:

?X ub : memberOf ?Y

g

Query6
SELECT ?X WHERE f
?X rdf : type ub : Student

g

Query7
SELECT ?X ?Y ?Z WHERE f
?X rdf : type ub : Student

?Y rdf : type ub : Course:

?X ub : takesCourse ?Y :

?Z ub : teacherOf ?Y

g

Query8
SELECT ?X ?Y ?Z ?M WHERE f
?X rdf : type ub : Student:

?Y rdf : type ub : Department:

?Y ub : subOrganizationOf ?M:

?X ub : emailAddress ?Z

g

Query9
SELECT ?X ?Y ?Z WHERE f

?Y rdf : type ub : Faculty:

?Z rdf : type ub : Course:

?X ub : advisor ?Y :

?Y ub : teacherOf ?Z:

?X ub : takesCourse ?Z

g

Query10
SELECT ?XWHERE f

?Y rdf : type ub : GraduateCourse:

?X ub : takesCourse ?Y

g

Query11
SELECT ?X WHERE f
?X rdf : type ub : ResearchGroup:

?X ub : subOrganizationOf ?Y :

?Y rdf : type ub : University

g


Query12
SELECT ?X ?Y ?M WHERE f
?X rdf : type ub : Chair:

?Y rdf : type ub : Department:

?X ub : worksFor ?Y :

?Y ub : subOrganizationOf ?M

g

Query13
SELECT ?X ?Y WHERE f
?X rdf : type ub : Person:

?Y ub : hasAlumnus ?X

g

Query14
SELECT ?X WHERE f
?X rdf : type ub : UndergraduateStudent

g

Query15
SELECT ?X ?C WHERE f

?X rdf : type ?C:

?C rdfs : subClassOf ub : Employeeg

Query16
SELECT ?X ?C WHEREf

?X rdf : type ?C:

?C directSubClassOf ub : Employeeg

Appendix B

The following are the queries that we used for evaluating reasoning performance over DBPedia knowledge bases.

Query1
SELECT ?name ?birth ?death ?person WHERE f
?person dbo : birthPlace ohttp : // dbpedia:org=resource=Berlin4 :

?person dbo : birthDate ?birth:

?person dbo : deathDate ?death

FILTER ð?birth o ‘‘1900-01-01’’44xsd : dateÞ:

g

ORDER BY ?name

Query2
SELECT ?player, ?place, ?cap, pop
WHERE f

?s foaf : page ?player:

?s rdf : type ohttp : // dbpedia:org=ontology=SoccerPlayer4 :

?s ohttp : // dbpedia:org=property=position4 ?position:

?s ohttp : // dbpedia:org=property=clubs4 ?club:

?club ohttp : // dbpedia:org=ontology=capacity4 ?cap:

?s ohttp : // dbpedia:org=ontology=birthPlace4 ?place:

?place ?population ?pop:

OPTIONAL f?s ohttp : // dbpedia:org=ontology=number4 ?tricot:g

Filter ðxsd : intð?popÞ410000000Þ:

Filter ðxsd : intð?capÞo40000Þ:

Filter ð?position¼ ’’Goalkeeper’’@en 99 ?position¼

ohttp : // dbpedia:org=resource=Goalkeeper_%28associationf ootball%29499

?position¼ ohttp : // dbpedia:org=resource=Goalkeeper_%28football%294 Þ

g Limit 1000

Query3
SELECT ?subject ?label ?released ?abstract
WHERE f

?subject rdf : type ohttp : // dbpedia:org=ontology=Film4 :

?subject ohttp : // dbpedia:org=property=starring4 ohttp : // dbpedia:org=resource=TomC ruise4 :

?subject rdfs : comment?abstract:

?subject rdfs : label ?label:

FILTERðlangð?abstractÞ ¼ ‘‘en’’ &&langð?labelÞ ¼ ‘‘en’’Þ:

?subject ohttp : // dbpedia:org=ontology=releaseDate4 ?released:

FILTERðxsd : dateð?releasedÞ o ‘‘2000-01-01’’44xsd : dateÞ:

g ORDER BY ?released

LIMIT 20

Table 2Cohesion and coupling of ontology modules.

Dataset Ontology design Module COH COP 9Loc9 COHW-Avg COPDes

LUBM Monolithic

design

Univ-Bench 0.0566 0 68 – 0

Modular design 1 OM-Publication 0.0984 0.0666 15 – 0.0245

OM-Person-Organization 0.0851 0.0125 53 –

Modular design 2 OM-Publication 0.0984 0.0666 15 –

OM-Person 0.1102 0.2051 39 – 0.16596

OM-Organization 0.0879 0.1632 14 –

OM-Organization; OM-Person – – 14; 39 0.1043

VICODI Monolithic

design

VICODI 0.03044 0 194 – 0

Modular design 1 Module 0 0.1003 0.3333 30 –

Module 1 0.0606 0.6666 40 –

Module 0; Module 7 0.1003; 0.0634 – 30; 22 0.08477

Module 2; Module 3; Module 8 0.202; 0.2666; 0.01111 – 12; 5; 25 0.0960 0.5257

Module 0; Module 1 0.202; 0.266; 0.0111 – 30; 40 0.08053

Modular design 2 Module 0; Module 7; Module 9 0.1341; 0.4444; 0.333 – 22; 3; 4 0.1937

Module 1 0.0920 0.6666 21 –

Module 0; Module 19 0.1341; 0.0634 – 22; 22 0.0988 0.58169

Module 2; Module 12; Module 18; Module 20 0.1676; 0.266; 0.266; 0.222 – 19; 5; 5; 6 0.2052

Module 1; Module 6 0.0920; 0.444 – 21; 3 0.1366

DBPedia Monolithic

design

DBPedia 0.0025 0 861 – 0

Modular design 1 Module 0; Module 4; Module 10 0.0025; 0.2222; 0.037 – 674; 3; 28 0.00488 –

Module 0 0.0025 0 674 –

Module 1 0.1538 0.666 13 – 0.0436

Module 0; Module 7 0.0025; 0.0888 – 674; 6 0.0033

Modular design 2 Person-Module 0.0061 0.063 204 –

Person-Module; Org-Module; Place-Module 0.0061; 0.0089; 0.022 – 204; 192; 49 0.00914 0.04788

Person-Module; Work-Module 0.0061; 0.0318 – 204; 65 0.01234

Org-Module 0.0089 0.0243 192 –


Query4
SELECT n
WHERE f

?company a ohttp : // dbpedia:org=ontology=Organisation4 :

?company ohttp : // dbpedia:org=ontology=foundationPlace4 ohttp : // dbpedia:org=resource=California4 :

?product ohttp : // dbpedia:org=ontology=developer4 ?company

:?producta ohttp : // dbpedia:org=ontology=Software4

g

Appendix C

Cohesion and coupling of ontology modules are shown in Table 2.

References

[1] M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry,A. Davis, K. Dolinski, S. Dwight, J. Eppig, et al., Gene ontology:tool for the unification of biology, Nature Genetics 25 (1) (2000)25–29.

[2] F. Baader, Appendix: description logic terminology, The DescriptionLogic Handbook: Theory, Implementation, and Applications (2003)485–495.

[3] F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider (Eds.), The Description Logic Handbook: Theory, Imple-mentation, and Applications, Cambridge University Press, 2003.

[4] F. Baader, U. Sattler, An overview of tableau algorithms fordescription logics, Studia Logica 69 (1) (2001) 5–40.

[5] J. Bao, D. Caragea, V. Honavar, Modular ontologies—a formal investi-gation of semantics and expressivity, in: R. Mizoguchi, Z. Shi,F. Giunchiglia (Eds.), ASWC, vol. 4185, 2006, pp. 616–631.

[6] J. Bao, G. Slutzki, V. Honavar, A semantic importing approach toknowledge reuse from multiple ontologies, in: Proceedings of theNational Conference on Artificial Intelligence, vol. 22, 1999, AAAIPress, MIT Press, Menlo Park, CA, Cambridge, MA, London, 2007,p. 1304.

[7] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak,S. Hellmann, Dbpedia-a crystallization point for the web of data,Web Semantics: Science, Services and Agents on the World WideWeb, 2009.

[8] A. Borgida, L. Serafini, Distributed description logics: assimilatinginformation from peer sources, Journal of Data Semantics 1 (2003)153–184.

[9] J. Brank, M. Grobelnik, D. Mladenic, A survey of ontology evaluationtechniques, in: Proceedings of the Conference on Data Mining andData Warehouses (SiKDD 2005), Citeseer, 2005.

[10] J. Brank, D. Mladenic, M. Grobelnik, Gold standard based ontologyevaluation using instance assignment, in: Proceedings of the EON2006 Workshop, Citeseer, 2006.


[11] C. Brewster, H. Alani, S. Dasmahapatra, Y. Wilks, Data drivenontology evaluation, in: Proceedings of LREC, vol. 2004, Citeseer,2004.

[12] A. Burton-Jones, V. Storey, V. Sugumaran, P. Ahluwalia, A semioticmetrics suite for assessing the quality of ontologies, Data & Knowl-edge Engineering 55 (1) (2005) 84–102.

[13] S. Chidamber, C. Kemerer, A metrics suite for object orienteddesign, IEEE Transactions on Software Engineering 20 (6) (1994)476–493.

[14] O. Corcho, M. Fernandez-Lopez, A. Gomez-Perez, Methodologies,tools and languages for building ontologies: where is their meetingpoint, Data & Knowledge Engineering 46 (1) (2003) 41–64.

[15] F. Ensan, W. Du, Towards domain-centric ontology developmentand maintenance frameworks, in: Proceedings of the NineteenthInternational Conference on Software Engineering & KnowledgeEngineering (SEKE 2007), Citeseer, 2007.

[16] F. Ensan, W. Du, An interface-based ontology modularizationframework for knowledge encapsulation, in: Proceedings of the7th International Conference on the Semantic Web, Springer, 2008,p. 532.

[17] F. Ensan, W. Du, Formalizing the role of goals in the development ofdomain-specific ontological frameworks, in: Proceedings of the41st Annual Hawaii International Conference on System Sciences,IEEE Computer Society, 2008, p. 120.

[18] F. Ensan, W. Du, A knowledge encapsulation approach to ontologymodularization, Knowledge and Information Systems 20 (3) (2009)249–283.

[19] N. Fenton, Software measurement: a necessary scientific basis, IEEETransactions on Software Engineering 20 (3) (1994) 199–206.

[20] A. Gangemi, C. Catenacci, M. Ciaramita, J. Lehmann, Modellingontology evaluation and validation, in: Proceedings of the 3rdEuropean Semantic Web Conference (ESWC2006), vol. 4011,Springer, 2006.

[21] B. Glimm, I. Horrocks, C. Lutz, U. Sattler, Conjunctive queryanswering for the description logic, Journal of Artificial IntelligenceResearch 31 (2008) 157–204.

[22] B. Grau, I. Horrocks, Y. Kazakov, U. Sattler, Just the right amount:extracting modules from ontologies, in: Proceedings of the 16th Inter-national Conference on World Wide Web, ACM, 2007, pp. 717–726.

[23] B. Grau, I. Horrocks, Y. Kazakov, U. Sattler, Modular reuse ofontologies: theory and practice, Journal of Artificial IntelligenceResearch 31 (1) (2008) 273–318.

[24] B. Grau, Y. Kazakov, I. Horrocks, U. Sattler, A logical framework formodular integration of ontologies, in: Proceedings of the 20thInternational Joint Conference on Artificial Intelligence (IJCAI2007), Citeseer, 2007, pp. 298–303.

[25] B. Cuenca-Grau, B. Parsia, E.Sirin, A.Kalyanpur, Automatic partition-ing of owl ontologies using e-connections, Tech. rep., UMIACS,available at http://www.mindswap.org/2004/multipleOnt/papers/Partition.pdf (2005).

[26] Y. Guo, Z. Pan, J. Heflin, LUBM: a benchmark for OWL knowledgebase systems, Web Semantics: Science, Services and Agents on theWorld Wide Web 3 (2–3) (2005) 158–182.

[27] V. Haarslev, R. Moller, Racer: a core inference engine for thesemantic web, in: Proceedings of the 2nd International Workshopon Evaluation of Ontology-based Tools, Citeseer, 2003, pp. 27–36.

[28] P. Haase, H. Lewen, R. Studer, T. Tran, M. Erdmann, M. d’Aquin,E. Motta, The neon ontology engineering toolkit, in: WWW, 2008.

[29] M. Hitz, B. Montazeri, Measuring coupling and cohesion in object-oriented systems, in: Proceedings of the International Symposiumon Applied Corporate Computing, vol. 50, 1995, pp. 75–76.

[30] I. Horrocks, DAMLþ OIL: a description logic for the semantic web,IEEE Data Engineering Bulletin, 25 (1) (2002) 4–9.

[31] I. Horrocks, U. Sattler, S. Tobies, Practical reasoning for expressivedescription logics, in: Proceedings of the 6th International Con-ference on Logic Programming and Automated Reasoning, LPAR ’99,Springer-Verlag, London, UK, 1999, pp. 161–180.

[32] S. Kramer, H. Kaindl, Coupling and cohesion metrics for knowledge-based systems using frames and rules, ACM Transactions on

Software Engineering and Methodology 13 (July) (2004) 332–358.URL: /http://doi.acm.org/10.1145/1027092.1027094S.

[33] O. Kutz, C. Lutz, F. Wolter, M. Zakharyaschev, E-connections ofabstract description systems, Artificial Intelligence 156 (1) (2004)1–73.

[34] A. Lozano-Tello, A. Gomez-Perez, Ontometric: a method to choosethe appropriate ontology, Journal of Database Management 15 (2)(2004) 1–18.

[35] Y. Ma, B. Jin, Y. Feng, Semantic oriented ontology cohesion metricsfor ontology-based systems, The Journal of Systems and Software(2009).

[36] A. Maedche, S. Staab, Measuring similarity between ontologies, in:Lecture Notes in Computer Science, 2002, pp. 251–263.

[37] D. McGuinness, F. Van Harmelen, OWL web ontology languageoverview, W3C Recommendation 10 (2004). 2004-03.

[38] B. Motik, U. Sattler, A comparison of reasoning techniques forquerying large description logic aboxes, in: Logic for Programming,Artificial Intelligence, and Reasoning, Springer, 2006, pp. 227–241.

[39] G. Nagypal, R. Deswarte, J. Oosthoek, Applying the Semantic Web:the VICODI experience in creating visual contextualization forhistory, Literary and Linguistic Computing 20 (3) (2005) 327.

[40] A. Orme, H. Tao, L. Etzkorn, Coupling metrics for ontology-basedsystem, IEEE Software 23 (2) (2006) 102–108.

[41] R. Porzel, R. Malaka, A task-based approach for ontology evaluation,in: ECAI Workshop on Ontology Learning and Population, Valencia,Spain, Citeseer, 2004.

[42] E. Prud-Hommeaux, A. Seaborne, et al., SPARQL query language forRDF, W3C Working Draft 4, 2006.

[43] R. Raskin, M. Pan, Semantic Web for Earth and EnvironmentalTerminology (SWEET), in: Proceedings of the Workshop on Seman-tic Web Technologies for Searching and Retrieving Scientific Data,Citeseer, 2003.

[44] E. Sirin, B. Parsia, B.C. Grau, A. Kalyanpur, Y. Katz, Pellet: a practicalowl-dl reasoner, Web Semantics: Science, Services and Agents onthe World Wide Web 5 (2) (2007) 51–53.

[45] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters,L. Goldberg, K. Eilbeck, A. Ireland, C. Mungall, et al., The OBOfoundry: coordinated evolution of ontologies to support biomedicaldata integration, Nature Biotechnology 25 (11) (2007) 1251–1255.

[46] P. Spyns, et al., EvaLexon: Assessing Triples Mined from Texts,Technical Report 09, 2005.

[47] H. Stuckenschmidt, M. Klein, Structure-based partitioning of largeconcept hierarchies, The Semantic Web-ISWC 2004, 2004, pp. 289–303.

[48] H. Stuckenschmidt, M. Klein, Reasoning and change managementin modular ontologies, Data & Knowledge Engineering 63 (2)(2007) 200–223.

[49] H. Stuckenschmidt, C. Parent, S. Spaccapietra, Modular Ontologies:Concepts, Theories and Techniques for Knowledge Modularization,Springer-Verlag, New York, Inc., 2009.

[50] S. Tartir, I. Arpinar, M. Moore, A. Sheth, B. Aleman-Meza, OntoQA:metric-based ontology quality analysis, in: Proceedings of theWorkshop on Knowledge Acquisition from Distributed, Autono-mous, Semantically Heterogeneous Data and Knowledge Sources(KADASH), Citeseer, 2006.

[51] G. Tzoganis, D. Koutsomitropoulos, T. Papatheodorou, QueryingOntologies: Retrieving Knowledge from Semantic Web Documents,Tech-Report, Available at: /http://www.hpclab.ceid.upatras.gr/viografika/kotsomit/pubs/eureka09.pdfS, 2009.

[52] D. Vrandecic, Y. Sure, How to design better ontology metrics,Lecture Notes in Computer Science, vol. 4519, , 2007, p. 311.

[53] Y. Wang, J. Bao, P. Haase, G. Qi, Evaluating formalisms for modularontologies in distributed information systems, Lecture Notes inComputer Science, vol. 4524, , 2007, pp. 178–193.

[54] H. Yao, A. Orme, L. Etzkorn, Cohesion metrics for ontology designand application, Journal of Computer Science 1 (1) (2005) 107–113.

[55] D. Zhang, C. Ye, An evaluation method for ontology complexityanalysis in ontology evolution, Lecture Notes in Computer Sciencevol. 4248 (2006)., p. 214.

http://www.mindswap.org/2004/multipleOnt/papers/Partition.pdf

http://www.mindswap.org/2004/multipleOnt/papers/Partition.pdf

http://doi.acm.org/10.1145/1027092.1027094

http://www.hpclab.ceid.upatras.gr/viografika/kotsomit/pubs/eureka09.pdf

http://www.hpclab.ceid.upatras.gr/viografika/kotsomit/pubs/eureka09.pdf