concept similarity in symontos: an enterprise ontology management tool

12
c British Computer Society 2002 Concept Similarity in SymOntos: An Enterprise Ontology Management Tool 1 ANNA FORMICA AND MICHELE MISSIKOFF LEKS, IASI-CNR, Viale Manzoni 30, I-00185, Rome, Italy Email: [email protected] The possibility of assessing the similarity between concepts is growing in importance. One of the primary reasons for this is the development of the Net Economy that requires a high level of computer support and flexibility in doing business. In business transactions, similarity plays an important role. It is constantly used whenever certain goods or services are not available with the required characteristics. Then a substitute may be accepted, as far as it is sufficiently close to what was originally required. In this paper we propose a method for evaluating concept similarity. The work has been performed within the SymOntos project concerning the development of a symbolic ontology management system, where concepts are defined in accordance with a frame- oriented approach. Received 23 October 2001; revised 9 April 2002 1. INTRODUCTION The possibility of assessing the similarity between concepts is growing in importance. Among the primary reasons, we may cite the development of the so-called Net Economy, which requires flexibility in doing business and the possibility of co-operation between national and international organizations, creating unplanned, often temporary, partnerships. In business transactions, similarity plays an important role. Flexibility requires enterprises to be able to cope with (often unexpected) different situations with respect to what was originally planned. For instance, in an e-procurement transaction, it may be the case that the required goods are not available with the desired characteristics (e.g. with the expected price, quality or delivery date); therefore the production plan must be adjusted to use a ‘similar’ part, although not exactly the one originally planned. (If the new part is ‘very similar’, the production plans do not need to be adjusted.) A similarity evaluation method is also required in other different areas, such as ontology integration, integration of multiple heterogeneous information sources for mediation and data warehousing, virtual enterprises and component- based information systems development. It is also important in another, very different, context, such as tourism services. When you start planning a holiday, it is very difficult to find exactly what you are looking for. Often, it is necessary to accept a hotel close to the original choice (but not exactly the one you wanted) and a flight with different dates or price. Again, similarity evaluations appear to be a fundamental 1 This work has been partially supported by the European Project IST-2000-29329 Harmonise. activity, although we often establish a similarity threshold below which we simply decide to stop since the trip is no longer what we originally wanted. On a more general ground, similarity reasoning, like taxonomic reasoning [1], represents one of the key mechanisms that humans use in order to organize their thoughts and plan their actions. However, similarity is a notion very difficult to precisely and exhaustively define. Objects can be similar from certain points of view and very different from others. According to [2], if we consider (the notion of) a pig,a donkey and a car, the first two exhibit a greater affinity being both animals but, in another perspective, the last two are similar as vehicles. The first similarity is due to a natural affinity, the second to a functional affinity. In this paper we consider concept similarity from an informational point of view. Given two concepts, e.g. car and truck, with their respective definitions, we would like to have a method to assess their similarity. In e-commerce, e-procurement is performed automatically by machines, so that a similarity reasoning facility would be extremely useful in performing automatic transactions [3]. The work presented in this paper is a first solution that has been adopted in SymOntos [4], an enterprise ontology management system developed at LEKS (Laboratory for Enterprise Knowledge and Systems), IASI-CNR, within two European projects, namely FETISH (Federated European Tourism Information System Harmonization) and, currently, Harmonise. SymOntos is based on the Object, Process, Actor language (OPAL) methodology [5] that allows concepts to be defined according to a frame- oriented approach. Notice that frame theory is a paradigm for representing real world knowledge, originally THE COMPUTER J OURNAL, Vol. 45, No. 6, 2002 at University of California, San Francisco on December 16, 2014 http://comjnl.oxfordjournals.org/ Downloaded from

Upload: a

Post on 11-Apr-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

c© British Computer Society 2002

Concept Similarity in SymOntos:An Enterprise Ontology

Management Tool1

ANNA FORMICA AND MICHELE MISSIKOFF

LEKS, IASI-CNR, Viale Manzoni 30, I-00185, Rome, ItalyEmail: [email protected]

The possibility of assessing the similarity between concepts is growing in importance. One ofthe primary reasons for this is the development of the Net Economy that requires a high levelof computer support and flexibility in doing business. In business transactions, similarity playsan important role. It is constantly used whenever certain goods or services are not availablewith the required characteristics. Then a substitute may be accepted, as far as it is sufficientlyclose to what was originally required. In this paper we propose a method for evaluating conceptsimilarity. The work has been performed within the SymOntos project concerning the developmentof a symbolic ontology management system, where concepts are defined in accordance with a frame-

oriented approach.

Received 23 October 2001; revised 9 April 2002

1. INTRODUCTION

The possibility of assessing the similarity between conceptsis growing in importance. Among the primary reasons,we may cite the development of the so-called NetEconomy, which requires flexibility in doing businessand the possibility of co-operation between nationaland international organizations, creating unplanned, oftentemporary, partnerships. In business transactions, similarityplays an important role. Flexibility requires enterprisesto be able to cope with (often unexpected) differentsituations with respect to what was originally planned.For instance, in an e-procurement transaction, it may bethe case that the required goods are not available withthe desired characteristics (e.g. with the expected price,quality or delivery date); therefore the production plan mustbe adjusted to use a ‘similar’ part, although not exactlythe one originally planned. (If the new part is ‘verysimilar’, the production plans do not need to be adjusted.)A similarity evaluation method is also required in otherdifferent areas, such as ontology integration, integration ofmultiple heterogeneous information sources for mediationand data warehousing, virtual enterprises and component-based information systems development. It is also importantin another, very different, context, such as tourism services.When you start planning a holiday, it is very difficult to findexactly what you are looking for. Often, it is necessary toaccept a hotel close to the original choice (but not exactlythe one you wanted) and a flight with different dates or price.Again, similarity evaluations appear to be a fundamental

1This work has been partially supported by the European ProjectIST-2000-29329 Harmonise.

activity, although we often establish a similarity thresholdbelow which we simply decide to stop since the trip is nolonger what we originally wanted.

On a more general ground, similarity reasoning, liketaxonomic reasoning [1], represents one of the keymechanisms that humans use in order to organize theirthoughts and plan their actions. However, similarity is anotion very difficult to precisely and exhaustively define.Objects can be similar from certain points of view and verydifferent from others. According to [2], if we consider(the notion of) a pig, a donkey and a car, the first twoexhibit a greater affinity being both animals but, in anotherperspective, the last two are similar as vehicles. The firstsimilarity is due to a natural affinity, the second to afunctional affinity. In this paper we consider conceptsimilarity from an informational point of view. Given twoconcepts, e.g. car and truck, with their respective definitions,we would like to have a method to assess their similarity.In e-commerce, e-procurement is performed automaticallyby machines, so that a similarity reasoning facility would beextremely useful in performing automatic transactions [3].The work presented in this paper is a first solution thathas been adopted in SymOntos [4], an enterprise ontologymanagement system developed at LEKS (Laboratoryfor Enterprise Knowledge and Systems), IASI-CNR,within two European projects, namely FETISH (FederatedEuropean Tourism Information System Harmonization) and,currently, Harmonise. SymOntos is based on the Object,Process, Actor language (OPAL) methodology [5] thatallows concepts to be defined according to a frame-oriented approach. Notice that frame theory is aparadigm for representing real world knowledge, originally

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 2: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

584 A. FORMICA AND M. MISSIKOFF

introduced by Minsky in [6], from which numerousresearch tracks on intelligent systems have originated,such as natural languages and recognition [7], hybridsystems [8], object-oriented languages [9], ISA-hierarchiesand subsumption [10], F -logic [11].

1.1. The knowledge representation method

According to OPAL, an ontology is constructed by defininga set of concepts and establishing semantic relations betweenthem. OPAL supplies a set of predefined concept categories(referred to as metaconcepts) and semantic relations thatform the OPAL framework. The definition of a domainconcept takes place by filling a concept template (conceivedaccording to a frame-slot approach), supplying first theOPAL category it belongs to, then filling the specified slots.The OPAL concept categories on which we focus in thispaper are: Actor, Object and Process.2

• Actor: this metaconcept allows the ontology engi-neer to define the active concepts of the domain(e.g. Customer or Travel Agency). A concept of thiscategory is able to activate or perform one or moreprocesses.

• Object: this metaconcept is used to model passiveconcepts, on which processes operate (e.g. Flight seat,Hotel room).

• Process: this metaconcept is used to model acti-vities that are performed to achieve actors’ goals(e.g. Hotel room reserving, Flight booking).

Therefore, according to OPAL, a SymOntos concept isdefined by specifying, besides the label and a description(d) in natural language, the following slots:

Kind (k), which specifies the category of the concept beingdefined (i.e. Actor, Object, or Process);

Broader (B), which gathers a set of references to moregeneral concepts;

Part (Pa), which gathers a set of references to conceptsrepresenting components;

Related (R), which gathers a set of references to relatedconcepts;

Predicate (Pr), which gathers a set of references toconcepts that can be seen as attributes;

Similar (S), which gathers a (possibly empty) set of termsthat represent similar concepts. Each term is associatedwith a similarity degree (a positive decimal less than orequal to 1.0; in the latter case we have a synonym).

2Notice that the examples provided in this paper have been taken fromthe tourism domain. In particular, with regard to the descriptions of thetourism concepts, we considered the work developed in [12], within theFETISH European Project.

EXAMPLE 1.1. A SymOntos concept, whose label isGuestHouse, is defined as follows:

GuestHouse := (

d = “Private house where accommodation and in most

cases breakfast are provided”,

k = Object,

B = {Accommodation},Pa = {DiningRoom},R = {Customer, Breakfast},P r = {Price},S = {〈Hotel, 0.7〉})

It is important to note that the similarity degree is notjudged by the user but it is established, by means of aconsensus system [13], by a panel of experts in a preliminaryphase. We will refer to it as tentative similarity (tsim),to distinguish it from concept similarity (csim), which isevaluated on the basis of the concept structure and is onlypartially influenced by tsim.

The above concept structure allows a complex semanticnet [8] to be defined. A few interesting subgraphs can beidentified. One is the inheritance hierarchy, constructed bymeans of the Broader declarations; another is the similaritygraph, constructed by means of the Similar declarations.The remaining sections of a concept definition (i.e. Part,Related and Predicate) represent the structural form, sincethey determine the information structure of the relatedinstances. The aim of this work is to use (i) the inheritancehierarchy, (ii) the similarity graph and (iii) the conceptstructural forms to derive the concept similarity csim.

1.2. The essence of the proposed method

The proposed method is divided in two phases. The first isa preparation phase, where the concepts are pre-elaborated,in order to make their structures fully explicit. The secondis the evaluation phase, where concept similarity is actuallycomputed.

Phase 1—Expanding the ontologyIn this phase, the goal is to analyze the concept definitions tobuild two graphs. The first is the inheritance graph (indeed,a directed acyclic graph (DAG) if it is correctly defined),which is built by starting from the Broader section of theconcept definitions, that organizes the ontology conceptsaccording to a generalization hierarchy. In this phase, theinheritance process is performed, therefore the structuralsections of the concept definitions (i.e. Part, Related andPredicate) are augmented with the concept labels inheritedfrom more general concepts (for a full treatment of structuralinheritance, please refer to [14]). This operation is referredto as ‘expansion’.

The second is the similarity graph, built starting from theSimilar slot, where nodes are concepts and arcs are labeledwith their tentative similarity degree. Since similarity enjoys

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 3: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

CONCEPT SIMILARITY IN SymOntos 585

reflexive, symmetric and transitive properties, the similaritygraph is obtained by starting from the original definitions(referred to as the signature for similarity) and operating thereflexive, symmetric and transitive closures.

The output of this phase is an expanded (i.e. all thedefinitions have been expanded exploiting inheritance) setof concepts and two graphs: an inheritance DAG and asimilarity graph.

Phase 2—Deriving concept similarityStarting from the ontology transformed according to theprevious phase, concept similarity is evaluated by usingthe expanded structure. In our approach we consider fournotions of similarity. The first is the tentative similarity(tsim) declared in the concept definition. Then, we have thefollowing.

• Flat structural similarity (fss). This is computed byanalyzing the three structural slots (Pa,R, P r) andevaluating the similarity of every concept referred totherein.

• Hierarchical structural similarity (hss). This sortof similarity only pertains to concept pairs that arehierarchically related. It is computed starting fromthe flat structural similarity (fss) defined above, bytaking into consideration a further element related tothe hierarchical relationship. In particular, a factoris introduced that represents the probability for aninstance of the more general concept to also be aninstance of the specialized concept.

• Concept similarity (csim). This is the final figurethat is produced by combining the structural similarity(either flat or hierarchical, depending on the case) andthe tentative similarity supplied in the original conceptdefinition.

The rest of this paper is organized as follows. In Section 2the preliminary definitions of the SymOntos concept andontology, with the related notions of structural forms,are given. This allows us to formally address, inSection 3, Phase 1 by defining the structures (essentially,the ontology in expanded form, the inheritance DAG, andthe similarity graph) on which the similarity evaluationanalysis is performed. In Section 4 the actual method isdescribed, as are the steps of Phase 2 that allow the threementioned kinds of similarities (fss, hss, csim) to be derived.Section 5 contains related work and Section 6 mentions ourconclusions and future lines of research.

2. FORMAL BASIS

In SymOntos, the fundamental modeling notion is that ofa concept, specified by a concept expression. A conceptexpression has a left-hand side, which is the identifyinglabel of the concept (essentially, its name), and a right-handside, the concept definition, which specifies the structure ofthe concept. For instance, Hotel and Customer are conceptlabels. We now formally introduce the notion of a SymOntosconcept.

DEFINITION 2.1. (SymOntos concept) A SymOntosconcept (concept for short) is a concept expression

c := (d, k, B, Pa,R, P r, S)

where the left-hand side, i.e. c, is a label that uniquelyidentifies the concept, whereas the right-hand side, theconcept definition, is a 7-tuple defined as follows:

• d is a string expressing the description (i.e. the intuitivemeaning) of the concept name, in natural language;

• k is the kind of the concept (i.e. its category, such asActor, Object or Process);

• B is the set of names of the Broader concepts of c,i.e. labels denoting generalizations of c;

• Pa is the set of names of the concepts that representcomponents (Part) of c;

• R is the set of names of the concepts that are somehowRelated to c;

• Pr is the set of names of the concepts that are inthe Predicate relation with c, i.e. those denoting theattributes of the concept being defined;

• S is the set of pairs 〈b, tsim〉, where b is the name ofa concept that is Similar to c and tsim is a decimalnumber in the interval [0.0, . . . , 1.0] standing for thetentative similarity degree.

Notice that, in the cases where confusion may arise, thecomponents of the 7-tuple and the similarity degree willbe properly indexed with the names of the related concept.For instance, the k element will be marked as kc and thesimilarity degree between the concepts c and b will beindicated as tsimc,b.

We now present the notion of a SymOntos ontology.

DEFINITION 2.2. (SymOntos ontology) A SymOntosontology (ontology for short) O is a set of interrelatedSymOntos concepts. In particular, if TO is the set of all theconcept labels appearing in O , then it is partitioned by thesets NO and WO , i.e.

TO = NO ∪ WO

NO ∩ WO = ∅

where NO is the set of concept labels that are left-handsides of concept expressions in O and WO is the set of allthe remaining terms appearing in O that are referred to asknown words of the ontology.

Known words represent ‘boundary concepts’ that areintentionally left undefined, i.e. they denote concepts that donot belong to the application domain that is modeled, but areused in some definitions.

A concept label is referred to as a reference when it isdefined in the right-hand side of a concept expression, thatis, it is used in a concept definition.

EXAMPLE 2.1. Consider the concept GuestHouse previ-ously defined, together with the following two concepts:

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 4: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

586 A. FORMICA AND M. MISSIKOFF

• Accommodation := (

d = “A place where at least sleeping and sanitary

facilities are provided”,

k = Object,

B = {},Pa = {Room},R = {Country},P r = {NofRooms},S = {〈Hotel, 0.8〉})

• RuralHouse := (

d = “GuestHouse in the countryside”,

k = Object,

B = {GuestHouse},Pa = {Court},R = {RusticLand},P r = {NofRecrServ},S = {})

This is a very simple example of ontology where,for instance, Customer (in the concept definition ofGuestHouse) and Court (in the concept definition ofRuralHouse) are known words, i.e. they are not left-hand sides of any concept expressions, and Hotel, in theAccommodation concept definition, is a reference.

Indeed, we are not interested in any ontology, ratherin the ontologies that satisfy some formal properties, alsoreferred to as correct ontologies. Such a notion, whichwill be formally introduced in Section 3, is based on someproperties defined over the structure of the concepts and, inparticular, on the mutual references that the concepts of theontology exhibit.

To this end, we start by addressing the notion of astructural form of a concept, that consists of components(Pa), associations (R) and attributes (Pr) of a conceptdefinition. Since we focus on structural similarity, such anotion is fundamental in computing concept similarity, asdefined in Section 4.

DEFINITION 2.3. (Structural form of a concept)The structural form of a concept c is the concept expressionwhose name is indicated as c− and whose definition is givenby the three structural slots Pa, R and Pr of c, i.e.

c− := (Pa,R, P r).

EXAMPLE 2.2. The structural form of the GuestHouseconcept of Example 1.1 is defined as follows:

GuestHouse− := (

Pa = {DiningRoom},R = {Customer, Breakfast},P r = {Price})

In the following, given an ontology O , the set of structuralforms of the concepts defined in O will be denoted as O−.

The slots of a concept definition that are not present inthe structural form, i.e. B and S, are used to define thesignature for inheritance and the signature for similarity ofthe structural form of an ontology, respectively, as definedbelow. Notice that the notion of a signature for inheritancewas originally introduced in [15]. In this paper, such a notionwill be used in accordance with [14], where it represents theDirectDesc relation (i.e. the relation between a concept andits immediate specializations).

DEFINITION 2.4. (Structural form of an ontology)Given an ontology O , let O be the triple (O−, �O , �O ),where O− is the set of structural forms of the concepts in O

and �O , �O are two relations defined as follows:

• �O is a set of ordered pairs defined according to theinheritance hierarchy (the sets of broader concepts)of O:

�O = {〈d, c〉 | c, d ∈ TO and d ∈ Bc};• �O is a set of ordered triples defined according to the

sets of similar concepts of O:

�O = {〈c, d, tsim〉 | c, d ∈ TO and 〈d, tsim〉 ∈ Sc}.Then, O = (O−,�O,�O) is the structural form of theontology O , where �O and �O are referred to as thesignature for inheritance and signature for similarity of O,respectively.

EXAMPLE 2.3. Suppose we enrich the ontology givenby the concepts of Examples 1.1 and 2.1 with the furtherconcepts:

• FarmHouse := (

d = “GuestHouse located on an operating farm”,

k = Object,

B = {GuestHouse},Pa = {Dairy},R = {Countryside, Milk, Cheese},P r = {NofAnimals},S = {})

• Hotel := (

d = “Establishment with reception, services and

additional facilities where accommodation and

in most cases meals are provided”,

k = Object,

B = {Accommodation},Pa = {Restaurant},R = {Tourist, CreditCard},P r = {Cost, NofCreditCards},S = {})

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 5: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

CONCEPT SIMILARITY IN SymOntos 587

Accommodation

GuestHouse Hotel

GrandHotelRuralHouse FarmHouse

FIGURE 1. Signature for inheritance of Example 2.3.

Accommodation

GuestHouse Hotel

GrandHotel

0.8

0.9

0.7

FIGURE 2. Signature for similarity of Example 2.3.

• GrandHotel := (

d = “Hotel where accommodation is provided in

rooms or suites”,

k = Object,

B = {Hotel},Pa = {Suite, SwimmingPool},R = {Limousine, Airline},P r = {NofSuites, LimoService},S = {〈Hotel, 0.9〉}).

The structural form of this ontology is given by thestructural forms of the concepts defined within it andthe signatures for inheritance and similarity graphicallyrepresented in Figures 1 and 2, respectively. Noticethat the evaluation of the similarity degrees between, forinstance, GuestHouse and GrandHotel, or GrandHotel andAccommodation, will be addressed in the next section.

3. CORRECT ONTOLOGY

In this section the conditions that an ontology has to satisfyto be correct are presented. As we will see, the notion of acorrect ontology concerns the signatures for inheritance andsimilarity of such an ontology. In order to address the formalproperties regarding the signature for similarity, in thefollowing subsection some definitions about the similarityof terms (i.e. tentative similarity between concept labels) arefirst introduced.

3.1. Formal properties about similarity of terms

In this subsection a few definitions concerning the similarityamong a general set of terms T (i.e. concept labels, definedor undefined) are given.

In the following, for the sake of simplicity, given twoconcept names ci , cj , the tentative similarity degree tsimci ,cj

will be indicated as tsimi,j .

DEFINITION 3.1. (Tentative similarity) Given a set ofterms T , the tentative similarity (similarity for short) is arelation on T ×T ×[0.0, . . . , 1.0] where, if 〈ci , cj , tsimi,j 〉 ∈T ×T ×[0.0, . . . , 1.0], the decimal number tsimi,j is referredto as the tentative similarity degree.

We now give the notions of reflexive, symmetric andtransitive similarity, together with their closures.

DEFINITION 3.2. (Reflexive similarity) A similarity S onT × T × [0.0, . . . , 1.0] is reflexive if and only if

∀ci ∈ T ⇒ 〈ci , ci , tsimi,i〉 ∈ T and tsimi,i = 1.0.

Furthermore, given two similarity relations S and R onT × T × [0.0, . . . , 1.0], S is the reflexive similarity closureof R if and only if S is the smallest subset of T × T ×[0.0, . . . , 1.0] such that S contains R and S is reflexive.

Of course, S is obtained from R by adding all the elements〈ci , ci, 1.0〉, for all ci ∈ T .

DEFINITION 3.3. (Symmetric similarity) A similarity S

on T × T × [0.0, . . . , 1.0] is symmetric if and only if

∀〈ci , cj , tsimi,j 〉 ∈ S ⇒ 〈cj , ci , tsimj,i〉 ∈ S

and tsimi,j = tsimj,i .

Furthermore, given two similarity relations S and R onT ×T ×[0.0, . . . , 1.0], S is the symmetric similarity closureof R if and only if S is the smallest subset of T × T ×[0.0, . . . , 1.0] such that S contains R and S is symmetric.

Of course, S is obtained from R by adding all the elements〈cj , ci , tsimi,j 〉, for all 〈ci , cj , tsimi,j 〉 in R.

DEFINITION 3.4. (Transitive similarity) A similarity S onT × T × [0.0, . . . , 1.0] is transitive if and only if

∀〈ci , cj , tsimi,j 〉, 〈cj , ch, tsimj,h〉 ∈ S

⇒ 〈ci , ch, tsimi,h〉 ∈ S

where tsimi,h is a value depending on tsimi,j and tsimj,h, i.e.

tsimi,h = f (tsimi,j , tsimj,h)

such that tsimi,h ≤ tsimi,j , tsimj,h.Furthermore, given two similarity relations S and R on

T × T × [0.0, . . . , 1.0], S is the transitive similarity closureof R if and only if S is the smallest subset of T × T ×[0.0, . . . , 1.0] such that S contains R and S is transitive.

Of course, S is obtained from R by adding all the elements〈ci , ch, tsimi,h〉, for all 〈ci, cj , tsimi,j 〉, 〈cj , ch, tsimj,h〉 in R.

Notice that the above definition has been conceived inorder to give maximum flexibility to the method. In fact,the f function can be defined by the user according to thespecific application domain addressed. For instance, in thispaper, we assume that

f (tsimi,j , tsimj,h) = tsimi,j ∗ tsimj,h

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 6: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

588 A. FORMICA AND M. MISSIKOFF

but more sophisticated choices are compatible with themethod, e.g. ‘fuzzy functions’.

EXAMPLE 3.1. In our example, if we assume that T =TO , let R be the similarity represented in Figure 2, i.e.

〈GuestHouse, Hotel, 0.7〉,〈GrandHotel, Hotel, 0.9〉,〈Accommodation, Hotel, 0.8〉.

By adding to it the following triples,

〈GuestHouse, GuestHouse, 1.0〉,〈GrandHotel, GrandHotel, 1.0〉,〈FarmHouse, FarmHouse, 1.0〉,〈RuralHouse, RuralHouse, 1.0〉,. . . (for all the concept names defined in TO),

we have the reflexive similarity closure of R. Furthermore,by adding

〈Hotel, GuestHouse, 0.7〉,〈Hotel, GrandHotel, 0.9〉,〈Hotel, Accommodation, 0.8〉,

we get the symmetric similarity closure of R.Notice that the transitive similarity closure of R is R itself,

since it is not possible to derive triples by transitivity in it.However, if the transitive similarity closure is applied to thesymmetric similarity closure of R, it is possible to derive

〈GuestHouse, Accommodation, 0.56〉,〈GrandHotel, Accommodation, 0.72〉,〈GuestHouse, GrandHotel, 0.63〉.

Therefore, in order to obtain all possible triples that can bederived by transitivity, the symmetric similarity closure willbe applied first. This is illustrated in the next subsection.

3.2. Inheritance DAG and similarity graph

As already mentioned, the formal definition of a correctontology is related to some formal properties that thesignatures for inheritance and similarity of the ontology haveto satisfy. To this end, the notions of inheritance DAG andsimilarity graph are first introduced.

DEFINITION 3.5. (Inheritance DAG) Given an ontol-ogy O , consider its structural form O = (O−,�O,�O).Let �O be the transitive closure of �O . Consider thefollowing conditions:

(1) �O is antireflexive;(2) �O is antisymmetric;(3) ∀〈c, d〉 ∈ �O ⇒ kc = kd , i.e. the concepts have the

same kind.

Then, if all the above conditions are fulfilled, �O is referredto as the inheritance DAG of O.

In fact, it is well known that the inheritance hierarchy of aset of concepts must be free of cycles and, in particular, theinheritance relation has to be antireflexive, antisymmetric,

Accommodation

GuestHouse Hotel

GrandHotel

0.56

0.72

0.8

0.7

0.90.63

FIGURE 3. The similarity subgraph of Example 2.3.

and transitive, i.e. (TO,�O) must be a strict partially orderedset (POSET) [16].

For instance, the transitive closure of the signature forinheritance represented in Figure 1 fulfills all the threeconditions given in the previous definition. Therefore,it is the inheritance DAG of the ontology described inExample 2.3.

DEFINITION 3.6. (Similarity graph) Given an ontol-ogy O , consider its structural form O = (O−,�O,�O).According to Definition 2.4, �O is a similarity on TO ×TO × [0.0, . . . , 1.0]. Let �O be the transitive closure ofthe reflexive and symmetric closures of �O . Consider thefollowing conditions:

(1) ∀〈c, d, tsimc,d〉 ∈ �O ⇒ kc = kd , i.e. the conceptshave the same kind;

(2) ∀c, d ∈ TO, 〈c, d, asc,d〉 ∈ �O, where asc,d is definedas follows:

— asc,d = tsimc,d if tsimc,d is the similarity degreedefined in �O and it is unique;

— asc,d = {tsimic,d }Choice in the presence of multiple

(transitively derived) similarity degrees definedin �O;

— asc,d = 0.0 otherwise.

If all the above conditions are fulfilled, �O is referred toas the similarity graph of O. In particular, asc,d will bereferred to as the axiomatic similarity degree of the conceptsc,d .

For instance, the transitive similarity closure of thesymmetric similarity closure of the signature for similarityrepresented in Figure 2 is shown in Figure 3.

Notice that in this case, if we consider the symmetricclosure of Figure 2, for each pair of concept namesthe similarity degrees derived by transitivity are unique.Therefore, by extending the graph of Figure 3 withreflexivity and the following triples

〈GuestHouse, FarmHouse, 0.0〉〈RuralHouse, GrandHotel, 0.0〉. . . (for all pairs not involved in any similarity),

we have the similarity graph of the ontology described inExample 2.3.

Finally, we have the notion of a correct SymOntosontology.

DEFINITION 3.7. (Correct ontology) Given an ontol-ogy O , consider its structural form O = (O−,�O,�O ).

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 7: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

CONCEPT SIMILARITY IN SymOntos 589

Then, the ontology O is correct iff �O is the inheritanceDAG and �O is the similarity graph of O.

If we extend the similarity subgraph of Figure 3 asmentioned above, the ontology of Example 2.3 is correct.

3.3. Concept inheritance

As already mentioned in the introduction, the goal of thepaper is the definition of a method that allows similarityamong ontology concepts to be evaluated on the basis of theconcept definitions. In order to perform this evaluation, the‘expansion’ step must be performed. Such a step concernsthe inheritance of the concept definitions, a problemwidely investigated in the literature, see for instance [14].The inheritance process is a necessary step for the evaluationof structural similarity, since in the structural form of aconcept all the concept labels declared in the slots of itsancestors, in the inheritance DAG of the ontology, must bepresent. The inheritance process is performed by applyingto the ontology concepts the Expand function that will beillustrated in the next subsection. Such a function is arevisitation of the Expand function defined in [14], modifiedin order to deal with the richer knowledge model used inOPAL to construct concept expressions.

Below, given a correct ontology, the notion of Ancestorsof a concept is introduced. Such a notion allows all theconcept names that are generalizations of a given concept,in the inheritance DAG, to be identified.

DEFINITION 3.8. (The Ancestors function) Consider thestructural form O = (O−,�O,�O ) of a correct ontologywith a non-empty inheritance DAG �O. Then, the Ancestors(A) function is defined as

A : TO → ℘(TO),

and, given a concept name c ∈ TO ,

A(c) = {d ∈ TO | 〈c, d〉 ∈ �O}.Notice that, for any c ∈ TO , the set A(c) is always finite

since TO is finite and �O is a DAG. For instance, in ourexample, we have

A(RuralHouse) = {GuestHouse, Accommodation}.In order to evaluate the similarity among concepts, we

have to expand the concept definitions by inheriting allthe concept names that are related to the ancestors of theconcepts, in the inheritance hierarchy. To this end, theExpand function is presented below. Such a function,essentially, returns a concept whose structural componentsare defined as the union of the corresponding components ofthe ancestor concepts.

DEFINITION 3.9. (The Expand function) Consider thestructural form O = (O−,�O,�O ) of a correct ontologywith a non-empty signature for inheritance. Let Ce be the setof all possible concept expressions. Then, the Expand (E)function is defined as

E : NO → Ce,

and, given a concept name c ∈ NO ,

E(c) = c′ := (R′c, Pa′

c, P r ′c)

where

R′c =

⋃g∈A(c)

Rg ∪ Rc,

Pa′c =

⋃d∈A(c)

Pad ∪ Pac,

P r ′c =

⋃e∈A(c)

P re ∪ Prc.

Notice that, in the above definition, for a known wordw ∈ WO , the sets Rw , Paw and Prw are assumed tobe empty since they are not concept names known to theontology.

By using the E function, we are able to present the notionof the expanded form of an ontology. Such a form allowsus to present, in the next sections, the method for similarityevaluations.

DEFINITION 3.10. (Expanded form of an ontology)Given an ontology in structural form O = (O−,�O,�O ),let O ′ be defined as follows

O ′ =⋃

ci∈NO

E(ci)

where E is the Expand function defined above. Then, thetriple

O′ = (O ′,�O,�O)

is the expanded form of the ontology O.

In essence, the expanded form is composed of twosignatures (for inheritance and similarity) and the set ofconcepts in expanded structural form.

EXAMPLE 3.2. Consider again the Examples 2.1 and 2.2.The Expand function applied to the concepts GuestHouseand RuralHouse returns:

• GuestHouse′ := (

Pa = {Room, DiningRoom},R = {Country, Customer, Breakfast},P r = {NofRooms, Price})

• RuralHouse′ := (

Pa = {Room, DiningRoom, Court},R = {Country, Customer, Breakfast, RusticLand},P r = {NofRooms, Price, NofRecrServ})

In the following subsections, the three notions ofstructural similarity are addressed. In all three cases, theyare defined for concepts of a correct ontology in expandedform.

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 8: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

590 A. FORMICA AND M. MISSIKOFF

4. DERIVING CONCEPT SIMILARITY

As pointed out in the introduction, the goal of the paper is thedefinition of a method that allows similarity among ontologyconcepts to be derived on the basis of their definitions.In this approach, the following three kinds of similarityevaluation are proposed, depending on the definitions ofconcepts to be compared:

• flat structural similarity degree, for concepts that arenot hierarchically related;

• hierarchical structural similarity degree, for conceptsthat are hierarchically related;

• concept similarity degree, which represents the finalconcept similarity evaluation, obtained by composingthe tentative (axiomatic) similarity and the derivedsimilarity, flat or hierarchical.

We will show that the axiomatic similarity (as) degree,introduced with the similarity graph (see Definition 3.6),plays a fundamental role in all three kinds of evaluations,not only in the last one.

4.1. Flat structural similarity degree

The flat structural similarity (fss) degree is computed onthe basis of the expanded structural forms of the conceptsand the axiomatic similarity degree defined accordingto the similarity graph. The method presented in thissubsection has been inspired to the maximum weightedmatching problem in bipartite graphs, which can be solved inpolynomial time [17]. Informally, it is illustrated as follows.

Consider two concepts whose names are ci and cj , andone of the three slots of their structural form, for instancePart (Pa). Then:

• consider the Cartesian product Paci × Pacj ;• within the above set, consider all the sets of pairs such

that no two pairs in the set share an element. Suchsets will be referred to as candidate sets of pairs; forinstance, assume that Paci and Pacj represent a setof boys and a set of girls, respectively, a candidateset of pairs defines a possible set of marriages (whenpolygamy is not allowed) [17];

• for each candidate set of pairs, consider the sum of theaxiomatic similarity degrees of the concept pairs in it;

• the candidate set having the maximal among all thecomputed sums is chosen.

Therefore, for each slot, elements of ci are paired withelements of cj in order to give the maximal sum. The fss ofthe concepts ci ,cj is then computed starting from the threemaximal values determined for each of the slots Pa, R andPr , up to a normalization factor.

DEFINITION 4.1. (The set CR of candidate sets of pairs)Consider two concepts ci , cj of a correct ontology and let Rbe one of the three concept slots Pa (Part), R (Related) orPr (Predicate). Let nR, mR be the cardinalities of the setsRci ,Rcj , respectively, i.e. nR = |Rci |, mR = |Rcj |, andsuppose that nR ≤ mR.

Then, the set CR(ci, cj ) of candidate sets of pairs isdefined by all possible sets of nR pairs of concept namesdefined as follows:

CR(ci, cj ) = {{〈a1, b1〉 . . . 〈anR, bnR 〉} | ah ∈ Rci ,

bh ∈ Rcj , ∀h = 1 . . . nR, and

ah �= ak, bh �= bl,∀k, l �= h}.

The definition of fss between concepts of a given ontologynow follows.

DEFINITION 4.2. (Flat structural similarity (fss)) Con-sider a correct ontology in expanded form, O′ =(O ′,�O,�O).

Then, the flat structural similarity (fss) of two conceptswhose names are ci , cj ∈ NO is defined as follows:

fss(ci, cj ) =∑R∈S

[wRmR

maxP∈CR(ci ,cj )

( ∑〈a,b〉∈P

as(a, b)

)]

where S = {Pa,R, P r} (i.e. R stands for one of the threeconcept slots defining the structural form of a concept),CR(ci, cj ) and mR are defined as in the previous definitionand as(a, b) is the axiomatic similarity degree of the conceptnames a, b, as defined according to the similarity graph �O .Furthermore, wR is a weight such that

∑R∈S

wR ≤ 1.

Notice that fss(ci, cj ) is always a value between zero andone and, given two concepts ci , cj , fss(ci, cj ) = fss(cj , ci).

EXAMPLE 4.1. In order to provide a more complexexample, suppose that the signature for similarity ofExample 2.3 has been extended as shown in Figure 4.Consider the expanded concepts of Example 3.2, togetherwith the following ones:

• FarmHouse′ := (

Pa = {Room, DiningRoom, Dairy},R = {Country, Customer, Breakfast, Countryside,

Milk, Cheese},P r = {NofRooms, Price, NofAnimals})

• GrandHotel′ := (

Pa = {Room, Restaurant, Suite, SwimmingPool},R = {Country, Tourist, CreditCard, Limousine,

Airline},P r = {NofRooms, Cost, NofCreditCards, NofSuites,

LimoService})

Furthermore, for the sake of simplicity, assume that, for anyR, wR = 1

3 . According to Definition 4.2, the following

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 9: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

CONCEPT SIMILARITY IN SymOntos 591

Customer Tourist

Suite

DiningRoom

RestaurantRoom

Countriside

Price

Cost

Accommodation

GuestHouse Hotel

GrandHotel

0.80.4

0.4

0.7

0.7

0.9

0.9

0.9

0.8

RusticLand

FIGURE 4. Extended signature for similarity of Example 2.3.

hold:

fss(RuralHouse,FarmHouse) = 1

3

(2

3+ 3.7

6+ 2

3

)

= 0.64

fss(RuralHouse,GrandHotel) = 1

3

(1.8

4+ 1.9

5+ 1.9

5

)

= 0.40

fss(FarmHouse,GrandHotel) = 1

3

(1.8

4+ 1.9

6+ 1.9

5

)

= 0.38,

where, for instance, for the concepts RuralHouse andFarmHouse the candidate sets of pairs with maximal sumare as follows:

{〈Room,Room〉, 〈DiningRoom,DiningRoom〉,〈Court,Dairy〉}∈ CPa(RuralHouse,FarmHouse)

{〈Country,Country〉, 〈Customer,Customer〉,〈Breakfast,Breakfast〉, 〈Countryside,RusticLand〉}∈ CR(RuralHouse,FarmHouse)

{〈NofRooms,NofRooms〉, 〈Price,Price〉,〈NofAnimals,NofRecreServ〉}∈ CPr(RuralHouse,FarmHouse).

Intuitively, in order to obtain the maximal sum, it isreasonable to pair the same concept names, leaving theremaining ones to match each other. For instance, inthe case of Related (R), RusticLand has been paired withCountryside rather than Milk or Cheese, since the axiomaticsimilarity between them is 0.7 rather than 0.0. In thecase of Predicate (Pr), NofRecrServ has been paired with

NofAnimals since, although their axiomatic similarity is 0.0,the sum of the axiomatic similarity degrees obtained fromthe other two pairs is maximal.

4.2. Hierarchical structural similarity degree

The hierarchical structural similarity (hss) degree iscomputed for concepts that are hierarchically related.The hss is essentially defined as the fss increased by avalue defined according to the inheritance DAG of theontology. In particular, such a value is computed underspecific assumptions that are related to the extentional notionof inheritance, i.e. the distribution of concept instances alongthe hierarchy. This proposal has been formulated under thefollowing assumptions. In the inheritance DAG:

• the concepts are organized according to a specializationas partition: in the hierarchy, the instances populatethe leaves of the DAG and the population of anintermediate node is the union of the populations of thechildren (recursively);

• for any concept, the distribution of the instances amongthe specialized concepts is uniform, i.e. the children areequally populated.

Such assumptions can be easily relaxed by introducingappropriate coefficients that take into account the actual dis-tribution of instances of the different concepts. Very often,especially in e-business, an ontology is related to a databaseand, therefore, distribution coefficients can be obtainedby means of simple data mining operations (a furtherelaboration on this point is beyond the scope of this paper).Then, the corrector we propose in order to compute thestructural similarity of two hierarchically related concepts isgiven by the specialization probability defined below. It is,essentially, the probability for an instance of a more general

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 10: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

592 A. FORMICA AND M. MISSIKOFF

concept to be an instance of one of its specialized concepts,under the assumptions above.

DEFINITION 4.3. (Specialization probability) Consideran inheritance DAG and two concepts ci ,cj hierarchicallyrelated. Let (c1, . . . , cn) be the path connecting suchconcepts, where c1 = ci , which we assume to be moregeneral than cj , and cn = cj . Then, if gh is the outdegree ofthe concept ch in the inheritance DAG, for h = 1, . . . , n−1,the specialization probability, say p(ci , cj ), is defined asfollows:

p(ci , cj ) =∏h

1

gh

.

Then, the hss can be defined as follows.

DEFINITION 4.4. (Hierarchical structural similarity (hss))Consider a correct ontology in expanded form O′ =(O ′,�O,�O) and two concepts ci, cj ∈ NO that arehierarchically related (i.e. connected by a path) in thesignature for inheritance �O . The hierarchical structuralsimilarity (hss) of ci, cj is then defined starting from the flatstructural similarity fss(ci, cj ) as follows:

hss(ci, cj ) = fss(ci, cj ) + (1 − fss(ci, cj )) ∗ p(ci , cj )

where p(ci , cj ) is the specialization probability as definedabove.

EXAMPLE 4.2. For instance, consider the hierarchicallyrelated concepts RuralHouse and GuestHouse. Their flatstructural similarity degree is

fss(RuralHouse,GuestHouse) = 13 ( 2

3 + 34 + 2

3 ) = 0.69.

Now, since the outdegree of the node labeled withGuestHouse in the inheritance DAG is two, then p(ci , cj )

= 12 . Therefore, the hierarchical structural similarity degree

between RuralHouse and GuestHouse is

hss(RuralHouse,GuestHouse) = 0.69 + 1 − 0.69

2= 0.84.

4.3. Concept similarity degree

After the introduction of the fss and the hss degrees, weare able to define the concept similarity (csim) degree. It isessentially given by the average of the axiomatic similaritydegree as and the hss or fss degrees, if the concepts arehierarchically related or not, respectively.

DEFINITION 4.5. (Concept similarity (csim)) Consider acorrect ontology in expanded form O′ = (O ′,�O,�O) andtwo concepts ci, cj ∈ NO . Then, the concept similarity(csim) of ci, cj is defined as follows. Assume that:

ss(ci, cj ) =

f ss(ci, cj ), if ci, cj are not hierarchically

related,

hss(ci, cj ), otherwise.

Then

csim(ci, cj ) = ss(ci, cj ) + as(ci, cj )

2

where as(ci, cj ) is the axiomatic similarity degree of theconcepts ci, cj .

EXAMPLE 4.3. In our example, consider the conceptsGuestHouse and GrandHotel that are not related in theinheritance DAG, but are related in the similarity graph withnon-null axiomatic similarity degree. Then

csim(GuestHouse,GrandHotel) = 0.40 + 0.63

2= 0.51,

since

fss(GuestHouse,GrandHotel) = 1

3

(1.8

4+ 1.9

5+ 1.9

5

)

= 0.40.

In the case of RuralHouse and FarmHouse we have twoconcepts that are again not related in the inheritanceDAG, but this time with null axiomatic similarity degree.Therefore,

csim(RuralHouse,FarmHouse) = 0.64 + 0.0

2= 0.32.

Consider now the concept Hotel, whose expanded form is

Hotel′ := (

Pa = {Room, Restaurant},R = {Country, Tourist, CreditCard},P r = {NofRooms, Cost, NofCreditCards})

Then, consider Accommodation that is hierarchically relatedto Hotel, with non-null axiomatic similarity degree.The following holds:

csim(Hotel,Accommodation) = 0.69 + 0.80

2= 0.75,

since

fss(Hotel,Accommodation) = 13 ( 1

2 + 13 + 1

3 ) = 0.38

hss(Hotel,Accommodation) = 0.38 + (1 − 0.38) 12 = 0.69.

Finally, as an example of hierarchically related conceptswith null axiomatic similarity degree, consider RuralHouseand GuestHouse, for which the following holds

csim(RuralHouse,GuestHouse) = 0.84 + 0.0

2= 0.42.

5. RELATED WORK

Similarity has been tackled in different fields of computerscience and a number of significant results are available.Other disciplines, such as linguistics and cognitive psychol-ogy, have addressed the same problem producing interestingresults, but with a limited impact for us, due to the com-pletely different methodological ground [18]. The methodproposed in this paper is the result of the analysis of

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 11: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

CONCEPT SIMILARITY IN SymOntos 593

different solutions that are present in the literature, andour aim is to overcome a number of limitations that wefound therein. It must be noted that the large majorityof existing results have not been conceived in e-commerceand business-to-business interoperability contexts, but ratherin data integration for distributed query processing and/ordata warehousing [19, 20, 21, 22]. Therefore, what wehave perceived as a limitation for our aim may be valid fordifferent applications.

The first difference of the proposed approach with respectto existing results is the way we treat similarity betweenhierarchically related concepts. For instance, in [19, 23]a constant value (specifically 0.5) is associated with any

pair of hierarchically related concepts. In our opinion,a constant value does not properly reflect the level ofspecialization and, on the contrary, it is important to evaluatethis coefficient by considering the degree of refinement ofthe specialized concept: the greater is the refinement, thehigher is the distance between the concepts. Therefore,by introducing the notion of hss, we take into account theprobability of an instance of a general concept to also be aninstance of a specialization (e.g. the probability that a vehicleis a car, in a given application domain). We believe thatthis method produces better results than merely associatinga constant factor to any pair of hierarchically relatedconcepts. For instance, instead of axiomatically assigning0.5 to the pair of the concepts (Hotel,Accommodation) ofExamples 2.1 and 2.3, three different values can be derivedaccording to the proposed approach:

(i) the first value, fss(Hotel,Accommodation) = 0.38,takes into account the structures of the concepts and,in particular, the fact that Hotel has, for each slot, a fewof the concepts that are not present in the correspondingslots of Accommodation;

(ii) the second value, hss(Hotel,Accommodation) = 0.69,is obtained by considering the hierarchy of Figure 1and, in particular, the outdegree of Accommodation;

(iii) finally, the average of the previous value with thesimilarity degree axiomatically given in Figure 4(in this case 0.8) leads to the final result:

csim(Hotel,Accommodation) = 0.75.

Similarity among hierarchically related concepts hasbeen investigated within Semantic Nets and logic-basedKnowledge Representation. In [24], where a metric on thepower set of nodes in a semantic net has been proposed,the conceptual distance of concepts that are hierarchicallyrelated has been defined by considering the length of theshortest path connecting them. Furthermore, in [25] theSemantic-Distance Metric (SDM) has been defined, which isbased on weighted paths. In particular, in that paper conceptsare connected by hyperonym/hyponym and synonym links.With respect to [24], in this paper the hss allows a morerefined similarity evaluation that takes into account not onlythe distance but also the outdegrees of the concepts in theinheritance hierarchy. With respect to [25], in this work notonly have synonyms been considered, but so have concepts

with similarity degrees strictly less than one. Furthermore,according to the fss, in our proposal structural links havealso been addressed, such as the ones related to attributes orcomponents.

The second main difference of our proposal with respectto the existing literature is the partitioning of the structuraldefinition of a concept into different slots—essentially,attributes (Pr), parts (Pa) and related (R) concepts—comparing therefore only elements of concept definitionsthat belong to the same partitions. Conversely, the majorityof methods found in the literature consider one kind of slotonly, namely property names (i.e. attributes). In particular,in our approach these three slots are addressed separately(since the relationship of a car with the attribute coloris inherently different from its relationship with a garagewhere it is repaired).

In [26], a richer set of distinguishing characteristics hasbeen proposed, which includes both the intentional (classes)and extentional (tokens) levels. However, there are a numberof limitations, such as the necessity that two concepts are atthe same ISA level to be compared.

On a more technical ground, we did not adopt thepopular Dice’s function [27], as was done for instance in[19, 28], which allows concept similarity to be evaluatedon the basis of the number of similar concept componentsdivided by the total number of concept components ofthe two concepts, without explicitly considering in thecomputation their similarity degree. Analogously, in [29]semantic relatedness (similarity) evaluation is based on theaggregation of the interconnections between concepts, thatis, the more properties two concepts have in common, themore closely related they are.

Finally, it is worth mentioning that the fss evaluationbetween concepts defined in this paper can be seen as a formof co-occurrence strategy as defined in [18], for which aSymOntos concept is a context and similarity is establishedon the basis of the amount of overlap of the contexts.Furthermore, in [30], general forms of distance metrics forthe computation of similarity measures have been defined,although with more emphasis on the evaluation of similaritybetween instances, rather than concepts.

6. CONCLUSION

In this paper a method for the evaluation of conceptsimilarity has been presented. The problem of conceptsimilarity is a complex one, therefore we have addressedit from a specific angle: that of structural similarity.Structural similarity, although being a partial view of amore general problem, represents an important issue inthe emerging applications of e-commerce. In fact, thestructural aspect of a concept determines the structureof data that commercial institutions exchange in doingbusiness. Another field where structural similarity is relevantis that of information integration in query processing ofheterogeneous data sources and data warehousing. Even ifwe consider the structural components of concepts only,

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from

Page 12: Concept Similarity in SymOntos: An Enterprise Ontology Management Tool

594 A. FORMICA AND M. MISSIKOFF

the problem appears quite complex. For this reason, inthis paper we have not elaborated on a number of tuningparameters, such as the specialization probability factor.

The similarity evaluation method proposed in this paperhas been included in the SymOntos system [4], developedwithin the European projects FETISH and Harmonise,aiming at the construction and maintenance of tourismontologies. The method will be used within various tasks,such as semantic data reconciliation and approximate queryprocessing.

REFERENCES

[1] Bergamaschi, S. and Sartori, C. (1992) On taxonomicreasoning in conceptual design. ACM Trans. Database Syst.,17, 385–422.

[2] Kasahara, K., Matsuzawa, K., Ishikawa, T. and Kawaoka, T.(1995) Viewpoint-based measurement of semantic similaritybetween words. In Proc. 5th Int. Workshop on ArtificialIntelligence and Statistics, Fort Lauderdale, FL, January 4–7,pp. 292–302.

[3] Uschold, M., King, M., Moralee, S. and Zorgios, Y. (1998)The enterprise ontology. Knowledge Eng. Rev., 13, 31–89.

[4] SymOntos: an enterprise ontology management system. IASI-CNR, www.symontos.org.

[5] Formica, A. and Missikoff, M. (2000) Design and Specifi-cation of an Integrated Knowledge Base. First Guidelines tothe Tourism Organisations for Enhancing their Interoperabil-ity in Doing Business by Using a Knowledge-Based SoftwareEnvironment. Deliverable D1.2 of the European Project IST13015—FETISH (Federated European Tourism InformationSystem Harmonization), IASI-CNR, Rome, Italy.

[6] Minsky, M. (1974) A Framework for Representing Know-ledge. Artificial Intelligence Memo 306, MIT AI Laboratory.

[7] Charniak, E. (1981) A common representation for problemsolving and language comprehension information. ArtificialIntell., 16, 225–255.

[8] Brachman, R. J. (1979) On the epistemological status ofsemantic networks. In Findler, N. V. (ed.), AssociativeNetworks—Representation and Use of Knowledge by Com-puters. Academic Press, New York.

[9] Khoshafian, S. and Abnous, R. (1990) Object-Orientation—Concepts, Languages, Databases, User Interfaces. Wiley,New York.

[10] Brachman, R. J. (1983) What IS-A is and isn’t: an analysis oftaxonomic links in semantic networks. IEEE Computer, 16,30–36.

[11] Kifer, M. and Lausen, G. (1989) F-Logic: a higher-order language for reasoning about objects, inheritance, andscheme. In Proc. ACM SIGMOD Int. Conf. on Managementof Data, Portland, OR, May 31–June 2, pp. 134–146.

[12] Comite Europeen de Normalisation (CEN) (2000) Tourismservices—Hotel and other types of tourism accommodation—Terminology. http://www.cenorm.be/

[13] Missikoff, M. and Wang, X. F. (2001) A group decisionsystem for collaborative ontology building. In Proc. Int. Conf.on Group Decision and Negotiation, La Rochelle, France,June 4–7, pp. 153–160.

[14] Beeri, C., Formica, A. and Missikoff, M. (1999) Inheritancehierarchy design in object-oriented databases. Data Know-ledge Eng., 30, 191–216.

[15] Ait-Kaci, H. and Podelski, A. (1993) Towards a meaning oflife. J. Logic Program., 16, 195–234.

[16] Horowitz, E. and Sahni, S. (1983) Fundamentals of ComputerAlgorithms. Computer Science Press, MD.

[17] Galil, Z. (1986) Efficient algorithms for finding maximummatching in graphs. ACM Comput. Surveys, 18, 23–38.

[18] Miller, G. A. and Charles, W. G. (1991) Contextual correlateof semantic similarity. Lang. Cognitive Process., 6, 1–28.

[19] Castano, S., De Antonellis, V., Fugini, M. G. andPernici, B. (1998) Conceptual schema analysis: techniquesand applications. ACM Trans. Database Syst., 23, 286–332.

[20] Jarke, M., Lenzerini, M. and Vassiliou, Y. (1999) Fundamen-tals of Data Warehouses. Springer, Berlin.

[21] Inmon, W. H. (1996) Building the Data Warehouses. Wiley,New York.

[22] Cohen, W. W. (2000) Data integration using similarity joinsand a word-based information representation language. ACMTrans. Inform. Syst., 18, 288–321.

[23] Damiani, E., Formica, A., Fugini, M. G., Missikoff, M.and Pizzicannella, R. (1997) Reusing analysis schemas inODB applications: a chart based approach. In Proc. 1st East-European Symp. on Advances in Databases and InformationSystems, St. Petersburg, Russia, September 2–5, pp. 406–415.Nevsky Dialect, Russia.

[24] Rada, R., Mili, H., Bicknell, E. and Blettner, M. (1989)Development and application of a metric on semantic nets.IEEE Trans. Syst., Man, Cybernetics, 19, 17–30.

[25] Bright, M., Hurson, A. and Pakzad, S. (1994) Automatedresolution of semantic heterogeneity in multidatabases. ACMTrans. Database Syst., 19, 212–253.

[26] Spanoudakis, G. and Constantopoulos, P. (1994) Similar-ity for analogical software reuse: a computational model.In Proc. 11th Eur. Conf. on Artificial Intelligence, Amster-dam, The Netherlands, August 8–12, pp. 18–22. Wiley, NewYork.

[27] Maarek, Y. S., Berry, D. M. and Kaiser, G. E. (1991) An infor-mation retrieval approach for automatically constructing soft-ware libraries. IEEE Trans. Software Eng., 17, 800–813.

[28] Bergamaschi, S., Castano, S., De Capitani di Vimercati, S.,Montanari, S. and Vicini, M. (1998) An intelligent approachto information integration. In Guarino, N. (ed.), FormalOntology in Information Systems, pp. 253–268. IOS Press,Amsterdam.

[29] Collins, A. and Loftus, E. (1975) A spreading activationtheory on semantic processing. Psychol. Rev., 82, 407–428.

[30] Bisson, G. (1992) Learning in FOL with a similarity measure.In Proc. 10th Natl Conference on Artificial Intelligence,San Jose, CA, July 12–16, pp. 82–87. The AAAI Press/TheMIT Press, CA.

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

at University of C

alifornia, San Francisco on Decem

ber 16, 2014http://com

jnl.oxfordjournals.org/D

ownloaded from