Transcript
Page 1: SVM-based ontology matching approach

International Journal of Automation and Computing 9(3), June 2012, 306-314

DOI: 10.1007/s11633-012-0649-x

SVM-based Ontology Matching Approach

Lei Liu1 Feng Yang1, 2 Peng Zhang1 Jing-Yi Wu1 Liang Hu1

1College of Computer Science and Technology, Jilin University, Changchun 130012, PRC2Department of Information, Jilin Teachers′ Institute of Engineering and Technology, Changchun 130052, PRC

Abstract: There are a lot of heterogeneous ontologies in semantic web, and the task of ontology mapping is to find their semanticrelationship. There are integrated methods that only simply combine the similarity values which are used in current multi-strategyontology mapping. The semantic information is not included in them and a lot of manual intervention is also needed, so it leads tothat some factual mapping relations are missed. Addressing this issue, the work presented in this paper puts forward an ontologymatching approach, which uses multi-strategy mapping technique to carry on similarity iterative computation and explores bothlinguistic and structural similarity. Our approach takes different similarities into one whole, as a similarity cube. By cutting operation,similarity vectors are obtained, which form the similarity space, and by this way, mapping discovery can be converted into binaryclassification. Support vector machine (SVM) has good generalization ability and can obtain best compromise between complexity ofmodel and learning capability when solving small samples and the nonlinear problem. Because of the said reason, we employ SVM inour approach. For making full use of the information of ontology, our implementation and experimental results used a common datasetto demonstrate the effectiveness of the mapping approach. It ensures the recall ration while improving the quality of mapping results.

Keywords: Semantic web, ontology engineering, ontology mapping, similarity cube, support vector machine (SVM).

1 Introduction

Berners-Lee et al.[1] proposed the concept of semanticweb in 1998. Semantic web enables computers to under-stand knowledge in world wide web, and it achieves se-mantic interoperation between information systems. On-tology describes semantic information of documents in se-mantic web and provides a universal standard for knowledgerepresentation. These years, the research on ontology hasbeen very active. A lot of different service-oriented ontolo-gies appear as the research on ontology progresses, suchas amino acid ontology[2], CContology[3] , plant ontology[4],and ontology transformed from general knowledge libraryand thesaurus[5, 6]. These ontologies have dissimilarities indefinition, application and service, so their intrinsic struc-ture and contents are different. To solve the heterogeneousproblems in ontology domain, we need to find semanticmapping relationship between different ontologies. Ontol-ogy mapping has already become an effective mean to com-bine distributed isomerism ontology, and provides serviceto many ontology engineering, such as ontology integration,ontology merging, and ontology alignment.

At the present, there is a lot research about ontologymapping methods, and researchers have developed manykinds of mapping discovery systems, such as ASCO[7],OLA[8], HMatch[9]. Most of these systems use multi-strategy to find similarity between ontology elements, andthen to find suitable mapping results by different strategiesintegration. Integrated approaches use multi-strategy sim-ilarity weighted mean and combined methods, such as hy-brid and composite[10, 11]. These methods have made someachievements, but still have many deficiencies. First, the

Manuscript received August 17, 2010; revised December 2, 2010This work was supported by National Natural Science Foundation

of China (No. 60873044), Science and Technology Research of theDepartment of Jilin Education (Nos. 2009498, 2011394), and Open-ing Fund of Top Key Discipline of Computer Software and Theory inZhejiang Provincial Colleges at Zhejiang Normal University of China(No. ZSDZZZZXK11).

combined methods used by these systems are simply com-bination of multi-strategy similarities, hence they do notshow the influence on final mapping result caused by multi-strategy itself at semantic level, so it leads to mapping oflow quality; second, these methods need a lot manual inter-vention, such as weighted-average method. Because of theexcessive intervention by users and domain experts, some ofthe mapping relationships are missed out. Moreover, suchan approach will not allow to take effective measures toprocess large scale ontology mapping tasks.

In order to solve these problems, this paper presents anew method to solve ontology mapping. This method con-jointly uses Wordnet and Jaccard coefficient to computesimilarity of concept labels and promotes neighbor levelconcept to show the structural similarity of ontology ele-ments while the similarity of ontology instances and en-tities are founded by using vector space model methods.New multi-strategy similarity results combination approachis carried out in this method, i.e., it proposes similar cubeconcept and cramps out similar vectors by stripping andslicing. Through these operations, this method utilizes sup-port vector machine (SVM)-based mapping discovery strat-egy. The mapping discovery problems are transformed tobinarization classification problem in vector space. Exper-iments show the method is not only effective but also canbe applied to process large scale ontology mapping tasks.

2 Ontology and ontology mapping

2.1 Ontology

Ontology concept originates from philosophy; ontologyis “a systematic description of objective, concerned withabstract entities of objective reality.” In the domains of ar-tificial intelligence, knowledge system and information sys-tem, ontology is “a formal, clear and detailed explanationof sharing concept system”[12]. Ontology provides a sharingword list. It includes object type, concept, attributes and

Page 2: SVM-based ontology matching approach

L. Liu et al. / SVM-based Ontology Matching Approach 307

relationships exist in particular domains[13]. Perez and Ben-jamins uses the relationship of taxonomy to organize ontol-ogy, summarize 5 basic modeling primitives[14], and pro-poses definition of ontology.

Definition 1. OntologyOntology O = {C, R, F, A, I}, where C is class or set of

concepts, c is concept (c ∈ C), which indicates everything,like work specification, function, behavior, strategy and rea-soning process; R is set of relationships, the interaction be-tween concepts in domain, defines a subset of n-dimensionalCartesian product formally: R : C1 × C2 × · · · × Cn, r isrelationship (r ∈ R), basic relationships contain: subclass-of, part-of, kind-of and attribute-of; F is function, a kind ofspecial relationship. Formally F : C1 × C2 × · · · × Cn−1 →Cn, such as Mother-of is a function, Mother-of (x, y) meansy is mother of x; A is axiom, represents tautological asser-tion, like concept B belongs to the range of concept A; Iis set of instances, i is an instance (i ∈ I). Fig. 1 shows anontology sample.

Fig. 1 Simple ontology sample

2.2 Ontology heterogeneity

The development of semantic web makes more and moreontologies. Different organizations define ontology to servetheir own applications. Because there is no unified standardfor ontology construction, the content and structure of theseontologies have differences. These differences have becomethe balk to knowledge sharing and multiplexing. Ontologyheterogeneity widely exists in semantic web, and registersin various ways. For example: different concepts come fromtwo ontologies and contain the same semanteme are definedas different name labels, this status is called name hetero-geneity; taking another example: if two concepts are de-fined as a same name labels but their semantemes are verydifferent, which becomes a semantic heterogeneity. In addi-tion, same parent class has different numbers of sub-classesin different ontologies, namely structural heterogeneity. Toachieve distributed information integration and knowledgemultiplexing, it must find semantic relationships betweendifferent ontologies by ontology mapping, and then solveontology heterogeneity.

2.3 Ontology mapping

Ontology mapping is a procedure, it uses two indepen-dent ontologies as input, and creates interrelated semantic

relationship of all the elements (concept, relation, entity) inthe two ontologies[15] .

Definition 2. Ontology mappingOntology mapping is to find semantic relationship be-

tween two ontologies, mapping function[16] expressed asMap({e1i, e2j}, O1, O2) = f . Two given ontologies O1 andO2, mapping from ontology O1 to O2 means every entityin ontology O1 finds an appropriate entity in ontology O2,and gives their corresponding relation. This correspond-ing relation is usually determined by similarity between thetwo entities. O1, O2 are called source ontology and tar-get ontology, respectively. Here e1i ∈ O1, e2j ∈ O2, and

{e1i} Map−→ {e2j}. They are all element sets. f is a kind ofmapping type or null. When f is null, it means there is nomapping relationships between {e1i} and {e2j}.

3 Mapping strategies

The method in this paper uses various kinds of infor-mation of ontology: concept labels, properties, instances,taxonomic structure and constraint design correspondingstrategies separately. Before introducing each specific strat-egy, we give definition of similarity matrix first.

Definition 3. Similarity matrix Mk

For finding the mapping relationship between ontologyO1 and O2, we use g kinds of strategies to compute theirelement similarity. Range of values of each element is [0, 1].The representation of similarity matrix Mk used strategy k(1 � k � g) is Mk= sim(e1i, e2j)nm.

As shown in Table 1, where e1i (1� i � n) and e2j(1 �j � m) are elements in ontology O1, O2, respectively,sim(e1i, e2j) is similarity of e1i and e2j with strategy k.

Table 1 Similarity matrix Mk

e21 e2j e2m

e11 sim(e11, e21) · · · sim(e11, e2m)

e1i · · · sim(e1i, e2j) · · ·e1n sim(e1n, e21) · · · sim(e1n, e2m)

3.1 Concept label similarity computingstrategy

Using concept label similarity to find mapping relation-ship is the most basic methods in mapping discovery. Thereis a lot of research in this area, such as [17–19]. These meth-ods mostly depend on text message of concepts merely anduse string matching algorithm or edit distance to show thesimilarity of two concept labels. For taking failure into ac-count of the semantic information of concept label, thesemethods are not very effective.

This paper uses a similarity computing method basedon concept label in Wordnet[20] and Jaccard coefficient[21].Semantic and grammar information of these labels havebeen taken into consideration. Experiments show that thismethod is not only effective in the condition when elementshave same or partly same names but also effective whennames are not same but with some semantic relations.

Wordnet is a semantic web vocabulary system based onEnglish. It organizes the English vocabulary as a the-saurus synset which indicates lexical concepts and createsdifferent indicators between concepts to express seman-tic relationships as hyponymy, synonymy, antonymy, etc.

Page 3: SVM-based ontology matching approach

308 International Journal of Automation and Computing 9(3), June 2012

Pantel and Lin[22] defined the similarity of two acceptationsaccording to Wordnet.

simd(s1, s2) =2 × log p(s)

log p(s1) + log p(s2)(1)

where p(s)=count(s)/total, means the proportion of thenumber of words contained in acceptation node s and itssub nodes to whole dictionary, total is the sum of words inWordnet.

Concept label in ontology is vocabulary. It matches alongwith multi-acceptation, so we give definition of concept la-bel acceptation.

Definition 4. Concept label acceptation s(ci)To concept ci in ontology O, its label acceptation is the

set of all its acceptation s in Wordnet, i.e., s(ci) = {si|i =1, 2, · · · , m}. The semantic similarity of two concept labelscan be got by concept label acceptation.

Definition 5. Concept label acceptation similarityGiven two ontologies O1 and O2, semantic similarity of

two concept labels c1i ∈ O1, c2j ∈ O2 is the average of alltheir concept label acceptation similarity in Wordnet, i.e.,

WNSim(c1i, c2i) =

i−1∑

m

j−1∑

n

sim(si, sj)

m × n. (2)

Definition 6. Public parents node of acceptation n(s)Given acceptations si, sj , their public parent node is

signed as n(s), n(s) = {s|s ∈ Wordnet∧(s, si ∈ H) ∧(s, sj ∈H)}, where H is the hyponymy in Wordnet.

Following Jaccard coefficient is used to compute syntaxsimilarity of two concept labels.

Definition 7. Concept label syntax similarityGiven two ontologies O1 and O2, the syntax similarity

of two concept labels c1i ∈ O1, c2j ∈ O2 is the ratio of thenumber of same characters and total number of charactersin them, i.e., JSSim(c1i, c2j) = (c1i × c2j)/(‖c1i‖ × ‖c1i‖ +‖c2j‖ × ‖c2j‖ − c1i × c2j).

Definition 8. Concept label similarityGiven two ontologies O1 and O2, similarity of

two concept labels c1i ∈ O1, c2j ∈ O2 is theweighted average of their Wordnet similarity andtheir Jaccard similarity, i.e., NLSim(c1i, c2j) =βWNSim(c1i, c2j)+(1−β)JSSim(c1i, c2j), where β is weight.

Algorithm 1. Concept label similarity computation

Input: two concept labels c1i ∈ O1, c2j ∈ O2, weight β and max-imum search depth MaxDep in WordnetOutput: matrix of concept label similarity MNL(i, j)

BeginIf c1i = c2j //* c1i and c2j are same concept *//

NLSim(c1i, c2j)= 1Else

If c1i, c2j on the same node in WordnetNLSim(c1i, c2j)= 1

Elsebacktracking MaxDepIf n(s) does not exist //* c1i and c2j have no public par-

ents node *//NLSim(c1i, c2j)= 0

ElseFor each c1i

For each c2j

NLSim(c1i, c2j)= βWNSim(c1i, c2j)+(1−β )JSSim(c1i, c2j)

MNL(i, j)=NLSim(c1i, c2j)EndFor

EndForReturn MNL(i, j)

End

3.2 Properties of similarity computing

Ontology formally expresses the objective world mainlyby concept which is an important formalizing tool. And re-lationships between concepts are reflected by the properties.In specific range, properties connect individuals in one do-main with the other. Concept has many properties, such asSubclassof, equivalentClass, disjointWith, UnionOf, com-plementOf, intersectionOf, oneof, etc. Ontologies expressedby formal language OWL have two kinds of properties: ob-ject properties and data properties. Object properties con-nect two classes; data properties connect classes with par-ticular data type or RDF character types. Table 2 shows asimple ontology properties sample.

Ontology properties provide useful information for map-ping task. Object properties or data properties, all existin text form, so when finding mapping relationships, ourmethods use text classification technique. Text classifica-tion technique is mature, and it is widely used in natu-ral language processing, pattern recognition and knowledgediscovery. Specific details can be seen in [23–25]. And inour methods, we use the same strategy to measure prop-erty and instance similarity, and refer to Section 3.4 formore details.

Table 2 Ontology properties sample

Class name Object property Data property

Complication (hasComplication)::Type disease example English name::Type string

Disease Therapeutic (indicationsOf)::Type medicine example Cause of disease::Type string

Doctor (expertiseOf):: Type doctor example symptom::Type strin

MedicineAdaptation disease (hasIndications)::Type disease example English name::Type string

Contraindication (hasContraindications)::Type disease example

English name::Type string

Hospital (memberOf)::Type hospital example Other name::Type stringDoctor

speciality (hasExpertise):: Type disease example Former name::Type string

Doctor introduction ::Type string

Hospital Doctor (memberOf)::Type doctor example

Page 4: SVM-based ontology matching approach

L. Liu et al. / SVM-based Ontology Matching Approach 309

3.3 Structure similarity computing strat-egy

Ontology structure provides potential semantic informa-tion for mapping discovery. In ontology structure, everyconcept is a detailing of its parent node, and every childnode inherits the information from its parent node. Tohave the concepts come from two different ontologies, theirmapping relationship must be affected by semantic relationof neighbor node at its level, that is to say, in ontologymapping process, semantic relation of nodes is transitive.Semantic relation of parent node and child node will exertan influence on concept nodes. If there is a mapping re-lationship between their parent node and child node, theyhave a mapping relationship. That is, if they have the sameor similar context, they may have a mapping relationship.

So, the structure similarity of two concepts can be foundfrom the information of their nearest parent node and theirnearest child node. Following are interrelated definitionsand algorithms.

Definition 9. Nearest neighbor level conceptIn ontology O, direct parent nodes of concept c ∈ C

and direct child nodes are defined as nearest neighbor levelconcept of c, namely IM(c). IM(c) = {ci|ci ∈ O|ci ∈(parent(c, ci)∪ son(c, ci))}.

Algorithm 2. Structure similarity computationInput: concepts c1i ∈ O1, c2j ∈ O2

Output: matrix of structure similarity MST (i, j)BeginGet near neighbor level concept IM(c1i), IM(c2j) of conceptsc1i, c2j

For each c1i

For each c2j

STSim(c1i, c2j) =

m∑

i−1

n∑

j−1

NLSim(IM(cli, IM(c2i)))

m × n

MST (i, j) = STSim(c1i, c2j)EndFor

EndForReturn MST (i, j)

End

3.4 Instance similarity computing strat-egy

In ontology, every concept includes the extension infor-mation, namely their instances. Usually, every instance in-cludes name and documents related to it. We use wordsand their frequency information in corresponding “text” ofinstances to find mapping relationships. We regard namesand documents related to instances as text contents. Eachinstance matches along with one text content, so we cancreate a text set for it. By using vector space model (VSM)method, we look this text set as a document in VSM. Ev-ery element in instance text set is regarded as feature term,the weight of each feature term can be got by computing itsfrequency information, specifically we use tf-idf methods[26].In this way, we get vector expression of text sets, and simi-larity between instances can be described by the similaritybetween two text vectors. For keeping the results normal-ized, when measuring the similarity between two text vec-tors, we use the method of computing the cosines of angles

between the two text vectors as follows:

sim(di, dj) =

n∑

k=1

wik × wjk

√√√√(

n∑

k=1

w2ik)(

n∑

k=1

w2jk)

. (3)

Thus, we transform instance similarity computation totext similarity discovery. Finally, instance similarity ma-trix MIS(i, j) is achieved.

4 SVM-based mapping discovery

How to combine the results of the similarities producedby multi-strategy is the key step of mapping discovery. Itis different from existing mapping discovery strategies, likeDempster-Shafer of evidence theory used in DSSim[27], re-laxation labeling selection strategy used in GLUE[28] andpredicted maximum selection strategy[29], we use supportvector machine (SVM) method.

4.1 SVM theory

SVM method is a sort of classification technique based onstatistical study and structural risk minimization theory[30].Its core principle is that using preselected kernels functionto map the input vectors into higher dimensional space, andthen construct the optimal separating hyperplane in thisspace to maximize the margin of classification. SVM cansolve practical problems like nonlinearity, higher dimensionand local minimum point, etc. It has many applicationsin the fields of pattern recognition, regression estimation,probability density function estimate, etc.[31−35] .

Our approach sees matching discovery as binary classi-fication in a vector space, namely, there are two kinds ofpoints in similarity vector space, the presence of matchingrelations and the opposite. It becomes the key of match-ing discovery that how to classify these two kinds of pointsreasonably. Since SVM has improved capability and canobtain best compromise between complexity of model andlearning capability when solving small sample and the non-linear problem, therefore it is being used here to carry outthe task of matching discovery.

4.2 SVM-based mapping discovery strat-egy

Before using SVM to discover mapping pairs, our ap-proach takes different similarities into one whole, as a sim-ilarity cube. By cutting operation on this similarity cube,similarity vectors are obtained. The similarity vectors arethe objects to be processed of SVM-based mapping discov-ery. Related definition and algorithm are given below:

Definition 10. Similarity cubeSimilarity cube is a triples of two ontology O1, O2, that is,

SC(O1, O2) = (D, M, F ). Where D = {D1, D2, · · · , Dn},means dimensionality set of the similarity cube, Di ={t1, t2, · · · , tm} is one of the dimensionalities, tm is a mem-ber of dimensionality Di, m is the total number of themembers in Di; M = {M1, M2, · · · , Mk} are facts in thesimilarity cube. It is a numerical measure and means simi-larity matrix sets which are got by using g kinds of strate-

Page 5: SVM-based ontology matching approach

310 International Journal of Automation and Computing 9(3), June 2012

gies. F = {F1(M), · · · , Fs(M)} is aggregate function offacts measure.

During the realization, dimensionality set D ={D1, D2, D3}, where dimensionality D1 = E1 = {e11,e12, · · · , e1n}, means all elements in ontology O1; dimen-sionality D2 = E2 = {e21, e22, · · · , e2m}, means all elementsin ontology O2; dimensionality D3 = S = {S1, S2, · · · , Sg},means similarity computing strategies S1, S2, · · · , Sg usedin this method; aggregate function F , combines similaritymatric M1, M2, · · · , Mg which are got by multi-strategy.

F = ∪gk−1Mk. (4)

Dimensionality junction is numerical measure Mkij of on-

tology element similarity by using multi-strategy. So we gota three-dimensional similarity cube as shown in Fig. 2.

Fig. 2 Similarity cube

Definition 11. Similarity cube cuttingFor the given similar cube SC(O1, O2), cutting is the pro-

cess of choosing single dimension member and their factsmeasure in its two or more dimensionalities. For example,three dimensionalities Di, Dj and Dk are selected in a sim-ilar cube, their members ti, tj and tk (ti ∈ Di, tj ∈ Dj ,tk ∈ Dk) are chosen separately to process the cutting ac-tion. The result of this action is called a sub-block (ti, tj ,tk, Mk

ij) of similarity cube. Similarity vector of two ontolo-gies can be got by cutting the similarity cube according tospecific conditions.

Definition 12. Similarity vectorFor ontology elements e1i, e2j (e1i ∈ O1, e2j ∈ O2), cut-

ting operation is performed according to the given condi-tions (E1 = e1i) and (E2 = e2j) and S. A sub-cube of thesimilarity cube can be got, which is called similarity vector.It is expressed as Xij = (M1

ij , M2ij , · · · , Mg

ij) (Mdij ∈ Mk),

where Mdij means one feature similarity of e1i and e2j , n

is the total number of features. Rn means set of similarityvectors.

All similar information of two ontologies are expressedby similarity cube. Cutting is an analysis action on thiscube. By cutting, similarity vector can be obtained, whichshows the similarity of two entities from different ontologiesin multi-measurement strategy. All of these similarity vec-tors make up a similarity vector space together and everysimilarity vector is a point in this space. Each point showstwo possible mapping results, that is match or no match.Correspondingly, similarity vectors can be signed as “+”and “–” separately. The label of one vector cannot be de-termined directly. Therefore, our approach uses SVM to

complete mapping procedure, which means transform themapping discovery to make binary classification of givendata set by constructing the optimal separating hyperplaneof similarity vector space. In other words, a number of sim-ilarity vectors are preselected as sample points. With thesepoints constructing optimal separating hyperplane, the dis-tances of the two classes of points are maximized. And thenthe two classes of points can be labeled “+” and “–”. Us-ing classifier, the sign results of other vectors′ can be got.The mapping elements pairs with signed “+” are chosen asmapping results, and then mapping procedure completes.

Algorithm 3. Mapping result discoveryInput: training set T , similar feature vectors unlabeled, penaltyfactor C.Output: signed values of similar feature vectors, mapping resultpairs

Begin//* classifier construction process *//doGiven training set T ={(x1

ij , y1), · · · , (xnij , yl)} ∈ (Rn ×

Y )l, where xij ∈ Rn, yi ∈ Y = {−1, 1}, i = 1, · · · , l;Select core function K(x, x′) and αi (0 � αi � C);Solve

min

α1

2

l∑

i−1

l∑

j−1

αi, αj , yi, yj(K(xi, xj) + 1) −l∑

i−1

αi

and get α∗=(α∗1 , · · · , α∗

l )T;Get discriminant function

f(x) = sgn(l∑

i−1

α∗i , yi(K(x, xi) + 1))

until the optimal separating hyperplane is obtained.//* mapping discovery with classifier *//Input similar feature vectors unlabeled, xij(xij ∈ Rn), then

get its label value yi;If yi=1, select element pairs corresponding to the similarity

vector xij , insert it into the mapping result set;

End

5 Mapping process

Given two ontologies, ontology mapping is to find se-mantic relationship between them by proper operation andprocess. During mapping discovery, the comprehensive in-formation of ontology must be of concern to get high qual-ity results. Fig. 3 shows mapping architecture used in thispaper. Input is two heterogeneous ontologies, mapping pro-cess task is to create the mapping relationship from sourceontology to target ontology. It is an iteration process, andeach iteration includes five major steps:

1) PreprocessingInput is two ontologies O1 and O2, where O1 and O2 can

be described with OWL language. After input, the ontolo-gies should be preprocessed. Reasonable preprocessing canimprove precision of ontology element similarity computa-tion. Preprocessing in this paper has two parts, elementspicking up and elements processing. The former picks upall the information about the ontology, concept, property,relation and instance, and then process them. The contentscover analyzing the textuary vocabulary about the pickedup information, process of numbers, hyphens, punctuations,

Page 6: SVM-based ontology matching approach

L. Liu et al. / SVM-based Ontology Matching Approach 311

Fig. 3 Mapping architecture

capital letters, small letters, suffix and prefix. Apart fromthese, it uses forbidden word list to take out stop-words,picks up stems and restores the vocabulary anamorphosis,etc. Accordingly, the words and phrases which express thesemantic information of ontology can be obtained. Theyare the input of later similarity computing. For example:concept name “2-Wheel Drive” is preprocessed as follow:first we convert numbers, hyphens, capital and small let-ters, then process participles, and get {two, wheel, drive};

property name “has vertebra” can be processed and thenwe get {has, vertebra}.

2) Similarity computationTo find the semantic mapping relationship between two

ontologies, all information should be used for mapping dis-covery. This paper measures the similarity of concept label,property, instance, taxonomic structure and constraint sep-arately. The obtained corresponding similarity matrix isused as operational objectives in next step.

3) Similarity cube cuttingIt is used as similarity cube in the polymerization of the

multi-strategy similarity computing result which is got insimilarity computation, and it expresses all similar infor-mation between two ontologies. According to preset con-ditions, we perform cutting operation, and then pick upsimilarity vectors. The vectors represent all information ofelements in the two ontologies.

4) SVM-based mapping discoveryMapping discovery processes based on similarity of ev-

ery independent strategy and the predicted mark value ofmerged similarity vector.

5) Mapping iterationThe whole mapping procedure is an iterative procedure.

Each iteration includes other steps except preprocessing;iteration is continued until there is no new mappings befound.

In contrast, using a single method can achieve high map-ping efficiency, but the map quality is often lower than idealresults. Our approach is an iterative process, in which a va-riety of strategies are used to calculate similarity. After eachiterative process, new mapping entity pairs can be found.Iteration only carries on in ontology that removes these en-tity pairs. Therefore, the speed of the latter is fast thanthe previous iteration. Until new mapping entities can notbe found, iteration stops. By this way, it ensures the recallwhile improving the quality of mapping results.

6 Experiments and evaluation

6.1 Experimental dataset

The proposed method in this paper is verified withdatasets in ontology alignment evaluation initiative(OAEI). These data sets can be obtained on the website ofOAEI[36]. Dataset “Benchmark” contains a reference ontol-ogy, including 33 concepts, 59 properties, 96 instances andother 25 ontologies with disparity elements, the mappingtask is to create mapping relationships from the 25 ontolo-gies to tagert ontology. And we use two subsets of dataset“Adult Mouse Anatomy”, mouse anatomy and nci anatomyontology. The former includes 2763 concepts, the highestlevel is 5; the latter includes 6514 concepts, the highestlevel is 12. Dataset “Directory” derived from web pagesdirectories of Google and Yahoo, including 2265 ontologydocument pairs, every ontology document is made up ofone class levels-tree. Every node mapping task is made upof a group of routes through the root node in web pages di-rectories. Then we pick up elements in directory “Health”of every website, where source ontology google.owl includes260 concepts, target ontology yahoo.owl includes 274 con-cepts.

6.2 Experimental evaluation standard

In order to evaluate our approach and compare withthe others, three criteria, i.e., precision, recall andoverall, were adopted. Suppose, the mapping resultsset by using this system is W = {(x1, y1), (x2,y2), · · · , (xn, yn)}, the factual mapping results set is V ={(x1, y1), (x2, y2), · · · , (xm, ym)}, then c = |W ∩ V | rep-resents the amount of right mapping results, n − c is theamount of the wrong results caused by the mapping system,and m− c is the amount of non-match results. Accordinglydefine

Precision =c

nRecall =

c

m.

For evaluating mapping result objectively, overall mustbe used to reflect the interaction relationship between recalland precision.

Overall = 1 –(n − c) + (m − c)

m= Recall×(2− 1

Precision).

Page 7: SVM-based ontology matching approach

312 International Journal of Automation and Computing 9(3), June 2012

6.3 Experimental results and analysisExperimental results are given in Tables 3–5. Table 3

shows the experimental result on Benchmark, Table 4 showsthis on Anatomy and Directory. And Table 5 shows com-parison results between our method and other mapping sys-tems.

It can be seen clearly that our method has better ef-fects in these mapping tasks as compared to the results ofother experiments. The precision on dataset Benchmarkis between 55% and 94%, recall is between 62 % and 94 %,which proves that the mapping method proposed in this pa-per is effective; precision on data set AnatomyDirectory are93% and 84%, recall are 86% and 84%, hence this resultis better.

In addition, it can be seen from the experimental re-sults, the precision of the method in this paper is obviouslyhigher than GLUE, and is basically the same as DSSim andASMOV; only the precision is slightly lower than DSSim.

GLUE is too dependent on the plentiful entities of ontology,and is too sensitive to data. When the number of ontologyentities is very large, it works well, but when the entitiesare limited, the result is not good. Compared to it, our ap-proach uses all information about ontology to find the map-ping relationship. It does not depend only on one aspectof the ontology, so it has a better result. Compared to AS-MOV and OntoDNA system, recall and precision are botha little higher because our method uses SVM for mappingresults discovery. It means that the result using SVM tocombine multi-strategy similarities is satisfactory. DSSimuses multi-strategy to compute the similarity of ontologyelements and carries out evidence theory to combine simi-larity results during mapping discovery procedure. It worksbetter than our method in 1:n mapping tasks. In prac-tice, the method in this paper can adjust the weight β andpenalty factor C according the situation, it can further im-prove degree of accuracy of mapping.

Table 3 Ontology properties sample

Dataset Mapping task Precision Recall Overall

101 to reference 0.86 0.85 0.71

102 to reference 0.86 0.85 0.71

Benchmark 103 to reference 0.86 0.85 0.71

104 to reference 0.91 0.91 0.82

201 to reference 0.94 0.92 0.86

202 to reference 0.93 0.92 0.85

203 to reference 0.56 0.62 0.31

204 to reference 0.93 0.91 0.84

205 to reference 0.74 0.74 0.48

206 to reference 0.89 0.86 0.75

207 to reference 0.93 0.87 0.80

Benchmark 208 to reference 0.89 0.90 0.78

209 to reference 0.91 0.93 0.91

210 to reference 0.89 0.91 0.81

221 to reference 0.85 0.89 0.73

222 to reference 0.93 0.91 0.84

223 to reference 0.94 0.92 0.87

224 to reference 0.93 0.92 0.85

225 to reference 0.92 0.94 0.86

226 to reference 0.93 0.94 0.90

Benchmark 301 to reference 0.55 0.67 0.18

302 to reference 0.60 0.62 0.16

303 to reference 0.85 0.73 0.61

304 to reference 0.65 0.80 0.37

Table 4 Experimental results on Anatomy and Directory

Dataset Mapping task Precision Recall Overall

Anatomy mouse anatomy to nci anatomy 0.93 0.86 0.79

Directory Google to Yahoo 0.89 0.84 0.73

Table 5 Comparison with other mapping systems

GLUE DSSim ASMOV OntoDNA Our method

Precision 0.81 0.91 0.88 0.85 0.89

Recall 0.77 0.87 0.89 0.85 0.87

Page 8: SVM-based ontology matching approach

L. Liu et al. / SVM-based Ontology Matching Approach 313

7 Conclusions and future works

This paper proposes a new ontology mapping method,which achieves multi-strategy mapping and combines thesimilarity results provided by all other strategies into a simi-larity cube. By cutting the similarity cube, method picks upsimilarity vectors. And then, it uses support vector machineto do two-value classification on members of the similarityvector space in order to complete the mapping discoverytasks. Experiments show that the method is not only veryeffective but also applicable to process large scale ontologymapping tasks. Aside from striving to improve the accu-racy of our approach, in the future, we intend to adjust themethod to process m : n mappings.

References

[1] T. Berners-Lee, J. Hendler, O. Lassila. The semantic web.Scientific American, vol. 284, no. 5, pp. 34–43, 2001.

[2] Amino Acid Ontologyv1.3, [Online], Avail-able: http://www.co-ode.org/ontologies/amino-acid/2009/02/16/, March 4, 2012.

[3] Customer Complaint Ontology, [Online], Available:http://www.jarrar.info/CContology/, March 4, 2012.

[4] Search or Browse the Plant Ontology Database, [Online],Available: http://www.plantontology.org/, March 4, 2012.

[5] Gellish, [Online], Available: http://sourceforge.net/apps/trac/gellish/, March 4, 2012.

[6] The NeOn project and the NeOn Foundation, [On-line], Available: http://aims.fao.org/website/NeON/sub2,March 4, 2012.

[7] B. T. Le, R. Dieng-Kuntz, F. Gandon. On ontology match-ing problems for building a corporate semantic web in amulti-communities organization. In Proceedings of the 6thInternational Conference on Enterprise Information Sys-tems, PubZone, Porto, Portugal, pp. 236–243, 2004.

[8] J. Euzenat, P. Valtchev. Similarity-based on-tology alignment for OWL-Lite. In Proceed-ings of European Conference on Artificial Intel-ligence, pp. 333–337, 2004. [Online], Available:http://disi.unitn.it/ accord/RelatedWork/Matching/align-ECAI04-FSub.pdf, March 4, 2012.

[9] S. Castano, A. Ferrara, S. Montanelli. Matching ontologiesin open networked systems: Techniques and applications.Journal on Data Semantics, vol. 3870, pp. 25–63, 2006.

[10] H. Do, E. Rahm. Coma: A system for flexible combina-tion of schema matching approaches. In Proceedings of the28th International Conference on Very Large Data Bases,PubZone, Hong Kong, PRC, pp. 610–621, 2002.

[11] A. H. Doan, P. Domingos, A. Y. Halevy. Reconcilingschemas of disparate data sources: A machine-learning ap-proach. In Proceedings of the 2001 ACM SIGMOD Inter-national Conference on Management of Data, ACM, NewYork, USA, pp. 509–520, 2001.

[12] T. R. Gruber. A translation approach to portable ontologyspecifications. Knowledge Acquisition, vol. 5, no. 2, pp. 199–220, 1993.

[13] R. Studer, V. R. Benjamins, D. Fensel. Knowledge engi-neering: Principles and methods. Data and Knowledge En-gineering, vol. 25, no. 1–2, pp. 161–197, 1998.

[14] A. G. Perez, V. R. Benjamins. Overview of knowledge shar-ing and reuse components: Ontologies and problem solvingmethods. In Proceedings of the IJCAI299 Workshop on On-tologies and Problem-solving Methods, pp. 1-1–1-15, 1999.

[15] E. Rahm, P. A. Bernstein. A survey of approaches to auto-matic schema matching. The VLDB Journal, vol. 10, no. 4,pp. 334–350, 2001.

[16] Z. J. Wang, Y. L. Wang, S. S. Zhang. Effective large scaleontology mapping. In Proceedings of the 1st InternationalConference on Knowledge Science, Engineering and Man-agement, Springer, Guilin, PRC, pp. 454–465, 2006.

[17] D. Thau, S. Bowers, B. Ludascher. Merging sets of tax-onomically organized data using concept mappings underuncertainty. In Proceedings of the Confederated Interna-tional Conferences, CoopIS, DOA, IS, and ODBASE 2009on the Move to Meaningful Internet Systems: Part II, ACM,Berlin, Germany, pp. 1103–1120, 2009.

[18] P. Bouquet, J. Euzenat, E. Franconi, L. Serafini, G. Sta-

mou, S. Tessaris. Specification of a Common Framework forCharacterizing Alignment, Technical Report, University ofKarlsruhe, Germany, 2004.

[19] A. Locoro, V. Mascardi. A correspondence repair algorithmbased onword sense disambiguation and upper ontolo-gies. In Proceedings of KEOD, 2009. [Online], Available:http://www.disi.unige.it/person/LocoroA/download/Loco-roMascardiKeod2009.pdf, March 4, 2012.

[20] WorldNet, [Online], Available: http://wordnet. princeton.edu/, March 4, 2012.

[21] A. E. Monge, C. P. Elkan. The field matching problem:Algorithms and applications. In Proceedings of the 2nd In-ternational Conference on Knowledge Discovery and DataMining, AAAI, pp. 267–270, 1996.

[22] P. Pantel, D. Lin. Discovering word senses from text. InProceedings of the 8th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, ACM,Edmonton, Canada, pp. 613–619, 2002.

[23] G. Alexander, D. David, M. David. Large-scale Bayesianlogistic regression for text categorization. Technometrics,vol. 49, no. 3, pp. 291–304, 2007.

[24] S. Ashok, S. Mehran. Text Mining: Classification, Cluster-ing, and Applications, Boca Raton, USA: CRC Press, 2009.

[25] H. Kim, P. Howland, H. Park. Dimension reduction in textclassification with support vector machine. Journal of Ma-chine Learning Research, vol. 6, pp. 37–53, 2005.

[26] F. Sebastiani. Machine learning in automated text catego-rization. ACM Computing Surveys, vol. 34, no. 1, pp. 1–47,2002.

[27] N. Miklos, V. Maria, M. Enrico. DSSim — Man-aging uncertainty on the semantic web. In Pro-ceedings of the 2nd International Workshop onOntology Matching, 2007. [Online], Available:http://wenku.baidu.com/view/3916881052d380eb62946d7a.html, March 4, 2012.

[28] A. Doan, J. Madhavan, P. Domingos, A. Halevy.Learning to map between ontologies on the seman-tic web. In Proceedings of the 11th World WideWeb Conference, pp. 662–673, 2002. [Online], Available:http://wenku.baidu.com/view/23e2f1000740be1e650e9ae3.html, March 4, 2012.

[29] N. F. Noy, M. A. Musen. Algorithm and tool for automatedontology merging and alignment. In Proceedings of the2000 National Conference on Artificial Intelligence, AAAI,Austin, USA, pp. 450–455, 2000.

[30] V. N. Vapnik. The Nature of Statistical Learning Theory,2nd ed., New York, USA: Springer Verlag, 1999.

Page 9: SVM-based ontology matching approach

314 International Journal of Automation and Computing 9(3), June 2012

[31] E. Osuna, R. Freund, F. Girosi. Training support vectormachines: An application to face detection. In Proceed-ings of IEEE Computer Society Conference on ComputerVision and Pattern Recognition, IEEE, San Juan, Puerto

Rico, pp. 130–136, 1997.

[32] C. Y. Lu, P. F. Yan, C. S. Zhang, J. Zhou. Face recognitionusing support vector machine. In Proceedings of ICNNB′98,Beijing, PRC, pp. 652–655, 1998.

[33] V. Blanz, B. Schoblkopf, H. Bulthoff, C. Burges, V. Vapnik,T. Vetter. Comparison of view-based object recognitionalgorithms using realistic 3D models. [Online], Available:http://www.kyb.mpg.de/fileadmin/user upload/files/publi-cations/pdfs/pdf445.pdf, March 4, 2012.

[34] A. Alias, B. Subramanian, S. Pramala, B. Rajalakshmi,R. Rajaram. Improving decision tree performance by ex-ception handling. International Journal of Automation andComputing, vol. 7, no. 3, pp. 372–380, 2010.

[35] X. H. Huang, X. J. Zeng, M. Wang. SVM-based iden-tification and un-calibrated visual servoing for micro-manipulation. International Journal of Automation andComputing, vol. 7, no. 1, pp. 47–54, 2010.

[36] Ontology Alignment Evaluation Initiative, [Online], Avail-able: http://oaei.ontologymatching.org/2009/, March 4,2012.

Lei Liu received his B. Sc. and M. Sc. de-grees in computer software and theory fromthe Jilin University, PRC in 1982 and 1985,respectively. Currently, he is a professorand doctoral supervisor in the Departmentof Computer Science at Jilin University,PRC. He received research award from Sci-ence Foundation, and the Specialized Re-search Foundation for the Doctoral Pro-gram of Higher Education of China in 2006

and 2009, respectively.His research interests include computer software theory, se-

mantic web, formal methods and compiler theory.E-mail: [email protected]

Feng Yang received her B. Sc. and M. Sc.degrees in computer software and the-ory from the Northeast Normal University,PRC in 1998 and 2006, respectively. Cur-rently, she is a Ph. D. candidate in com-puter software and theory in Jilin Univer-sity, PRC. Since 1998, she has been a fac-ulty member at Department of Information,Jilin Teachers′ Institute of Engineering andTechnology.

Her research interests include ontology engineering, data min-ing, and control theory.

E-mail: [email protected]

Peng Zhang graduated from College ofComputer Science and Technology (CCST)in Jilin University (JLU), PRC in 2009.Currently, he is a doctoral candidate ofCCST in JLU.

His research interests include semanticweb and ontology engineering.

E-mail: [email protected]

Jing-Yi Wu graduated from College ofComputer Science and Technology (CCST)in Jilin University (JLU), PRC in 2009.Currently, she continues her master degreeat JLU.

Her research interests include ontologymapping and software engineering.

E-mail: [email protected]

Liang Hu received his B. Sc. degree fromHarbin Institute of Technology in 1990,M. Sc. and Ph.D. degrees in computer soft-ware and theory from the Jilin University,PRC in 1993 and 1999, respectively. Cur-rently, he is a professor and doctoral super-visor in the Department of Computer Sci-ence at Jilin University, PRC.

His research interests include computernetwork and information security.

E-mail: [email protected] (Corresponding author)


Top Related