SVM-based ontology matching approach

Download SVM-based ontology matching approach

Post on 25-Aug-2016




7 download

Embed Size (px)


  • International Journal of Automation and Computing 9(3), June 2012, 306-314

    DOI: 10.1007/s11633-012-0649-x

    SVM-based Ontology Matching Approach

    Lei Liu1 Feng Yang1, 2 Peng Zhang1 Jing-Yi Wu1 Liang Hu11College of Computer Science and Technology, Jilin University, Changchun 130012, PRC

    2Department of Information, Jilin Teachers Institute of Engineering and Technology, Changchun 130052, PRC

    Abstract: There are a lot of heterogeneous ontologies in semantic web, and the task of ontology mapping is to nd their semanticrelationship. There are integrated methods that only simply combine the similarity values which are used in current multi-strategyontology mapping. The semantic information is not included in them and a lot of manual intervention is also needed, so it leads tothat some factual mapping relations are missed. Addressing this issue, the work presented in this paper puts forward an ontologymatching approach, which uses multi-strategy mapping technique to carry on similarity iterative computation and explores bothlinguistic and structural similarity. Our approach takes dierent similarities into one whole, as a similarity cube. By cutting operation,similarity vectors are obtained, which form the similarity space, and by this way, mapping discovery can be converted into binaryclassication. Support vector machine (SVM) has good generalization ability and can obtain best compromise between complexity ofmodel and learning capability when solving small samples and the nonlinear problem. Because of the said reason, we employ SVM inour approach. For making full use of the information of ontology, our implementation and experimental results used a common datasetto demonstrate the eectiveness of the mapping approach. It ensures the recall ration while improving the quality of mapping results.

    Keywords: Semantic web, ontology engineering, ontology mapping, similarity cube, support vector machine (SVM).

    1 Introduction

    Berners-Lee et al.[1] proposed the concept of semanticweb in 1998. Semantic web enables computers to under-stand knowledge in world wide web, and it achieves se-mantic interoperation between information systems. On-tology describes semantic information of documents in se-mantic web and provides a universal standard for knowledgerepresentation. These years, the research on ontology hasbeen very active. A lot of dierent service-oriented ontolo-gies appear as the research on ontology progresses, suchas amino acid ontology[2], CContology[3] , plant ontology[4],and ontology transformed from general knowledge libraryand thesaurus[5, 6]. These ontologies have dissimilarities indenition, application and service, so their intrinsic struc-ture and contents are dierent. To solve the heterogeneousproblems in ontology domain, we need to nd semanticmapping relationship between dierent ontologies. Ontol-ogy mapping has already become an eective mean to com-bine distributed isomerism ontology, and provides serviceto many ontology engineering, such as ontology integration,ontology merging, and ontology alignment.

    At the present, there is a lot research about ontologymapping methods, and researchers have developed manykinds of mapping discovery systems, such as ASCO[7],OLA[8], HMatch[9]. Most of these systems use multi-strategy to nd similarity between ontology elements, andthen to nd suitable mapping results by dierent strategiesintegration. Integrated approaches use multi-strategy sim-ilarity weighted mean and combined methods, such as hy-brid and composite[10, 11]. These methods have made someachievements, but still have many deciencies. First, the

    Manuscript received August 17, 2010; revised December 2, 2010This work was supported by National Natural Science Foundation

    of China (No. 60873044), Science and Technology Research of theDepartment of Jilin Education (Nos. 2009498, 2011394), and Open-ing Fund of Top Key Discipline of Computer Software and Theory inZhejiang Provincial Colleges at Zhejiang Normal University of China(No. ZSDZZZZXK11).

    combined methods used by these systems are simply com-bination of multi-strategy similarities, hence they do notshow the inuence on nal mapping result caused by multi-strategy itself at semantic level, so it leads to mapping oflow quality; second, these methods need a lot manual inter-vention, such as weighted-average method. Because of theexcessive intervention by users and domain experts, some ofthe mapping relationships are missed out. Moreover, suchan approach will not allow to take eective measures toprocess large scale ontology mapping tasks.

    In order to solve these problems, this paper presents anew method to solve ontology mapping. This method con-jointly uses Wordnet and Jaccard coecient to computesimilarity of concept labels and promotes neighbor levelconcept to show the structural similarity of ontology ele-ments while the similarity of ontology instances and en-tities are founded by using vector space model methods.New multi-strategy similarity results combination approachis carried out in this method, i.e., it proposes similar cubeconcept and cramps out similar vectors by stripping andslicing. Through these operations, this method utilizes sup-port vector machine (SVM)-based mapping discovery strat-egy. The mapping discovery problems are transformed tobinarization classication problem in vector space. Exper-iments show the method is not only eective but also canbe applied to process large scale ontology mapping tasks.

    2 Ontology and ontology mapping

    2.1 Ontology

    Ontology concept originates from philosophy; ontologyis a systematic description of objective, concerned withabstract entities of objective reality. In the domains of ar-ticial intelligence, knowledge system and information sys-tem, ontology is a formal, clear and detailed explanationof sharing concept system[12]. Ontology provides a sharingword list. It includes object type, concept, attributes and

  • L. Liu et al. / SVM-based Ontology Matching Approach 307

    relationships exist in particular domains[13]. Perez and Ben-jamins uses the relationship of taxonomy to organize ontol-ogy, summarize 5 basic modeling primitives[14], and pro-poses denition of ontology.

    Definition 1. OntologyOntology O = {C,R, F,A, I}, where C is class or set of

    concepts, c is concept (c C), which indicates everything,like work specication, function, behavior, strategy and rea-soning process; R is set of relationships, the interaction be-tween concepts in domain, denes a subset of n-dimensionalCartesian product formally: R : C1 C2 Cn, r isrelationship (r R), basic relationships contain: subclass-of, part-of, kind-of and attribute-of; F is function, a kind ofspecial relationship. Formally F : C1 C2 Cn1 Cn, such as Mother-of is a function, Mother-of (x, y) meansy is mother of x; A is axiom, represents tautological asser-tion, like concept B belongs to the range of concept A; Iis set of instances, i is an instance (i I). Fig. 1 shows anontology sample.

    Fig. 1 Simple ontology sample

    2.2 Ontology heterogeneity

    The development of semantic web makes more and moreontologies. Dierent organizations dene ontology to servetheir own applications. Because there is no unied standardfor ontology construction, the content and structure of theseontologies have dierences. These dierences have becomethe balk to knowledge sharing and multiplexing. Ontologyheterogeneity widely exists in semantic web, and registersin various ways. For example: dierent concepts come fromtwo ontologies and contain the same semanteme are denedas dierent name labels, this status is called name hetero-geneity; taking another example: if two concepts are de-ned as a same name labels but their semantemes are verydierent, which becomes a semantic heterogeneity. In addi-tion, same parent class has dierent numbers of sub-classesin dierent ontologies, namely structural heterogeneity. Toachieve distributed information integration and knowledgemultiplexing, it must nd semantic relationships betweendierent ontologies by ontology mapping, and then solveontology heterogeneity.

    2.3 Ontology mapping

    Ontology mapping is a procedure, it uses two indepen-dent ontologies as input, and creates interrelated semantic

    relationship of all the elements (concept, relation, entity) inthe two ontologies[15] .

    Definition 2. Ontology mappingOntology mapping is to nd semantic relationship be-

    tween two ontologies, mapping function[16] expressed asMap({e1i, e2j}, O1, O2) = f . Two given ontologies O1 andO2, mapping from ontology O1 to O2 means every entityin ontology O1 nds an appropriate entity in ontology O2,and gives their corresponding relation. This correspond-ing relation is usually determined by similarity between thetwo entities. O1, O2 are called source ontology and tar-get ontology, respectively. Here e1i O1, e2j O2, and{e1i} Map {e2j}. They are all element sets. f is a kind ofmapping type or null. When f is null, it means there is nomapping relationships between {e1i} and {e2j}.

    3 Mapping strategies

    The method in this paper uses various kinds of infor-mation of ontology: concept labels, properties, instances,taxonomic structure and constraint design correspondingstrategies separately. Before introducing each specic strat-egy, we give denition of similarity matrix rst.

    Definition 3. Similarity matrix MkFor nding the mapping relationship between ontology

    O1 and O2, we use g kinds of strategies to compute theirelement similarity. Range of values of each element is [0, 1].The representation of similarity matrix Mk used strategy k(1 k g) is Mk= sim(e1i, e2j)nm.

    As shown in Table 1, where e1i (1 i n) and e2j(1 j m) are elements in ontology O1, O2, respectively,sim(e1i, e2j) is similarity of e1i and e2j with strategy k.

    Table 1 Similarity matrix Mk

    e21 e2j e2me11 sim(e11, e21) sim(e11, e2m)e1i sim(e1i, e2j) e1n sim(e1n, e21) sim(e1n, e2m)

    3.1 Concept label similarity computingstrategy

    Using concept label similarity to nd mapping relation-ship is the most basic methods in mapping discovery. Thereis a lot of research in this area, such as [1719]. These meth-ods mostly depend on text message of concepts merely anduse string matching algorithm or edit distance to show thesimilarity of two concept labels. For taking failure into ac-count of the semantic information of concept label, thesemethods are not very eective.

    This paper uses a similarity computing method basedon concept label in Wordnet[20] and Jaccard coecient[21].Semantic and grammar information of these labels havebeen taken into consideration. Experiments show that thismethod is not only eective in the condition when elementshave same or partly same names but also eective whennames are not same but with some semantic relations.

    Wordnet is a semantic web vocabulary system based onEnglish. It organizes the English vocabulary as a the-saurus synset which indicates lexical concepts and createsdierent indicators between concepts to express seman-tic relationships as hyponymy, synonymy, antonymy, etc.

  • 308 International Journal of Automation and Computing 9(3), June 2012

    Pantel and Lin[22] dened the similarity of two acceptationsaccording to Wordnet.

    simd(s1, s2) =2 log p(s)

    log p(s1) + log p(s2)(1)

    where p(s)=count(s)/total, means the proportion of thenumber of words contained in acceptation node s and itssub nodes to whole dictionary, total is the sum of words inWordnet.

    Concept label in ontology is vocabulary. It matches alongwith multi-acceptation, so we give denition of concept la-bel acceptation.

    Definition 4. Concept label acceptation s(ci)To concept ci in ontology O, its label acceptation is the

    set of all its acceptation s in Wordnet, i.e., s(ci) = {si|i =1, 2, ,m}. The semantic similarity of two concept labelscan be got by concept label acceptation.

    Definition 5. Concept label acceptation similarityGiven two ontologies O1 and O2, semantic similarity of

    two concept labels c1i O1, c2j O2 is the average of alltheir concept label acceptation similarity in Wordnet, i.e.,

    WNSim(c1i, c2i) =





    sim(si, sj)

    m n . (2)

    Definition 6. Public parents node of acceptation n(s)Given acceptations si, sj , their public parent node is

    signed as n(s), n(s) = {s|s Wordnet(s, si H) (s, sj H)}, where H is the hyponymy in Wordnet.

    Following Jaccard coecient is used to compute syntaxsimilarity of two concept labels.

    Definition 7. Concept label syntax similarityGiven two ontologies O1 and O2, the syntax similarity

    of two concept labels c1i O1, c2j O2 is the ratio of thenumber of same characters and total number of charactersin them, i.e., JSSim(c1i, c2j) = (c1i c2j)/(c1i c1i+c2j c2j c1i c2j).

    Definition 8. Concept label similarityGiven two ontologies O1 and O2, similarity of

    two concept labels c1i O1, c2j O2 is theweighted average of their Wordnet similarity andtheir Jaccard similarity, i.e., NLSim(c1i, c2j) =WNSim(c1i, c2j)+(1)JSSim(c1i, c2j), where is weight.

    Algorithm 1. Concept label similarity computation

    Input: two concept labels c1i O1, c2j O2, weight and max-imum search depth MaxDep in WordnetOutput: matrix of concept label similarity MNL(i, j)

    BeginIf c1i = c2j //* c1i and c2j are same concept *//

    NLSim(c1i, c2j)= 1Else

    If c1i, c2j on the same node in WordnetNLSim(c1i, c2j)= 1

    Elsebacktracking MaxDepIf n(s) does not exist //* c1i and c2j have no public par-

    ents node *//NLSim(c1i, c2j)= 0

    ElseFor each c1i

    For each c2jNLSim(c1i, c2j)= WNSim(c1i, c2j)+(1 )

    JSSim(c1i, c2j)MNL(i, j)=NLSim(c1i, c2j)


    Return MNL(i, j)


    3.2 Properties of similarity computing

    Ontology formally expresses the objective world mainlyby concept which is an important formalizing tool. And re-lationships between concepts are reected by the properties.In specic range, properties connect individuals in one do-main with the other. Concept has many properties, such asSubclassof, equivalentClass, disjointWith, UnionOf, com-plementOf, intersectionOf, oneof, etc. Ontologies expressedby formal language OWL have two kinds of properties: ob-ject properties and data properties. Object properties con-nect two classes; data properties connect classes with par-ticular data type or RDF character types. Table 2 shows asimple ontology properties sample.

    Ontology properties provide useful information for map-ping task. Object properties or data properties, all existin text form, so when nding mapping relationships, ourmethods use text classication technique. Text classica-tion technique is mature, and it is widely used in natu-ral language processing, pattern recognition and knowledgediscovery. Specic details can be seen in [2325]. And inour methods, we use the same strategy to measure prop-erty and instance similarity, and refer to Section 3.4 formore details.

    Table 2 Ontology properties sample

    Class name Object property Data property

    Complication (hasComplication)::Type disease example English name::Type string

    Disease Therapeutic (indicationsOf)::Type medicine example Cause of disease::Type string

    Doctor (expertiseOf):: Type doctor example sympto...


View more >