a survey on large scale schema and ontology matching techniques

Upload: journal-of-computing

Post on 07-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    1/7

    A Survey on Large Scale Schema andOntology Matching Techniques

    K. Saruladha, Dr. G. Aghila, and B. Sathiya

    AbstractWith the fast growth and the increasing usage of the large variety of data like XML schemas, ontologies. In manydomains, the heterogeneity among the data increases enormously. Hence solving such heterogeneity needs matching

    techniques. In areas like E-business, web and life sciences the size of XML schema and ontology is large. But most of the

    existing matching techniques address only small match tasks. So, we present an overview of recently proposed matching

    techniques like early pruning of the search space, divide and conquer strategies, parallel matching and some renowned

    ontology matching tools which achieve high match efficiency or/and high quality for large-scale matching. In addition to this the

    pros and cons of various matching techniques is summarized.

    Index TermsSimilarity Measure, Schema Matching, Ontology Matching, Ontology Alignment.

    1 INTRODUCTION

    Ngeneral

    relational

    and

    XML

    database

    is

    called

    as

    Schemaandthedatabaserepresentedbysemantic lan

    guage likeWebOntologyLanguage (OWL)orResource

    DescriptionLanguage(RDF)iscalledasOntology.Match

    ing technique finds semantic correspondences between

    cartesianproductofentitypairof thegiven twoontolo

    gies. These correspondences may stand for equivalent,

    subsumption, or disjoint relationbetween the ontology

    entities.Theresultofmatchingtechniqueisthesetofse

    mantic correspondences called alignments. Fig.1 depicts

    the general framework for the schema matching tech

    niques which is also applicable for ontologymatching

    techniques.As

    shown

    in

    figure

    the

    input

    of

    the

    matching

    technique istwoschemaswhicharefirst importedintoa

    format suitable forprocessing.For a faster computation

    the schemamaybe preprocessed, e.g., preparing neig

    bourlistforeachentitytofastenthestructuralsimilarity

    calculation, removing redundant information from sche

    ma,etc.Thesimilarityvalueiscalculatedforallcartesian

    product entitypairsone from eachof twopreprocessed

    schemas using a match workflow consisting of set of

    matcher (e.g., lexical, structural and semanticmatcher).

    Thesemanticvalueranges from0 to1, i.e.,value1 indi

    catesequivalentandvalue0meansdisjointrelation.The

    matcherworkflow

    can

    be

    sequential,

    parallel,

    iterative

    or

    in some mixed fashion. The similarity value obtained

    fromthematchworkflowcanbeaggregatedtoobtainthe

    final alignment. For large scale schema matching the

    workflowwill

    slightly

    vary

    to

    incorporate

    techniques

    like

    early pruning andpartitioning of schema to reduce the

    searchspace,parallelization,etc.

    According toShvaikoPandEuzenatJ [11]oneof the

    toughestchallengeformatchingsystemishandlinglarge

    scaleschemasorontologyandonthebasisofRahmE[12]

    work domain where large scale matching is necessary

    includeebusiness[13],webdata[14],lifescienceontolo

    gies,andmedicine [15] andweb directories [16]. Even

    thoughtheadvancementsaremadeinthecurrentmatch

    ing systems, they are stillunable to achievegood effec

    tiveness (correctnessandcompletenessof thealignment)

    andgood

    efficiency

    (time

    and

    space

    efficiency).

    The

    effec

    tiveness ismeasuredby precision and recallwhile the

    efficiency ismeasured by execution time andmemory

    used.

    The remainder of thispaper is organized as follows:Section 2 discusses various large scale matching techniques. Section 3 gives abrief comparison of the largescalematching techniques. Finally, Section 4 concludesthesurveyofontologymatchingtechniques.

    Fig. 1 General Framework for Matching Process

    Mrs. K. Saruladha is with the Computer Science Department, Pondi-cherry Engineering College, Puducherry, Pin 605014, India.

    Dr. G. Aghila is with the Computer Science Department, PondicherryUniversity, Puducherry, Puducherry, Pin 605014, India.

    Ms. B. Sathiya is with the Computer Science Department, PondicherryEngineering College, Puducherry, Pin 605014, India.

    I

    2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

    http://sites.google.com/site/journalofcomputing/

    InputSchemas

    Preprocessor

    Preprocessor

    Similarity

    Computation

    Similarity

    AggregationAlignments

    JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617

    HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING

    WWW.JOURNALOFCOMPUTING.ORG 55

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    2/7

    2 APPROACHESUSEDFORLARGESCALEMATCHINGTECHNIQUE

    In this section, we discuss about various large scale

    matching techniques.Thematching techniques are cate

    gorizedintofourmajorcategoriesasshownbelow

    1) Earlypruningstrategybasedmatchingtechnique

    a.

    Quick

    Ontology

    Matching

    algorithm

    (QOM)

    b. Eric peukert et al. schema and ontology

    matchingalgorithm

    2) Partitionbasedmatchingtechnique

    a. Coma++ schema and ontologymatching al

    gorithm

    b. FalconAOontologymatchingalgorithm

    c. Taxomapontologymatchingalgorithm

    d. Anchorfloodontologymatchingalgorithm

    3) Parallelmatching technique

    a. Grossetal.ontologymatchingalgorithm

    4) Othermatchingtool

    a. RiMOMontologymatchingtoolb. ASMOV (AutomatedSemanticMatchingof

    Ontologies with Verification) ontology

    matchingtool

    c. Agreementmaker schema and ontology

    matchingtool

    2.1Early Pruning Strategy Matching TechniqueTheaimofearlypruningtechniqueistoreducethesearch

    space formatching, e.g., onematcher can prune entity

    pairswhose semantic correspondencevalue isvery low,

    thus reducing search space for the subsequentmatcher.

    Thisidea

    is

    suitable

    for

    both

    sequential

    and

    iterative

    matcherworkflow.QOM [1]was the first system to im

    plement this ideausing iterativematcherworkflowand

    Peukert et al. [2] System implemented both matcher

    workflows.

    2.1.1 QOMQOM is theontologymatchingalgorithmadapted from

    theNaiveOntologyMatching (NOM). Thebasicdiffer

    encebetweenNOMandQOMisQOMusesheuristicand

    dynamic programming approach to reduce the search

    space with marginally compromising on effectiveness.

    QOM isan iterativematcher consistingof the following

    six

    steps

    to

    match

    the

    ontologies:

    Feature

    engineering,

    search step selection, similarity computation, similarity

    aggregation,interpretationanditeration.

    FeatureEngineering.Theprocessofdeterminingwheth

    ertwoentitiesaresameornotisbasedontheentityfea

    tures.RDF features likeentitiesunifiedresource identifi

    ers (URIs), its label, its parent entities, similar entities,

    childrenentitiesanddomainspecificfeaturesareusedfor

    matching.Thisstepconstructsthesearchspaceofontolo

    gy entitiesby computing the cartesianproductofentity

    pairsusingentityfeaturesofbothontologies.

    SearchStepSelection.Theentitypairsobtainedwillbe

    pruned using heuristics strategies to reduce the search

    space contructed in the previous step. The selected

    matchingpairswillbeprocessed forsimilaritycomputa

    tionandtheirinterpretation.Onceagainthesystemhasto

    determinewhichmatchingpairstoaddtotheagendafor

    thenext iteration.Theheuristics strategies arebasedon

    labelsof entities, entityneighbour to thealignmentsob

    tainedfrom

    the

    previous

    iteration,

    hierarchy

    of

    entities,

    randompickorcombinationoftheabovestrategies.

    Similarity Computation. The similarity between the

    above selected entity pairs can be measured by wide

    range of similarity functions likeLevenshteins editdis

    tance,Dicecoefficientandcosinesimilarityvalue .Based

    on the feature of the entity any of the above similarity

    functioncanbechosen.

    SimilarityAggregation.QOMusesweightedaverageof

    similarityvaluesforagivenentitypair.Theweightofthe

    similarityvaluesiscalculatedbysigmoidfunctionwhich

    emphasizeshighindividualsimilaritiesanddeemphasiz

    eslowindividualsimilarities.

    Interpretation.First,

    matching

    pairs

    with

    similarity

    value

    less than a threshold isdiscarded.Next similaritypairs

    whichviolatesbijectivity(onetooneandontoconstraint)

    ofthematchingarediscarded.

    Iteration. Initial iteration uses lexical knowledgewhile

    later iterations use ontology structure knowledge for

    matching.The number of iteration required irrespective

    ofontologysizebasedontheirexperimentsisten.

    BasedontheevaluationQOMisnearlytwentytimesfast

    erthanNOM.ThedrawbackofQOMisthat,effectiveness

    ismarginally lower thanNOM sincenotall entitypairs

    areevaluated.

    2.1.2 Peukert et al. MethodPeukertetal.proposedaschemaandontologymatching

    algorithm.Itisarulebasedoptimizationtechniquewhich

    rewritesmatchworkflow to improve performance. The

    filteroperatorswithinmatchworkfloweliminatedissimi

    larentitypairs (whosesimilarityvalue is less thansome

    threshold) from intermediatematch results and thus re

    ducingsearchspaceforfurthermatchers.Thethresholdis

    either statically predetermined or dynamically derived

    fromthesimilaritythresholdused inthepreviousmatch

    workflowtoselectalignment.

    TheinputofthisprocessisXMLschemas,metamod

    els

    or

    ontologies.

    These

    input

    formats

    are

    converted

    into

    a

    standard internal format. The user must design the

    matchingprocesswhich isrepresentedasagraphwhere

    theverticesrepresentmatcherfromthelibraryandedges

    representexecutionorderanddataflow.Sincethegraph

    isdesignedbytheuserthematchworkflowcanbeparal

    lelor serialor iterative.Theuser canutilize the rewrite

    recommendation system, to increase efficiency of the

    matchingworkflow.These rewrite recommendation sys

    tem is similar to costbased rewrites in database query

    optimizatione.g,theuseofrewriterulesaresimilartothe

    useofpredicatepushdownrulesfordatabasequeryop

    JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617

    HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING

    WWW.JOURNALOFCOMPUTING.ORG 56

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    3/7

    timizationwhich reduce thenumberof input tuples for

    joinsandotherexpensiveoperators.Asimplecostmodel

    isusedtochoosewhichpartsofamatchingworkflowto

    rewrite.Theserewritescanbeacceptedorrejectedbythe

    user.Once thematching process graph is finalised the

    graphcanbeexecutedtoobtainthealignment.

    Thedrawbacksof thismethodareasfollows.Rewrite

    rules,only

    focus

    on

    efficiency

    improvements

    but

    not

    on

    effectiveness improvement.Thematcher library consists

    ofonlyfewnumbersofmatchers.

    2.2 Partition Based Matching Technique

    Theideaofthisapproachistopartitiontheontologyand

    executeapartitionwisematching.Thepartitioningisper

    formedinsuchawaythateachpartitionoffirstontology

    ismatchedwithonlysmallsubsetofthepartitions(ideal

    ly,onlywithonepartition)of the secondontology.This

    methodreducesthesearchspaceandthusbetterefficien

    cy.Space complexityof thematchingprocess is also re

    duced. Four partitionbasedmethodsCOMA++ [5], Fal

    conAO

    [3],Taxomap

    [4],

    anchor

    flood

    [6]

    will

    be

    dis

    cussedbelow.

    2.2.1 COMA++COMA++ is a generic schema and ontology matching

    toolwith a library ofmatchers and a flexible option to

    combinethematcherstorefinethematchingresults.This

    methodcanprocessXMLschema,relationaldatabaseand

    OWLontology.First the schema isprocessed to identify

    the relevant components (node, path or fragment) for

    matching. Then depending on the component chosen a

    matcherworkflowisusedtocomputecomponentsimilar

    ities.Finallysimilaritycombinationmethodsareused to

    findthe

    correspondences

    between

    components.

    Fragmentcomponentbasedmatchingprocess iscapa

    ble of handling large scale schema.A rooted subgraph

    downtothe leaf level intheschemagraph isreferredas

    fragment. Ingeneral, fragments shouldhave littleorno

    overlap to avoid repeated similarity computations and

    overlappingmatch results.The three typesofautomatic

    fragmenttypesareschema,subschema,andshared frag

    ments.Usercanalsoselectasubstructureasafragment.

    Based on the fragment type, the schema is fragmented

    andfragmentpairsfromeachontologyismatchedtofind

    the most similar fragment pair.The search for similar

    fragments

    is

    some

    kind

    of

    light

    weight

    matching,

    e.g.,

    basedonthesimilarityofthefragmentroots.Nextsimilar

    fragmentpair ismatchedelementwiseto find thealign

    ments.

    Thedrawbacksofthismethodareasfollows.COMA++

    isdevelopedforXMLschemasandhenceitisnotsuitable

    forcomplexontologygraph.Ituserelativelysimpleheu

    risticrules topartition the inputschemasresultingoften

    in too few or too many partitions and to find similar

    fragmentpaironlytherootnodefeaturesareusedwhich

    willleadtolessmatchingquality.

    2.2.2 Falcon-AO

    FalconAO is an ontologymatching toolwhich aims at

    finding alignments of the given twoOWL/RDF ontolo

    gies.Thematcher libraryofFalconAO consistofVDoc

    [17] and ISub [19] which are lightweight linguistic

    matchers,GMO [18] an iterative structuralmatcher and

    PBM(PartitionBasedMatcher)whichadoptsthedivide

    and

    conquer

    strategy

    to

    find

    block

    mappings

    between

    largescale ontologies. The three phase of PBM are ex

    plainedbelow

    Partitioning ontologies: Structural proximitiesbetween

    entitiesare calculatedbasedonhow closely theyare re

    latedinthehierarchies.Theontologyclustersareformed

    based on structural proximities using modified ROCK

    [20] clustering algorithm. RDF Blocks are constructed

    fromtheclustersbyassigningeachRDFtripletoacluster

    inwhichatleasttwoentitiesarecontained.

    Matchingblocks:Thealignmentwithhighsimilaritiesis

    referred as anchors. A lightweight string comparison

    technique, ISUB, is firstly employed to exploit anchors

    betweentwo

    full

    ontologies

    and

    then

    the

    blocks

    from

    the

    twoontologiesarematchedbasedon thedistributionof

    theanchors.

    Discovering alignments:VDOC adopts a linguistic ap

    proach toontologymatching.GMO isan iterativestruc

    turalmatcher.Similaritycombinationisaheuristicstrate

    gytotunethethresholdsoftheabovetwomatchers.

    Thedrawbacksof thismethodareas follows.Theentire

    ontologyneeds tobeprocessedtofindanchorsandthus

    efficiencyofFalconAOisreduced.Maximumnumberof

    entityinaclusterisdeterminedbytheGMOmatcherand

    theclusteringalgorithm terminatesabruptly if it reaches

    maximum

    number

    which

    will

    lead

    to

    poor

    clustering.

    2.2.3 TaxomapTaxomapisanontologymatchingalgorithmconsistingof

    twopartitioningalgorithmnamelyPartitionAnchorParti

    tion(PAP)andAnchorPartitionPartition(APP)whichhave

    been designed to take the alignment objective into ac

    count in the partitioning process. Themost structured

    ontologyisreferredastargetontologyandthelessstruc

    tured is referred as sourceontology.PAP is suitable for

    structuredontologyvsunstructuredontologyandAPPis

    suitable for structured ontology vs structured ontology.

    Theentitypaironefromeachoftwoontologywhichhas

    identicallabelsiscalledasanchorswhichwillbeusedin

    bothPAPandAPP.Thealignmentisbasedonlexicalandstructure(subclass)similaritymeasure.

    PartitionAnchorPartition(PAP):

    1. UsePBMmatchertopartitionthetargetontologyinto

    setofblocks.

    2. Identify the set of anchorsbetween two ontologies.

    This setwillbe the centerof the futureblockwhich

    willbegeneratedfromthesourceontology.

    3. Use PBM matcher to partition the source ontology

    aroundtheidentifiedcenters.

    4. Aligneachblockwiththecorrespondingblock.

    JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617

    HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING

    WWW.JOURNALOFCOMPUTING.ORG 57

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    4/7

    AnchorPartitionPartition(PAP):

    1. Identifythesetofanchorsacrossontologies.

    2. Partitionboththetargetontologyandsourceontology

    by modifying PBM matcher in order to take into

    accountsharedanchors.

    3. Alignblocksthatsharemaximalnumberofanchors.

    The

    drawbacks

    of

    this

    method

    are

    as

    follows.

    The

    ef

    fectivenessof thismethoddependson theavailabilityof

    identical labels across ontologies. Only labels and hie

    rarchystructureisusedformatchingandhencecompara

    tivelylessrecall.

    2.2.4 AnchorFlood

    AnchorFlood is an ontologymatching algorithmwhich

    implements a dynamic partition based matching that

    avoidstheprioripartitioningoftheontologies.Thesimi

    laritymeasure for alignment isbased on internal struc

    ture,externalstructureandJaroWingler stringdistance.

    The process starts ofwith the set of anchorswhere an

    anchor is defined as a exact stringmatch of concepts,

    propertiesorinstancespair.The algorithm preprocesses ontologies to normalize

    the textual contents of entities. It startswith an anchor

    and collects neighboring entity of the chosen anchor

    acrossontologies.Thesegmentpairconstructedfromthe

    abovecollectedneighbors isprocessed toproducealign

    mentpairs.Theprocess repeatsuntileitherall the col

    lected entity are explored, or no new aligned pair is

    found.Theanchorwillbediscardedasmisalignment if

    theconstructedsegmentof theanchordoesnotproduce

    sufficientalignments.

    Thedrawbacksofthismethodareasfollows.Semantic

    similaritybetween

    entities

    is

    not

    explored.

    It

    ignores

    cer

    tain distantly placed aligned pairs since segments are

    constructed fromneighborofanchors.Onlyexact string

    match of concepts, properties and instances are consi

    deredasanchorswhichleadtoinefficientalignments.

    2.3 Parallel Matching Technique

    A straightforwardmethod to increase the efficiency of

    largescalematching is toexecutematcher inparallelon

    severalprocessors.Thetwokindsofparallelmatchingare

    intermatcher and intramatcher parallelization. Inter

    matcher parallelization dealswith parallel execution of

    independently executablematchers while intramatcher

    parallelizationdeals

    with

    internal

    decomposition

    of

    indi

    vidualmatchersormatcherpartsintoseveralmatchtasks

    thatcanbeexecuted inparallel.Grossetal.[7]matching

    system implementsbothintermatcherandintramatcher

    parallelizationwhichwillbediscussedbelow

    Gross et al. proposed a parallel ontology matching

    system with a distributed infrastructure to incorporate

    intramatcherandintermatcherparallelism.Theelement

    level and structurelevelmatching are also parallelized.

    For intramatcher parallelism, a sizebased partitioning

    algorithm hasbeen proposedby this system leading to

    better loadbalancing, limitedmemoryconsumptionand

    scalability without reducing the effectiveness of match

    results.

    The system first generates the context attributes for

    each entity. Then the ontology is partitioned into set of

    partitionbased on the sizebasedpartitioning algorithm

    achievingintra

    matcher

    parallelism.

    Now

    the

    partition

    pairsareconstructedonefromeachofthetwoontologies.

    Eachpartitionpair is assigned aprocessor.Within each

    processorthepartitionpairisprocessedinparallelbythe

    elementlevel,structurelevel, instancebasedmatchersso

    as to achieve inter matcherparallelism.Thealignments

    fromallthematchercanbeaggregatedtooutputthefinal

    alignment.

    The drawbacks of this method are as follows.The

    matcherlibraryconsistsofonlyfewnumberofmatchers.

    The partition algorithm uses only simple strategies for

    partitioningwhichcanbeimproved.

    2.4 Other Matching ToolsRiMOM [8] is the firstontologymatchingalgorithmde

    velopedwithautomaticanddynamicselectionofmatch

    ers for ontology alignment tasks. The input ontology

    shouldbe inOWL format. Itconsiders lexical,structural

    and instancesimilarities.Basedonthe featuresofthe in

    put ontology and the predefined rules, appropriate

    matchers are chosen to apply for thematching task.Ri

    MOMconsistofsixstepsandareexplainedbelow.

    OntologyPreprocessingandFeatureFactorsEstimation.

    Foreachentityofbothontologies,generatethefeaturesof

    theentitylikeitsname,label,children,etc.Thenthelabel

    and

    structure

    similarity

    of

    the

    entity

    pairs

    are

    calculated

    whichwillbeusedinthefollowingstep.

    Strategy (Matcher)Selection.Thebasic ideaof strategy

    selection is that, if two ontologies have some feature in

    common, thenmatcherbased on these feature informa

    tionareemployedwithhighweight;andifsomefeature

    factorsaretwo low,then thesematchermay notbeem

    ployed.Theentitiesarefirstlinguisticallymatched;struc

    turalmatchingisonlyappliediftheschemasexhibitsuf

    ficientstructuralsimilarity.

    Single strategy execution. The selected strategies from

    theabovestepisusedtofindthealignmentindependent

    ly.Eachstrategyoutputsanalignmentresult.

    Alignmentcombination.

    In

    this

    phase,

    RiMOM

    combines

    thealignmentresultsobtainedby theselectedstrategies.

    The combination is conductedby a linear interpolation

    method.

    Similaritypropagation (optional). If the two ontologies

    havehighstructuresimilarityfactor,RiMOMemploysto

    findnewalignmentaccording to the structural informa

    tion.

    Alignment refinement. It refines the alignment results

    fromtheprevioussteps.Itdefinedseveralheuristicrules

    toremovetheunreliablealignments.

    JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617

    HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING

    WWW.JOURNALOFCOMPUTING.ORG 58

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    5/7

    TABLE 1Comparison of Matching Technique

    *Notavailable**optional

    The drawback of RiMOM is its inefficiency for dealing

    withlarge

    scale

    ontologies.

    Eventhough

    it

    shows

    avery

    good effectiveness for large scale ontologies it consume

    longtimeand largeamountofmemorysince itdoesnot

    incorporate search space reduction techniques like early

    pruning,partitionofontologyorparalleltechnique.

    ASMOV(Automated Semantic Matching of Ontologies

    with Verification) [9] is an iterative ontologymatching

    algorithm. The strength of ASMOV lies in the post

    processing of the alignment to remove the alignments

    whicharesemanticallyinconsistent.ItusesWordNetand

    theUnifiedMedicalLanguageSystem(UMLS)toincrease

    theeffectiveness.The inputontologyshouldbe inOWL

    DLformat.

    The

    input

    of

    ASMOV

    is

    two

    ontologies

    and

    an

    optionalpredeterminedalignmentset.Thelexical,struc

    tural and instance based similarity value between all

    possiblepairsofentities,onefromeachofthetwoontol

    ogies is calculated and is used as the optional input

    alignmenttoadjustanycalculatedmeasures.Asimilarity

    matrix containing the calculated similarity values for

    everypairofentitiesisobtainedfromtheabovestep.For

    eachentitychoose themaximumsimilarityvalueaspre

    alignmentfromthesimilaritymatrix.

    Thisprealignmentmustundergoaprocessofseman

    ticverification,whichisanextensivepostprocessingto

    eliminate potential inconsistencies among the set pre

    alignment.Five

    different

    kinds

    of

    inconsistencies

    are

    checked.One such inconsistency rule isMultipleentity

    correspondences,e.g., if twoalignments (a,b)and (b, c)

    exist then theremustalsobealignment like (a, c), ifnot

    the above two alignment cannotbe verified and hence

    removed.Theoutputoftheabovestepisthesemantically

    verified similaritymatrix,which is then testedagainsta

    termination condition. If this condition is true,nomore

    iteration is needed and the process stops.The resulting

    alignmentisfinalalignmentset.

    Thedrawbacksofthissystemareasfollows.Whenthe

    given two ontologies are dissimilar the effectiveness is

    decreased. Sometime semanticverification system elimi

    natestoo

    many

    alignments

    leading

    to

    less

    recall.

    For

    each

    iteration, theASMOVneedspolynomial time, thus lead

    ingtoinefficiency.

    AgreeementMaker[10]isaschemaandontologymatch

    ing algorithm consisting of wide range ofmatcher for

    lexicalandstructuralfeaturesoftheontology.Itprovides

    bothserialandparallelmatcherworkflows.Thestrength

    of the system lies inGUIwhich enable user to choose,

    control and execute the iterativematchers and their re

    sults.ThroughGUItheusercanchoosematcherfromthe

    matcher librarybased on thematching granularity (ele

    QOM Peukert

    etal.

    COMA++ Falcon

    AO

    Taxo

    map

    Anchor

    Flood

    Gross

    etal.

    RiMOM ASMOV Agreement

    Maker

    Inputas

    XMLSchemas

    No Yes Yes No No No No No No (Yes)

    Inputas

    Ontologies

    Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

    GUI No Yes Yes (Yes)** No No No No No Yes

    Linguistic

    Matcher

    Yes Na* Yes Yes Yes Yes Yes (Yes) Yes Yes

    Structural

    Matcher

    Yes Na Yes Yes Yes Yes Yes (Yes) Yes Yes

    Instance

    Matcher

    Yes Na Yes No No No Yes (Yes) Yes Yes

    Earlypruning Yes Yes No No No No No No No No

    Schemaparti

    tion

    No No Yes Yes Yes Yes Yes No No No

    Parallel

    match

    ingNo

    (Yes)

    No

    No

    No

    No

    Yes

    No

    No

    No

    Dyn.matcher

    selection

    No No No No No No No Yes No No

    Mapping

    reuse

    No Yes Yes No No No No No No No

    OAEIpartici

    pation

    No No Yes Yes Yes Yes No Yes Yes Yes

    Useofexternal

    dictionary

    Yes No Yes No No No No Yes Yes Yes

    JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617

    HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING

    WWW.JOURNALOFCOMPUTING.ORG 59

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    6/7

  • 8/3/2019 A Survey on Large Scale Schema and Ontology Matching Techniques

    7/7

    in Distributed computing, information security and ontology basedinformation retrieval. She is currently pursuing her Ph.D. in ontologybased information retreival systems.

    Dr. G. Aghila working as Professor in Pondicherry University, Indiahas got a total of 21 years of teaching expereince. She has graduat-ed from Anna University Chennai, India. She has published nearly40 research papers in web crawlers, ontology based informationretrieval. She is currently guiding 8 Ph.D. scholars. She was in re-ceipt of schrneiger award. She is an expert in onology development.Her area of interest includes artificial intelligence, text mining andsemantic web technologies.

    B. Sathiya is a post graduate student pursuing her M.Tech in Com-puter Science and Engineering (Distributed Computing Systemsspecialization) in Pondicherry Engineering College, Puducherry ,India.

    JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617

    HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING

    WWW.JOURNALOFCOMPUTING.ORG 61