Ontology Alignment. Ontology alignment Ontology alignment Ontology alignment strategies Evaluation of ontology alignment strategies Ontology alignment.

Download Ontology Alignment. Ontology alignment Ontology alignment Ontology alignment strategies Evaluation of ontology alignment strategies Ontology alignment.

Post on 19-Jan-2016

216 views

Category:

Documents

3 download

Embed Size (px)

TRANSCRIPT

  • Ontology Alignment

  • Ontology Alignment

    Ontology alignmentOntology alignment strategiesEvaluation of ontology alignment strategiesOntology alignment challenges

  • Ontologies in biomedical researchmany biomedical ontologiese.g. GO, OBO, SNOMED-CT

    practical use of biomedical ontologiese.g. databases annotated with GO

  • Ontologies with overlapping information

  • Ontologies with overlapping informationUse of multiple ontologies custom-specific ontology + standard ontology different views over same domain overlapping domains

    Bottom-up creation of ontologiesexperts can focus on their domain of expertise

    important to know the inter-ontology relationships

  • Ontology AlignmentDefining the relations between the terms in different ontologies

  • Ontology Alignment

    Ontology alignment Ontology alignment strategiesEvaluation of ontology alignment strategiesOntology alignment challenges

  • An Alignment Framework

  • According to inputKR: OWL, UML, EER, XML, RDF, components: concepts, relations, instance, axiomsAccording to processWhat information is used and how?According to output1-1, m-nSimilarity vs explicit relations (equivalence, is-a)confidence

    Classification

  • Preprocessing

  • Preprocessing

    For example,Selection of featuresSelection of search space

  • Matchers

  • Strategies based on linguistic matchingStructure-based strategiesConstraint-based approachesInstance-based strategiesUse of auxiliary informationMatcher StrategiesStrategies based on linguistic matching

  • Example matchersEdit distanceNumber of deletions, insertions, substitutions required to transform one string into anotheraaaa baab: edit distance 2

    N-gramN-gram : N consecutive characters in a stringSimilarity based on set comparison of n-gramsaaaa : {aa, aa, aa}; baab : {ba, aa, ab}

  • Matcher StrategiesStrategies based on linguistic matchingStructure-based strategiesConstraint-based approachesInstance-based strategiesUse of auxiliary information

  • Example matchersPropagation of similarity valuesAnchored matching

  • Example matchersPropagation of similarity valuesAnchored matching

  • Example matchersPropagation of similarity valuesAnchored matching

  • Matcher StrategiesStrategies based on linguistic matchingStructure-based strategiesConstraint-based approachesInstance-based strategiesUse of auxiliary informationO1O2BirdMammalMammalFlyingAnimal

  • Matcher StrategiesStrategies based on linguistic matchingStructure-based strategiesConstraint-based approachesInstance-based strategiesUse of auxiliary informationO1O2BirdMammalMammalStone

  • Example matchersSimilarities between data typesSimilarities based on cardinalities

  • Matcher StrategiesStrategies based on linguistic matchingStructure-based strategiesConstraint-based approachesInstance-based strategiesUse of auxiliary informationOntologyinstancecorpus

  • Example matchersInstance-basedUse life science literature as instances

    Structure-based extensions

  • Learning matchers instance-based strategiesBasic intuition A similarity measure between concepts can be computed based on the probability that documents about one concept are also about the other concept and vice versa.Intuition for structure-based extensionsDocuments about a concept are also about their super-concepts.(No requirement for previous alignment results.)

  • Learning matchers - stepsGenerate corporaUse concept as query term in PubMedRetrieve most recent PubMed abstractsGenerate text classifiersOne classifier per ontology / One classifier per conceptClassificationAbstracts related to one ontology are classified by the other ontologys classifier(s) and vice versaCalculate similarities

  • Basic Nave Bayes matcherGenerate corporaGenerate classifiersNaive Bayes classifiers, one per ontologyClassificationAbstracts related to one ontology are classified to the concept in the other ontology with highest posterior probability P(C|d)Calculate similarities

  • Basic Support Vector Machines matcherGenerate corporaGenerate classifiersSVM-based classifiers, one per conceptClassificationSingle classification variant: Abstracts related to concepts in one ontology are classified to the concept in the other ontology for which its classifier gives the abstract the highest positive value.Multiple classification variant: Abstracts related to concepts in one ontology are classified all the concepts in the other ontology whose classifiers give the abstract a positive value.Calculate similarities

  • Structural extension ClGenerate classifiersTake (is-a) structure of the ontologies into account when building the classifiersExtend the set of abstracts associated to a concept by adding the abstracts related to the sub-conceptsC1C3C4C2

  • Structural extension SimCalculate similaritiesTake structure of the ontologies into account when calculating similaritiesSimilarity is computed based on the classifiers applied to the concepts and their sub-concepts

  • Matcher StrategiesStrategies based linguistic matchingStructure-based strategiesConstraint-based approachesInstance-based strategiesUse of auxiliary information

  • Example matchersUse of WordNetUse WordNet to find synonymsUse WordNet to find ancestors and descendants in the is-a hierarchyUse of Unified Medical Language System (UMLS)Includes many ontologies Includes many alignments (not complete)Use UMLS alignments in the computation of the similarity values

  • Ontology Alignment and Mergning Systems

  • Combinations

  • Combination Strategies

    Usually weighted sum of similarity values of different matchersMaximum of similarity values of different matchers

  • Filtering

  • Threshold filtering Pairs of concepts with similarity higher or equal than threshold are alignment suggestionsFiltering techniques( 2, B )( 3, F )( 6, D )( 4, C )( 5, C )( 5, E ) sim

  • Filtering techniqueslower-th( 2, B )( 3, F )( 6, D )( 4, C )( 5, C )( 5, E ) upper-thDouble threshold filtering(1) Pairs of concepts with similarity higher than or equal to upper threshold are alignment suggestions(2) Pairs of concepts with similarity between lower and upper thresholds are alignment suggestions if they make sense with respect to the structure of the ontologies and the suggestions according to (1)

  • Example alignment system SAMBO matchers, combination, filter

  • Example alignment system SAMBO suggestion mode

  • Example alignment system SAMBO manual mode

  • Ontology Alignment

    Ontology alignment Ontology alignment strategiesEvaluation of ontology alignment strategies Ontology alignment challenges

  • Evaluation measuresPrecision: # correct mapping suggestions # mapping suggestionsRecall: # correct mapping suggestions # correct mappings F-measure: combination of precision and recall

  • Ontology AlignmentEvaluation Initiative

  • OAEISince 2004Evaluation of systemsDifferent tracks (2014)benchmark expressive: anatomy, conference, large biomedical ontologiesmultilingual: multifarm (8 languages)directories and thesauri: library interactiveinstances: identity, similarity

  • OAEI

    Evaluation measures Precision/recall/f-measure recall of non-trivial mappings

    full / partial golden standard

  • OAEI 200717 systems participated benchmark (13)ASMOV: p = 0.95, r = 0.90 anatomy (11) AOAS: f = 0.86, r+ = 0.50SAMBO: f =0.81, r+ = 0.58 library (3)Thesaurus merging: FALCON: p = 0.97, r = 0.87Annotation scenario: FALCON: pb =0.65, rb = 0.49, pa = 0.52, ra = 0.36, Ja = 0.30Silas: pb = 0.66, rb= 0.47, pa = 0.53, ra = 0.35, Ja = 0.29 directory (9), food (6), environment (2), conference (6)

  • OAEI 2008 anatomy trackAlign Mouse anatomy: 2744 terms NCI-anatomy: 3304 terms Mappings: 1544 (of which 934 trivial)Tasks 1. Align and optimize f 2-3. Align and optimize p / r 4. Align when partial reference alignment is given and optimize f

  • OAEI 2008 anatomy track#19 systems participatedSAMBOp=0.869, r=0.836, r+=0.586, f=0.852 SAMBOdtfp=0.831, r=0.833, r+=0.579, f=0.832Use of TermWN and UMLS

  • OAEI 2008 anatomy track#1Is background knowledge (BK) needed?

    Of the non-trivial mappings:Ca 50% found by systems using BK and systems not using BKCa 13% found only by systems using BKCa 13% found only by systems not using BKCa 25% not foundProcessing time: hours with BK, minutes without BK

  • OAEI 2008 anatomy track#4Can we use given mappings when computing suggestions? partial reference alignment given with all trivial and 50 non-trivial mappings

    SAMBOp=0.6360.660, r=0.6260.624, f=0.6310.642 SAMBOdtfp=0.5630.603, r=0.6220.630, f=0.5910.616

    (measures computed on non-given part of the reference alignment)

  • OAEI 2007-2008Systems can use only one combination of strategies per task systems use similar strategies text: string matching, tf-idf structure: propagation of similarity to ancestors and/or descendants thesaurus (WordNet) domain knowledge important for anatomy task?

  • OAEI 201414 systemsAnatomy: best system f=0.944, p=0.956, r=0.932, r+=0.822, 28 secondsmany systems produce coherent mappings

  • Evaluation of algorithms

  • CasesGO vs. SigO

    MA vs. MeSHGO-behaviorMA-eyeMA-noseMA-ear

  • Evaluation of matchersMatchersTerm, TermWN, Dom, Learn (Learn+structure), Struc

    ParametersQuality of suggestions: precision/recall Threshold filtering : 0.4, 0.5, 0.6, 0.7, 0.8Weights for combination: 1.0/1.2

    KitAMO (http://www.ida.liu.se/labs/iislab/projects/KitAMO)

  • ResultsTerminological matchers

    Chart1

    0.070.070.150.180.2

    0.120.140.260.280.3

    0.310.330.860.550.67

    0.670.7110.760.88

    10.6710.920.95

    B

    ID

    nose

    ear

    eye

    threshold

    precision

    Sheet1

    0.40.50.60.70.8

    B0.070.120.310.671

    ID0.070.140.330.710.67

    nose0.150.260.8611

    ear0.180.280.550.760.92

    eye0.20.30.670.880.95

    Sheet1

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet2

    Sheet3

    Chart3

    10.87510.960.96

    10.87510.960.85

    10.62510.960.82

    10.6250.860.930.78

    10.50.860.890.67

    B

    ID

    nose

    ear

    eye

    threshold

    recall

    Sheet1

    0.40.50.60.70.80.40.50.60.70.8

    B0.070.120.310.671B11111

    ID0.070.140.330.710.67ID0.8750.8750.6250.6250.5

    nose0.150.260.8611nose1110.860.86

    ear0.180.280.550.760.92ear0.960.960.960.930.89

    eye0.20.30.670.880.95eye0.960.850.820.780.67

    0.40.50.60.70.8

    B0.50.50.50.50.25

    ID0.750.630.250.1250

    nose0.720.720.720.720.43

    ear0.590.520.40.370.11

    eye0.670.630.520.310.11

    Sheet1

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet2

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet3

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

  • ResultsBasic learning matcher (Nave Bayes)

    Naive Bayes slightly better recall, but slightly worse precision than SVM-single SVM-multiple (much) better recall, but worse precision than SVM-single

    Chart2

    0.50.750.720.590.67

    0.50.630.720.520.63

    0.50.250.720.40.52

    0.50.1250.720.370.31

    0.2500.430.110.11

    B

    ID

    nose

    ear

    eye

    threshold

    recall

    Sheet1

    0.40.50.60.70.8

    B0.070.120.310.671

    ID0.070.140.330.710.67

    nose0.150.260.8611

    ear0.180.280.550.760.92

    eye0.20.30.670.880.95

    0.40.50.60.70.8

    B0.50.50.50.50.25

    ID0.750.630.250.1250

    nose0.720.720.720.720.43

    ear0.590.520.40.370.11

    eye0.670.630.520.310.11

    Sheet1

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet2

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet3

    Chart4

    0.50.670.840.890.72

    110.840.930.95

    1110.921

    1110.911

    11111

    B

    ID

    nose

    ear

    eye

    threshold

    precision

    Sheet1

    0.40.50.60.70.80.40.50.60.70.8

    B0.070.120.310.671B11111

    ID0.070.140.330.710.67ID0.8750.8750.6250.6250.5

    nose0.150.260.8611nose1110.860.86

    ear0.180.280.550.760.92ear0.960.960.960.930.89

    eye0.20.30.670.880.95eye0.960.850.820.780.67

    0.40.50.60.70.80.40.50.60.70.8

    B0.50.50.50.50.25B0.51111

    ID0.750.630.250.1250ID0.671111

    nose0.720.720.720.720.43nose0.840.84111

    ear0.590.520.40.370.11ear0.890.930.920.911

    eye0.670.630.520.310.11eye0.720.95111

    Sheet1

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet2

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet3

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

  • ResultsDomain matcher (using UMLS)

    Chart9

    1110.880.95

    1110.880.95

    1110.880.95

    1110.920.96

    1110.920.96

    B

    ID

    nose

    ear

    eye

    threshold

    precision

    Sheet1

    0.40.50.60.70.80.40.50.60.70.8

    B0.070.120.310.671B11111

    ID0.070.140.330.710.67ID0.8750.8750.6250.6250.5

    nose0.150.260.8611nose1110.860.86

    ear0.180.280.550.760.92ear0.960.960.960.930.89

    eye0.20.30.670.880.95eye0.960.850.820.780.67

    0.40.50.60.70.80.40.50.60.70.8

    B0.50.50.50.50.25B0.51111

    ID0.750.630.250.1250ID0.671111

    nose0.720.720.720.720.43nose0.840.84111

    ear0.590.520.40.370.11ear0.890.930.920.911

    eye0.670.630.520.310.11eye0.720.95111

    0.40.50.60.70.80.40.50.60.70.8

    B11111B11111

    ID11111ID0.50.50.50.50.5

    nose11111nose1110.860.86

    ear0.880.880.880.920.92ear0.850.850.850.820.82

    eye0.950.950.950.960.96eye0.780.780.780.710.71

    Sheet1

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet2

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet3

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    threshold

    precision

    Chart10

    10.510.850.78

    10.510.850.78

    10.510.850.78

    10.50.860.820.71

    10.50.860.820.71

    B

    ID

    nose

    ear

    eye

    threshold

    recall

    Sheet1

    0.40.50.60.70.80.40.50.60.70.8

    B0.070.120.310.671B11111

    ID0.070.140.330.710.67ID0.8750.8750.6250.6250.5

    nose0.150.260.8611nose1110.860.86

    ear0.180.280.550.760.92ear0.960.960.960.930.89

    eye0.20.30.670.880.95eye0.960.850.820.780.67

    0.40.50.60.70.80.40.50.60.70.8

    B0.50.50.50.50.25B0.51111

    ID0.750.630.250.1250ID0.671111

    nose0.720.720.720.720.43nose0.840.84111

    ear0.590.520.40.370.11ear0.890.930.920.911

    eye0.670.630.520.310.11eye0.720.95111

    0.40.50.60.70.80.40.50.60.70.8

    B11111B11111

    ID11111ID0.50.50.50.50.5

    nose11111nose1110.860.86

    ear0.880.880.880.920.92ear0.850.850.850.820.82

    eye0.950.950.950.960.96eye0.780.780.780.710.71

    Sheet1

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet2

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    Sheet3

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    threshold

    precision

    00000

    00000

    00000

    00000

    00000

    B

    ID

    nose

    ear

    eye

    threshold

    recall

  • ResultsComparison of the matchers

    CS_TermWN CS_Dom CS_Learn

    Combinations of the different matchers

    combinations give often better resultsno significant difference on the quality of suggestions for different weight assignments in the combinations (but: did not check for large variations for the weights)

    Structural matcher did not find (many) new correct alignments(but: good results for systems biology schemas SBML PSI MI)

  • Evaluation of filteringMatcherTermWN

    ParametersQuality of suggestions: precision/recall Double threshold filtering using structure: Upper threshold: 0.8Lower threshold: 0.4, 0.5, 0.6, 0.7, 0.8

  • ResultsThe precision for double threshold filtering with upper threshold 0.8 and lower threshold T is higher than for threshold filtering with threshold T

    Chart8

    0.1920.51

    0.310.733

    0.6670.875

    0.8750.91

    TermWN

    filtered

    (lower) threshold

    precision

    eye

    Sheet1

    0.40.50.60.7

    TermWN0.0690.1140.3070.667

    filtered0.1380.2220.4441

    0.40.50.60.7

    TermWN0.0730.1430.3130.714

    filtered0.2220.4610.6670.667

    0.40.50.60.7

    TermWN0.1460.250.751

    filtered0.4670.58311

    0.40.50.60.7

    TermWN0.1680.2620.5530.765

    filtered0.5530.650.7650.896

    0.40.50.60.7

    TermWN0.1920....