Ontology mapping needs context & approximation

Download Ontology mapping  needs  context & approximation

Post on 14-Feb-2016

50 views

Category:

Documents

7 download

DESCRIPTION

Ontology mapping needs context & approximation. Frank van Harmelen Vrije Universiteit Amsterdam. Or: . How to make ontology-mapping less like data-base integration. and more like a social conversation. Three. Two obvious intuitions. Ontology mapping needs background knowledge. - PowerPoint PPT Presentation

TRANSCRIPT

  • Ontology mapping needs context & approximationFrank van HarmelenVrije Universiteit Amsterdam

  • Or: and more like a social conversationHow to make ontology-mapping less like data-base integration

  • Two obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mapping

  • Which Semantic Web?Version 1: "Semantic Web as Web of Data" (TBL) recipe: expose databases on the web, use RDF, integratemeta-data from:expressing DB schema semantics in machine interpretable waysenable integration and unexpected re-use

  • Which Semantic Web?Version 2: Enrichment of the current Web recipe: Annotate, classify, indexmeta-data from:automatically producing markup: named-entity recognition, concept extraction, tagging, etc.enable personalisation, search, browse,..

  • Which Semantic Web?Version 1: Semantic Web as Web of DataVersion 2: Enrichment of the current Web Different use-cases Different techniques Different usersdata-orienteduser-oriented

  • Which Semantic Web?Version 1: Semantic Web as Web of Databetween source & userbetween sources Version 2: Enrichment of the current Web But both need ontologies for semantic agreement

  • Ontology research is almost done..we know what they are consensual, formalised models of a domainwe know how to make and maintain them (methods, tools, experience)we know how to deploy them (search, personalisation, data-integration, )

    Main remaining open questionsAutomatic construction (learning)Automatic mapping (integration)

  • Three obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mappingPh.D. studentyoung researcher? =? AIOpost-doc

  • This work withZharko Aleksovski & Michel Klein

  • Does context knowledge help mapping?

  • The general ideasourcetargetbackground knowledgemappinginference

  • a realistic exampleTwo Amsterdam hospitals (OLVG, AMC)Two Intensive Care Units, different vocabsWant to compare quality of careOLVG-1400:1400 terms in a flat listused in the first 24 hour of staysome implicit hierarchy e.g.6 types of Diabetes Mellitus)some reduncy (spelling mistakes)AMC: similar list, but from different hospital

  • Context ontology usedDICE:2500 concepts (5000 terms), 4500 linksFormalised in DLfive main categories:tractus (e.g. nervous_system, respiratory_system)aetiology (e.g. virus, poising)abnormality (e.g. fracture, tumor)action (e.g. biopsy, observation, removal)anatomic_location (e.g. lungs, skin)

  • Baseline: Linguistic methodsCombine lexical analysis with hierarchical structure

    313 suggested matches, around 70 % correct209 suggested matches, around 90 % correct

    High precision, low recall (the easy cases)

  • Now use background knowledgeOLVG (1400, flat)AMC (1400, flat)DICE(2500 concepts, 4500 links)mappinginference

  • Example found with context knowledge (beyond lexical)

  • Example 2

  • Anchoring strengthAnchoring = substring + trivial morphology

    anchored on N aspectsOLVGAMCN=5N=4N=3N=2N=10041444012198711285208total nr. of anchored termstotal nr. of anchorings549129839%1404581696%

  • ResultsExample matchings discoveredOLVG:Acute respiratory failure AMC: Asthma cardialeOLVG:Aspergillus fumigatus AMC:AspergilloomOLVG:duodenum perforation AMC:Gut perforationOLVG:HIV AMC:AIDSOLVG:Aorta thoracalis dissectie type B AMC:Dissection of artery

  • Experimental resultsSource & target = flat lists of 1400 ICU terms eachBackground = DICE (2300 concepts in DL)Manual Gold Standard (n=200)

  • Does more context knowledge help?

  • Adding more context

    Only lexical DICE (2500 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts)Anchoring strength:

    DICEMeSHICD104 aspects0803 aspects08902 aspects13520101 aspect41369480total54899280

  • Results with multiple ontologies

    Monotonic improvement Independent of orderLinear increase of cost

    Joint

    SeparateLexicalICD-10DICEMeSHRecallPrecision64%95%64%95%76%94%88%89%

  • does structured context knowledge help?

  • Exploiting structureCRISP: 700 concepts, broader-thanMeSH: 1475 concepts, broader-thanFMA: 75.000 concepts, 160 relation-types (we used: is-a & part-of)

  • Using the structure or not ?(S
  • Using the structure or not ?(S
  • Examples

  • Examples

  • Matching results (CRISP to MeSH)(Golden Standard n=30)

    Recall=totalincr.Exp.1:DirectExp.2:Indir. is-a + part-ofExp.3:Indir. separate closuresExp.4:Indir. mixed closuresExp.5:Indir. part-of before is-a448395395395395417516933151197215640514022228180010211316273041433167-29%167%306%210%

    Precision=totalcorrectExp.1:DirectExp.4:Indir. mixed closuresExp.5:Indir. part-of before is-a1714141839373595038112101100%94%100%

  • Three obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mappingyoung researcher? post-doc

  • This work withZharko Aleksovski Risto Gligorov Warner ten Kate

  • Approximating subsumptions(and hence mappings)query: A v B ?B = B1 u B2 u B3 A v B1, A v B2, A v B3 ? B1B3A

  • Approximating subsumptionsUse Google distance to decide which subproblems are reasonable to focus onGoogle distance

    wheref(x) is the number of Google hits for xf(x,y) is the number of Google hits for the tuple of search items x and yM is the number of web pages indexed by Google

    symmetric conditional probability of co-occurrence estimate of semantic distance estimate of contribution to B1 u B2 u B3

  • Google distanceHIDDEN

  • Google distanceanimalsheepcowmadcowvegeterianplant

  • Google for sloppy matchingAlgorithm for A v B (B=B1 u B2 u B3)

    determine NGD(B, Bi)=i, i=1,2,3incrementally:increase sloppyness threshold allow to ignore A v Bi with i match if remaining A v Bj hold

  • Properties of sloppy matchingWhen sloppyness threshold goes up, set of matches grows monotonically=0: classical matching=1: trivial matching

    Ideally: compute i such that:desirable matches become true at low undesirable matches become true only at high Use random selection of Bi as baseline?

  • Experiments in music domainSize: 465 classesDepth: 2 levelsvery sloppy terms good

  • Experiment7 16-05-2006 precisionrecallclassicalrandomNGDManual Gold Standard, N=50, random pairs20 =0.539760 =0.5

  • wrapping up

  • Three obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mapping

  • So that shared context & approximation make ontology-mapping a bit more like a social conversation

  • Future: Distributed/P2P settingsourcetargetbackground knowledgemappinginference

  • Vragen & discussieFrank.van.Harmelen@cs.vu.nlhttp://www.cs.vu.nl/~frankh

    developed in daily practiceflatsloppycarefully designed, formalised, preciseacademic workhierarchically organised- notice: Dutch terms