Ontology mapping needs context & approximation

Download Ontology mapping  needs  context & approximation

Post on 14-Feb-2016

50 views

Category:

Documents

7 download

DESCRIPTION

Ontology mapping needs context & approximation. Frank van Harmelen Vrije Universiteit Amsterdam. Or: . How to make ontology-mapping less like data-base integration. and more like a social conversation. Three. Two obvious intuitions. Ontology mapping needs background knowledge. - PowerPoint PPT Presentation

TRANSCRIPT

<ul><li><p>Ontology mapping needs context &amp; approximationFrank van HarmelenVrije Universiteit Amsterdam</p></li><li><p>Or: and more like a social conversationHow to make ontology-mapping less like data-base integration</p></li><li><p>Two obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mapping</p></li><li><p>Which Semantic Web?Version 1: "Semantic Web as Web of Data" (TBL) recipe: expose databases on the web, use RDF, integratemeta-data from:expressing DB schema semantics in machine interpretable waysenable integration and unexpected re-use</p></li><li><p>Which Semantic Web?Version 2: Enrichment of the current Web recipe: Annotate, classify, indexmeta-data from:automatically producing markup: named-entity recognition, concept extraction, tagging, etc.enable personalisation, search, browse,..</p></li><li><p>Which Semantic Web?Version 1: Semantic Web as Web of DataVersion 2: Enrichment of the current Web Different use-cases Different techniques Different usersdata-orienteduser-oriented</p></li><li><p>Which Semantic Web?Version 1: Semantic Web as Web of Databetween source &amp; userbetween sources Version 2: Enrichment of the current Web But both need ontologies for semantic agreement </p></li><li><p>Ontology research is almost done..we know what they are consensual, formalised models of a domainwe know how to make and maintain them (methods, tools, experience)we know how to deploy them (search, personalisation, data-integration, )</p><p>Main remaining open questionsAutomatic construction (learning)Automatic mapping (integration)</p></li><li><p>Three obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mappingPh.D. studentyoung researcher? =? AIOpost-doc</p></li><li><p>This work withZharko Aleksovski &amp; Michel Klein</p></li><li><p>Does context knowledge help mapping?</p></li><li><p>The general ideasourcetargetbackground knowledgemappinginference</p></li><li><p>a realistic exampleTwo Amsterdam hospitals (OLVG, AMC)Two Intensive Care Units, different vocabsWant to compare quality of careOLVG-1400:1400 terms in a flat listused in the first 24 hour of staysome implicit hierarchy e.g.6 types of Diabetes Mellitus)some reduncy (spelling mistakes)AMC: similar list, but from different hospital</p></li><li><p>Context ontology usedDICE:2500 concepts (5000 terms), 4500 linksFormalised in DLfive main categories:tractus (e.g. nervous_system, respiratory_system)aetiology (e.g. virus, poising)abnormality (e.g. fracture, tumor)action (e.g. biopsy, observation, removal)anatomic_location (e.g. lungs, skin)</p></li><li><p>Baseline: Linguistic methodsCombine lexical analysis with hierarchical structure</p><p>313 suggested matches, around 70 % correct209 suggested matches, around 90 % correct</p><p> High precision, low recall (the easy cases)</p></li><li><p>Now use background knowledgeOLVG (1400, flat)AMC (1400, flat)DICE(2500 concepts, 4500 links)mappinginference</p></li><li><p>Example found with context knowledge (beyond lexical)</p></li><li><p>Example 2</p></li><li><p>Anchoring strengthAnchoring = substring + trivial morphology</p><p>anchored on N aspectsOLVGAMCN=5N=4N=3N=2N=10041444012198711285208total nr. of anchored termstotal nr. of anchorings549129839%1404581696%</p></li><li><p>ResultsExample matchings discoveredOLVG:Acute respiratory failure AMC: Asthma cardialeOLVG:Aspergillus fumigatus AMC:AspergilloomOLVG:duodenum perforation AMC:Gut perforationOLVG:HIV AMC:AIDSOLVG:Aorta thoracalis dissectie type B AMC:Dissection of artery</p></li><li><p>Experimental resultsSource &amp; target = flat lists of 1400 ICU terms eachBackground = DICE (2300 concepts in DL)Manual Gold Standard (n=200)</p></li><li><p>Does more context knowledge help?</p></li><li><p>Adding more context </p><p> Only lexical DICE (2500 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts)Anchoring strength:</p><p>DICEMeSHICD104 aspects0803 aspects08902 aspects13520101 aspect41369480total54899280</p></li><li><p>Results with multiple ontologies</p><p>Monotonic improvement Independent of orderLinear increase of cost</p><p>Joint</p><p>SeparateLexicalICD-10DICEMeSHRecallPrecision64%95%64%95%76%94%88%89%</p></li><li><p>does structured context knowledge help?</p></li><li><p>Exploiting structureCRISP: 700 concepts, broader-thanMeSH: 1475 concepts, broader-thanFMA: 75.000 concepts, 160 relation-types (we used: is-a &amp; part-of)</p></li><li>Using the structure or not ?(S </li><li>Using the structure or not ?(S </li><li><p>Examples</p></li><li><p>Examples</p></li><li><p>Matching results (CRISP to MeSH)(Golden Standard n=30)</p><p>Recall=totalincr.Exp.1:DirectExp.2:Indir. is-a + part-ofExp.3:Indir. separate closuresExp.4:Indir. mixed closuresExp.5:Indir. part-of before is-a448395395395395417516933151197215640514022228180010211316273041433167-29%167%306%210%</p><p>Precision=totalcorrectExp.1:DirectExp.4:Indir. mixed closuresExp.5:Indir. part-of before is-a1714141839373595038112101100%94%100%</p></li><li><p>Three obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mappingyoung researcher? post-doc</p></li><li><p>This work withZharko Aleksovski Risto Gligorov Warner ten Kate</p></li><li><p>Approximating subsumptions(and hence mappings)query: A v B ?B = B1 u B2 u B3 A v B1, A v B2, A v B3 ? B1B3A</p></li><li><p>Approximating subsumptionsUse Google distance to decide which subproblems are reasonable to focus onGoogle distance </p><p>wheref(x) is the number of Google hits for xf(x,y) is the number of Google hits for the tuple of search items x and yM is the number of web pages indexed by Google</p><p> symmetric conditional probability of co-occurrence estimate of semantic distance estimate of contribution to B1 u B2 u B3 </p></li><li><p>Google distanceHIDDEN</p></li><li><p>Google distanceanimalsheepcowmadcowvegeterianplant</p></li><li><p>Google for sloppy matchingAlgorithm for A v B (B=B1 u B2 u B3)</p><p>determine NGD(B, Bi)=i, i=1,2,3incrementally:increase sloppyness threshold allow to ignore A v Bi with i match if remaining A v Bj hold</p></li><li><p>Properties of sloppy matchingWhen sloppyness threshold goes up, set of matches grows monotonically=0: classical matching=1: trivial matching</p><p>Ideally: compute i such that:desirable matches become true at low undesirable matches become true only at high Use random selection of Bi as baseline?</p></li><li><p>Experiments in music domainSize: 465 classesDepth: 2 levelsvery sloppy terms good</p></li><li><p>Experiment7 16-05-2006 precisionrecallclassicalrandomNGDManual Gold Standard, N=50, random pairs20 =0.539760 =0.5 </p></li><li><p>wrapping up</p></li><li><p>Three obvious intuitionsOntology mapping needs background knowledgeOntology mapping needs approximationThe Semantic Web needs ontology mapping</p></li><li><p>So that shared context &amp; approximation make ontology-mapping a bit more like a social conversation</p></li><li><p>Future: Distributed/P2P settingsourcetargetbackground knowledgemappinginference</p></li><li><p>Vragen &amp; discussieFrank.van.Harmelen@cs.vu.nlhttp://www.cs.vu.nl/~frankh</p><p>developed in daily practiceflatsloppycarefully designed, formalised, preciseacademic workhierarchically organised- notice: Dutch terms</p></li></ul>