Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam

Download Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam

Post on 20-Dec-2015

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li> Slide 1 </li> <li> Ontology mapping needs context &amp; approximation Frank van Harmelen Vrije Universiteit Amsterdam </li> <li> Slide 2 </li> <li> 2 Or: and more like a social conversation How to make ontology-mapping less like data-base integration </li> <li> Slide 3 </li> <li> 3 Two obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping Three </li> <li> Slide 4 </li> <li> 4 Which Semantic Web? Version 1: "Semantic Web as Web of Data" (TBL) recipe: expose databases on the web, use RDF, integrate meta-data from: l expressing DB schema semantics in machine interpretable ways enable integration and unexpected re-use </li> <li> Slide 5 </li> <li> 5 Which Semantic Web? Version 2: Enrichment of the current Web recipe: Annotate, classify, index meta-data from: l automatically producing markup: named-entity recognition, concept extraction, tagging, etc. enable personalisation, search, browse,.. </li> <li> Slide 6 </li> <li> 6 Which Semantic Web? Version 1: Semantic Web as Web of Data Version 2: Enrichment of the current Web Different use-cases Different techniques Different users data-oriented user-oriented </li> <li> Slide 7 </li> <li> 7 between source &amp; user Which Semantic Web? Version 1: Semantic Web as Web of Data between sources Version 2: Enrichment of the current Web But both need ontologies for semantic agreement </li> <li> Slide 8 </li> <li> 8 Ontology research is almost done.. we know what they are consensual, formalised models of a domain we know how to make and maintain them (methods, tools, experience) we know how to deploy them (search, personalisation, data-integration, ) Main remaining open questions Automatic construction (learning) Automatic mapping (integration) </li> <li> Slide 9 </li> <li> 9 Three obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping Ph.D. student young researcher ?=?= ?? AIO post-doc </li> <li> Slide 10 </li> <li> This work with Zharko Aleksovski &amp; Michel Klein </li> <li> Slide 11 </li> <li> Does context knowledge help mapping? </li> <li> Slide 12 </li> <li> 12 The general idea sourcetarget background knowledge anchoring mapping inference </li> <li> Slide 13 </li> <li> 13 a realistic example Two Amsterdam hospitals (OLVG, AMC) Two Intensive Care Units, different vocabs Want to compare quality of care OLVG-1400: l 1400 terms in a flat list l used in the first 24 hour of stay l some implicit hierarchy e.g.6 types of Diabetes Mellitus) l some reduncy (spelling mistakes) AMC: similar list, but from different hospital </li> <li> Slide 14 </li> <li> 14 Context ontology used DICE: l 2500 concepts (5000 terms), 4500 links l Formalised in DL l five main categories: tractus (e.g. nervous_system, respiratory_system) aetiology (e.g. virus, poising) abnormality (e.g. fracture, tumor) action (e.g. biopsy, observation, removal) anatomic_location (e.g. lungs, skin) </li> <li> Slide 15 </li> <li> 15 Baseline: Linguistic methods Combine lexical analysis with hierarchical structure 313 suggested matches, around 70 % correct 209 suggested matches, around 90 % correct High precision, low recall (the easy cases) </li> <li> Slide 16 </li> <li> 16 Now use background knowledge OLVG (1400, flat) AMC (1400, flat) DICE (2500 concepts, 4500 links) anchoring mapping inference </li> <li> Slide 17 </li> <li> 17 Example found with context knowledge (beyond lexical) </li> <li> Slide 18 </li> <li> 18 Example 2 </li> <li> Slide 19 </li> <li> 19 Anchoring strength Anchoring = substring + trivial morphology anchored on N aspectsOLVGAMC N=5 N=4 N=3 N=2 N=1 0 4 144 401 2 198 711 285 208 total nr. of anchored terms total nr. of anchorings 549 1298 39%1404 5816 96% </li> <li> Slide 20 </li> <li> 20 Results Example matchings discovered l OLVG:Acute respiratory failure AMC: Asthma cardiale l OLVG:Aspergillus fumigatus AMC:Aspergilloom l OLVG:duodenum perforation AMC:Gut perforation l OLVG:HIV AMC:AIDS l OLVG:Aorta thoracalis dissectie type B AMC:Dissection of artery </li> <li> Slide 21 </li> <li> 21 Experimental results Source &amp; target = flat lists of 1400 ICU terms each Background = DICE (2300 concepts in DL) Manual Gold Standard (n=200) </li> <li> Slide 22 </li> <li> Does more context knowledge help? </li> <li> Slide 23 </li> <li> 23 Adding more context Only lexical DICE (2500 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts) Anchoring strength: DICEMeSHICD10 4 aspects080 3 aspects0890 2 aspects1352010 1 aspect41369480 total54899280 </li> <li> Slide 24 </li> <li> 24 Results with multiple ontologies Monotonic improvement Independent of order Linear increase of cost SeparateLexicalICD-10DICEMeSH Recall Precision 64% 95% 64% 95% 76% 94% 88% 89% Joint </li> <li> Slide 25 </li> <li> does structured context knowledge help? </li> <li> Slide 26 </li> <li> 26 Exploiting structure CRISP: 700 concepts, broader-than MeSH: 1475 concepts, broader-than FMA: 75.000 concepts, 160 relation-types (we used: is-a &amp; part-of ) CRISP (738) MeSH (1475) FMA (75.000) anchoring mapping inference </li> <li> Slide 27 </li> <li> 27 (S &lt; a B) &amp; (B &lt; B) &amp; (B &lt; a T) ! (S &lt; i T) a a i Using the structure or not ? </li> <li> Slide 28 </li> <li> 28 (S &lt; a B) &amp; (B &lt; B) &amp; (B &lt; a T) ! (S &lt; i T) No use of structure Only stated is-a &amp; part-of Transitive chains of is-a, and transitive chains of part-of Transitive chains of is-a and part-of One chain of part-of before one chain of is-a Using the structure or not ? </li> <li> Slide 29 </li> <li> 29 Examples </li> <li> Slide 30 </li> <li> 30 Examples </li> <li> Slide 31 </li> <li> 31 Matching results (CRISP to MeSH) Recall= totalincr. Exp.1:Direct Exp.2:Indir. is-a + part-of Exp.3:Indir. separate closures Exp.4:Indir. mixed closures Exp.5:Indir. part-of before is-a 448 395 417 516 933 1511 972 156 405 1402 2228 1800 1021 1316 2730 4143 3167 - 29% 167% 306% 210% Precision= totalcorrect Exp.1:Direct Exp.4:Indir. mixed closures Exp.5:Indir. part-of before is-a 17 14 18 39 37 3 59 50 38 112 101 100% 94% 100% (Golden Standard n=30) </li> <li> Slide 32 </li> <li> 32 Three obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping young researcher ?? post-doc </li> <li> Slide 33 </li> <li> This work with Zharko Aleksovski Risto Gligorov Warner ten Kate </li> <li> Slide 34 </li> <li> 34 B Approximating subsumptions (and hence mappings) query: A v B ? B = B 1 u B 2 u B 3 A v B 1, A v B 2, A v B 3 ? B1B1 B3B3 B2B2 A </li> <li> Slide 35 </li> <li> 35 Approximating subsumptions Use Google distance to decide which subproblems are reasonable to focus on Google distance where f(x) is the number of Google hits for x f(x,y) is the number of Google hits for the tuple of search items x and y M is the number of web pages indexed by Google symmetric conditional probability of co-occurrence estimate of semantic distance estimate of contribution to B1 u B2 u B3 symmetric conditional probability of co-occurrence estimate of semantic distance estimate of contribution to B1 u B2 u B3 </li> <li> Slide 36 </li> <li> 36 Google distance HIDDEN </li> <li> Slide 37 </li> <li> 37 Google distance animal sheepcow madcow vegeterian plant </li> <li> Slide 38 </li> <li> 38 Google for sloppy matching Algorithm for A v B (B=B 1 u B 2 u B 3 ) determine NGD(B, B i )= i, i=1,2,3 incrementally: increase sloppyness threshold allow to ignore A v B i with i match if remaining A v B j hold </li> <li> Slide 39 </li> <li> 39 Properties of sloppy matching When sloppyness threshold goes up, set of matches grows monotonically =0: classical matching =1: trivial matching Ideally: compute i such that: desirable matches become true at low undesirable matches become true only at high Use random selection of B i as baseline ? </li> <li> Slide 40 </li> <li> 40 CDNow (Amazon.com) All Music Guide MusicMoz ArtistGigs Artist Direct Network CD baby Yahoo Size: 96 classes Depth: 2 levels Size: 2410 classes Depth: 5 levels Size: 382 classes Depth: 4 levels Size: 222 classes Depth: 2 levels Size: 1073 classes Depth: 7 levels Size: 465 classes Depth: 2 levels Size: 403 classes Depth: 3 levels Experiments in music domain very sloppy terms good </li> <li> Slide 41 </li> <li> 7 16-05-2006 Experiment precision recall classical random NGD Manual Gold Standard, N=50, random pairs 20 =0.53 97 60 =0.5 </li> <li> Slide 42 </li> <li> wrapping up </li> <li> Slide 43 </li> <li> 43 Three obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping </li> <li> Slide 44 </li> <li> 44 So that shared context &amp; approximation make ontology-mapping a bit more like a social conversation </li> <li> Slide 45 </li> <li> 45 Future: Distributed/P2P setting sourcetarget background knowledge anchoring mapping inference </li> <li> Slide 46 </li> <li> 46 Vragen &amp; discussie Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh </li> </ul>