Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam.

Download Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam.

Post on 20-Dec-2015

212 views

Category:

Documents

0 download

TRANSCRIPT

  • Slide 1
  • Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam
  • Slide 2
  • 2 Or: and more like a social conversation How to make ontology-mapping less like data-base integration
  • Slide 3
  • 3 Two obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping Three
  • Slide 4
  • 4 Which Semantic Web? Version 1: "Semantic Web as Web of Data" (TBL) recipe: expose databases on the web, use RDF, integrate meta-data from: l expressing DB schema semantics in machine interpretable ways enable integration and unexpected re-use
  • Slide 5
  • 5 Which Semantic Web? Version 2: Enrichment of the current Web recipe: Annotate, classify, index meta-data from: l automatically producing markup: named-entity recognition, concept extraction, tagging, etc. enable personalisation, search, browse,..
  • Slide 6
  • 6 Which Semantic Web? Version 1: Semantic Web as Web of Data Version 2: Enrichment of the current Web Different use-cases Different techniques Different users data-oriented user-oriented
  • Slide 7
  • 7 between source & user Which Semantic Web? Version 1: Semantic Web as Web of Data between sources Version 2: Enrichment of the current Web But both need ontologies for semantic agreement
  • Slide 8
  • 8 Ontology research is almost done.. we know what they are consensual, formalised models of a domain we know how to make and maintain them (methods, tools, experience) we know how to deploy them (search, personalisation, data-integration, ) Main remaining open questions Automatic construction (learning) Automatic mapping (integration)
  • Slide 9
  • 9 Three obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping Ph.D. student young researcher ?=?= ?? AIO post-doc
  • Slide 10
  • This work with Zharko Aleksovski & Michel Klein
  • Slide 11
  • Does context knowledge help mapping?
  • Slide 12
  • 12 The general idea sourcetarget background knowledge anchoring mapping inference
  • Slide 13
  • 13 a realistic example Two Amsterdam hospitals (OLVG, AMC) Two Intensive Care Units, different vocabs Want to compare quality of care OLVG-1400: l 1400 terms in a flat list l used in the first 24 hour of stay l some implicit hierarchy e.g.6 types of Diabetes Mellitus) l some reduncy (spelling mistakes) AMC: similar list, but from different hospital
  • Slide 14
  • 14 Context ontology used DICE: l 2500 concepts (5000 terms), 4500 links l Formalised in DL l five main categories: tractus (e.g. nervous_system, respiratory_system) aetiology (e.g. virus, poising) abnormality (e.g. fracture, tumor) action (e.g. biopsy, observation, removal) anatomic_location (e.g. lungs, skin)
  • Slide 15
  • 15 Baseline: Linguistic methods Combine lexical analysis with hierarchical structure 313 suggested matches, around 70 % correct 209 suggested matches, around 90 % correct High precision, low recall (the easy cases)
  • Slide 16
  • 16 Now use background knowledge OLVG (1400, flat) AMC (1400, flat) DICE (2500 concepts, 4500 links) anchoring mapping inference
  • Slide 17
  • 17 Example found with context knowledge (beyond lexical)
  • Slide 18
  • 18 Example 2
  • Slide 19
  • 19 Anchoring strength Anchoring = substring + trivial morphology anchored on N aspectsOLVGAMC N=5 N=4 N=3 N=2 N=1 0 4 144 401 2 198 711 285 208 total nr. of anchored terms total nr. of anchorings 549 1298 39%1404 5816 96%
  • Slide 20
  • 20 Results Example matchings discovered l OLVG:Acute respiratory failure AMC: Asthma cardiale l OLVG:Aspergillus fumigatus AMC:Aspergilloom l OLVG:duodenum perforation AMC:Gut perforation l OLVG:HIV AMC:AIDS l OLVG:Aorta thoracalis dissectie type B AMC:Dissection of artery
  • Slide 21
  • 21 Experimental results Source & target = flat lists of 1400 ICU terms each Background = DICE (2300 concepts in DL) Manual Gold Standard (n=200)
  • Slide 22
  • Does more context knowledge help?
  • Slide 23
  • 23 Adding more context Only lexical DICE (2500 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts) Anchoring strength: DICEMeSHICD10 4 aspects080 3 aspects0890 2 aspects1352010 1 aspect41369480 total54899280
  • Slide 24
  • 24 Results with multiple ontologies Monotonic improvement Independent of order Linear increase of cost SeparateLexicalICD-10DICEMeSH Recall Precision 64% 95% 64% 95% 76% 94% 88% 89% Joint
  • Slide 25
  • does structured context knowledge help?
  • Slide 26
  • 26 Exploiting structure CRISP: 700 concepts, broader-than MeSH: 1475 concepts, broader-than FMA: 75.000 concepts, 160 relation-types (we used: is-a & part-of ) CRISP (738) MeSH (1475) FMA (75.000) anchoring mapping inference
  • Slide 27
  • 27 (S < a B) & (B < B) & (B < a T) ! (S < i T) a a i Using the structure or not ?
  • Slide 28
  • 28 (S < a B) & (B < B) & (B < a T) ! (S < i T) No use of structure Only stated is-a & part-of Transitive chains of is-a, and transitive chains of part-of Transitive chains of is-a and part-of One chain of part-of before one chain of is-a Using the structure or not ?
  • Slide 29
  • 29 Examples
  • Slide 30
  • 30 Examples
  • Slide 31
  • 31 Matching results (CRISP to MeSH) Recall= totalincr. Exp.1:Direct Exp.2:Indir. is-a + part-of Exp.3:Indir. separate closures Exp.4:Indir. mixed closures Exp.5:Indir. part-of before is-a 448 395 417 516 933 1511 972 156 405 1402 2228 1800 1021 1316 2730 4143 3167 - 29% 167% 306% 210% Precision= totalcorrect Exp.1:Direct Exp.4:Indir. mixed closures Exp.5:Indir. part-of before is-a 17 14 18 39 37 3 59 50 38 112 101 100% 94% 100% (Golden Standard n=30)
  • Slide 32
  • 32 Three obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping young researcher ?? post-doc
  • Slide 33
  • This work with Zharko Aleksovski Risto Gligorov Warner ten Kate
  • Slide 34
  • 34 B Approximating subsumptions (and hence mappings) query: A v B ? B = B 1 u B 2 u B 3 A v B 1, A v B 2, A v B 3 ? B1B1 B3B3 B2B2 A
  • Slide 35
  • 35 Approximating subsumptions Use Google distance to decide which subproblems are reasonable to focus on Google distance where f(x) is the number of Google hits for x f(x,y) is the number of Google hits for the tuple of search items x and y M is the number of web pages indexed by Google symmetric conditional probability of co-occurrence estimate of semantic distance estimate of contribution to B1 u B2 u B3 symmetric conditional probability of co-occurrence estimate of semantic distance estimate of contribution to B1 u B2 u B3
  • Slide 36
  • 36 Google distance HIDDEN
  • Slide 37
  • 37 Google distance animal sheepcow madcow vegeterian plant
  • Slide 38
  • 38 Google for sloppy matching Algorithm for A v B (B=B 1 u B 2 u B 3 ) determine NGD(B, B i )= i, i=1,2,3 incrementally: increase sloppyness threshold allow to ignore A v B i with i match if remaining A v B j hold
  • Slide 39
  • 39 Properties of sloppy matching When sloppyness threshold goes up, set of matches grows monotonically =0: classical matching =1: trivial matching Ideally: compute i such that: desirable matches become true at low undesirable matches become true only at high Use random selection of B i as baseline ?
  • Slide 40
  • 40 CDNow (Amazon.com) All Music Guide MusicMoz ArtistGigs Artist Direct Network CD baby Yahoo Size: 96 classes Depth: 2 levels Size: 2410 classes Depth: 5 levels Size: 382 classes Depth: 4 levels Size: 222 classes Depth: 2 levels Size: 1073 classes Depth: 7 levels Size: 465 classes Depth: 2 levels Size: 403 classes Depth: 3 levels Experiments in music domain very sloppy terms good
  • Slide 41
  • 7 16-05-2006 Experiment precision recall classical random NGD Manual Gold Standard, N=50, random pairs 20 =0.53 97 60 =0.5
  • Slide 42
  • wrapping up
  • Slide 43
  • 43 Three obvious intuitions Ontology mapping needs background knowledge Ontology mapping needs approximation The Semantic Web needs ontology mapping
  • Slide 44
  • 44 So that shared context & approximation make ontology-mapping a bit more like a social conversation
  • Slide 45
  • 45 Future: Distributed/P2P setting sourcetarget background knowledge anchoring mapping inference
  • Slide 46
  • 46 Vragen & discussie Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh

Recommended

View more >