ontology mapping - out of the babel tower
Post on 20-Jan-2015
Embed Size (px)
DESCRIPTIONKeynote at the AI in Medicine Conference (AIME 2005), giving an overview of the work in Ontology Mapping to people in Medical Informatics (which includes explaining the what and why of ontologies in general).
- 1. Ontology mapping:a way out of the medical tower of Babel? Frank van HarmelenVrije Universiteit Amsterdam The Netherlands Antilles
2. Before we start a talk on ontology mappings is difficult talk to give: no concensus in the field on merits of the different approaches on classifying the different approaches no one can speak with authority onthe solution this is a personal view, with a sell-by date other speakers will entirely disagree (or disapprove) 3. Good overviews of the topic Knowledge Web D2.2.3: State of the art on ontology alignment Ontology Mapping Survey talk by Siyamed Seyhmus SINIR ESWC'05 Tutorial on Schema and Ontology Matching by Pavel Shvaiko Jerome Euzenat KER 2003 paper Kalfoglou & Schorlemmer These are all different & incompatible 4. Ontology mapping:a way out of the medical tower of Babel? 5. The Medical tower of Babel Mesh Medical Subject Headings, National Library of Medicine 22.000 descriptions EMTREE Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms UMLS Integrates 100 different vocabularies SNOMED 200.000 concepts, College of American Pathologists Gene Ontology 15.000 terms in molecular biology NCI Cancer Ontology: 17,000 classes (about 1M definitions), 6. Ontology mapping:a way out of the medical tower of Babel? 7. What are ontologies &what are they used for worldconceptlanguage Agree on ano shared understandingconceptualizationConceptual andterminological confusion Make it explicit in some language. Actors: both humans and machines 8. Ontologies come in very different kinds From lightweight to heavyweight: Yahoo topic hierarchy Open directory (400.000 general categories) Cyc, 300.000 axioms From very specific to very general METAR code (weather conditions at air terminals) SNOMED (medical concepts) Cyc (common sense knowledge) 9. Whats inside an ontology? terms + specialisation hierarchy classes + class-hierarchy instances slots/values inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., ) relations between classes (disjoint, covers) reasoning tasks: classification, subsumption Increasing semantic weight 10. In short (for the duration of this talk) Ontologies are not definitive descriptions of what exists in the world (= philosphy) Ontologies aremodels of the worldconstructed to facilitate communication Yes, ontologies exist (because we build them) 11. Ontology mapping:a way out of the medical tower of Babel? 12. Ontology mapping is old & inevitable Ontology mapping is old db schema integration federated databases Ontology mapping is inevitable ontology language is standardised, don't even try to standardise contents 13. Ontology mapping is important database integration, heterogeneous database retrieval (traditional) catalog matching (e-commerce) agent communication (theory only) web service integration (urgent) P2P information sharing (emerging) personalisation (emerging) 14. Ontology mapping is now urgent Ontology mapping has acquired new urgency physical and syntactic integration is solved, (open world, web) automated mappings are now required (P2P) shift from off-line to run-time matching Ontology mapping has new opportunities larger volumes of data richer schemas (relational vs. ontology) applications where partial mappings work 15. Different aspects of ontology mapping how to discover a mapping how to represent a mapping subset/equal/disjoint/overlap/ is-somehow-related-to logical/equational/category-theoretical atomic/complex arguments, confidence measure how to use itWe only talk about how to discover 16. Many experimental systems: (non-exhaustive!) Prompt (Stanford SMI) Coma (ULeipzig) Anchor-Prompt (Stanford SMI) Buster (UBremen) Chimerae (Stanford KSL) MULTIKAT (INRIA S.A.) Rondo (Stanford U./ULeipzig) ASCO (INRIA S.A.) MoA (ETRI) OLA (INRIA R.A.) Cupid (Microsoft research) Dogma's Methodology Glue (Uof Washington) ArtGen (Stanford U.) FCA-merge (UKarlsruhe) Alimo (ITI-CERTH) IF-Map Bibster (UKarlruhe) Artemis (UMilano) QOM (UKarlsruhe) T-tree (INRIA Rhone-Alpes) KILT (INRIA LORRAINE) S-MATCH (UTrento) 17. Different approaches to ontology matching Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge 18. Linguistic & structural mappings normalisation(case,blanks,digits,diacritics) lemmatization, N-grams,edit-distance, Hamming distance, distance = fraction of common parents elements are similar iftheir parents/children/siblings are similardecreasing order of boredom 19. Different approaches to ontology matching Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge 20. Matching through shared vocabularyQLow(Q) Q Up(Q) Low(Q) Q Up(Q) 21. Matching through shared vocabulary Used in mapping geospatial databases from German land-registration authorities (small) Used in mapping bio-medical and genetic thesauri (large) 22. Different approaches to ontology matching Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge 23. Matching through shared instances 24. Matching through shared instances Used by Ichise et al (IJCAI03) to succesfully map parts of Yahoo to parts of Google Yahoo = 8402 classes, 45.000 instances Google = 8343 classes, 82.000 instances Only 6000 shared instances 70% - 80% accuracy obtained (!) Conclusions from authors: semantics is needed to improve on this ceiling 25. Different approaches to ontology matching Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge 26. Matching using shared background knowledge shared background knowledgeontology 1 ontology 2 27. Ontology mappingusing background knowledgeCase study 1 PHILIPS Work with Zharko Aleksovski @ Philips Michel Klein @ VUKIK @ AMC 28. Overview of test data Two terminologies from intensive care domain OLVG list List of reasons for ICU admission AMC list List of reasons for ICU admission DICE hierarchy Additional hierarchical knowledge describing the reasons for ICU admission 29. OLVG list developed by clinician 3000 reasons for ICU admission 1390 used in first 24 hours of stay 3600 patients since 2000 based on ICD9 + additional material List of problems for patient admission Each reason for admission is described with one label Labels consist of 1.8 words on average redundancy because of spelling mistakes implicit hierarchy (e.g. many fractures) 30. AMC list List of 1460 problems for ICU admission Each problem is described using 5 aspects from the DICE terminology: 2500 concepts (5000 terms), 4500 links Abnormality (size: 85) Action taken (size: 55) Body system (size: 13) Location (size: 1512) Cause (size: 255) expressed in OWL allows for subsumption & part-of reasoning 31. Why mapping AMC list $ OLVG list? allow easy entering of OLVG data re-use of data in epidemiology quality of care assessment data-mining (patient prognosis) 32. Linguistic mapping: Compare each pair of concepts Use labels and synonyms of concepts Heuristic method to discover equivalence and subclass relations Long brain tumor More specific Long tumor than First round compare with complete DICE 313 suggested matches, around 70 % correct Second round: only compare with reasons for admission subtree 209 suggested matches, around 90 % correct High precision, low recall (the easy cases) 33. Using background knowledge Use properties of concepts Use other ontologies to discover relation between properties ?. .. .. . 34. Semantic match DICE aspect Lexical match taxonomies Given?Abnormality taxonomy? Action taxonomy? Body system taxonomy ? Location taxonomy ?Cause taxonomyImplicit OLVGmatching: DICE problem listproperty problem listmatch 35. Semantic matchTaxonomy of body partsBlood vessel is more generalis more general VeinArteryis more general Aorta Lexical match: Lexical match: has locationReasoning: has locationimpliesAorta thoracalis dissectionDissection of artery Location match:has more general location 36. Example: Heroin intoxication drugs overdoseCause taxonomyDrugsis more general HeroineLexical match:Lexical Cause match:match: cause has more specific causecause Heroin intoxicationDrugs overdosis Abnormality match: has more generalLexicalabnormality Lexical match:match:abnormalityabnormalityAbnormality taxonomyIntoxicatieis more general Overdosis 37. Example results OLVG: Acute respiratory failure abnormality DICE: Asthma cardiale OLVG: Aspergillus fumigatus cause DICE: Aspergilloom OLVG: duodenum perforationabnormality, DICE: Gut perforation cause OLVG: HIV cause DICE: AIDS OLVG: Aorta thoracalis dissectie type B location, DICE: Dissection of arteryabnormality 38. Extension: approximate matching Terms are not precisely defined Terms are not precisely used Exact reasoning will not be useful A BAB? 39. Approximate matching Translate every class-name into a propositional formula (both DNF and CNF versions) A B = (Ai Bk) = i,k (Ai Bk) ignore increasing number. of (i,k)-subsumption pairs varies from classical to trivial 40. Results (obtained on different domain) 600000500000400000B subClass of A300000 A subClass of B equivalences200000100000 0 0 3 45 6 89 1 2 70 0. 0. 0.1.0. 0. 0.0. 0. 0.0. 41. Ontology mapping using background knowledge Case study 2Work with Heiner Stuckenschmidt@ VU 42. Case Study: Map GALEN & Tambis, using UMLS as background knowledge Select three topics with sufficient overlap Substances Structures Processes Define some partial & ad-hoc manual mappings between individual concepts Represent mappings in C-OWL Use semantics of C-OWL to verify and complete mappings 43. Case Study: verification &verification & derivation