ontology mapping: a way out of the medical tower of babel? frank van harmelen vrije universiteit...

48
Ontology mapping: a way out of the medical tower of Babel? Frank van Harmelen Vrije Universiteit Amsterdam The Netherlands Antilles

Upload: berenice-lucas

Post on 17-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Ontology mapping: a way out of

the medical tower of Babel?

Frank van HarmelenVrije Universiteit Amsterdam

The Netherlands Antilles

Before we start… a talk on ontology mappings

is difficult talk to give: no concensus in the field

• on merits of the different approaches• on classifying the different approaches

no one can speak with authority on the solution

this is a personal view, with a sell-by dateother speakers will entirely disagree

(or disapprove)

Good overviews of the topicKnowledge Web D2.2.3:

“State of the art on ontology alignment”Ontology Mapping Survey

talk by Siyamed Seyhmus SINIRESWC'05 Tutorial on

Schema and Ontology Matching by Pavel Shvaiko Jerome Euzenat

KER 2003 paper Kalfoglou & Schorlemmer

These are all different & incompatible…

Ontology mapping: a way out of

the medical tower of Babel?

The Medical tower of Babel Mesh

• Medical Subject Headings, National Library of Medicine • 22.000 descriptions

EMTREE• Commercial Elsevier, Drugs and diseases• 45.000 terms, 190.000 synonyms

UMLS• Integrates 100 different vocabularies

SNOMED• 200.000 concepts, College of American Pathologists

Gene Ontology• 15.000 terms in molecular biology

NCI Cancer Ontology: • 17,000 classes (about 1M definitions),

Ontology mapping: a way out of

the medical tower of Babel?

no shared understanding

Conceptual and terminological confusion

Actors: both humans and machines

Agree on a conceptualization

Make it explicit in some language.

world

concept

language

What are ontologies &what are they used for

Ontologies come in very different kindsFrom lightweight to heavyweight:

• Yahoo topic hierarchy• Open directory (400.000 general categories)• Cyc, 300.000 axioms

From very specific to very general• METAR code (weather conditions at air terminals)• SNOMED (medical concepts)• Cyc (common sense knowledge)

What’s inside an ontology?

terms + specialisation hierarchy classes + class-hierarchy instances slots/values inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., …) relations between classes (disjoint, covers) reasoning tasks: classification,

subsumption

Increasing semantic “weight”

In short (for the duration of this talk)Ontologies are not

definitive descriptions of what exists in the world (= philosphy)

Ontologies are

models of the worldconstructed

to facilitate communication

Yes, ontologies exist(because we build them)

Ontology mapping: a way out of

the medical tower of Babel?

Ontology mapping is old & inevitableOntology mapping is old

• db schema integration• federated databases

Ontology mapping is inevitable• ontology language is standardised,• don't even try to standardise contents

Ontology mapping is importantdatabase integration,

heterogeneous database retrieval (traditional)

catalog matching (e-commerce)agent communication (theory only)web service integration (urgent)P2P information sharing (emerging)personalisation (emerging)

Ontology mapping is now urgentOntology mapping has acquired

new urgency• physical and syntactic integration is ± solved,

(open world, web)• automated mappings are now required (P2P)• shift from off-line to run-time matching

Ontology mapping has new opportunities• larger volumes of data• richer schemas (relational vs. ontology)• applications where partial mappings work

Different aspectsof ontology mapping how to discover a mapping how to represent a mapping

• subset/equal/disjoint/overlap/is-somehow-related-to

• logical/equational/category-theoretical atomic/complex arguments, confidence measure how to use it

We only talk about “how to discover”

Many experimental systems: (non-exhaustive!) Prompt (Stanford SMI) Anchor-Prompt (Stanford SMI) Chimerae (Stanford KSL) Rondo (Stanford U./ULeipzig) MoA (ETRI) Cupid (Microsoft research) Glue (Uof Washington) FCA-merge (UKarlsruhe) IF-Map Artemis (UMilano) T-tree (INRIA Rhone-Alpes) S-MATCH (UTrento)

Coma (ULeipzig) Buster (UBremen) MULTIKAT (INRIA S.A.) ASCO (INRIA S.A.) OLA (INRIA R.A.) Dogma's Methodology ArtGen (Stanford U.) Alimo (ITI-CERTH) Bibster (UKarlruhe) QOM (UKarlsruhe) KILT (INRIA

LORRAINE)

Different approaches toontology matching

Linguistics & structure

Shared vocabulary

Instance-based matching

Shared background knowledge

Linguistic & structural mappings

normalisation (case,blanks,digits,diacritics)

lemmatization, N-grams, edit-distance, Hamming distance,

distance = fraction of common parents elements are similar if

their parents/children/siblings are similar

decreasing order of boredom

Different approaches toontology matching

Linguistics & structure

Shared vocabulary

Instance-based matching

Shared background knowledge

Up(Q)

Low(Q) µ Q µ Up(Q) Low(Q) µ Q µ Up(Q)

Q

QLow(Q)

Matching through shared vocabulary

Matching through shared vocabulary Used in mapping geospatial databases

from German land-registration authorities (small)

Used in mapping bio-medical and genetic thesauri(large)

Different approaches toontology matching

Linguistics & structure

Shared vocabulary

Instance-based matching

Shared background knowledge

Matching through shared instances

Used by Ichise et al (IJCAI’03) to succesfully map parts of Yahoo to parts of Google

Yahoo = 8402 classes, 45.000 instancesGoogle = 8343 classes, 82.000 instancesOnly 6000 shared instances70% - 80% accuracy obtained (!)

Conclusions from authors:• semantics is needed to improve on this

ceiling

Matching through shared instances

Different approaches toontology matching

Linguistics & structure

Shared vocabulary

Instance-based matching

Shared background knowledge

sharedbackgroundknowledge

Matching using shared background knowledge

ontology 1 ontology 2

Ontology mapping using background knowledgeCase study 1

Work with Zharko Aleksovski @ Philips Michel Klein @ VU

KIK @ AMC •

PHILIPS

Overview of test data

Two terminologies from intensive care domain

OLVG list• List of reasons for ICU admission

AMC list• List of reasons for ICU admission

DICE hierarchy• Additional hierarchical knowledge

describing the reasons for ICU admission

OLVG listdeveloped by clinician3000 reasons for ICU admission1390 used in first 24 hours of stay

• 3600 patients since 2000based on ICD9 + additional materialList of problems for patient admissionEach reason for admission is described

with one label• Labels consist of 1.8 words on average• redundancy because of spelling mistakes• implicit hierarchy (e.g. many fractures)

AMC listList of 1460 problems for ICU

admission Each problem is described using

5 aspects from the DICE terminology:

2500 concepts (5000 terms), 4500 links•Abnormality (size: 85)•Action taken (size: 55)•Body system (size: 13)•Location (size: 1512)•Cause (size: 255)

expressed in OWL allows for subsumption & part-of

reasoning

Why mapping AMC list $ OLVG list? allow easy entering of OLVG

data re-use of data in

• epidemiology• quality of care assessment• data-mining (patient prognosis)

Linguistic mapping: Compare each pair of concepts Use labels and synonyms of concepts Heuristic method to discover

equivalence and subclass relations

tumorbrainLong tumor LongMore specific than

First round• compare with complete DICE• 313 suggested matches, around 70 % correct

Second round:• only compare with “reasons for admission” subtree• 209 suggested matches, around 90 % correct

High precision, low recall (“the easy cases”)

Using background knowledge Use properties of concepts Use other ontologies to discover

relation between properties

….….….

….….….

?

Action taxonomyAction taxonomy

Abnormality taxonomyAbnormality taxonomy

Body system taxonomyBody system taxonomy

Location taxonomyLocation taxonomy

Cause taxonomyCause taxonomy

DICE aspect taxonomies

Semantic match

OLVG problem list

OLVG problem list

DICE problem list

DICE problem list

Given???

??

Implicit matching:property match

Lexical match

Semantic match

ArteryArtery

AortaAorta

is more general

Taxonomy of body parts

Blood vessel

Veinis more general is more general

Aorta thoracalis dissectionAorta thoracalis dissection Dissection of arteryDissection of artery

Lexical match:has location

Lexical match:has location

Location match:has more

general location

Reasoning:implies

Example: “Heroin intoxication” – “drugs overdose” DrugsDrugs

HeroineHeroine

is more general

Cause taxonomy

Heroin intoxicationHeroin intoxicationDrugs overdosisDrugs overdosis

Lexical match:cause Cause match:

has more specific cause

Abnormality match:has more general

abnormality

IntoxicatieIntoxicatie

OverdosisOverdosis

is more general

Abnormality taxonomy

Lexical match:cause

Lexical match:

abnormality

Lexical match:abnormality

Example results

• OLVG: Acute respiratory failureDICE: Asthma cardiale

• OLVG: Aspergillus fumigatus DICE: Aspergilloom

• OLVG: duodenum perforation DICE: Gut perforation

• OLVG: HIVDICE: AIDS

• OLVG: Aorta thoracalis dissectie type B DICE: Dissection of artery

cause

abnormality,cause

cause

location,abnormality

abnormality

Ontology mapping using background knowledgeCase study 2

Work with Heiner Stuckenschmidt @ VU

Case Study: 1. Map GALEN & Tambis,

using UMLS as background knowledge2. Select three topics with sufficient overlap

• Substances• Structures • Processes

3. Define some partial & ad-hoc manual mappings between individual concepts

4. Represent mappings in C-OWL5. Use semantics of C-OWL

to verify and complete mappings

GALEN(medical ontology)

Tambis(genetic ontology)

UMLS(medical terminology)

lexical mappinglexical mapping

derived mapping

verification &derivation

verification & derivation

Case Study:

Ad hoc mappings: Substances

Notice: mappings high and low in the hierarchy, few in the middle

UMLS GALEN

Ad hoc mappings: Substances

UMLS Tambis

Notice different grainsize: UMLS course, Tambis fine

Verification of mappings

UMLS:Chemicals

Tambis:Chemical

Tambis:enzyme

UMLS:Chemicals_viewed_structurally

UMLS:Chemicals_viewed_functionally

UMLS:enzyme

=

=

?

Deriving new mappings

UMLS:substance

Galen:ChemicalSubstance

UMLS:Phenomenon_or_process

UMLS:Chemicals

UMLS:OrganicChemical

=

Ontology mapping: a way out of

the medical tower of Babel?

“Conclusions”Ontology mapping is (still) hard & openMany different approaches will be

required:• linguistic,• structural• statistical• semantic• …

Currently no roadmap theory on what's good for which problems

Challengesroadmap theory run-time matching“good-enough” matcheslarge scale evaluation methodologyhybrid matchers (needs roadmap

theory)

Ontology mapping: a way out of

the medical tower of Babel?

?