1 scale and context: issues in ontologies to link health- and bio-informatics scale and context:...

40
1 O pen G ALEN Scale and Context: Issues in Scale and Context: Issues in Ontologies to link Health- and Bio- Ontologies to link Health- and Bio- Informatics Informatics Alan Rector, Jeremy Rogers, Alan Rector, Jeremy Rogers, Angus Roberts, Chris Wroe Angus Roberts, Chris Wroe Bio and Health Informatics Forum/ Bio and Health Informatics Forum/ Medical Informatics Group Medical Informatics Group Department of Computer Science, University of Manchester Department of Computer Science, University of Manchester [email protected] [email protected] www.cs.man.ac.uk/mig img.man.ac.uk www.cs.man.ac.uk/mig img.man.ac.uk www.clinical-escience.org www.clinical-escience.org www.opengalen.org www.opengalen.org

Upload: dwain-daniel

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

1O p en G A L E N

Scale and Context: Issues in Scale and Context: Issues in Ontologies to link Health- and Bio-Ontologies to link Health- and Bio-

InformaticsInformatics Alan Rector, Jeremy Rogers, Alan Rector, Jeremy Rogers,

Angus Roberts, Chris WroeAngus Roberts, Chris Wroe

Bio and Health Informatics Forum/Bio and Health Informatics Forum/Medical Informatics GroupMedical Informatics Group

Department of Computer Science, University of ManchesterDepartment of Computer Science, University of Manchester

[email protected]@cs.man.ac.uk

www.cs.man.ac.uk/mig img.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.ukwww.clinical-escience.orgwww.clinical-escience.org

www.opengalen.orgwww.opengalen.org

2O p en G A L E N

Organisation of TalkOrganisation of Talk

• Informal presentation, motivation & examples

• Intro to logic based ontologies

• How to use logic based ontologies to represent scales and context– Making context modular – normalisation– Recurrent distinctions

• and tests for those distinctions

• Making logic based ontologies usable– Views and Intermediate Representations

• Summary

3O p en G A L E N

Example Problems of ContextExample Problems of Context• Classification by multiple axes

– e.g. Molecular action, physiologic, and pathological effects

• Chloride transport & Cystic fibrosis

• Biological Scope

– eg. Normal/Abnormal, Human/Mouse

• Conceptual view– e.g. the Digital Anatomist Foundational Model of

organs vs Clinical convention – Is the pericardium a part of the heart?

4O p en G A L E N

Basic ApproachBasic Approach

• Separate information into independent modules– Normalise the ontology

• “The truth, the whole truth, and nothing but the truth”

• Add explicit contextual information– Don’t distort the structure

• Add context to it explicitly

5O p en G A L E N

Why use Logic-based Ontologies?Why use Logic-based Ontologies?

because

Knowledge is Fractal!

&Requirements are Diverse

Coherence without Uniformity!

6O p en G A L E N

Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego

hand

extremity

body

acute

chronic

abnormal

normalischaemic

deletion

bacterial

polymorphism

cell

protein

gene

infection

inflammation

Lung

expression

7O p en G A L E N

Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego

“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…”

“Hand which isanatomicallynormal”

8O p en G A L E N

Logic based ontologiesLogic based ontologies

• A formalisation of semantic nets, frame systems, and object hierarchies via KL-ONE and KRL

• “is-kind-of” = “implies” (“logical subsumption”)– “Dog is a kind of wolf”

means“All dogs are wolves”

• Modern examples: DAML+OIL /“OWL”?)• Older variants LOOM, CLASSIC, BACK, GRAIL, K-REP, …

9O p en G A L E N

Encrustation

+ involves: MitralValve

Thing

+ feature: pathological

Structure

+ feature: pathological

+ involves: Heart

Logic Based Ontologies: The basicsLogic Based Ontologies: The basics

Thing

Structure

Heart MitralValve EncrustationMitralValve* ALWAYS partOf: Heart

Encrustation* ALWAYS feature: pathological

Feature

pathological red

+ (feature: pathological)

red

+ partOf: Heart

red

+ partOf: Heart

Primitives Descriptions Definitions Reasoning Validating(constraining cross products)

10O p en G A L E N

Bridging Bio and Health Bridging Bio and Health InformaticsInformatics

• Define concepts with ‘pieces’ from different scales and disciplines and then combine them– “Polymorphism which causes defect which causes

disease”

• Use concepts which make context explicit– “ ‘Hand which is anatomically normal’ has five

fingers”“ ‘Normal human prostate’ has three lobes”

• Use different subproperties for different contexts – “Abnormalities of clinical parts of the heart”

11O p en G A L E N

Bridging Scales Bridging Scales with Ontologieswith Ontologies

GenesSpecies

Protein

Function

Disease

Protein coded by(CFTRgene & in humans)

Membrane transport mediated by (Protein coded by (CFTRgene in humans))

Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans))))

CFTRGene in humans

12O p en G A L E N

Use composition to express Use composition to express contextcontext

• Normal and abnormalHand isSubdivisionOf some UpperExtremityHand & AnatomicallyNormal hasSubdivision exactly-

5 fingers

• Homologies and OrthologiesThumb of Hand of Human hasFeature Opposable

Thumb of Hand of NonHumanPrimate ¬hasFeature Opposable

13O p en G A L E N

More detailed exampleMore detailed exampleBody

Prostatesome

Bodymammal

Bodymammal

male

Bodyhumanmale

Bodymousemale

=5Prostate

P1 P2 P3 P4 P5

Prostate=3

Lobe

L1 L2 L3

=1

14O p en G A L E N

15O p en G A L E N

Represent context and views by Represent context and views by variant propertiesvariant properties

Organ

HeartPericardium

OrganPart

CardiacValve

Disease of part_of Heart

Disease of Pericardium

is_part_of

is_structurally_part_ofis_clinically_part_of

16O p en G A L E N

What we want to avoid:What we want to avoid: combinatorial explosions combinatorial explosions

• The “Exploding Bicycle” From “phrase book” to “dictionary + grammar” – 1980 - ICD-9 (E826) 8 – 1990 - READ-2 (T30..) 81– 1995 - READ-3 87– 1996 - ICD-10 (V10-19 Australian) 587

• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income

– and meanwhile elsewhere in ICD-10• W65.40 Drowning and submersion while in bath-tub, street

and highway, while engaged in sports activity

• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

17O p en G A L E N

The Cost 1: Normalising (untangling) The Cost 1: Normalising (untangling) OntologiesOntologies

StructureFunction

Part-wholeStructure Function

Part-w

hole

18O p en G A L E N

The Cost 1: Normalising (untangling) The Cost 1: Normalising (untangling) OntologiesOntologies

Making each meaning explicit and separateMaking each meaning explicit and separatePhysSubstance Protein ProteinHormone Insulin Enzyme Steroid SteroidHormone Hormone ProteinHormone^ Insulin^ SteroidHormone^ Catalyst Enzyme^

Hormone = Substance & playsRole-HormoneRoleProteinHormone = Protein & playsRole-HormoneRoleSteroidHormone = Steroid & playsRole-HormoneRoleCatalyst = Substance & playsRole CatalystRoleInsulin playsRole HormoneRole

…build it all by combining simple trees

Enzyme ?=? Protein & playsRole-CatalystRole

PhysSubstance Protein ‘ ProteinHormone’ Insulin ‘Enzyme’ Steroid ‘SteroidHormone’ ‘Hormone’ ‘ProteinHormone’ Insulin^ ‘SteroidHormone’ ‘Catalyst’ ‘Enzyme’

… ActionRole PhysiologicRole HormoneRole CatalystRole …

… Substance BodySubstance Protein Insulin Steroid …

19O p en G A L E N

NormalisationNormalisationBuilding ontologies from orthogonal Building ontologies from orthogonal

treestrees

• Each tree is homogeneous and based on subsumption– One prinicple – one of function, structure,

cause,…

• Every primitive has exactly 1 primitive parent– All multiple classification done by the logic

• All self-standing primitives disjoint

20O p en G A L E N

The Cost: 2 – Clean Distinctions & The Cost: 2 – Clean Distinctions & TestsTests

• Repeating patterns within levels– Structures vs Substances– Flavours of part-whole– Part-whole vs containment, connection, branching– Process/Event vs Thing (“Endurant” vs

“Perdurant”)– …

• Repeating patterns across levels– Multiples at one level act as substances at the

next– Substances span levels; structures are specific to

a level

21O p en G A L E N

Repeating Patterns within each Repeating Patterns within each level level

• Structures vs Substances (Discrete vs Mass)– Structures are made of substances

• Organs are made of tissue

– Parts & portions• Structures have parts & subdivisions,…• Substances have portions

– Portions can have proportions & concentrations

22O p en G A L E N

TestsTests

• Structures (Discrete) – Can you count it? Is one part different

from another? Is it made of something(s)?

• Books, organs, ideas, individual cells, organisations, …

• Substance (Mass)– Are all bits the same? Can something be

made of it? Can you talk about “A piece of it”? “A lump of it”? “A stream of it”? …

• Water, sodium, tissue, blood, …

23O p en G A L E N

Repeating Patterns within each Repeating Patterns within each levellevel

• Part-whole vs containment– Parthood is organisational

• The wall is part of the cell; • The cornea is part of the eye

– Containment is physical• The inclusion is contained in the cell• The marrow is contained in the bone

– Often occur together• Nucleus is a part of and contained in the cell• The retina is part of and contained in the eye

24O p en G A L E N

TestsTests

• Parts– If I take the part away, is the whole

incomplete?– If the part is damaged is the whole

damaged?– If I do something to the part do I do

something to the whole?

• Containment– Is the contained thing inside the container?– Is the relationship spatial/physical?

(or temporal?)

25O p en G A L E N

Repeating Patterns bridging Repeating Patterns bridging levelslevels

• Multiples of structures at one level behave as substances at the next– “Blood is made of in part a multiple of red cells”

“Tissue is made of in part a multiple of cells”“A rash is a multiple of spots”“Polyposis is a multiple of polyps”“A flock is a multiple of birds”

• Multiples are not Sets– Not defined by members

• Membership can change (intensional rather than extensional)

– Action on the singleton is not action on the multiple;Action on the whole is (usually) action on the singletons

• If I treat a spot, I do not treat the rash• If I treat the rash, I treat the spots

26O p en G A L E N

TestsTests

• Multiples– Name for the singleton – “grain”,

“cell”, “bird”?– Singletons are countable?– Multiple is measurable rather than

countable?– Odd to say part-of “This cell is part of

the Arm”?

27O p en G A L E N

But make it simpleBut make it simple

• Intermediate representations and views

– OWL + Detailed Schema is the Assembler Language

• FaCT/SHIQ/… is the machine code

• Almost no one writes in assembler– let alone machine code

• Separate “terms” and “concepts”– Language/labels from concepts

28O p en G A L E N

Tools

Versioning

Language

Metadata

Provenance

Intermed Rep

Links to Resources

Layered Layered ArchitectureArchitecture

Indexed KB

(Frame Like)

DL

Protégé +Protégé +“OilEd-II”+ “OilEd-II”+ …?…?

29O p en G A L E N

Example:Example:An Intermediate Representation for An Intermediate Representation for

SurgerySurgery"Open fixation of a fracture of the

neck of the left femur"

MAIN fixingACTS_ON fracture

HAS_LOCATION neck of long bone

IS_PART_OF femurHAS_LATERALITY

leftHAS_APPROACH open

30O p en G A L E N

The formal “assembler” versionThe formal “assembler” version

hasSpecificSubprocess (‘SurgicalAccessing’

hasSurgicalOpenClosedness (SurgicalOpenClosedness which

hasAbsoluteState surgicallyOpen))

(‘SurgicalProcess’ whichisMainlyCharacterisedBy (performance which

isEnactmentOf (‘SurgicalFixing’ which

actsSpecificallyOn (PathologicalBodyStructure which <involves Bone hasUniqueAssociatedProcess FracturingProcess

hasSpecificLocation (Collum which

isSpecificSolidDivisionOf (Femur which

hasLeftRightSelector leftSelection))>))))

31O p en G A L E N

ResultResult• Training time: 3 mo 3 days +

3 days

• Productivity: 25/day 100/day

• Central reconciliation: 50%+ 10%

• Local cycle time: 3 months <1 week

• “Dependencies” High Low

• Author satisfaction: Low High

• Disputes: Frequent Rare

• Repeatability: Low HighEven Pre Web!Even Pre Web!

32O p en G A L E N

Navigation vs Retrieval/ReferenceNavigation vs Retrieval/Reference“Access terminology” & “Reference terminology”“Access terminology” & “Reference terminology”

• Access follows model of use– e.g. MeSH, MEDCin

• Hierarchy is what is needed next “to hand”– People find easy; Software hard

• Retrieval follows model of meaning– Logic based ontologies

• Hierarchy means “is-kind-of” / subsumption– People may find odd; Software is easy

• Need Both - & visualisations of both– The logic based structure isn’t enough

• Views and intermediate representations

33O p en G A L E N

What’s in a View/ What’s in a View/ Intermediate Representation?Intermediate Representation?

Explicit Context in Ontology “Assembler”

User Oriented Structures

Language

semantictransformations &

Filters

linguisticgeneration &

search

34O p en G A L E N

SummarySummaryLet the logic engine do the workLet the logic engine do the work

• Logic based ontologies can bridge granularities & represent context explicitly– And manage the potential combinatorial

explosions

• To do so– Views and Interface – usable, flexible & easy to

learn• Entry, Navigation, & Use are different

– Structure – explicit & modular – “Normalised”– Conception – clean testable distinctions– Tools & Architecture - layered & comprehensive

• The logic is the assembly language

35O p en G A L E N

36O p en G A L E N

Some Healthcare TerminologiesSome Healthcare Terminologies

37O p en G A L E N

Some Healthcare TerminologiesSome Healthcare Terminologies

• ICD 9/10• Traditional paper thesauri• -CM versions essential for billing (and –AM)

• CPT – Clinical Procedure Terminology• “Simple” list

• Clinical Terms (Read Codes) V2• Simple hierarchy• Still dominant in UK general practice

• SNOMED-CT• At least “logic assisted”• Political questions…

• NCI Cancer Ontology• “Logic based in parts” – work in progress

38O p en G A L E N

OthersOthers• Standards Related

– Loinc – laboratory data– Increasingly structured – “logic assisted” aspirations

– HL7 Vocabulary TC– Specialised vocabularies – Inspiration for OHT– Links to RxNorm

– Snomed Dicom Microglossary (SDM)– Image related information – not related tNOMED

• Open Source– OpenGALEN Common Reference Model

• Logic based – multilingual – a resource rather than a terminology

– Basis of UK Drug Ontology

– Open Health Terminology• Watch this space

– Focusing on UMLS– Likely to be at least “logic assisted”

39O p en G A L E N

Special PurposeSpecial Purpose

• Anatomy– Digital Anatomist Foundational Model of

AnatomyFMA

• Principled frame based representation– Superb reference point for structural anatomy

» Needs functional and clinical supplements– http://sig.biostr.washington.edu/projects/da/

• Drugs– RxNorm and VA projects

– See Steve Brown & Stuart Nelson

– UK Primary Care Drug DictionaryUKCPRS (Secondary Care)Drug Ontology (OpenGALEN based)

– MEDDRA, FDA, Proprietary, …, …, …

40O p en G A L E N

Unified Medical Language System Unified Medical Language System (UMLS)(UMLS)

• Common reference point and link to MeSH Terms and literature– De facto standard for universal identifiers

• Concept Unique Identifiers (CUIs)• Lexical Unique Identifiers (LUIs)• String Unique Identifiers (SUIs)

– Valuable in itself:Huge resource for mining and restructuring

• Udo Hahn and Stefan Schulz“CoMMeT – Conceptual Model of Medical Terminology

– http://www.coling.uni-freiburg.de/pub/schulz/commet/

• Alexa McCray is speaking next