1 introduction to biomedical ontology barry smith university at buffalo

47
On June 22, 1799, in Paris, everything changed 3

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

On June 22, 1799, in Paris,everything changed

3

Page 2: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

International System of Units

4

Page 3: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

Multiple kinds of data in multiple kinds of silos

Lab / pathology data

EHR data

Clinical trial data

Patient histories

Medical imaging

Microarray data

Model organism data

Flow cytometry

Mass spec

Genotype / SNP data

5

Page 4: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

How to find data?

How to find other people’s data?

How to reason with data when you find it?

How to work out what data does not yet exist?

6

Page 5: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

7

How to solve the problem of making the data we find queryable and re-

usable by others?

Part of the solution must involve: standardized terminologies and coding schemes

Page 6: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

But there are multiple kinds of standardization for biomedical data, and

they do not work well together

Terminologies (SNOMED, UMLS)

CDEs (Clinical research)

Information Exchange Standards (HL7 RIM)

LIMS (LOINC)

MGED standards for microarray data, etc.

top-down grid frameworks (caBIG)

8

Page 7: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

9

most successful, thus far: UMLSUnified Medical Language System

collection of separate terminologies built by trained experts

massively useful for information retrieval and information integration

UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies developed according to different and sometimes conflicting standards

Page 8: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

10

for UMLSlocal usage respected

regimentation frowned upon

cross-framework consistency not important

no concern to establish consistency with basic science

different grades of formal rigor, different degrees of completeness, different update policies, capricious policies for empirical testing

Page 9: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

A good solution to the silo problem must be:

• modular

• incremental

• bottom-up

• evidence-based

• revisable

• incorporate a strategy for motivating potential developers and users

11

Page 10: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

12

ontologies = standardized labels designed for use in annotations

to make the data cognitively accessible to human beings

and algorithmically accessible to computers

Page 11: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

13

ontologies = high quality controlled structured vocabularies for the annotation (description) of data

Page 12: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological OntologySyst. Biol. 56(2):283–294, 2007

Page 13: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

15

what cellular component?

what molecular function?

what biological process?

ontologies used in curation of literature

Page 14: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

16

Ontologies

help integrate complex representations of reality

help human beings find things in complex representations of reality

help computers reason with complex representations of reality

Page 15: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

The Gene Ontology

Page 16: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

Ontologies facilitate grouping of annotations

brain 20 hindbrain 15 rhombomere 10

Query brain without ontology 20Query brain with ontology 45

but they succeed in this only if there is one consensus ontology for each domain

18

Page 17: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

19

Page 18: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

20

Page 19: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

21

People are extending the GO methodology to other domains of

biology and of clinical and translational medicine?

Page 20: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

• It is easier to write useful software if one works with a simplified model

• (“…we can’t know what reality is like in any case; we only have our concepts…”)

• This looks like a useful model to me

• (One week goes by:) This other thing looks like a useful model to him

• Data in Pittsburgh does not interoperate with data in Vancouver

• Science is siloed

The standard engineering methodology

Page 21: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

23

an analogue of the UMLS problem

proliferation of tiny ontologies by different groups with urgent annotation needs

Page 22: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo
Page 23: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

25

the solution

establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence-based pathway to incremental improvement

Page 24: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

26

a shared portal for (so far) 58 ontologies (low regimentation)

http://obo.sourceforge.net NCBO BioPortal

First step (2001)

Page 25: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

27

Page 26: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

OBO builds on the principles successfully implemented by the GO

recognizing that ontologies need to be developed in tandem

28

Page 27: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

The methodology of cross-products

compound terms in ontologies to be defined as cross-products of simpler terms:E.g elevated blood glucose is a cross-product of PATO: increased concentration with FMA: blood and CheBI: glucose.

= factoring out of ontologies into discipline-specific modules (orthogonality)

29

Page 28: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

The methodology of cross-products

enforcing use of common relations in linking terms drawn from Foundry ontologies serves

• to ensure that the ontologies are maintained and revised in tandem

• logically defined relations serve to bind terms in different ontologies together to create a network

30

Page 29: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

31

The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/

Third step (2006)Third step (2006)

Page 30: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

32

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Building out from the original GO

Page 31: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

33

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

initial OBO Foundry coverage

GRANULARITY

RELATION TO TIME

Page 32: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

34

CRITERIA

opennness

common formal language.

collaborative development

evidence-based maintenance

identifiers

versioning

textual and formal definitions

CRITERIA

Page 33: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

Orthogonality = modularity

• one ontology for each domain• no need for mappings (which are in

any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change)

• everyone knows where to look to find out how to annotate each kind of data

35

Page 34: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

36

COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the Basic Formal Ontology (BFO)

CRITERIA

Page 35: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

OBO Foundry

provides guidelines (traffic laws) to new groups of ontology developers in ways which can counteract current dispersion of effort

Page 36: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

38

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Building out from the original GO

Page 37: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

39

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

GRANULARITY

RELATION TO TIME

Page 38: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

Basic Formal Ontology

continuant occurrent

biological processes

independentcontinuant

cellular component

dependentcontinuant

molecular function

Page 39: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

qualityfunctionroledisposition

occurrent

Page 40: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

function - of liver: to store glycogen- of birth canal: to enable transport- of eye: to see- of mitochondrion: to produce ATP- of liver: to store glycogen

not optional; reflection of physical makeup of bearer

Page 41: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

role optional:exists because the bearer is in some special natural, social, or institutional set of circumstances in which the bearer does not have to be

Page 42: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

role - bearers can have more than one role

person as student and staff member- roles often form systems of mutual dependence

husband / wife first in queue / last in queuedoctor / patient

host / pathogen

Page 43: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

role of some chemical compound: to serve as analyte in an experiment

of a dose of penicillin in this human child: to treat a disease

of this bacteria in a primary host: to cause infection

Page 44: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

A good solution to the silo problem must be:

• modular• incremental• bottom-up• evidence-based • revisable• incorporate a strategy for motivating potential

developers and users

46

Page 45: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

Because the ontologies in the Foundry

are built as orthogonal modules which form an incrementally evolving network

• scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network

• users are motivated by the assurance that the ontologies they turn to are maintained by experts

47

Page 46: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

More benefits of orthogonality

• helps those new to ontology to find what they need

• to find models of good practice• ensures mutual consistency of ontologies

(trivially)• and thereby ensures additivity of annotations

48

Page 47: 1 Introduction to Biomedical Ontology Barry Smith University at Buffalo

More benefits of orthogonality

• it rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes

• thereby brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness

49