envo: the environment ontology (presentation at the genomics standards consortium meeting),...

52
An Introduction to Ontology as a Strategy for Data Integration Barry Smith 1

Upload: barry-smith

Post on 28-Jan-2015

107 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

An Introduction to Ontology as a Strategy for Data Integration

Barry Smith

1

Page 2: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

The problem

• legacy idiosyncracies in handling data

complicated progressively by• changes in available hardware and software• turnover of personnel and of collaborations• explosion of data• need to get funding (inhibits reuse)

2

Page 3: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

The result: balkanization

• systems are poorly integrated• deliver redundant capabilities• foster error and waste• prevent comparison and aggregation• prevent secondary use of data• lowers ROI on software

3

Page 4: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

The proposed solution• vocabulary and meanings change more

slowly than hardware and software (and scientific theory*)

• semantic interoperability has high initial cost (governance, commitment) but considerable long-term value

*atom, electron, cell, bacteria, organism …

4/24

Page 5: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

How to do it right?

• how create an incremental, evolutionary process, where what is good survives, and what is bad fails

• create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

6

Page 6: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Uses of ‘ontology’ in PubMed abstracts

7/24

Page 7: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

By far the most successful: GO (Gene Ontology)

8

Page 8: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data

• multi-species, multi-disciplinary, open source

• contributing to the cumulativity of scientific results obtained by distinct research communities

• compare use of kilograms, meters, seconds in formulating experimental results

• natural language and logical definitions for all terms to support consistent human application and computational exploitation

9

Page 9: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

What is the key to GO’s success?• multi-species, multi-disciplinary, open source

• clear rules for ontology development and maintenance

• over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO

10

Page 10: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Extending GO’s success to other fields

Open Biological and Biomedical Ontologies (OBO) Foundry

• Best practice principles• Governance• Review process• Two-tier membership

http://obofoundry.org11

Page 11: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

http://ontology.buffalo.edu/smith

12

Page 12: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 13

Page 13: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Population-level ontologies 14

Page 14: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Environment Ontology

envi

ron

men

ts

15

Page 15: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

The Environment OntologyThe Environment Ontology

Barry Smithhttp://ontology.buffalo.edu/smith

17

Page 16: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

The Spatial-Structural NicheA Hole Story

Page 17: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011
Page 18: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

20

Places are holes

Page 19: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

21

Page 20: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011
Page 21: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

23

Page 22: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

24

Page 23: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

DIGESTIVE SYSTEM

the interior of your gut: an environment for more than1013 microorganisms

Page 24: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

26

Positive and negative parts

positivepart

negativepart

or hole

(made of matter)

(not made of matter)

Page 25: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

A site

intuitively: a spatial entity that can contain a material entity

28

Page 26: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

A spatial environment

is a site that

1. contains a medium (air, water)

2. can contain an organism or a population of organisms

Some sites are supported and demarcated by some solid object

29

Page 27: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

30

Stationary Sites

1: your office when the door is closed; a closed mouth

2: a rabbit hole; an open mouth

3: the surface of a leaf

4: the Klingon Empire

1 2 3 4

Page 28: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

31

Mobile Sites

1 2 3 4

1: a womb; a spaceship2: a snail’s shell; a 3: the home range of a migrating herd of buffalo; 4: the niche around a flying buzzard

Page 29: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

At any given instant

a site is coincident with some spatial region

But because there are mobile sites

not: site spatial region

For stationary sites we can associate latitute/longitude specifications

32

Page 30: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

33

Double hole structure of a Spatial Environment

Medium (filling the environing hole)

Tenant (occupying the central hole)

Retainer (a boundary of some surrounding structure)

Page 31: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

……… (soil, cheese …)

Page 32: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)Sequence Ontology

(SO*) Molecular Function

(GO*)Protein Ontology(PRO*) Extension Strategy + Modular Organization 40

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical

Investigations(OBI)

Spatial Ontology

(BSPO)

Basic Formal Ontology (BFO)

Page 33: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

How to fit EnvO under BFO

• http://www.ifomis.org/bfo/

Page 34: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Populating downwards from BFO

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

Page 35: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Basic Formal Ontology

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

organism

Page 36: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

obofoundry.org

GRANULARITY

RELATION TO TIME

Page 37: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Hydraulic System

Page 38: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

CIRCULATORY SYSTEM (Principal Organs)

Page 39: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

47

Page 40: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Genus-species definitions

System =def. an independent continuant which is composed of interacting material entities forming an integrated whole

Ecosystem =def. a system which includes organisms and the site in which they live as components

48

Page 41: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Biome =def. An ecosystem which contains populations adapted to the environmental conditions conserved over its spatial extent.

Microbiome =def. A biome which contains the totality of microscopic organisms, their genetic elements, and interactions in a given environment.

49

Page 42: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Aligning EnvO to the Basic Formal Ontology

Page 43: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

habitat

Habitat =def. An ecosystem which can support the life of a given organism, population, or community

Realized niche =def. An ecosystem which is that part of a habitat which supports the life of a given organism, population or community

Page 44: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Aligning EnvO to the Basic Formal Ontology

Page 45: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Hutchinsonion niche(niche as volume in a functionally defined hyperspace)

=def. an n-dimensional hyper-volume whose dimensions correspond to resource gradients over which species are distributed– degree of slope, exposure to sunlight,

soil fertility, foliage density, salinity...

Page 46: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

G.E. Hutchinson (1957, 1965)

Page 47: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011
Page 48: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

Aligning EnvO to the Basic Formal Ontology

part_of

Page 49: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

58

Page 50: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

59

Page 51: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

GAZ. An open source gazetteer based on ontological principles

60

http://gensc.org/gc_wiki/index.php/GAZ_Project

Page 52: ENVO: The Environment Ontology (Presentation at the Genomics Standards Consortium Meeting), September 2011

61

Applications of EnvO in biology