envo: the environment ontology (presentation at the genomics standards consortium meeting),...
DESCRIPTION
TRANSCRIPT
An Introduction to Ontology as a Strategy for Data Integration
Barry Smith
1
The problem
• legacy idiosyncracies in handling data
complicated progressively by• changes in available hardware and software• turnover of personnel and of collaborations• explosion of data• need to get funding (inhibits reuse)
2
The result: balkanization
• systems are poorly integrated• deliver redundant capabilities• foster error and waste• prevent comparison and aggregation• prevent secondary use of data• lowers ROI on software
3
The proposed solution• vocabulary and meanings change more
slowly than hardware and software (and scientific theory*)
• semantic interoperability has high initial cost (governance, commitment) but considerable long-term value
*atom, electron, cell, bacteria, organism …
4/24
How to do it right?
• how create an incremental, evolutionary process, where what is good survives, and what is bad fails
• create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested
6
Uses of ‘ontology’ in PubMed abstracts
7/24
By far the most successful: GO (Gene Ontology)
8
GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data
• multi-species, multi-disciplinary, open source
• contributing to the cumulativity of scientific results obtained by distinct research communities
• compare use of kilograms, meters, seconds in formulating experimental results
• natural language and logical definitions for all terms to support consistent human application and computational exploitation
9
What is the key to GO’s success?• multi-species, multi-disciplinary, open source
• clear rules for ontology development and maintenance
• over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO
10
Extending GO’s success to other fields
Open Biological and Biomedical Ontologies (OBO) Foundry
• Best practice principles• Governance• Review process• Two-tier membership
http://obofoundry.org11
http://ontology.buffalo.edu/smith
12
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 13
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OFORGANISMS
Family, Community, Deme, Population
OrganFunction
(FMP, CPRO)
Population Phenotype
PopulationProcess
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Population-level ontologies 14
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Environment Ontology
envi
ron
men
ts
15
The Environment OntologyThe Environment Ontology
Barry Smithhttp://ontology.buffalo.edu/smith
17
The Spatial-Structural NicheA Hole Story
20
Places are holes
21
23
24
DIGESTIVE SYSTEM
the interior of your gut: an environment for more than1013 microorganisms
26
Positive and negative parts
positivepart
negativepart
or hole
(made of matter)
(not made of matter)
A site
intuitively: a spatial entity that can contain a material entity
28
A spatial environment
is a site that
1. contains a medium (air, water)
2. can contain an organism or a population of organisms
Some sites are supported and demarcated by some solid object
29
30
Stationary Sites
1: your office when the door is closed; a closed mouth
2: a rabbit hole; an open mouth
3: the surface of a leaf
4: the Klingon Empire
1 2 3 4
31
Mobile Sites
1 2 3 4
1: a womb; a spaceship2: a snail’s shell; a 3: the home range of a migrating herd of buffalo; 4: the niche around a flying buzzard
At any given instant
a site is coincident with some spatial region
But because there are mobile sites
not: site spatial region
For stationary sites we can associate latitute/longitude specifications
32
33
Double hole structure of a Spatial Environment
Medium (filling the environing hole)
Tenant (occupying the central hole)
Retainer (a boundary of some surrounding structure)
……… (soil, cheese …)
Anatomy Ontology(FMA*, CARO)
Environment
Ontology(EnvO)
Infectious Disease
Ontology(IDO*)
Biological Process
Ontology (GO*)
Cell Ontology
(CL)
CellularComponentOntology
(FMA*, GO*) Phenotypic Quality
Ontology(PaTO)
Subcellular Anatomy Ontology (SAO)Sequence Ontology
(SO*) Molecular Function
(GO*)Protein Ontology(PRO*) Extension Strategy + Modular Organization 40
top level
mid-level
domain level
Information Artifact Ontology
(IAO)
Ontology for Biomedical
Investigations(OBI)
Spatial Ontology
(BSPO)
Basic Formal Ontology (BFO)
How to fit EnvO under BFO
• http://www.ifomis.org/bfo/
Populating downwards from BFO
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
Basic Formal Ontology
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
organism
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
obofoundry.org
GRANULARITY
RELATION TO TIME
Hydraulic System
CIRCULATORY SYSTEM (Principal Organs)
47
Genus-species definitions
System =def. an independent continuant which is composed of interacting material entities forming an integrated whole
Ecosystem =def. a system which includes organisms and the site in which they live as components
48
Biome =def. An ecosystem which contains populations adapted to the environmental conditions conserved over its spatial extent.
Microbiome =def. A biome which contains the totality of microscopic organisms, their genetic elements, and interactions in a given environment.
49
Aligning EnvO to the Basic Formal Ontology
habitat
Habitat =def. An ecosystem which can support the life of a given organism, population, or community
Realized niche =def. An ecosystem which is that part of a habitat which supports the life of a given organism, population or community
Aligning EnvO to the Basic Formal Ontology
Hutchinsonion niche(niche as volume in a functionally defined hyperspace)
=def. an n-dimensional hyper-volume whose dimensions correspond to resource gradients over which species are distributed– degree of slope, exposure to sunlight,
soil fertility, foliage density, salinity...
G.E. Hutchinson (1957, 1965)
Aligning EnvO to the Basic Formal Ontology
part_of
58
59
GAZ. An open source gazetteer based on ontological principles
60
http://gensc.org/gc_wiki/index.php/GAZ_Project
61
Applications of EnvO in biology