the obo foundry

43
The OBO Foundry Barry Smith 1

Upload: beatrice-waters

Post on 03-Jan-2016

67 views

Category:

Documents


0 download

DESCRIPTION

The OBO Foundry. Barry Smith. History of Ontology as Computational Artifact. 1970s: AI (based on FOL: McCarthy, Hayes) 1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...) 1999: GO, OBO format (Ashburner, ...) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The OBO Foundry

The OBO Foundry

Barry Smith

1

Page 2: The OBO Foundry

History of Ontology as Computational Artifact

1970s: AI (based on FOL: McCarthy, Hayes)

1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...)

1999: GO, OBO format (Ashburner, ...)

2000s: Semantic Web (based on OWL; Horrocks, Hendler, 1000 lite ontologies)

2009: Reconciliation of OBO with OWL; but still 2 methodologies: OBO Foundry; NCBO Bioportal

2

Page 3: The OBO Foundry

Ontology and the Semantic Web

• html demonstrated the power of the Web to allow sharing of information

• can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)?

• can we use RDF and OWL to break down silos, and create useful integration of on-line data and information?

3

Page 4: The OBO Foundry

people tried, but the more they were successful, they more they failed

OWL breaks down data silos via controlled vocabularies for the description of data dictionaries

Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways

4

Page 5: The OBO Foundry

reasons for this effect• Semantic Web (original) idea: if a million ‘lite

ontologies bloom’, then somehow intelligence will be created

• let’s all build new ones (shrink-wrapped software mentality – you will not get paid for reusing existing ontologies

• requirements-driven software development, promotes forking, reduces potential for secondary uses

5

Page 6: The OBO Foundry

Ontology success stories, and some reasons for failure

A fragment of the “Linked Open Data” in the biomedical domain

6

Page 7: The OBO Foundry

What you get with ‘mappings’

HPO: all phenotypes (excess hair loss, duck feet ...)

7

Page 8: The OBO Foundry

What you get with ‘mappings’

HPO: all phenotypes (excess hair loss, duck feet ...)

NCIT: all organisms

8

Page 9: The OBO Foundry

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

all organisms

allose (a form of sugar)

9

Page 10: The OBO Foundry

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

all organisms

allose (a form of sugar)

Acute Lymphoblastic Leukemia (A.L.L.)

10

Page 11: The OBO Foundry

Mappings are hardThey are fragile, and expensive to maintainNeed new authorities to maintain(one for each pair of

mapped ontologies), yielding new risk of forking – who will police the mappings?

The goal should be to minimize the need for mappings, by avoiding redundancy in the first place

Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible

11

Page 12: The OBO Foundry

Why should you care?

• you need to create systems for data mining and text processing which will yield useful digitally coded output

• if the codes you use are constantly in need of ad hoc repair huge, resources will be wasted

• serious investment in annotation will be defeated from the start

• relevant data will not be found, because it will be lost in multiple semantic cemeteries

12

Page 13: The OBO Foundry

How to do it right?

• how create an incremental, evolutionary process, where what is good survives, and what is bad fails

• where the number of ontologies needing to be linked is small

• where links are stable• create a scenario in which people will find it

profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

13

Page 14: The OBO Foundry

Reasons why GO has been successful

It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists

Based on community consensusUpdated every nightClear versioning principles ensure backwards

compatibility; prior annotations do not lose their value

Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)

14

Page 15: The OBO Foundry

GO has learned the lessons of successful cooperation

• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in

manifold combinations with other ontologies)• Subjected to considerable third-party critique• Tracker for user input with rapid turnaround and

help desk

15

Page 16: The OBO Foundry

GO has been amazingly successful in overcoming the data balkanization

problembut it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …

16

Page 17: The OBO Foundry

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 17

Page 18: The OBO Foundry

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Environment Ontology

envi

ron

men

ts

are

her

e

18

Page 19: The OBO Foundry

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Population-level ontologies 19

Page 20: The OBO Foundry

Ontology success stories, and some reasons for failure

20

Page 21: The OBO Foundry

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org 21

Page 22: The OBO Foundry

Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology

and agree in advance to collaborate with developers of ontologies in adjacent domains.

http://obofoundry.org

The OBO Foundry: a step-by-step, evidence-based approach to expand

the GO

22

Page 23: The OBO Foundry

OBO Foundry Principles

Common governance (coordinating editors)

Common training

Common architecture to overcome Tim Berners Lee-ism:

• simple shared top level ontology

• shared Relation Ontology: www.obofoundry.org/ro

23

Page 24: The OBO Foundry

Open Biomedical Ontologies Foundry

Seeks to create high quality, validated terminology modules across all of the life sciences which will be

• one ontology for each domain, so no need for mappings

• close to language use of experts

• evidence-based

• incorporate a strategy for motivating potential developers and users

• revisable as science advances

24

Page 25: The OBO Foundry

Principles

http://obofoundry.org/wiki/index.php/OBO_FoundryPrinciples

25

Page 26: The OBO Foundry

Pistoia AllianceOpen standards for data and technology interfaces in

the life science research industry

consortium of major pharmaceutical and life science companies

can we address the data silo problems created by multiplicity of proprietary terminologies by declaring terminology ‘pre-competitive’

require shared use of something like OBO Foundry ontologies in presentation of information?

26

Page 27: The OBO Foundry

27

Page 28: The OBO Foundry

Virtual Physiological Human

28

Page 29: The OBO Foundry

Only with a prospective standard like that of the OBO Foundry could

something like the VPH work

designed to guarantee interoperability of ontologies from the very start (and to keep out weeds)

initial set of 10 criteria tested in the annotation of

scientific literature

model organism databases

life science experimental results

29

Page 30: The OBO Foundry

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO Foundry coverage

GRANULARITY

RELATION TO TIME

30

Page 31: The OBO Foundry

ORTHOGONALITY

modularity ensures • annotations can be additive• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit

work on other modules• incentivization of those responsible for

individual modules

31

Page 32: The OBO Foundry

Benefits of coordination

• Can more easily reuse what is made by others• Can more easily inspect and criticize what is

made by others• Leads to innovations (e.g. Mireot strategy for

importing terms into ontologies)

32

Page 33: The OBO Foundry

8 Foundry members (2010)

CHEBI: Chemical Entities of Biological Interest

GO: Gene Ontology

PATO: Phenotypic Quality Ontology

PRO: Protein Ontology

XAO: Xenopus Anatomy Ontology

ZFA: Zebrafish Anatomy Ontology

33

Page 34: The OBO Foundry

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)XAO ZFA

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule (SO, RnaO)

Molecular Function(GO)

Molecular Process

(GO)ChEBI PRO

Current Foundry members in yellow34

Page 35: The OBO Foundry

ORGAN ANDORGANISM

OrganismNCBI

Taxonomy

CARO FMAOrgan

Function(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)

XAO ZFA

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULESO RnaO Molecular Function

(GO)

Molecular Process

(GO)ChEBI PRO

Prospective Foundry ontologies (in green):Foundational Model of Anatomy Ontology (FMA)Cell Ontology (CL)Sequence Ontology (SO)RNA Ontology (RnaO)

35

Page 36: The OBO Foundry

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)Sequence Ontology

(SO*) Molecular Function

(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

36

Page 37: The OBO Foundry

Problem cases

Common Anatomy Reference Ontology

Disease Ontology

Function Ontologies Cellular Component Function

Cellular Function

Organ Function

Artifact Function (pumping, transporting ...)

Environment Ontology

Species Ontology (NCBI Taxonomy)37

Page 38: The OBO Foundry

IDO (Infectious Disease Ontology) Core

Follows GO strategy of providing a canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectorsProvides common terminology resources and tested common guidelines for a vast array of different disease communities

38

Page 39: The OBO Foundry

IDO (Infectious Disease Ontology) Consortium• MITRE, Mount Sinai, UTSouthwestern – Influenza• IMBB/VectorBase – Vector borne diseases (A.

gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)

• Colorado State University – Dengue Fever• Duke University – Tuberculosis, Staph. aureus• Cleveland Clinic – Infective Endocarditis• University of Michigan – Brucellosis• Duke University, University at Buffalo – HIV

39

Page 40: The OBO Foundry

Ontology for General Medical Science

http://code.google.com/p/ogms/

(OBO) http://purl.obolibrary.org/obo/ogms.obo

(OWL) http://purl.obolibrary.org/obo/ogms.owl

40

Page 41: The OBO Foundry

OGMS-based initiatives

Vital Signs Ontology (VSO) (Welch Allyn)

EHR / Demographics Ontology

Infectious Disease Ontology

Mental Health Ontology

Emotion Ontology

41

Page 42: The OBO Foundry

Ontology for General Medical Science

Jobst Landgrebe (then Co-Chair of the HL7 Vocabulary Group):

“the best ontology effort in the whole biomedical domain by far”

42

Page 43: The OBO Foundry

EXPERIMENTAL ARTIFACTS Ontology for Biomedical Investigations (OBI)

CLINICAL MEDICINE Ontology of General Medical Science (OGMS)

INFORMATION ARTIFACTS Information Artifact Ontology (IAO)

How to keep clear about the distinction• processes of observation,

• results of such processes (measurement data)

• the entities observed

43