computational biology and informatics laboratory development of an application ontology for beta...

25
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations Jie Zheng , Elisabetta Manduchi and Christian J. Stoeckert Jr Department of Genetics, Perelman School of Medicine, University of Pennsylvania ICBO July 2013, Montreal

Upload: theodore-thomas

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical

Investigations

Jie Zheng, Elisabetta Manduchi and Christian J. Stoeckert Jr Department of Genetics, Perelman School of Medicine, University of

Pennsylvania

ICBO July 2013, Montreal

Page 2: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Beta Cell Genomics Database

• http://genomics.betacell.org/gbco/• A functional genomics resource focused on

pancreatic beta cell research supporting a consortium of 62 investigators and their groups

• 128 studies (version 4.1) addressing the biology of beta cells, aspects of diabetes, and the production of functional beta cells from– embryonic stem cells – mature cells of other types such as exocrine cells

Page 3: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Desired Features of A Beta Cell Genomics Ontology

• Support semantic annotation of beta cell studies with enough granularity covering both biological and experimental aspects– Specimen characteristics, species, strain, anatomical entity, cell type, etc.– Assay, protocol, data analysis methods, etc.

• Enable queries of increasing complexity (competency questions)– Find gene expression data of endocrine cells– Find studies using cells which develop from either mesoderm or endoderm– Find high throughput sequencing gene expression data in samples obtained

during the embryo stage from mouse strains with genetic background C57BL/6J

• Enable knowledge discovery based on computable definitions– Automated cell type classification based on cell phenotype/functions and/or

genetic signatures using reasoners

• Leverages existing efforts covering the domains of investigations, cells, anatomy, proteins, and genes– OBO Foundry ontologies

Page 4: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

OBO Foundry Reference Ontologies• Shared common upper level ontology, Basic Formal Ontology

(BFO) and common relations• Orthogonal interoperable ontologies – reuse existing terms

defined in OBO Foundry ontologies• Each reference ontology covers a specific domain:

– Cell type ontology (CL) : cell type– Gene ontology (GO): biological process, molecular function, cell

components– Protein ontology (PR): protein (cross species)– Uber anatomy ontology (UBERON): cross-species anatomy– Ontology for biomedical investigations (OBI): all aspects of an

experiments

Facilitate ontology integration

Page 5: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Motivation for Developing An Application Ontology for Beta Cell Genomics Research

• No single OBO Foundry ontology can meet our needs

• No ontology available covers enough granularity needed by beta cell genomics research

• Typical use of disconnected multiple ontologies loses semantic power

Page 6: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Principles of Beta Cell Genomics Ontology (BCGO) Development

• Reuse terms existing in the OBO Foundry ontologies if possible

• Reuse existing ontology design patterns• Use OBI as the ontology framework and

integrate subsets of other OBO Foundry ontologies into it

• Enrich the ontology with additional axioms when needed

Page 7: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Ontology for Biomedical Investigations (OBI)

• Cover all aspects of an investigation• Contains classes that connect OBI with other OBO Foundry

reference ontologies, such as CL, UBERON, and GO, and serve as the parent of referenced external terms

gross anatomical entity

cellular_component

molecular entity

materialentity

specimen

Cellcultured cell

data transformation

biological_process

process assay

data item

measurementunit label

information content

entityprotocol

OBI

UBERON

GO

CL

CLO

UO

ChEBI

. . .subClass of

is a

Page 8: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Development of BCGO

1. Identification of terms defined in OBO Foundry Ontologies

2. Extraction of terms from OBO Foundry ontologies

3. Integration of terms from different OBO Foundry ontologies

4. Enrichment of BCGO by adding additional terms and axioms

Page 9: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Step 1: Identification of Terms Defined in OBO Foundry Ontologies

1. Draw terms from the MO to OBI mapping list– Beta Cell Genomics Database was annotated using

multiple controlled vocabularies and ontologies including the MGED Ontology (MO)

2. Bioportal Annotation Tool– High accuracy (>95%)– May not include the latest version of ontologies

3. Bioportal Search Tool– Includes partial and exact matches of input text– Requires more manual review as compared to the

Bioportal Annotation Tool

Page 10: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Most Terms Needed Could Be Matched to Small Subsets of Many Ontologies

Ontology VersionTotal

ClassesMatched

TermsOBI 2012-07-01 2042 200BTO* 12/20/2012 5391 2CARO N/A 50 1EnVO 2013-01-08 1557 1ERO* 2012-10-03 1579 2FMA 3.1 83281 1GAZ 1.512 518195 1MP 07/14/2012 9164 1OGMS 2011-09-20 81 3RS 1/14/2013 3361 1SO 11/1/2012 2151 1SWO 0.5 661 1EFO* 2.31 4057 40ChEBI 100 38901 12CLO 2.1.03 35436 11GO 2012-12-18 38747 2NCBITaxon 2013-01-24 981148 1PR 31.0. 35488 1UO 2012-08-30 313 67CL 2013-01-31 2120 46PATO 01/09/2013 2426 19UBERON 2013-01-07 7318 126

• 852 terms used in the Beta Cell Genomics database

• 644 terms were matched to 543 ontology terms

• Mapped terms defined in 24 OBO Foundry ontologies including BFO and IAO

*: application ontologyBTO: BRENDA tissue / enzyme sourceCARO: Common Anatomy Reference OntologyEnVO: Environment OntologyERO: eagle-i resource ontologyFMA: Foundational Model of AnatomyGAZ: GazetteerMP: Mammalian PhenotypeOGMS: Ontology for General Medical ScienceRS: Rat Strain ontologySO: Sequence types and features

SWO: Software OntologyEFO: Experimental Factor OntologyChEBI: Chemical entities of biological interestCLO: cell line ontologyNCBITaxon: NCBI organismal classificationPR: protein ontologyUO: Units of measurementPATO: Phenotypic quality

Page 11: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Step 2: Extraction of Terms from OBO Foundry Ontologies

• Ontodog tool: OBI subset extraction – Generates a community view including all related terms

and axiomsReference: Zheng et al. International Conference on Biomedical

Ontology (ICBO), Graz, Austria, July 2012

• OntoFox tool for extracting terms from all other OBO Foundry ontologies– Option 1: MIREOT– Option 2: include minimal intermediate ontology terms– Option 3: all related terms and axioms

Reference: Xiang et al. (2010) BMC Research Notes, 3:175

Page 12: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Extraction Option 1

• Applied when five or less terms in an ontology were used by BCGO

• MIREOT: minimum information to reference an external ontology term

Reference: Courtot et al. (2011) Applied Ontology, 6:23

– IRI of the term – IRI of the source ontology– IRI of the term parent in the target ontology– Can be done manually

Page 13: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Extraction Option 2• Keep hierarchical structure with minimal intermediates• Example: reference human, mouse, rat in NCBITaxon

… 14 intermediate classes

MIREOT Include all intermediate classes

Include computed intermediate classes

Option 2

Page 14: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Extraction Option 3• Reuse logical axioms of terms defined in source ontologies• Example – ontology design pattern of cell in CL

Meehan et al. BMC Bioinformatics 2011, 12:6

Page 15: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Summary of Extraction Methods And Results

Page 16: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Step 3: Integration of Terms Extracted From Different OBO Ontologies (1)

Import retrieved terms into OBI subset (BCGO community view) under corresponding parent classes

ontology

OntoFox output file

subClass of

is a

gross anatomical entity

cellular_component

molecular entity

materialentity

specimen

Cellcultured cell

data transformation

biological_process

process assay

data item

measurementunit label

information content

entityprotocol

Beta Cell Genomics

view of OBI

subset of UBERON

subset of GO

subset of CL

subset of CLO

subset of UO

subset of ChEBI

. . .

terms of interest In other OBO

Foundry ontologiesSubset of OBI

- Using OWL:imports- Keep retrieved terms belong to same sourceontology in one OWL file- Contains 2389 classes

Page 17: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Step 3: Integration of Terms Extracted From Different OBO Ontologies (2)

To avoid inconsistencies caused by integrating terms from different paths we remove textual and logical definitions of terms referenced to external ontologiesPATO terms retrieved from OBI

PATO

deprecated

Removal of definitions of PATO terms in retrieved OBI subset

Retrieval of definitions from PATO

Page 18: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Summary of Extraction Methods And Results

Page 19: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Step 4: Enrichment of BCGO• 208 terms that could not be matched to OBO Foundry ontologies• 42 new terms have been added into BCGO• Example – ‘insulin-expressing mature beta cell’

Meehan et al. BMC Bioinformatics 2011, 12:6

insulin-expressing mature beta cell

mature

insulin

islet of Langerhans

insulin secretion detection of glucose type B pancreatic cell

insulin secretion

islet of Langerhans

Page 20: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Ontology Validation

• Annotation: 83% terms covered by BCGO• Competency questions can be answered:

Find gene expression data of endocrine cellsFind studies using cells which develop from either

mesoderm or endodermFind high throughput sequencing gene expression

data in samples obtained during the embryo stage from mouse strains with genetic background C57BL/6J

• Automated cell type classification: ongoing

Page 21: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Challenges

• OBO Foundry ontologies use different versions of upper level ontology – BFO

• Inconsistent representation of the same entities in different OBO Foundry ontologies– Example, ‘cell line cell’, alignment work has been

done by CL, CLO and OBI developers– Resolution: Alignment work presented in the ICBO

poster session with title ‘Alignment of Cultured Cell Modeling Across OBO Foundry Ontologies: Key Outcomes and Insights’ by Dr. Matthew Brush

Page 22: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Summary

• BCGO is available on: http://purl.obolibary.org/obo/bcgo.owl

• All related documents are available on: http://code.google.com/p/bcgo-ontology/

• Development of a cross-domain application ontology – based on the OBI framework– reuse existent reference ontologies and ontology design patterns

• The approach should be generally applicable when using interoperable source ontologies

• Orthogonal interoperable OBO Foundry ontologies facilitate ontology integration

Page 23: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Acknowledgements

• Emily Greenfest-Allen • Matthew Brush• And OBI, CLO, CL developers• Oliver He and Allen Xiang

• NIH grant 1R01GM093132-01 and by 5 U01 DK 072473

Page 24: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Questions?

Page 25: Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations

Computational Biology and Informatics Laboratory

Advantages Of Using OntoFox

• Provide many different options for ontology terms extractions

• Backend RDF store contains all OBO Foundry ontologies and reload daily if updated

• Input settings can be saved as a text format file and can be reused