biocuration2012 eugeni belda

24
From bacterial genome annotation to metabolic pathway curation Eugenio Belda Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team) CEA/DSV/IG/Genoscope & CNRS UMR8030

Upload: eugenibc

Post on 11-May-2015

414 views

Category:

Technology


1 download

DESCRIPTION

Presentation of Eugeni Belda (LABGeM-Genoscope) at the Biocuration 2012 conference (Georgetown University, Washington DC): From bacterial genome annotation to metabolic pathway curation

TRANSCRIPT

Page 1: Biocuration2012 Eugeni Belda

From bacterial genome annotation to metabolic pathway curation

Eugenio Belda

Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team)

CEA/DSV/IG/Genoscope & CNRS UMR8030

Page 2: Biocuration2012 Eugeni Belda

Introduction Advances in sequencing technologies has allowed an exponential accumulation of complete genome sequences in public databases in recent years.

However, wide gap exist between rapid advances in genome sequencing and slow progress in characterization of new protein functions

Genoscope (French National Sequencing Center) has as one fundamental research objective the extension of in silico sequence annotations with experimental characterization of new enzymatic functions (Metabolic Genomics).

Lab. of Genomics & Biochemistry of Metabolism (LGBM) Lab. of Organic Chemistry and Biocatalysis (LCOB)Lab. For enzymatic cloning and screening (LCAB)Lab. of Bioinformatic Analysis in Genomic and Metabolism(LABGeM)

26%of

unknown functions

4712 enzymatic activities

(EC number)

25% of orphan

reactions

12273 protein families (Pfam)

?

Page 3: Biocuration2012 Eugeni Belda

Three MicroScope componentsV

isu

ali

zati

on

PrimaryDatabanks

InternalGenomicObjects

Computationalresults

PathwayGenome

DataBases

PkGDB

Data

Man

ag

em

en

tPro

cess

Man

ag

em

en

t

MaGe Web Interface

MicroCyc

DBRelease

JBPM Database

Functional / relationalAnalyses

Primary DatabankUpdate

Login

Genome browserand

Synteny maps

Tutorial

Artemis

Data Export

CGViewLinePlot

Genome overview

Keyword searchBlast and Pattern

Phylogenetic ProfileFusion / Fission

Tandem duplicationsMinimal Gene Set

RGPfinderSNPs / InDels

KEGGMicroCyc

Metabolic ProfilePathway / Synteny

Syntondisplay

Geneeditor

JobHistory

SyntacticAnnotations

Genecard

Vallenet D, et al.«MaGe - a microbial genome annotation system supported by synteny results» Nucleic Acids Research 2006

Vallenet D. et al.«MicroScope - a platform for microbial genome annotation and comparative genomics» Database 2009

> 25 methods :

=> full automatisation :• genome annotation• primary data up-to-date

Integrated in a workflow

management system

Page 4: Biocuration2012 Eugeni Belda

EC / reactioncorrespondence

Pathway Tools A metabolic database is built for each annotated microbial

genomePGDB = Pathway/Genome Database (orgname_Cyc)

(P. Karp, SRI, USA)

• Experimentally elucidatedmetabolic pathways • 1800 pathways from 2216 organisms

Today: 1233 organisms (of which 676 public

genomes)PkGDB

http://www.genoscope.cns.fr/agc/microcyc

Database Management

Mapping on the KEGG metabolic

maps (http://www.kegg.jp/

)

Relational DataBase PkGDB(Prokaryotic Genome DataBase)

Page 5: Biocuration2012 Eugeni Belda

www.genoscope.cns.fr/agc/microscope

MicroScope Web site

«guest» access«guest» access

More than 30 tools are made available to the community

Since 2005, more than 50.000 expert

annotations per year

> 1,000 users, 300 active

Page 6: Biocuration2012 Eugeni Belda

Curation of metabolic data in Microscope CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration of genomic and metabolic contexts, that assists expert functional annotation, especially in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in genome sequence associated to “close” metabolic reactions):

reactions and compounds in metabolic network

genes on genome

gene gaps

reaction gapAnd ORPHAN

functional annotations

?

The method provides candidate genes for global/local orphan enzymatic activities that are located in the “gaps” of metabolons

https://www.genoscope.cns.fr/agc/microscope/metabolism/canoe.php

Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15.

Page 7: Biocuration2012 Eugeni Belda

Curation of metabolic data in Microscope

CanOE (Candidate genes for Orphan Enzymes)

Example: Allantoin degradation metabolon in E. coli K122.1.3.5 is a global orphan reaction (no associated to any gene in any organism)

Three candidate genes for EC:2.1.3.5 reaction

None share any significant similarities with kown carbamoytransferases Protein expression and biochemical assays under way

Smith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)

Page 8: Biocuration2012 Eugeni Belda

GPR curation interface: In the context of network reconstruction, is essential the definition of Gene-Protein-Reaction associations (Genes encoding enzymes/complexes/isozymes catalyzing a particular metabolic reaction):

Thiele & Palsson; Nat Protoc. 2010;5(1):93-121

Curation of metabolic data in Microscope

Page 9: Biocuration2012 Eugeni Belda

GPR curation interface: The gene curation interface of Microscope allows the validation of Gene-Reaction associations based on curated gene annotations. Two reference reaction resources availables, MetaCyc (functional) and RHEA (under development):

4.1.3.27, 2.4.2.18 Automatic retrieval of Metacyc/Rhea reactions based on

EC number Keyword

search

Curation of metabolic data in Microscope

Page 10: Biocuration2012 Eugeni Belda

Pathway validation interface: Validation/curation of automatically projected MetaCyc pathways based on Gene-Reaction associations:

Curation of metabolic data in Microscope

Page 11: Biocuration2012 Eugeni Belda

Projet Microme : www.microme.eu

Purpose : develop bioinformatics infrastructures, together with a projection and curation process, in order to generate : - complete metabolic pathways from genome annotations - whole-cell metabolic models from pathway assemblies

A Knowledge-Based Bioinformatics Frameworkfor Microbial Pathway Genomics

Experimentally validation of metabolic model using growth phenotype data (i.e, BIOLOG experiments) generated within the project for a subset of selected species.

Analytical tools are integrated for comparative and phylogenetic analysis based on projected pathways and metabolic models

AMAbiotics

CEA-Genoscope

Center for research and Technology Hellas

ISTHMUS

Molecular Networks

Swiss Institute of Bioinformatics

Wellcome Trust Sanger Institute

Wageningen University

Université Libre de Bruxelles

Tel-Aviv University

Spanish National Cancer Centre

German Collection of Microorganisms and Cell Cultures

European Bioinformatics Institute

Centro Nacional de Biotecnología

Page 12: Biocuration2012 Eugeni Belda

Microme WP2: Objectives

Unification of existing metabolic resources:

Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions) Cross-references External resources (compounds, reactions, pathways): KEGG,

MetaCyc, Metabolic modelsAlcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754-D760, Database issue.

Provide EU with a curated microbial metabolic resource

Implement a unique cyclic and colaborative curation process for metabolic data

MicroScope and Microme Use MicroScope as reference resource of curated GPR (Gene Protein Reaction) associations for microbial genomes included in Microme project

Development of novel interfaces for GPR curation in Microscope environment. Retrieval of METACYC and RHEA reactions for a particular gene object from EC number annotations

Page 13: Biocuration2012 Eugeni Belda

Web-services

PkGDBmicrocycReconstruction

Each night

Curation tool

MicroScope and Microme Development of web-services to provide Microme partners with curated Gene-Reaction associations from Microscope platform

Page 14: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation

Second most intensively studied bacterium after Escherichia coli, being a model organism for Gram-positive bacteria

Re-sequencing and first re-annotation of the genome in 2009

Genome sequenced in 1997. 4,214 Megabases, 4000 CDSs

Nature 1997 Nov 20;390(6657):249-56

Microbiology (2009), 155, 1758-1775

Re-annotation of the genome in the context of Microme project with special focus in the curation of Gene-Reaction associations by using Microscope metabolic tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics (Antoine Danchin)

Page 15: Biocuration2012 Eugeni Belda

531 CDSs

378 CDSs

508 CDSs

310 CDSs Predicted MetaCyc reaction; BBH relationship with E. coli CDSs

Predicted MetaCyc reaction; No BBH relationship with E. coli CDSs

"Putative enzymes" in Product type annotation; No predicted MetaCyc re-action

"Enzymes" in Product type annotation; No predicted MetaCyc reaction

Starting data for curation of Gene-Reaction associations

Test-case: Bacillus subtilis 168 re-annotation

Page 16: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotationFrom the 909 CDS with predicted reaction

531 with BBH in E. coli:

416 with same GPR in B. subtilis and E. coli (EcoCyc)

115 CDS with different GPR in B. subtilis and E. coli (EcoCyc)

378 without BBH in E. coli:

254 with GPR predicted from the curated EC number

124 with GPR predicted from “product” annotation

310 CDS with “enzyme” annotation and without predicted reaction

508 CDS with “enzyme” annotation and without predicted reaction: Filter by Catalytic activity field in SwissProt annotations (41 CDSs)

Automatic validation of Gene-Reaction associations

Manual curation of Gene-Reaction associations in Microscope

environment

Sequence similarity profiles

Genomic context conservation

Integration of genomic and metabolic context (CanOE strategy)

Co-evolution patterns of functionally related genes

Page 17: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation

Problems associated to automatic predictions of Gene-Reaction associations. Example: Generic EC number definition associated to multiple specific reaction instances in MetaCyc

No experimental evidence of activity ;

generic product annotation

17 predicted reactions based on EC:1.2.1.3 annotation. Problems in terms of modelling purposes

Without experimental evidence of specific substrates, only generic reaction has been validated

Page 18: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation

0 200 400 600 800 1000 1200 1400 1600

1406 (715)

1006 (517)

985 (388)

1549

901

1022

Initial Gene-Reac-tion predictions (Pathway Tools)

Current Gene-Reac-tion associations (Manually Curated)

Stats of curation Gene-Reaction associations in Microscope

Nº Gene-Reaction associations

Nº CDS

Nº reactions

105 CDS without automatically predicted

reaction in initial projections

147 new reactions added (not originally predicted) 184 originally predicted reactions removed

Page 19: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation

13 possible new metabolic pathways/pathway variants not presents in MetaCyc

Biotin biosynthesis pathway variant Lipoate biosynthesis pathway variant Myoinositol catabolism pathway variant Rhamnogalacturonan type I degradation pathway variant Acetoin dehydrogenase pathway variant Methionin salvage pathway variant Bacillaene biosynthesis pathway Aerobic respiration pathway variants

17 possible updates of SwissProt annotations

6 possible new EC numbers

Reported to SwissProt/IUBMB

curators

Aromatic polyketide biosynthesis pathway 2-methylthio-N6-threocarbamoyladenosine biosynthesisBacilysocin biosynthesisArchaeal-type ether lipid biosynthesisBacillaene biosynthesis pathway Methionine-Cysteine interconversion

New pathway variants

New metab.

pathways

Page 20: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant (EC:2.6.1.62)

KEGG pathway (map00780) MetaCyc pathway (PWY-5005)

S-Adenosyl-L-methionine as amino

group donor

L-lysine instead S-adenosyl-Methionine as amino group donor in Bacillus subtilis BioA enzyme

Page 21: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of genome-scale metabolic models

iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL. Genome Biol. 2009;10(6):R69

Dead-end metabolite

Not included in Biomass equation

iBsu1103 iBsu1103; Biotin in Biomass

iBsu1103; External influx Pimelate

iBsu1103; External influx Biotin

0.0020.0040.0060.0080.00

100.00120.00140.00

122.97

0.00

122.97 122.97

FBA simulations iBsu1103 model

Biom

ass

prod

. rat

eEX_pimelate

EX_biotin

Auxotrophic for Biotin

biosynthesis

Page 22: Biocuration2012 Eugeni Belda

Test-case: Bacillus subtilis 168 re-annotation

BioI enzyme of B. subtilis 168: cytochrome P450 protein that catalyzes the oxidative cleavage of acyl-ACP/free fatty acid molecules generated in the context of fatty acid biosynthesis yielding pimeloyl-ACP as primary product.

An Acyl-ACP

Pimeloyl-ACP

BioI (BSU30190)

A fatty acidBioI

(BSU30190)

Fatty acids metabolism L-Alanine+H+

CO2+HoloACP

BioF (BSU30220)

Page 23: Biocuration2012 Eugeni Belda

Future work

Extension of the reference set of Microme species to: Acinetobacter sp. ADP1 Pseudomonas putida KT2440 Bacillus subtilis 168

Second version of Gene-Reaction curation interface in Microscope environment:

Curation of protein complexes / Isozyme sets Management of Rhea reactions in addition of MetaCyc reactions

Definition of strategies for vertical annotation and propagation of curated GPR across multiple microbial genomes

Use UniPathway as reference resource of metabolic pathways in Microscope; Specie-specific pathway representations based on Pathway modules combination (http://www.unipathway.org)

Page 24: Biocuration2012 Eugeni Belda

Contributions

Claudine Médigue (Group Leader)David Vallenet (Researcher)Damien Monrico (Engineer)François Lefèvre (Engineer)Alexander T. Smith (PhD)Eugeni Belda (Post doc)

Claude ScarpelliLudovic Fleury

IT team

Anne Morgat Antoine Danchin

External partners

Foundings

EU Framework Programme 7 Collaborative Project. Grant Agreement Number 222886-2