metabolomics applications of the biocyc databases and pathway tools software

98
© 2014 SRI International Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

Upload: tuan

Post on 20-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Metabolomics Applications of the BioCyc Databases and Pathway Tools Software. Peter D. Karp ecocyc.org SRI International biocyc.org metacyc.org. Overview. Overview of MetaCyc family of Pathway/Genome Databases ( PGDBs ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

Page 2: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Overview

• Overview of MetaCyc family of Pathway/Genome Databases (PGDBs)– BioCyc collection: EcoCyc, MetaCyc, HumanCyc, etc– Curated PGDBs for Arabidopsis, Yeast, Mouse, Fly, etc

• Overview of Pathway Tools software

• Automatic generation of metabolic-flux models

Page 3: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

MetaCyc Family ofPathway/Genome Databases

• 6,000+ databases from many institutions• All domains of life with microbial emphasis• Genomes plus predicted metabolic pathways

• DBs derived from MetaCyc via computational pathway prediction

Common schema

Common controlled vocabularies

Managed using Pathway Tools software

Archives of Toxicology 85:1015 2011

BioCyc.org3,500

MetaCyc Family6,000+

Page 4: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Curated Databases Within the MetaCyc Family

Database Organism Organization Publications Curated From

MetaCyc Multiorganism SRI 40,000

EcoCyc E. coli SRI 25,000

HumanCyc H. sapiens SRI

AraCyc A. thaliana TAIR/Carnegie Institution

2,282

YeastCyc S. cerevisiae SGD/Stanford/SRI 565

MouseCyc M. musculus MGD/Jackson Laboratory

http://biocyc.org/otherpgdbs.shtml

Page 5: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software• Comprehensive systems biology software environment• Create and maintain an organism database integrating genome, pathway,

regulatory information– Computational inference tools– Interactive editing tools

• Query and visualize that database• Generate steady-state metabolic flux models

– Flux-balance analysis• Interpret omics datasets• Comparative analysis tools• Licensed by 5,000+ groups

Page 6: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Motivations: Management ofMetabolic Pathway Data

• Organize growing corpus of data on metabolic pathways– Experimentally elucidated pathways in the biomedical literature– Computationally predicted pathways derived from genome data

• Provide software tools for querying and comprehending this complex information space

• Multiorganism view: MetaCyc– Unique, experimentally elucidated pathways across all organisms– Reference database for computational pathway prediction

• Organism-specific view:– Organism-specific Pathway/Genome Databases– Detailed qualitative models of metabolic networks – Combine computational predictions with experimentally determined

pathways

Page 7: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Model Organism Databases /Organism Specific Databases

• DBs that describe the genome and other information about an organism

• Every sequenced organism with an active experimental community requires a MOD– Integrate genome data with information about the biochemical and

genetic network of the organism– Integrate literature-based information with computational predictions

• Accurate metabolic modeling requires a curation effort

Page 8: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Rationale for MODs

• Each “complete” genome is incomplete in several respects:– 40%-60% of genes have no assigned function– Roughly 7% of those assigned functions are incorrect– Many assigned functions are non-specific

• Need continuous updating of annotations with respect to new experimental data and computational predictions– Gene positions, sequence, gene functions, regulatory sites, pathways

• MODs are platforms for global analyses of an organism– Interpret omics data in a pathway context– In silico prediction of essential genes– Characterize systems properties of metabolic and genetic networks

Page 9: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway/Genome Database

ChromosomesPlasmids

Genes

ProteinsRNAs

Reactions

Pathways

Compounds

CELL

RegulationOperonsPromotersDNA Binding SitesRegulatory Interactions

Sequence Features

Page 10: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

BioCyc Collection of 3,000 Pathway/Genome Databases

•Pathway/Genome Database (PGDB) – combines information about

– Pathways, reactions, substrates– Enzymes, transporters– Genes, replicons– Transcription factors/sites, promoters,

operons

•Tier 1: Literature-Derived PGDBs– MetaCyc, HumanCyc, YeastCyc– EcoCyc -- Escherichia coli K-12– AraCyc – Arabidopsis thaliana

•Tier 2: Computationally-derived DBs, Some Curation -- 34 PGDBs

– Bacillus subtilis, Mycobacterium tuberculosis

•Tier 3: Computationally-derived DBs, No Curation -- ~3,000 PGDBs

Page 11: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Obtaining a PGDB for Organism of Interest

• Find existing PGDB in BioCyc• Find existing PGDB from larger MetaCyc family of PGDBs

– http://biocyc.org/otherpgdbs.shtml• Download from PGDB registry

– http://biocyc.org/registry.html

• Create your own PGDB

Page 12: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software: PGDBs Created Outside SRI

•4,000+ licensees: 250 groups applying software to 1,700 organisms

•Saccharomyces cerevisiae, SGD project, Stanford University– 135 pathways / 565 publications – BioCyc.org

•FungiCyc, Broad Institute•Candida albicans, CGD project, Stanford University•dictyBase, Northwestern University

•Mouse, MGD, Jackson Laboratory -- BioCyc.org•Drosophila, FlyBase, Harvard University -- BioCyc.org •Under development:

– C. elegans, WormBase

•Arabidopsis thaliana, TAIR, Carnegie Institution of Washington– 288 pathways / 2282 publications – BioCyc.org

•PlantCyc: Poplar, Cassava, Corn, Grape, Soy, Carnegie Institution•Six Solanaceae species, Cornell University •GrameneDB: Rice, Sorghum, Maize, Cold Spring Harbor Laboratory•Medicago truncatula, Samuel Roberts Noble Foundation•ChlamyCyc, GoFORSYS

Page 13: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software: PGDBs Created Outside SRI

•M. Bibb, John Innes Centre, Streptomyces coelicolor•F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa•Genoscope, Acinetobacter•R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579•Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major•Sergio Encarnacion, UNAM, Sinorhizobium meliloti•Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis

Page 14: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software: PGDBs Created Outside SRI• Large scale users:

– C. Medigue, Genoscope, 500+ PGDBs– J. Zucker, Broad Inst, 94 PGDBs– G. Sutton, J. Craig Venter Institute, 80+ PGDBs– G. Burger, U Montreal, 60+ PGDBs– E. Uberbacher, ORNL 33 Bioenergy-related organisms– Bart Weimer, UC Davis, Lactococcus lactis, Brevibacterium linens,

Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes

• Partial listing of outside PGDBs at http://biocyc.org/otherpgdbs.shtml

Page 15: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

EcoCyc Project – EcoCyc.org• E. coli Encyclopedia

– Review-level Model-Organism Database for E. coli– Derived from 25,000 publications

• “Multi-dimensional annotation of the E. coli K-12 genome”– Gene product summaries and literature citations– Evidence codes– Gene Ontology terms– Protein features (active sites, metal ion binding sites)– Multimeric complexes– Metabolic pathways– Regulation of gene expression and of protein activity– Gene essentiality data– Growth under alternative nutrient conditions

Nuc. Acids Res. 41:D605 2013

Karp, Gunsalus, Collado-Vides, Paulsen

Page 16: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

EcoCyc = E.coli Dataset + Pathway/Genome Navigator

Genes: 4,499

Monomers: 4389Complexes: 976RNAs: 301

Reactions: Metabolic: 1600 Transport: 370

Pathways: 312

Compounds: 2,400

Regulation: Operons: 4,500 Trans Factors: 222 Promoters: 3,770 TF Binding Sites: 2,700 Reg Interactions: 5,900

EcoCyc v17.0

Citations: 24,000

URL: EcoCyc.org

Page 17: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Perspective 1:EcoCyc as Online Encyclopedia

• All gene products for which experimental literature exists are curated with a minireview summary– 3,730 gene products contain summaries– Summaries cover function, interactions, mutant phenotypes, crystal structures,

regulation, and more

• Additional summaries and other data found in pages for genes, operons, pathways

• Quick Search

Page 18: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Perspective 2: EcoCyc as Queryable Database

• High-fidelity knowledge representation amenable to structured queries• 333 database fields capture object properties and relationships• Each molecular species defined as a DB object

– Genes, proteins, small molecules• Each molecular interaction defined as a DB object

– Metabolic and transport reactions, regulation• Extensive search tools

– Object-specific search Search Menu– Advanced search Search -> Advanced

Page 19: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Paradigm 3: EcoCyc as Predictive Metabolic Model

• A steady-state quantitative model of E. coli metabolism can be generated from EcoCyc

• Predicts phenotypes of E. coli knock-outs, and growth/no-growth of E. coli on different nutrients

• Model is updated on each EcoCyc release

• Serves as a quality check on the EcoCyc data

Page 20: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

EcoCyc Accelerates Science• Experimentalists

– E. coli experimentalists– Experimentalists working with other microbes– Analysis of expression data

• Computational biologists – Biological research using computational methods– Genome annotation– Study properties of E. coli metabolic and regulatory networks

• Bioinformaticists– Training and validation of new bioinformatics algorithms – predict

operons, promoters, protein functional linkages, protein-protein interactions,

• Metabolic engineers– “Design of organisms for the production of organic acids, amino

acids, ethanol, hydrogen, and solvents “• Educators

– Microbiology and metabolism education

Page 21: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Recent Developments in EcoCyc• EcoCyc contains six knock-out datasets for E. coli containing 13,000

growth observations

Reference Medium No Growth Growth Indeterminate

Gerdes03 LB enriched, aerobic 614 3082 2

Baba06 LB Lennox, aerobic 300 3906 1

Baba06 MOPS+0.4% glucose, aerobic

460 4823 92

Feist07 MOPS+0.4% glucose, aerobic

460 4823 92

Patrick07 M9+0.4% glucose, aerobic

107

Joyce06 M9+1% glucose, aerobic

118 3763 1

Page 22: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Recent Developments in EcoCyc –Growth-Observation Data

• EcoCyc contains 1831 growth observations under 522 conditions for E. coli

• Substantial number of discrepancies– 45 cases remain where growth status is unclear

Aerobic Anaerobic

Low throughput data from literature

23 0

Low throughput data from our group

20 0

PM data from literature 1244 191

PM data from our group 353 0

Page 23: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 24: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

MetaCyc: Metabolic Encyclopedia• Describes experimentally determined metabolic pathways, reactions,

enzymes, and compounds

• Literature-based DB with extensive references and commentary

• MetaCyc vs BioCyc: Experimentally elucidated pathways

• Jointly developed by – P. Karp, R. Caspi, C. Fulcher, SRI International– L. Mueller, A. Pujar, Boyce Thompson Institute– S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research 2012 Database Issue

Page 25: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

MetaCyc Data -- Version 18.0

Pathways 2,200

Reactions 11,700

Enzymes 9,700

Small Molecules

11,000

Organisms 2,500

Citations 40,600

“A Systematic Comparison of the MetaCyc and KEGG Pathway DatabasesBMC Bioinformatics 2013 14(1):112

Page 26: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Taxonomic Distribution of MetaCyc PathwaysVersion 17.5

Bacteria 1,130

Green Plants 830

Fungi 300

Metazoa 275

Archaea 148

Page 27: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Comparison with KEGG

• KEGG vs MetaCyc: Reference pathway collections– KEGG maps are not pathways Nuc Acids Res 34:3687 2006

• KEGG maps contain multiple biological pathways• KEGG maps are composites of pathways in many organisms -- do not

identify what specific pathways elucidated in what organisms• Two genes chosen at random from a BioCyc pathway are more likely to be

related according to genome context methods than from a KEGG pathway– KEGG has few literature citations, few comments, less enzyme detail

• KEGG vs organism-specific PGDBs– KEGG does not curate or customize pathway networks for each

organism– Highly curated PGDBs now exist for important organisms such as E.

coli, yeast, mouse, Arabidopsis

Page 28: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

44 SRI International Bioinformatics

Pathway Tools

Page 29: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software

Pathway/GenomeEditors

Pathway/GenomeDatabase

PathoLogicMetaCyc

AnnotatedGenome

Pathway/GenomeNavigator

Briefings in Bioinformatics 11:40-79 2010

+

MetaFlux

Page 30: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Enables Multi-Use Metabolic Databases

Zoomable Metabolic Map

Queryable Database

Omics DataAnalysis

Metabolic ModelEncyclopedia

Page 31: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software: PathoLogic

• Computational creation of new Pathway/Genome Databases

• Transforms genome into Pathway Tools schema and layers inferred information above the genome

• Predicts operons• Predicts metabolic network• Predicts which genes code for missing enzymes in metabolic pathways • Infers transport reactions from transporter names

Page 32: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software:Pathway/Genome Editors• Interactively update PGDBs with

graphical editors

• Support geographically distributed teams of curators with object database system

• Gene and protein editor• Reaction editor• Compound editor• Pathway editor• Operon editor• Publication editor

Page 33: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software:Pathway/Genome Navigator

• Querying and visualization of:– Pathways– Reactions– Metabolites– Genes/Proteins/RNA– Regulatory interactions– Chromosomes

• Modes of operation:– Web mode– Desktop mode– Most functionality shared

Page 34: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Software: MetaFlux

• Speeds development of genome-scale metabolic flux models• Steady-state quantitative flux-models generated directly from PGDBs• Computed reaction fluxes can be painted onto metabolic overview diagram• Multiple gap filler accelerates model development by suggesting model

completions:– Reactions to add from MetaCyc– Additional nutrients and secreted compounds

Page 35: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Schema / Ontology

• 1064 classes– Datatype classes such as:

• Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters)

– Taxonomies for Pathways, Reactions, Compounds– Cell Component Ontology– Evidence Ontology

• 308 attributes and relationships• Span genome, metabolism, regulatory information

– Meta-data: Creator, Creation-Date– Comment, Citations, Common-Name, Synonyms– Attributes: Molecular-Weight, DNA-Footprint-Size– Relationships: Catalyzes, Component-Of, Product

Page 36: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Prediction• Pathway prediction is useful because

– Pathways organize the metabolic network into mentally tractable units– Pathways guide us to search for missing enzymes– Pathway inference fills in holes in the metabolic network– Pathways can be used for analysis of high-throughput data

• Visualization, enrichment analysis

• Pathway prediction is hard because– Reactome inference is imperfect– Some reactions present in multiple pathways– Pathway variants share many reactions in common– Increasing size of MetaCyc

Page 37: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Reactome Inference

• For each protein in the organism, infer reaction(s) it catalyzes

• Protein functions can be specified in three ways:– Enzyme names (protein functions) (uncontrolled vocabulary)– EC numbers– Gene Ontology terms

• Detect conflicts among this information– Example:

• Yersinia pseudotuberculosis PB1• 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase / EC

4.1.1.71

Page 38: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Enzyme Name Matching

• Extraneous information found in gene product names

• Putative carbamate kinase, alpha subunit• Carbamate kinase (abcD)• Carbamate kinase (3.2.1.4)• Monoamine oxidase B

• bifunctional proline dehydrogenase/pyrroline-5-carboxylate dehydrogenase

Page 39: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Inference of Metabolic Pathways

• For each pathway in MetaCyc consider– What fraction of its reactions are present in the just-inferred reactome of the

organism?– Are enzymes present for reactions unique to the pathway?– Are enzymes present for designated “key reactions” within MetaCyc pathways?

• Calvin cycle / ribulose bisphosphate carboxylase– Is a given pathway outside its designated taxonomic range?

• Calvin cycle: green plants, green algae, etc

Standards in Genomic Sciences 5:424-429 2011

Page 40: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Evaluation of Pathway Inference

• Define gold-standard pathway prediction set– E. coli, Yeast, Arabidopsis, Synechococcus, Mouse– Positive and negative pathways

• PathoLogic achieved 91% accuracy

BMC Bioinformatics 11:15 2010

Page 41: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Comparison with KEGG• KEGG vs MetaCyc: Reference pathway collections

– KEGG maps are not pathways Nuc Acids Res 34:3687 2006• KEGG maps contain multiple biological pathways• KEGG maps are composites of pathways in many organisms -- do not

identify what specific pathways elucidated in what organisms

– KEGG modules are incomplete– KEGG has few literature citations, few comments, less enzyme detail

• KEGG vs organism-specific PGDBs– KEGG does not curate or customize pathway networks for each

organism– Highly curated PGDBs now exist for important organisms such as E.

coli, yeast, mouse, Arabidopsis

• KEGG algorithms– Not published; accuracy unknown

Page 42: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Analysis of Metagenomes

• Bin the metagenome data and create separate PGDBs for each organism– Hallam lab

• Compute list of all pathways present in the metagenome

Page 43: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Analysis of High Throughput Datasets

•Generated automatically from PGDB•Magnify, interrogate•Omics viewers paint omics data onto overview diagrams

– Different perspectives on same dataset– Use animation for multiple time points

or conditions

Genome-scale visualizations of cellular networks

Page 44: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Cellular Overview Diagram

• Combines metabolic map and transporters• Automatically generated, organism-specific• Zoomable, queryable

Page 45: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 46: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 47: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

E. coli Cellular Overview

Page 48: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 49: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 50: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 51: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 52: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Omics Data Graphing on Cellular Overview

Page 53: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 54: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 55: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Genome Overview

Page 56: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Genome Poster

Page 57: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Genome Overview

Page 58: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Regulatory Overview• Show regulatory relationships among gene groups

Page 59: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Regulatory Omics Viewer

Page 60: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

The Atom-Mapping Problem

• Definition: An atom-mapping is a bijection from reactant atoms to product atoms that specifies the terminus of each reaction atom

• MetaCyc v17.5 contains 10,300 atom mappings

Page 61: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Applications of Atom Mapping

• Speed visual comprehension of reactions and pathways

Page 62: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Applications of Atom Mapping

• Improve evaluation of computer-generated metabolic pathways– Do any feedstock atoms reach target compound?– What fraction of feedstock atoms reach target compound?

• Facilitate design and interpretation of isotope-labeling experiments

Page 63: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Atom Mapping: Our Approach• Weighted Minimal Bond-Edit Distance

– Edit distance weighted by bond type and atom species– Computed using MILP for 9,390 MetaCyc reactions– Average time per reaction:

• 73% are solved in less than 1 second• 96% are solved in less than 60 seconds

– 96% of reactions had 1 or 2 solutions (with symmetries removed) – different bonds made/broken

• Solution times are a function of the solver– SCIP vs CPLEX

J Chem Inf Model. 2012 52:2970-82.

Page 64: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Accuracy of Our Atom Mappings• Use KEGG RPAIR as a gold standard

– Caveats: Not clear which RPAIRs are curated; accuracy of RPAIR unclear• We implemented software to

– Import KEGG and RPAIR into a Pathway Tools PGDB– Map atoms in KEGG reactions to corresponding atoms in MetaCyc reactions

• 2,446 atom mappings from KEGG RPAIR could be compared to MetaCyc mappings– 25 disagreements:– 1 reaction: experimental evidence our mapping is correct– 2 reactions: similar to preceding– 22 reactions – KEGG is correct

Page 65: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

RouteSearch Software --- Metabolism->Metabolic Route Search

• User specifies feedstock compound and target compound

• Software computes minimal-cost paths from feedstock to target based on reactions from– Current PGDB, plus, optionally– MetaCyc

• Optimality criteria: minimize– Number of reactions– Number of lost atoms based on atom mappings– Number of reactions foreign to the organism

• User interface guides exploration of solution pathways

Latendresse et al., Bioinformatics 2014

Page 66: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Results: Sample Metabolic Engineering Problems

• Five engineered pathways obtained from literature– 2-oxoisovalerate 3-methylbutanol (3-methylbutanol biosynthesis)– pyruvate isopropanol (isopropanol biosynthesis)– pyruvate n-butanol (pyruvate fermentation to butanol II)– 3-phospho-D-glycerate n-butanol (1-butanol autotrophic biosynthesis)– 3-dehydroquinate vanillin (vanillin biosynthesis)

• Given feedstock and target compounds, our system found the literature pathway in all five cases

Page 67: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pyruvate Isopropanol

• Two highest-ranked pwys shown• Best corresponds to pwy from literature• Search engine can continue to generate alternatives

Page 68: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Metabolomics Applications

• MetaCyc contains extensive multiorganism metabolite database• Organism-specific metabolite databases in each PGDB

– Genome+pathway context for interpreting metabolomics data

• Monoisotopic mass searches• Paint metabolomics data onto pathway maps• Group transformations• Enrichment analysis

Page 69: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Object Groups

• Collect and save lists of metabolites, genes, pathways, …

• Share groups with colleagues

Page 70: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Object Groups• Create manually, from files, from query results• Explore gene list interactively• Combine (union, intersection, subtraction)

Page 71: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Manipulate Genes and Sequences

Page 72: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Object Group Transformations• Transform metabolite group into group of metabolic pathways, then

into gene group

• Create group containing transcriptional regulator; transform to all genes it regulates

• Transform gene group into group of regulators of those genes

• Transform gene group into list of TF binding sites controlling those genes; into list of sequences

• Create group of nucleotide positions; transform to closest genes; paint to cellular overview

Page 73: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Object Groups: Enrichment Analysis

“My experiments yielded a set of genes/metabolites. What do they have in common?”

• Given a group of genes:– What GO terms are statistically over-represented in that set?– What metabolic pathways are over-represented?– What transcriptional regulators are over-represented?

• Given a set of metabolites:– What metabolic pathways are statistically over-represented in that set?

Page 74: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

128 SRI International Bioinformatics

Automated Generation of Metabolic Flux Models from PGDBs

Joint work with Mario Latendresse

Page 75: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Marriage of Systems Biology and Model-Organism Databases• Systems biology

– Qualitative system-level analysis– Quantitative system-level modeling

• Hypothesis: Strong synergies between MODs and SB• Curation is critical to SB and to MODs

– Biological models undergo long periods of updating and refinement– Common curation effort for MOD and systems-biology model

• MOD provides data needed for SB construction and validation• SB identifies errors and omissions in MOD, directs curation• Methodologies from MODs can benefit systems-biology models

– Evidence codes– Mini-review summaries– Literature citations

Page 76: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Flux-Balance Analysis

Nutrients

Biomass

Secretions

A A B C

X

D D

Metabolic Reaction List

• Steady state, constraint-based quantitative models of metabolism• Starting information for organism of interest:

Page 77: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Flux Balance Analysis• Define system of linear equations encoding fluxes on each

metabolite M– R1 + R2 = R3 + R4 + R5

• Boundary reactions:– Exchange fluxes for nutrients and secretions– Biomass reaction L-arginine … + GTP … + … biomass

• Submit to linear optimization package– Optimize biomass production– Optimize ATP production– Optimize production of desired end product

M

R1

R2

R4

R5

R3

Page 78: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Example

100 10040

O2

glucose 2 pyruvate

alanine

ATP

glucose

O2

Biomass: ATP:alanine 4:1

160

160

Page 79: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Prediction• Pathway prediction is useful because

– Pathways organize the metabolic network into mentally tractable units– Pathways guide us to search for missing enzymes– Pathway inference fills in holes in the metabolic network– Pathways can be used for analysis of high-throughput data

• Visualization, enrichment analysis

• Pathway prediction is hard because– Reactome inference is imperfect– Some reactions present in multiple pathways– Pathway variants share many reactions in common– Increasing size of MetaCyc

Page 80: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

FBA Results

• FBA predicts steady-state reaction fluxes for the metabolic network

• Remove reactions from model to predict knock-out phenotypes

• Supply alternative nutrient sets to predict growth phenotypes

• Predict growth rates, nutrient uptake rates

Page 81: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Approach: Generate FBA Models from Pathway/Genome Databases

• Store and update metabolic model within PGDB– All query and visualization tools applicable to FBA model– FBA model is tightly coupled to genome and regulatory information

• MetaFlux generates linear programming problem from PGDB reactions• Submit to constraint solver for model execution/solving

• Tools to accelerate model refinement:– Reaction balance checking– Dead-end metabolite analysis– Visualize reaction flux using cellular overview– Multiple gap filling

MetaFlux: Latendresse et al, Bioinformatics 2012 28:388-96

Page 82: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

MetaFlux FBA Model Execution

• MetaFlux creates .lp file and executes SCIP solver– Konrad-Zuse-Zentrum für Informationstechnik Berlin

• Interpret SCIP output– Determine if SCIP found a solution– Map fluxes to PGDB reactions

• Display resulting fluxes on the Cellular Overview

Page 83: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Model Debugging ViaDead End Metabolite Finder

• A small molecule C is a dead-end if:

– C is produced only by metabolic reactions in Compartment, and no transporter acts on C in Compartment OR

– C is consumed only by metabolic reactions in Compartment, and no transporter acts on C in Compartment

Page 84: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Dead-End Metabolite Analysis of EcoCyc

• (2R,4S)-2-methyl-2,3,3,4-tetrahydroxytetrahydrofuran

• 3-hydroxypropionate • 4-methyl-5-(beta-

hydroxyethyl)thiazole• 5,6-dimethylbenzimidazole• aminoacetaldehyde • cis-vaccenate • cobinamide • ethanolamine

• methanol • oxamate • S-adenosyl-4-methylthio-2-

oxobutanoate• S-methyl-5-thio-D-ribose• S2- • tetrahydromonapterin• urate • urea

148 dead-end metabolites total16 dead-end metabolites in EcoCyc pathways:

Page 85: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Model Debugging via Multiple Gap Filling

• Most FBA models are not initially solvable because of incomplete or incorrect information

• MetaFlux uses meta-optimization to postulate alterations to a model to render it solvable

• Each alteration has an associated cost; minimize cost of alterations

• Formulate as MILP and submit to SCIP

Page 86: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Multiple Gap Filling of FBA Models

• Reaction gap filling (Kumar et al, BMC Bioinf 2007 8:212): – Reverse directionality of selected reactions– Add a minimal number of reactions from MetaCyc to the model to enable a

solution– Reaction cost is a function of reaction taxonomic range

• Metabolite gap filling: Postulate additional nutrients and secretions • Partial solutions: Identify maximal subset of biomass components for which

model can yield positive production rates

Page 87: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

MILP Objective Function for Gap Filling

ΣwbBi + ΣwrRa + ΣwtRb + ΣwmRc + ΣwsSk + ΣwnNp

Where• Wb > 0, wr, wt, wm, ws, wn < 0 are weights for

biomass, reactions (2), secretions, and nutrients• Bi, Ra, Rb, Rc, Sk, Np are binary variables

i a b

c k p

Page 88: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Results – FBA Model of Human Metabolism

• 46 biomass compounds• 13 nutrients• 2 secretions• 207 reactions carry non-zero flux

Page 89: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

MetaFlux Gap Filler Suggestions

• Addition of 8 new reactions from MetaCyc; 4 supported by literature research

• Reversal of 4 reactions confirmed by literature searches• Enzyme curated into wrong compartment• FBA analysis identified an amino-acid biosynthetic pathway that should

not have been present in HumanCyc• Further issues identified by dead-end metabolite analysis and

reachability analysis

Page 90: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 91: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Page 92: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Other Capabilities

• Display and editing of protein features• Blast sequences against PGDBs• Retrieve nucleotide and amino acid sequences• Define Web links from PGDB objects to other web sites• Active community of contributors

– JavaCyc, PerlCyc– SBML and BioPAX export tools

Page 93: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Pathway Tools Implementation Details

• Platforms:– Macintosh, PC/Linux, and PC/Windows platforms

• Same binary can run as desktop app or Web server

• PGDBs can be stored in files, MySQL, Oracle

• Production-quality software– Two regular releases per year– Extensive quality assurance– Extensive documentation– Auto-patch– Automatic DB-upgrade

Page 94: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Accesing PGDB Data• Export to Genbank, SBML, BioPAX• Export to tab-delimited files• Export to attribute-value files• Attribute-value files can be imported into SRI’s BioWarehouse

– Relational database system for bioinformatics database integration

• APIs– Web services -- http://biocyc.org/web-services.shtml– Lisp– PerlCyc– JavaCyc

Page 95: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Summary

• Pathway/Genome Databases– MetaCyc non-redundant DB of literature-derived pathways– MetaCyc family of ~4,000 PGDBs

• Pathway Tools software– Extract pathways from genomes– Distributed curation tools for PGDB development– Query, visualization, WWW publishing– Omics data analysis– Quantitative metabolic models

Page 96: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

BioCyc and Pathway Tools Availability

• BioCyc.org Web site and database files freely available to all

• Pathway Tools freely available to non-profits– Macintosh, PC/Windows, PC/Linux

Page 97: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Acknowledgements•SRI

– Suzanne Paley, Ron Caspi, Mario Latendresse, Ingrid Keseler, Carol Fulcher, Tim Holland, Markus Krummenacker, Tomer Altman, Richard Billington, Pallavi Kaipa, Deepika Brito

•EcoCyc Collaborators– Julio Collado-Vides, Robert

Gunsalus, Ian Paulsen

•MetaCyc Collaborators– Sue Rhee, Peifen Zhang, Kate

Dreher– Lukas Mueller, Hartmut Foerster

•Funding sources:– NIH National Institute of

General Medical Sciences–Department of Energy

http://www.ai.sri.com/pkarp/talks/

BioCyc webinars: biocyc.org/webinar.shtml

Page 98: Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

© 2014 SRI International

Learn More

• Pathway Tools Tutorial– April 25-27

• http://bioinformatics.ai.sri.com/ptools/tutorial/