metabolic network analysis algorithms in pathway · pdf filealgorithms in pathway tools peter...
TRANSCRIPT
1 SRI International Bioinformatics
Metabolic Network Analysis Algorithms in Pathway Tools
Peter D. Karp, Ph.D.Bioinformatics Research Group
BioCyc.orgEcoCyc.org, MetaCyc.org, HumanCyc.org
2 SRI International Bioinformatics
Systems Biology
Def 1: System-scale descriptions and analyses of biological sytems
Def 2: Predictive modeling of biological systems
3 SRI International Bioinformatics
Overview
Pathway/Genome DatabasesBioCyc collectionEcoCyc, MetaCyc
Pathway Tools softwareVisualization, Editing, AnalysisInference tools
Analyzing biological networks to identify gaps and inconsistenciesPrediction of growth media from metabolic network
4 SRI International Bioinformatics
What to do When Theories BecomeLarger than Minds can Grasp?
Example: E. coli metabolic network244 pathways involving 1,029 reactions and 895 substrates
Example: E. coli genetic networkControl by 97 transcription factors of 1174 genes in 630 transcription units
Past solutions:Experts specializePublish theories in textual form
We cannot compute with theories in those formsEvaluate theories for consistency with new data: microarraysRefine theories with respect to new data Compare theories describing different organisms
5 SRI International Bioinformatics
Databases of Metabolic Pathway Data
Organize growing corpus of data on metabolic pathwaysExperimentally elucidated pathways in the biomedical literatureComputationally predicted pathways derived from genome data
Provide software tools for querying and comprehending this complex information space
Multiorganism view: MetaCycUnique, experimentally elucidated pathways across all organismsReference database for computational pathway prediction
Organism-specific view:Organism-specific Pathway/Genome DatabasesDetailed qualitative models of metabolic networks Combine computational predictions with experimentally determined pathways
6 SRI International Bioinformatics
Pathway Tools Capabilities
Create and maintain an organism database integrating genome, pathway, regulatory information
Computational inference toolsInteractive editing tools
Query and visualize that databaseUse the database to interpret omics dataMetabolic network analysis toolsComparative analysis tools
8 SRI International Bioinformatics
BioCyc Collection of 507 Pathway/Genome Databases
Pathway/Genome Database (PGDB) –combines information about
Pathways, reactions, substratesEnzymes, transportersGenes, repliconsTranscription factors/sites, promoters, operons
Tier 1: Literature-Derived PGDBsMetaCycEcoCyc -- Escherichia coli K-12
Tier 2: Computationally-derived DBs, Some Curation -- 24 PGDBs
HumanCycMycobacterium tuberculosis
Tier 3: Computationally-derived DBs, No Curation -- 481 DBs
9 SRI International Bioinformatics
Pathway Tools Software
PathoLogicPredicts operons, metabolic network, pathway hole fillers, from genomeComputational creation of new Pathway/Genome Databases
Pathway/Genome EditorsDistributed curation of PGDBsDistributed object database system, interactive editing tools
Pathway/Genome NavigatorWWW publishing of PGDBsQuerying, visualization of pathways, chromosomes, operonsAnalysis operations
Pathway visualization of gene-expression dataGlobal comparisons of metabolic networks
Bioinformatics 18:S225 2002
10 SRI International Bioinformatics
EcoCyc Project – EcoCyc.orgE. coli Encyclopedia
Review-level Model-Organism Database for E. coliTracks evolving annotation of the E. coli genome and cellular networksThe two paradigms of EcoCyc
“Multi-dimensional annotation of the E. coli K-12 genome”Positions of genes; functions of gene products – 76% / 66% expGene Ontology terms; MultiFun termsGene product summaries and literature citationsEvidence codesMultimeric complexesMetabolic pathwaysCellular regulation
Nuc. Acids Res. 35:7577 2007 ASM News 70:25 2004 Science 293:2040
Karp, Gunsalus, Collado-Vides, Paulsen
11 SRI International Bioinformatics
EcoCyc = E.coli Dataset + Pathway/Genome Navigator
Genes: 4,478
Proteins: 4,479Complexes: 880
RNAs: 285
Reactions:Metabolic: 975Transport: 272
Pathways: 237
Compounds: 1,373
URL: EcoCyc.org
Gene Regulation:Operons: 3,359Trans Factors: 196Promoters: 1,766
TF Binding Sites: 2,105
EcoCyc v13.5
Citations: 19,000
12 SRI International Bioinformatics
Paradigm 1:EcoCyc as Textual Review Article
All gene products for which experimental literature exists are curated with a minireview summary
Found on protein and RNA pages, not gene pages!3257 gene products contain summaries
Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more
Additional summaries found in pages for operons, pathways
EcoCyc cites 17,300 publications
13 SRI International Bioinformatics
Paradigm 2: EcoCyc as Computational Symbolic Theory
Highly structured, high-fidelity knowledge representation provides computable informationEach molecular species defined as a DB object
Genes, proteins, small moleculesEach molecular interaction defined as a DB object
Metabolic reactionsTransport reactionsTranscriptional regulation of gene expression
220 database fields capture extensive properties and relationships
14 SRI International Bioinformatics
EcoCyc Accelerates Science
ExperimentalistsE. coli experimentalistsExperimentalists working with other microbesAnalysis of expression data
Computational biologists Biological research using computational methodsGenome annotationStudy connectivity of E. coli metabolic networkStudy phylogentic extent of metabolic pathways and enzymes in all domains of life
BioinformaticistsTraining and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions,
Metabolic engineers“Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “
Educators
15 SRI International Bioinformatics
MetaCyc: Metabolic Encyclopedia
Describe a representative sample of every experimentally determined metabolic pathwayDescribe properties of metabolic enzymes
Literature-based DB with extensive references and commentaryPathways, reactions, enzymes, substrates
Jointly developed by P. Karp, R. Caspi, C. Fulcher, SRI InternationalL. Mueller, A. Pujar, Cornell UnivS. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2008
16 SRI International Bioinformatics
Applications of MetaCyc
Reference source on metabolic pathways
Metabolic engineeringFind enzymes with desired activities, regulatory propertiesDetermine cofactor requirements
Predict pathways from genomes
Systematic studies of metabolism
Computer-aided education
17 SRI International Bioinformatics
MetaCyc Data -- Version 13.5
Pathways 1,400
Reactions 8,100
Enzymes 5,900
Small Molecules 8,200
Organisms 1,800
Citations 20,800
18 SRI International Bioinformatics
Taxonomic Distribution ofMetaCyc Pathways – version 13.1
Bacteria 883
Green Plants 607
Fungi 199
Mammals 159
Archaea 112
21 SRI International Bioinformatics
Pathway Tools Overviews and Omics Viewers
Designed to avoid the hairball effectGenerated automatically from PGDBMagnify, interrogateOmics viewers paint omics data onto
overview diagramsDifferent perspectives on same datasetUse animation for multiple time points or conditionsPaint any data that associates numbers with genes, proteins, reactions, or metabolites
Genome-scale visualizations of cellular networksHarness human visual system to interpret patterns in biological
contexts
22 SRI International Bioinformatics
Regulatory Overview and Omics Viewer
Show regulatory relationships among gene groups
23 SRI International Bioinformatics
24 SRI International Bioinformatics
Genome Poster
29 SRI International Bioinformatics
Dead End MetabolitesClues to extra/missing reactions A small molecule C is a dead-end if:
(Def 1 easier to compute; Def 2 more accurate)Definition 1:
C is a substrate in only one reaction of the set of SMM reactions occurring in Compartment ANDNo reactions exist containing parent classes of C ANDNo transporter acts on C in Compartment, nor on parent classes of C
Definition 2:C is produced only by SMM reactions in Compartment, and no transporter acts on C in Compartment ORC is consumed only by SMM reactions in Compartment, and no transporter acts on C in Compartment
30 SRI International Bioinformatics
Dead-End Metabolite Analysis of E. coli
36 22 dead ends in metabolic pathways174 dead ends in full metabolic network
GDP-L-fucoseProduced onlyLiterature research supported addition of a reaction producing colanic acid from GDP-L-fucose
D-galactarate and D-glucarateDegraded onlyLiterature indicates both can be used as C sourcesHypothetical transport reactions addedProbable gene identified through knock-out study
31 SRI International Bioinformatics
Reachability Analysis of Metabolic Networks
Given:A PGDB for an organismA set of initial metabolites
Infer:What set of products can be synthesized by the small-molecule metabolism of the organism
Motivations:Quality control for PGDBsVerify that a known E. coli growth medium yields known essential compounds of E. coliRomero and Karp, Pacific Symposium on Biocomputing, 2001
32 SRI International Bioinformatics
Algorithm: Forward PropagationThrough Production System
Each reaction becomes a production ruleEach of the 21 metabolites in the nutrient set becomes an axiom
Nutrientset
Metabolitepool
“Fire”reactions
Products
Reactants
PGDBreaction
set
A + B C
36 SRI International Bioinformatics
37 SRI International Bioinformatics
Results from EcoCyc Reachability Analysis in 2001
Phase I: Forward propagation21 initial compounds yielded only half of the 41 essential compounds for E. coli
Phase II: Manually identifyBugs in EcoCyc (e.g., two objects for tryptophan)
A B B’ CIncomplete knowledge of E. coli metabolic network
A + B C + D“Bootstrap compounds”Missing initial protein substrates (e.g., ACP)
Protein synthesis not represented
Phase III: Forward propagation with 11 more initial metabolites
Yielded all 41 essential compounds
38 SRI International Bioinformatics
Minimal Nutrient Sets
Carolyn Talcott, Markus Krummenacker, Steven Eker, and Peter Karp
Computer Science Laboratory and
Bioinformatics Research Group
SRI, InternationalOctober 21st, 2009
39 SRI International Bioinformatics
The Problem
Given a model of metabolism for an organism, determine minimal sets of nutrients that will support growth.
Model -- network of metabolic reactions (R)Nutrients -- transportables (T), compounds that have transport reactionsGrowth -- production of essential compounds (E)
A subset N of T is a nutrient set if E is R-producible from NN is minimal if no proper subset is a nutrient set
40 SRI International Bioinformatics
Mathematical Approach
S = stochiometric matrix for R Sij coeff of Ci in Rj
r = vector of reaction fluxesp = S x r -- production pi is production rate of Ci
pi = Si1 r1 + .... + Sik rk
Basic constraintsri >= 0 -- reactions run forwardpi > 0 if Ci in Epi >= 0 if Ci not in E or NIf a compound Cj not in E or T is used, it must be
produced (pj > 0)
41 SRI International Bioinformatics
Problem Simplification
Impossibility eliminationDrop reactions that have reactants that can not be produced (or transported)(Uses forward collection)
Uselessness eliminationDrop useless compounds and reactions whose products are all uselessThe useful compounds are found by backwards propagation from E (Uses backwards collection) �
42 SRI International Bioinformatics
Searching for Minimal Nutrient Sets
Define nutset(N) for N a subset of T bynutset(N) = true if the constraints for N are satisfiable
= false otherwiseUse a constraint solver (Yices) to determine if there is a solutionFind one minimal N: Start with N = T and eliminate elements until no more can be eliminated.Finding all requires some cleverness to do it feasibly. Our approach uses a representation of Boolean functions called BDDs (binary decision diagrams) to search for extensions of a set of minimal solutions. �
43 SRI International Bioinformatics
E. coli Case Study
160 Transportables1378 Compounds2251 Reactions36 Essentials
1156 Solutions9 Reduced solutions
44 SRI International Bioinformatics
Some Minimal Nutrient Sets
Solution 5TaurinePhosphateL-alanine
Solution 6TaurinePhosphateL-aspartate
45 SRI International Bioinformatics
Equivalence and Reduced Solutions
Problem: Large number of minimal nutrient sets (1156) is hard to understand and evaluateSolution: Nutrient equivalence classes
Define two nutrients A,B to be equivalent if whenever A appears in a minimal nutrient set, then replacing A by B yields another minimal nutrient set, and conversely
Benefits: Small number of solutionsInsights into the role of each nutrient
46 SRI International Bioinformatics
One Reduced Solution and its Equivalence Classes
Reduced solution 5CytidineSulfatePhosphate
Equivalence Classes:(CN): cytidine, 32 other compounds, L-alanine, L-aspartate(S): taurine, sulfate(P): phosphate
47 SRI International Bioinformatics
Lessons Learned
Analysis is a great way to debug a knowledge baseGaps in network Missing participantsIncorrect reaction directions
48 SRI International Bioinformatics
Ten Equivalence Classes
2 Unitary:HPO4 (P)nicotinamide mononucleotide (CNP)
3 with two compounds:Sulfate / taurine (S) L-methionine / glutathione (CNS)beta-D-glucose-6-phosphate / sn-glycerol-3-phosphate (CP)
1 Medium (9 cpds)L-valine/NH4+/ … (N)
2 Very largefumarate/malate/ ... (C) -- 50 cpdscytidine/L-aspartate/ ... (CN) – 35 cpds
49 SRI International Bioinformatics
C Sources Equivalence Classfumaratemalatedeoxyuridine3-(3-hydroxyphenyl)propionateD-fructuronatesuccinatelactoseL-fucose2-oxoglutarate2-dehydro-3-deoxy-D-gluconateL-tartrateD-fructosetrehaloseD-mannoseD-galactitolarbutin3-phenylpropionateD-glucarateD-gluconateL-galactonateglyoxylatecitratemannosylglycerateL-idonateacetateL-ascorbate2,3-diketo-L-gulonate (C)
L-lyxose5-ketogluconateD-galactaratebeta-D-glucoseacetoacetatepsicoselysineglycerolbeta-D-ribopyranoseD-alloseD-sorbitolsalicinD-mannitoluridineD-galacturonatebeta-D-galactoseglycolateD-xyloseL-rhamnoseD-glucuronatethymidineD-galactonatemelibioseL-lysine
50 SRI International Bioinformatics
N Sources Equivalence Class
L-valinenitriteNH4+pyridoxamineL-phenylalanineL-tyrosineL-leucineL-isoleucinecytosine
51 SRI International Bioinformatics
CN Sources Equivalence ClasscytidinedeoxycytidineL-prolineputrescineL-serineglycine4-aminobutyratecyanatexanthosineN-acetylmuramateglucosamineL-argininephenylethylamineGlcNAc-1,6-anhMurNAc-L-Ala-gamma-D-Glu-DAP-D-AlaGlcNAc-1,6-anhMurNAcxanthineD-serine1,6-anhydro-N-acetylmuramate
L-ornithineL-glutamineN-acetyl-D-glucosaminechitobioseinosineD-alanineN-acetylneuraminateL-glutamateorotateL-asparagineL-threonineL-tryptophandeoxyinosinedeoxyadenosineadenosineL-aspartateL-alanine
52 SRI International Bioinformatics
Summary
Pathway/Genome DatabasesMetaCyc non-redundant DB of literature-derived pathways400 organism-specific PGDBs available through SRI at BioCyc.orgComputational theories of biochemical machinery
Pathway Tools softwareExtract pathways from genomesMorph annotated genome into structured ontologyDistributed curation tools for MODsQuery, visualization, WWW publishing
53 SRI International Bioinformatics
Acknowledgements
SRISuzanne Paley, Ron Caspi, Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa
EcoCyc CollaboratorsJulio Collado-Vides, Robert Gunsalus, Ian Paulsen
MetaCyc CollaboratorsSue Rhee, Peifen Zhang, Kate DreherLukas Mueller, Anuradha Pujar
Funding sources:NIH National Institute of General Medical SciencesNIH National Center for Research Resources
BioCyc.org
Learn more from BioCyc webinars: biocyc.org/webinar.shtml