pathway databases

16
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms of binary relations. Pac Symp Biocomput, 175-86 P.R. Romero and P. Karp Nutrition-Related Analysis of Pathway/Genome Databases Pacific Symposium on Biocomputing 6:470-482 (2001). (KEGG vs EcoCyc)

Upload: stella

Post on 09-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Pathway databases. Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms of binary relations. Pac Symp Biocomput, 175-86 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pathway databases

Pathway databases

Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms of binary relations. Pac Symp Biocomput, 175-86

P.R. Romero and P. Karp Nutrition-Related Analysis of Pathway/Genome Databases Pacific Symposium on Biocomputing 6:470-482 (2001).

(KEGG vs EcoCyc)

Page 2: Pathway databases

seems pretty obvious

• As compared to a sequence database– (DNA or amino acid sequences)– Prior organization based on sequence homologies

• Pathways as alternative approach to whole-cell model– But similar- based on systemic understanding– Building from the bottom up instead of top-down– Also similarly in initial phases

• Technical details of implementation will be irritatingly vague

• Links constructed from functional interactions– Fairly logical next step for biologists– Moving from diagrams to graph/network structures

• So…metabolic networks and regulatory networks

Page 3: Pathway databases

Apoptosis

Page 4: Pathway databases

Citric acid cycle

Page 5: Pathway databases

Painthe Boehringer-Mannheim wallcharts

Page 6: Pathway databases

more pain

Page 7: Pathway databases

Metabolic networks

• Each enzyme/reaction can be a path between nodes– Each node is an enzyme substrate (product or reactant)

• Converting individual reactions to paths and nodes– Produces directed graphs

• Classification of biochemical reactions– EC numbering system (Enzyme Commission)– Hierarchical numerical system i.e. 1.5.3.1– Based on organic chemistry involved, not proteins

• How to translate from function to specific proteins?

Page 8: Pathway databases

Function to Enzyme mapping in KEGG

• Original network based on biochemical studies– Boehringer Mannheim and Japanese Biochemical Society

• How to assign function to enzyme? Do it by hand– Then use FASTA sequence comparisons on each new genome– Why does this irritate me? (feel free to shoot me down here)

• Which assignments are supported by experimental evidence?

• But anyway…– Binary relations linking various pieces of data

• EC number to a particular enzyme• Organism to gene• Reactant to enzyme• Enzyme to product• Can produce derivative structures like reactant to product

• Lots of opportunities for relational database fun

Page 9: Pathway databases

Examples of relational links

Page 10: Pathway databases

The join function in KEGG

• Query relaxation– Relaxing constraints on the lower two

numbers – in the EC number system, i.e. 5.1.x.x – Can search up hierarchy of sequences in EC

family

• Path computation– Construction based on substrate-product

relations

Page 11: Pathway databases

KEGG example

Page 12: Pathway databases

EcoCyc (e. Coli encyclopedia)

• Takes approach based on single enzyme reactions – All assignments based on hand-annotations – Integrated with complete genome data

• Allows for building of metabolic network– You can start with known required compounds

(growth media and inside cell)– Feed to get all required organic compounds (amino

acids, nucleic acids, etc)– Focus on small molecules- i.e. no polymers

• Proteins (amino acid polymers)• DNA and RNA (nucleic acid polymers)

Page 13: Pathway databases

What they did

• List of metabolites produced by network:– Starting compound A, A -> B– B -> C– Therefore A-> C, repeat as necessary

• Finding precursors– Product C not produced from above computation

• Find all reactions that produce C, i.e. A + B -> C– Backtrack A and B to find their precursors

– Repeat as necessary until no reaction can be found

– This identifies earliest precursors with unknown origin

» outputs every possible combination of precursors

Page 14: Pathway databases

Results (sort of)• Known inputs and outputs

– Necessary for cell survival and growth– “bootstrapping” elements - already present in the cell

• Are required for their own synthesis

Page 15: Pathway databases

Results (2)• Essential Elements

– DNA, RNA, amino acids– Membrane components (phospholipids)– Extracellular components

• Peptidoglycan- cross-linked sugar-amino acid chains • Cell wall (surrounds bacterial cell)

Page 16: Pathway databases

Results (3)– Identifies unknown synthesis

pathways and oddities in simulation– Cob(I)alamin

• Already known it can’t be synthesized• Only used in anaerobic growth• But it was simulated under aerobic

conditions• Aerobic version of reaction requires 5-

…– Synthesis of which is unknown

– Proteins are treated as bootstrap compounds

– Some looped compounds– Others have unknown synthesis

pathways