pah res-potentia-netsci emailable-stagebuild

Res Potentia as a route to understanding function

and evolution of cellular networks

Adam PahNetSci

June 21, 2012

1

Where do we stand and how can we do better?

2

We are generating biological data faster than ever


But generating is only one part, we still have to convert that to actual usable knowledge

2

Knowledge




2

KnowledgeData




2

KnowledgeData

Know

ledge


Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

• I use metabolism because:


3



• The data fidelity, while not perfect, is far better


3




• We can use metabolism as a test case to help develop an understanding of cellular networks


3




• We can use metabolism as a test case to help develop an understanding of cellular networks

• There is also the ability to produce metabolites or chemicals that are of interest


3


Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

• Metabolites are connected if they are a part of the main reaction pair




• Substrates are connected to Products only.







UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+





UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+





UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

UDP-Glucose UDP-Glucuronate

2 NAD+ 2 NADH

Looking at one organism

5

Methanococcus maripaludis

How do we construct a framework

6


Escherichia coli Homo sapiensArabidopsis thaliana


Current knowledgeof Realm of actuals

‘Res Extenta’

6


Escherichia coli Homo sapiensArabidopsis thaliana


Current knowledgeof Realm of actuals

‘Res Extenta’

Realm of Possibles‘Res Potentia’

6


It can identify new features

7


7

Increased emphasison metabolite roles


7

Increased emphasison metabolite roles

Putative metabolic‘devices’

We can use this network to revise our knowledge

8


Helping to sort out the bigger picture

9

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

• 88,000 metabolites have been added as annotations


10



• 31,000 metabolites that were annotated have been removed


10



• 31,000 metabolites that were annotated have been removed

• Resulting in over 100 changes per organism


10


How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’


11



Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins


11



Reaction1(Annotated)

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes


12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes


12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

Protein BLASTfor Enzyme Sequences


13




Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

0.0

MatchE-values

10-3

10-45.0

10-2


14



Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

0.0

MatchE-values

10-3

10-45.0

10-2

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches


14




Reaction2(Unannotated)

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches


15



Repeat this for all 3328 reactions using 5.94 million enzyme sequences in 873 organisms

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

• We have our predicted networks and its changes to this dataset (Predicted Changes)



17


• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)



17



• We can then compare how well each set of changes does in correcting the networks



17



• We can then compare how well each set of changes does in correcting the networks

• Ideally the networks should make sense and be as connected as reasonably possible



17

Validate by promoting connectedness

18

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks


18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks


19

We can test and see how the actual changes in the database create gaps


19

We can test and see how the actual changes in the database create gaps

-0.1 -0.06 -0.02 0.02 0.06 0.1

RPF PredictedDeletions

KEGG 2011Deletions

Relative fraction of removed reactionsthat create additional components

Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks

What did we learn

20

Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks

What did we learn

20

Now we can begin to analyze and understand more complex features of these networks

Acknowledgements

• Luis Amaral

• Irmak Sirer, Pat McMullen, Sam Seaver, Erin Sawardecker

With financial support from:

• Northwestern/NIH Biotechnology Training Grant

• Chicago Biomedical Consortium

pah res-potentia-netsci emailable-stagebuild

Science

metabolic network udpglucose

udpglucose udpglucuronate

methanococcus maripaludis

study metabolism

kyoto encyclopedia of

genomes database

generalizable framework

knowledgedata knowledge