pah res-potentia-netsci emailable-stagebuild
DESCRIPTION
NetSci 2012 Talk about using a global metabolic network to predict organismal networksTRANSCRIPT
Res Potentia as a route to understanding function
and evolution of cellular networks
Adam PahNetSci
June 21, 2012
1
Where do we stand and how can we do better?
2
We are generating biological data faster than ever
Where do we stand and how can we do better?
But generating is only one part, we still have to convert that to actual usable knowledge
2
Knowledge
We are generating biological data faster than ever
Where do we stand and how can we do better?
But generating is only one part, we still have to convert that to actual usable knowledge
2
KnowledgeData
We are generating biological data faster than ever
Where do we stand and how can we do better?
But generating is only one part, we still have to convert that to actual usable knowledge
2
KnowledgeData
Know
ledge
We are generating biological data faster than ever
Why study metabolism?
3
• My goal is to create a generalizable framework for understanding cellular networks
• I use metabolism because:
Why study metabolism?
3
• My goal is to create a generalizable framework for understanding cellular networks
• I use metabolism because:
• The data fidelity, while not perfect, is far better
Why study metabolism?
3
• My goal is to create a generalizable framework for understanding cellular networks
• I use metabolism because:
• The data fidelity, while not perfect, is far better
• We can use metabolism as a test case to help develop an understanding of cellular networks
Why study metabolism?
3
• My goal is to create a generalizable framework for understanding cellular networks
• I use metabolism because:
• The data fidelity, while not perfect, is far better
• We can use metabolism as a test case to help develop an understanding of cellular networks
• There is also the ability to produce metabolites or chemicals that are of interest
Why study metabolism?
3
• My goal is to create a generalizable framework for understanding cellular networks
Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:
How do we construct a metabolic network
• Metabolites are connected if they are a part of the main reaction pair
Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:
How do we construct a metabolic network
• Metabolites are connected if they are a part of the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:
How do we construct a metabolic network
• Metabolites are connected if they are a part of the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:
How do we construct a metabolic network
UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+
• Metabolites are connected if they are a part of the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:
How do we construct a metabolic network
UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+
• Metabolites are connected if they are a part of the main reaction pair
• Substrates are connected to Products only.
Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:
How do we construct a metabolic network
UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+
UDP-Glucose UDP-Glucuronate
2 NAD+ 2 NADH
Looking at one organism
5
Methanococcus maripaludis
Looking at one organism
5
Methanococcus maripaludis
How do we construct a framework
6
Methanococcus maripaludis
Escherichia coli Homo sapiensArabidopsis thaliana
How do we construct a framework
Current knowledgeof Realm of actuals
‘Res Extenta’
6
Methanococcus maripaludis
Escherichia coli Homo sapiensArabidopsis thaliana
How do we construct a framework
Current knowledgeof Realm of actuals
‘Res Extenta’
Realm of Possibles‘Res Potentia’
6
Methanococcus maripaludis
It can identify new features
7
It can identify new features
7
Increased emphasison metabolite roles
It can identify new features
7
Increased emphasison metabolite roles
It can identify new features
7
Increased emphasison metabolite roles
Putative metabolic‘devices’
We can use this network to revise our knowledge
8
Methanococcus maripaludis
We can use this network to revise our knowledge
8
Methanococcus maripaludis
We can use this network to revise our knowledge
8
Methanococcus maripaludis
Helping to sort out the bigger picture
9
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:
• 88,000 metabolites have been added as annotations
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:
• 88,000 metabolites have been added as annotations
• 31,000 metabolites that were annotated have been removed
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:
• 88,000 metabolites have been added as annotations
• 31,000 metabolites that were annotated have been removed
• Resulting in over 100 changes per organism
How much of a need exists to correct databases?
10
In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:
How can we make predictions?
11
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
How can we make predictions?
11
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
How can we make predictions?
11
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
How can we make predictions?
11
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1(Annotated)
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
Enzyme1Organism1
Enzyme1Organism2
Enzyme1Organism3
Enzyme1Organism4
Reaction1enzymes
How can we make predictions?
12
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
Enzyme1Organism1
Enzyme1Organism2
Enzyme1Organism3
Enzyme1Organism4
Reaction1enzymes
How can we make predictions?
12
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
Enzyme1Organism1
Enzyme1Organism2
Enzyme1Organism3
Enzyme1Organism4
Reaction1enzymes
How can we make predictions?
12
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
Enzyme1Organism1
Enzyme1Organism2
Enzyme1Organism3
Enzyme1Organism4
Reaction1enzymes
Protein BLASTfor Enzyme Sequences
How can we make predictions?
13
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1(Annotated)
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
Enzyme1Organism1
Enzyme1Organism2
Enzyme1Organism3
Enzyme1Organism4
Reaction1enzymes
0.0
MatchE-values
10-3
10-45.0
10-2
How can we make predictions?
14
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Protein1Organism1
Protein2Organism1
Protein3Organism1
Protein4Organism1
Organism1proteins
Enzyme1Organism1
Enzyme1Organism2
Enzyme1Organism3
Enzyme1Organism4
Reaction1enzymes
0.0
MatchE-values
10-3
10-45.0
10-2
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
How can we make predictions?
14
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1(Annotated)
Reaction2(Unannotated)
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
How can we make predictions?
14
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Reaction1(Annotated)
Reaction2(Unannotated)
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
How can we make predictions?
15
For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins
to see how well that reaction ‘fits’
Repeat this for all 3328 reactions using 5.94 million enzyme sequences in 873 organisms
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
Picking an optimal threshold
16
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
Picking an optimal threshold
16
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
Picking an optimal threshold
16
0.0
0.2
0.4
0.6
0.8
1.0
ExcellentMatches
Frac
tion
of M
atch
es
PoorMatches
• We have one starting dataset, metabolic networks from KEGG 2009
How do we validate our results?
17
• We have our predicted networks and its changes to this dataset (Predicted Changes)
• We have one starting dataset, metabolic networks from KEGG 2009
How do we validate our results?
17
• We have our predicted networks and its changes to this dataset (Predicted Changes)
• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)
• We have one starting dataset, metabolic networks from KEGG 2009
How do we validate our results?
17
• We have our predicted networks and its changes to this dataset (Predicted Changes)
• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)
• We can then compare how well each set of changes does in correcting the networks
• We have one starting dataset, metabolic networks from KEGG 2009
How do we validate our results?
17
• We have our predicted networks and its changes to this dataset (Predicted Changes)
• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)
• We can then compare how well each set of changes does in correcting the networks
• Ideally the networks should make sense and be as connected as reasonably possible
• We have one starting dataset, metabolic networks from KEGG 2009
How do we validate our results?
17
Validate by promoting connectedness
18
We can test and see how the actual changes in the database do at completing and filling in gaps
in the networks
Validate by promoting connectedness
18
We can test and see how the actual changes in the database do at completing and filling in gaps
in the networks
Validate by promoting connectedness
18
Gap Size0.00
0.02
0.04
0.06
0.08
0.10
0.12
Frac
tion
of G
aps
Fille
d KEGG ChangesRandom
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in the database do at completing and filling in gaps
in the networks
Validate by promoting connectedness
18
Gap Size0.00
0.02
0.04
0.06
0.08
0.10
0.12
Frac
tion
of G
aps
Fille
d KEGG ChangesRandom
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in the database do at completing and filling in gaps
in the networks
Validate by promoting connectedness
18
Gap Size0.00
0.02
0.04
0.06
0.08
0.10
0.12
Frac
tion
of G
aps
Fille
d KEGG ChangesRandom
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in the database do at completing and filling in gaps
in the networks
Validate by promoting connectedness
18
Gap Size0.00
0.02
0.04
0.06
0.08
0.10
0.12
Frac
tion
of G
aps
Fille
d KEGG ChangesRandom
1 2 3 4 5
Predicted Changes
We can test and see how the actual changes in the database do at completing and filling in gaps
in the networks
Validate by promoting connectedness
19
We can test and see how the actual changes in the database create gaps
Validate by promoting connectedness
19
We can test and see how the actual changes in the database create gaps
Validate by promoting connectedness
19
We can test and see how the actual changes in the database create gaps
Validate by promoting connectedness
19
We can test and see how the actual changes in the database create gaps
-0.1 -0.06 -0.02 0.02 0.06 0.1
RPF PredictedDeletions
KEGG 2011Deletions
Relative fraction of removed reactionsthat create additional components
Validate by promoting connectedness
19
We can test and see how the actual changes in the database create gaps
-0.1 -0.06 -0.02 0.02 0.06 0.1
RPF PredictedDeletions
KEGG 2011Deletions
Relative fraction of removed reactionsthat create additional components
Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks
What did we learn
20
Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks
What did we learn
20
Now we can begin to analyze and understand more complex features of these networks
Acknowledgements
• Luis Amaral
• Irmak Sirer, Pat McMullen, Sam Seaver, Erin Sawardecker
With financial support from:
• Northwestern/NIH Biotechnology Training Grant
• Chicago Biomedical Consortium