pah res-potentia-netsci emailable-stagebuild

67
Res Potentia as a route to understanding function and evolution of cellular networks Adam Pah NetSci June 21, 2012 1

Upload: adam-pah

Post on 24-Jun-2015

64 views

Category:

Science


1 download

DESCRIPTION

NetSci 2012 Talk about using a global metabolic network to predict organismal networks

TRANSCRIPT

Page 1: Pah res-potentia-netsci emailable-stagebuild

Res Potentia as a route to understanding function

and evolution of cellular networks

Adam PahNetSci

June 21, 2012

1

Page 2: Pah res-potentia-netsci emailable-stagebuild

Where do we stand and how can we do better?

2

We are generating biological data faster than ever

Page 3: Pah res-potentia-netsci emailable-stagebuild

Where do we stand and how can we do better?

But generating is only one part, we still have to convert that to actual usable knowledge

2

Knowledge

We are generating biological data faster than ever

Page 4: Pah res-potentia-netsci emailable-stagebuild

Where do we stand and how can we do better?

But generating is only one part, we still have to convert that to actual usable knowledge

2

KnowledgeData

We are generating biological data faster than ever

Page 5: Pah res-potentia-netsci emailable-stagebuild

Where do we stand and how can we do better?

But generating is only one part, we still have to convert that to actual usable knowledge

2

KnowledgeData

Know

ledge

We are generating biological data faster than ever

Page 6: Pah res-potentia-netsci emailable-stagebuild

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

Page 7: Pah res-potentia-netsci emailable-stagebuild

• I use metabolism because:

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

Page 8: Pah res-potentia-netsci emailable-stagebuild

• I use metabolism because:

• The data fidelity, while not perfect, is far better

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

Page 9: Pah res-potentia-netsci emailable-stagebuild

• I use metabolism because:

• The data fidelity, while not perfect, is far better

• We can use metabolism as a test case to help develop an understanding of cellular networks

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

Page 10: Pah res-potentia-netsci emailable-stagebuild

• I use metabolism because:

• The data fidelity, while not perfect, is far better

• We can use metabolism as a test case to help develop an understanding of cellular networks

• There is also the ability to produce metabolites or chemicals that are of interest

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

Page 11: Pah res-potentia-netsci emailable-stagebuild

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

Page 12: Pah res-potentia-netsci emailable-stagebuild

• Metabolites are connected if they are a part of the main reaction pair

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

Page 13: Pah res-potentia-netsci emailable-stagebuild

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

Page 14: Pah res-potentia-netsci emailable-stagebuild

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

Page 15: Pah res-potentia-netsci emailable-stagebuild

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

Page 16: Pah res-potentia-netsci emailable-stagebuild

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

UDP-Glucose UDP-Glucuronate

2 NAD+ 2 NADH

Page 17: Pah res-potentia-netsci emailable-stagebuild

Looking at one organism

5

Methanococcus maripaludis

Page 18: Pah res-potentia-netsci emailable-stagebuild

Looking at one organism

5

Methanococcus maripaludis

Page 19: Pah res-potentia-netsci emailable-stagebuild

How do we construct a framework

6

Methanococcus maripaludis

Page 20: Pah res-potentia-netsci emailable-stagebuild

Escherichia coli Homo sapiensArabidopsis thaliana

How do we construct a framework

Current knowledgeof Realm of actuals

‘Res Extenta’

6

Methanococcus maripaludis

Page 21: Pah res-potentia-netsci emailable-stagebuild

Escherichia coli Homo sapiensArabidopsis thaliana

How do we construct a framework

Current knowledgeof Realm of actuals

‘Res Extenta’

Realm of Possibles‘Res Potentia’

6

Methanococcus maripaludis

Page 22: Pah res-potentia-netsci emailable-stagebuild

It can identify new features

7

Page 23: Pah res-potentia-netsci emailable-stagebuild

It can identify new features

7

Increased emphasison metabolite roles

Page 24: Pah res-potentia-netsci emailable-stagebuild

It can identify new features

7

Increased emphasison metabolite roles

Page 25: Pah res-potentia-netsci emailable-stagebuild

It can identify new features

7

Increased emphasison metabolite roles

Putative metabolic‘devices’

Page 26: Pah res-potentia-netsci emailable-stagebuild

We can use this network to revise our knowledge

8

Methanococcus maripaludis

Page 27: Pah res-potentia-netsci emailable-stagebuild

We can use this network to revise our knowledge

8

Methanococcus maripaludis

Page 28: Pah res-potentia-netsci emailable-stagebuild

We can use this network to revise our knowledge

8

Methanococcus maripaludis

Page 29: Pah res-potentia-netsci emailable-stagebuild

Helping to sort out the bigger picture

9

Page 30: Pah res-potentia-netsci emailable-stagebuild

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

Page 31: Pah res-potentia-netsci emailable-stagebuild

• 88,000 metabolites have been added as annotations

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

Page 32: Pah res-potentia-netsci emailable-stagebuild

• 88,000 metabolites have been added as annotations

• 31,000 metabolites that were annotated have been removed

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

Page 33: Pah res-potentia-netsci emailable-stagebuild

• 88,000 metabolites have been added as annotations

• 31,000 metabolites that were annotated have been removed

• Resulting in over 100 changes per organism

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

Page 34: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Page 35: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Page 36: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Page 37: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

Page 38: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

Page 39: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

Page 40: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

Protein BLASTfor Enzyme Sequences

Page 41: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

13

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

0.0

MatchE-values

10-3

10-45.0

10-2

Page 42: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

14

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

0.0

MatchE-values

10-3

10-45.0

10-2

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 43: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

14

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Reaction2(Unannotated)

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 44: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

14

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Reaction2(Unannotated)

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 45: Pah res-potentia-netsci emailable-stagebuild

How can we make predictions?

15

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Repeat this for all 3328 reactions using 5.94 million enzyme sequences in 873 organisms

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 46: Pah res-potentia-netsci emailable-stagebuild

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 47: Pah res-potentia-netsci emailable-stagebuild

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 48: Pah res-potentia-netsci emailable-stagebuild

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Page 49: Pah res-potentia-netsci emailable-stagebuild

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

Page 50: Pah res-potentia-netsci emailable-stagebuild

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

Page 51: Pah res-potentia-netsci emailable-stagebuild

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

Page 52: Pah res-potentia-netsci emailable-stagebuild

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)

• We can then compare how well each set of changes does in correcting the networks

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

Page 53: Pah res-potentia-netsci emailable-stagebuild

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)

• We can then compare how well each set of changes does in correcting the networks

• Ideally the networks should make sense and be as connected as reasonably possible

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

Page 54: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

18

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Page 55: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

18

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Page 56: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Page 57: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Page 58: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Page 59: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Page 60: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

Page 61: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

Page 62: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

Page 63: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

-0.1 -0.06 -0.02 0.02 0.06 0.1

RPF PredictedDeletions

KEGG 2011Deletions

Relative fraction of removed reactionsthat create additional components

Page 64: Pah res-potentia-netsci emailable-stagebuild

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

-0.1 -0.06 -0.02 0.02 0.06 0.1

RPF PredictedDeletions

KEGG 2011Deletions

Relative fraction of removed reactionsthat create additional components

Page 65: Pah res-potentia-netsci emailable-stagebuild

Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks

What did we learn

20

Page 66: Pah res-potentia-netsci emailable-stagebuild

Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks

What did we learn

20

Now we can begin to analyze and understand more complex features of these networks

Page 67: Pah res-potentia-netsci emailable-stagebuild

Acknowledgements

• Luis Amaral

• Irmak Sirer, Pat McMullen, Sam Seaver, Erin Sawardecker

With financial support from:

• Northwestern/NIH Biotechnology Training Grant

• Chicago Biomedical Consortium