pah res-potentia-netsci emailable-stagebuild

Post on 24-Jun-2015

64 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

NetSci 2012 Talk about using a global metabolic network to predict organismal networks

TRANSCRIPT

Res Potentia as a route to understanding function

and evolution of cellular networks

Adam PahNetSci

June 21, 2012

1

Where do we stand and how can we do better?

2

We are generating biological data faster than ever

Where do we stand and how can we do better?

But generating is only one part, we still have to convert that to actual usable knowledge

2

Knowledge

We are generating biological data faster than ever

Where do we stand and how can we do better?

But generating is only one part, we still have to convert that to actual usable knowledge

2

KnowledgeData

We are generating biological data faster than ever

Where do we stand and how can we do better?

But generating is only one part, we still have to convert that to actual usable knowledge

2

KnowledgeData

Know

ledge

We are generating biological data faster than ever

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

• I use metabolism because:

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

• I use metabolism because:

• The data fidelity, while not perfect, is far better

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

• I use metabolism because:

• The data fidelity, while not perfect, is far better

• We can use metabolism as a test case to help develop an understanding of cellular networks

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

• I use metabolism because:

• The data fidelity, while not perfect, is far better

• We can use metabolism as a test case to help develop an understanding of cellular networks

• There is also the ability to produce metabolites or chemicals that are of interest

Why study metabolism?

3

• My goal is to create a generalizable framework for understanding cellular networks

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

• Metabolites are connected if they are a part of the main reaction pair

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

• Metabolites are connected if they are a part of the main reaction pair

• Substrates are connected to Products only.

Metabolic networks are constructed from the Kyoto Encyclopedia of Genes and Genomes database for each organism where:

How do we construct a metabolic network

UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+UDP-Glucose + H2O + 2 NAD+ UDP-Glucuronate + 2 NADH + 2 H+

UDP-Glucose UDP-Glucuronate

2 NAD+ 2 NADH

Looking at one organism

5

Methanococcus maripaludis

Looking at one organism

5

Methanococcus maripaludis

How do we construct a framework

6

Methanococcus maripaludis

Escherichia coli Homo sapiensArabidopsis thaliana

How do we construct a framework

Current knowledgeof Realm of actuals

‘Res Extenta’

6

Methanococcus maripaludis

Escherichia coli Homo sapiensArabidopsis thaliana

How do we construct a framework

Current knowledgeof Realm of actuals

‘Res Extenta’

Realm of Possibles‘Res Potentia’

6

Methanococcus maripaludis

It can identify new features

7

It can identify new features

7

Increased emphasison metabolite roles

It can identify new features

7

Increased emphasison metabolite roles

It can identify new features

7

Increased emphasison metabolite roles

Putative metabolic‘devices’

We can use this network to revise our knowledge

8

Methanococcus maripaludis

We can use this network to revise our knowledge

8

Methanococcus maripaludis

We can use this network to revise our knowledge

8

Methanococcus maripaludis

Helping to sort out the bigger picture

9

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

• 88,000 metabolites have been added as annotations

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

• 88,000 metabolites have been added as annotations

• 31,000 metabolites that were annotated have been removed

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

• 88,000 metabolites have been added as annotations

• 31,000 metabolites that were annotated have been removed

• Resulting in over 100 changes per organism

How much of a need exists to correct databases?

10

In the course of 1 year for 979 organisms in the Kyoto Encyclopedia of Genes and Genomes Database:

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

How can we make predictions?

11

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

How can we make predictions?

12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

How can we make predictions?

12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

How can we make predictions?

12

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

Protein BLASTfor Enzyme Sequences

How can we make predictions?

13

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

0.0

MatchE-values

10-3

10-45.0

10-2

How can we make predictions?

14

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Protein1Organism1

Protein2Organism1

Protein3Organism1

Protein4Organism1

Organism1proteins

Enzyme1Organism1

Enzyme1Organism2

Enzyme1Organism3

Enzyme1Organism4

Reaction1enzymes

0.0

MatchE-values

10-3

10-45.0

10-2

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

How can we make predictions?

14

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Reaction2(Unannotated)

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

How can we make predictions?

14

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Reaction1(Annotated)

Reaction2(Unannotated)

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

How can we make predictions?

15

For every reaction there is a set of enzyme sequences that we can compare to each organismal set of proteins

to see how well that reaction ‘fits’

Repeat this for all 3328 reactions using 5.94 million enzyme sequences in 873 organisms

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

Picking an optimal threshold

16

0.0

0.2

0.4

0.6

0.8

1.0

ExcellentMatches

Frac

tion

of M

atch

es

PoorMatches

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)

• We can then compare how well each set of changes does in correcting the networks

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

• We have our predicted networks and its changes to this dataset (Predicted Changes)

• I also have the entire KEGG dataset for 2 years following that date (KEGG Changes)

• We can then compare how well each set of changes does in correcting the networks

• Ideally the networks should make sense and be as connected as reasonably possible

• We have one starting dataset, metabolic networks from KEGG 2009

How do we validate our results?

17

Validate by promoting connectedness

18

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Validate by promoting connectedness

18

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Validate by promoting connectedness

18

Gap Size0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

of G

aps

Fille

d KEGG ChangesRandom

1 2 3 4 5

Predicted Changes

We can test and see how the actual changes in the database do at completing and filling in gaps

in the networks

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

-0.1 -0.06 -0.02 0.02 0.06 0.1

RPF PredictedDeletions

KEGG 2011Deletions

Relative fraction of removed reactionsthat create additional components

Validate by promoting connectedness

19

We can test and see how the actual changes in the database create gaps

-0.1 -0.06 -0.02 0.02 0.06 0.1

RPF PredictedDeletions

KEGG 2011Deletions

Relative fraction of removed reactionsthat create additional components

Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks

What did we learn

20

Considering reactions in the context of the Res Potentia enhances the ability to correct and close gaps in organismal networks

What did we learn

20

Now we can begin to analyze and understand more complex features of these networks

Acknowledgements

• Luis Amaral

• Irmak Sirer, Pat McMullen, Sam Seaver, Erin Sawardecker

With financial support from:

• Northwestern/NIH Biotechnology Training Grant

• Chicago Biomedical Consortium

top related