fault tolerance in protein interaction networks: stable bipartite subgraphs and redundant pathways...

67
Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Upload: silvia-rosemary-conley

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Fault Tolerance in Protein Interaction Networks:Stable Bipartite Subgraphs and Redundant Pathways

Lenore CowenTufts University

Page 2: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Protein-protein interaction

Page 3: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Protein-protein interaction

Page 4: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

PPI: A simple graph model

vertices ↔ genes/proteins

edges ↔ physical interactions

simplifications:

• undirected

• loses temporal information

• difficult to decompose into separate processes

• conflates different PPI types into one class of "physical interactions"

Page 5: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Current data

• High-throughput methods are allowing us to fill in many edges in our simple model, often between unannotated proteins.

Page 6: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

What we want:

What we have:

Question: Can we infer anything about "real" pathways from the low-resolution graph model of pairwise interactions?

Page 7: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Interaction types

• We distinguish here between two types of interaction:

– physical interactions

– genetic interactions

Page 8: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Genetic interactions (epistasis)

Only 18% of yeast genes are essential (the yeast dies when they’re removed).

yeast.

essential gene. gene deleted.

yeast dies.

Page 9: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Genetic interaction:synthetic lethality

nonessential gene. gene deleted.

yeast dies.

nonessential gene.

yeast lives.

gene deleted.

both genes deleted at once.

Some pairs of nonessential genes exhibit interesting correlative relationships.

Page 10: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Nonessential Genes

– Some genes are non-essential because they are only required under certain conditions (i.e. an enzyme to metabolize a particular nutrient).

– Other genes are non-essential because the network has some built-in redundancy.

• One gene (completely or partially) compensates for the loss of another.

• One functional pathway (completely or partially) compensates for the loss of another.

Page 11: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Redundant pathwaysand synthetic lethality

Page 12: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Kelley and Ideker (2005):Between-Pathway Model (BPM)

Page 13: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

In reality, the data are very incomplete:Between-Pathway Model (BPM)

Page 14: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Kelley and Ideker (2005)

• Goal: detect putative BPMs in yeast interactome• Method:

1) find densely-connected subsets of the physical protein-protein interaction (PI) network (putative pathways)

2) check the genetic interaction (GI) network to see if patterns in density of genetic interactions correlate with these putative pathways

3) check resulting structures for overrepresentation of biological function (gene set enrichment)

and Ulitsky and Shamir (2007)

Page 15: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Kelley and Ideker (2005)and Ulitsky and Shamir (2007)

(1) (2)

(3)

enriched for function X

enriched for function Y

Page 16: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Kelley and Ideker (2005)

• Problems:– Sparse data limits the potential scope of discovery

– independent validation is difficult

and Ulitsky and Shamir (2007)

Page 17: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Our method

• We show how to systematically search for stable bipartite subgraphs (putative BPMs)

• We use only synthetic lethality interactions to search for BPMs:– allows the use of PIs for independent statistical

validation of putative BPMs– scope of potential discovery is greater than when

using PIs as seed structures

Page 18: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

How should we look for bipartite subgraphs?

Page 19: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Maximum bipartition

• Definition: Given any graph G, a maximum bipartition of G is an assignment of each node of G to one of two sets, A and B, in such a way that the number of edges that CROSS the partition is maximized.

Page 20: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Maximum bipartition

• Definition: Given any graph G, a maximum bipartition of G is an assignment of each node of G to one of two sets, A and B, in such a way that the number of edges that CROSS the partition is maximized.

• Fact: Maximum bipartition is NP-hard.

Page 21: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

We don’t want a maximum bipartition anyway!

We don’t want to force a choice of sides!

Page 22: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Maximal bipartition

• Definition: Given any graph G, a maximal bipartition of G is an assignment of each node of G to one of two sets, in such a way that moving any single node from one set to the other does not increase the number of edges of G which cross between the two sets.

Page 23: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Maximal bipartition

Page 24: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Algorithm

• Randomly assign a set-label to each node in G.• Call a node v “happy” if at least half of its

neighbors are in the opposite set from v, and “unhappy” otherwise.

• While there exists an unhappy node:– Pick one such node at random.– Flip its set label.

Page 25: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Algorithm

(an “unhappy” node flips to “happy.”)

Page 26: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Algorithm

Claim: This procedure terminates in at most |E| steps, where |E| is the number of edges in G.

Proof: While a particular node may switch its affiliation many times over the course of the algorithm, notice that each time a flip is performed, the number of edges crossing between the two partitions increases by at least one. So there can be at most |E| steps.

Page 27: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Algorithm

Claim: On termination, every node is “happy.”Proof: [This is just the termination condition of

the while-loop.]

Observe that the partition generated in this way is maximal: flipping any single node cannot increase the number of edges crossing between partitions, because all nodes are happy.

Page 28: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Stable Bipartite Subgraph: Motivation

If a gene exists within a BPM, then we expect the two pathways of the BPM to fall into opposite sets within most maximal partitions (because the partitioning algorithm is looking to maximize the number of edges crossing between sets).

So in a maximal partition,genes in the same pathway as a BPM gene g should tend to be assigned to the same set as g;those in the opposite pathway should wind up in the opposite set;and those in neither pathway should bounce around with little or no correlation to g’s set-assignment.

Page 29: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Stable Bipartite Subgraph

Definition: For a node m, repeat this procedure k times to find maximal bipartite subgraphs. Let A be the set of nodes that occur in the same partition as m at least r percent of the time. Let B be the set of nodes that occur in the opposite partition of m at least r percent of the time. Return A and B as m’s stable bipartite subgraph.

Page 30: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Stable Bipartite Subgraph

Definition: For a node m, repeat this procedure k times to find maximal bipartite subgraphs. Let A be the set of nodes that occur in the same partition as m at least r percent of the time. Let B be the set of nodes that occur in the opposite partition of m at least r percent of the time. Return A and B as m’s stable bipartite subgraph.

The stable bipartite subgraphs are our BPMs! (k=250; r= 70 percent)

Page 31: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Test Datasets

• original physical + genetic interaction data used in Kelley + Ideker (2005)

• up-to-date set of physical + genetic interactions taken from BioGRID database (October 2007)

1,678 genes (nodes)6,818 edges (SL interactions)

682 genes (nodes)1,858 edges (SL interactions)

Page 32: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Return Stable BPMs?

Page 33: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Example BPM

Page 34: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

How do we know it is meaningful?

Biological validation: Enrichment results. We find things that are known to be functionally related in our putative pathways. [GO Enrichment]

Statistical validation: - Location of known PI edges

- Prediction of new SL edges

Page 35: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results Network BPMs SL edges

covered%Enrich.pathways

Kelley&Ideker

G 360 687 251/720 34.9%

Our Results

G 602 1,526 643/1204 53.4%

Ulitsky&ShamirA

G’ 140 <3,765 100/280 35.7%

Ulitsky&ShamirB

G’ 270 <3,765 177/540 32.8%

Our Results

G* 1,510 4,949 1528/3020

50.6%

Page 36: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results

SGD GO-SLIM coverage

Ulitsky + Shamir Us

46.3% 79.8%

Page 37: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: Dually-enriched BPMs

Page 38: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: Differentially-enriched BPMs

Page 39: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University
Page 40: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Example BPM

Page 41: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Example BPM

Page 42: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Websitehttp://bcb.cs.tufts.edu/.yeast.bpm/

Page 43: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Websitehttp://bcb.cs.tufts.edu/.yeast.bpm/

Page 44: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Websitehttp://bcb.cs.tufts.edu/.yeast.bpm/

Page 45: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: BPM Validation

In addition to validation based on coherence of biological function, we can also statisticially validate our methods directly from the structure of the network!

Method 1: Examine the distribution of known PIs within each BPM.

Page 46: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: BPM Validation

Goal: estimate the probability of seeing as many or fewer physical interactions between the two sets as were actually observed.

Page 47: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: BPM Validation

Page 48: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: BPM Validation

Method 2: Examine the distribution of new SL interactions appearing within each BPM in the Kelley/Ideker network.

Page 49: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: BPM Validation

Goal: estimate the probability of seeing as many or more new synthetic-lethality interactions appearing between the two sets as were actually observed.

Page 50: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results: BPM Validation• Results: Across the set of 175 candidate BPMs

from G which contained at least 20 new SL edges in G+, the average probability that the observed between-pathway bias would occur by chance was 0.017.

• Since these new edges were not used to construct candidate BPMs in G, their distribution bias provides independent support for the hypothesis that stable subgraphs do indeed correspond to biologically meaningful structures.

Page 51: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Validation: Microarray Data

• Rosetta compendium (Hughes et al, 2000): -- contains yeast expression profiles of 276

deletion mutants: i.e. for each gene in the yeast genome,

measures how its expression levels change when particular gene g is deleted, as compared to wildtype yeast.

Page 52: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Delete a gene in pathway 1; see if changes in pathway 2 coherent

Page 53: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

log10 ratio

BPM

Deleted Gene

Pathway restriction

Sort

Page 54: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

At step i: N to 1

Calculate weighted percent of genes in pathway seen so far and precent of genes not in pathway:

Score is max difference

Page 55: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Using a permutation test we sample 99 random subsets of genes the same size as the pathwayWe calculate the cluster rank score for each of these 99 setsWe sort the test plus the pathway scoreThe p-value is the percentileA pathway is validated if its p-value is <=0.1

How to validate a pathway

Page 56: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Delete a gene in pathway 1; see if changes in pathway 2 coherent

We call a pathway “Validated” if its Cluster Rank Score has p-value < .1

Page 57: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Kelley-Ideker Histogram of the Lowest CRS per Pathway per BPM

This histogram displays all the CRS scores from all of the results from Kelley and Ideker’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.

Page 58: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Ulitskyi Histogram of the Lowest CRS per Pathway per BPM

This histogram displays all the CRS scores from all of the results from Ulitskyi’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.

Page 59: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Ma Histogram of the Lowest CRS per Pathway per BPM

This histogram displays all the CRS scores from all of the results from Ma’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.

Page 60: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Brady Histogram of the Lowest CRS per BPM

This histogram displays all the CRS scores from all of the results from Brady’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM. Clearly, Brady’s BPMs are disproportionately represented in the lower p value range.

Page 61: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Results

BPM dataset # paths hitknockouts

# validated pathways

% validatedpathways

Kelley-Ideker (05)

160 16 10%

Ulitsky-Shamir (07)

36 5 14%

Ma et al. (08)

54 6 11%

Our results 959 230 24%

Page 62: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

A Tantalizing Peek of What We can Do With More Data!

• A heat map of the differential expression of yeast genes in pathway 2 in response to the deletion of two different genes (SHE4 and GAS1) from pathway 1 in a validated BPM of Ma et al.

Page 63: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

A random-gene validation test couples the two pathways together

Page 64: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Co-authors and collaborators

• Arthur Brady• Noah Daniels• Ben Hescott • Max Leiserson• Kyle Maxwell• Donna Slonim

Page 65: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

thanks.

Page 66: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

A Graph Theory Problem

• Our algorithm samples from the maximal bipartite subgraphs. With what distribution? Is it uniform? Proportional to the number of edges that cross the cut?? ???

• What are the properties of the stable bipartite subgraphs of the synthetic lethal network? Are they conserved across species?

Page 67: Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Approach• Run the partitioning algorithm 250 times on

the yeast SL network (G).• For each gene g in G,

– Construct a set A consisting of g and all nodes in G which wind up in the same set as g at least 70% of the time.

– Construct another set B consisting of all nodes in G which wind up in the opposite set from g at least 70% of the time.

• We call the subgraph of G defined by A and B the “stable bipartite subgraph of g”, and designate it as a candidate BPM.