qnet: a tool for querying protein interaction networks banu dost +, tomer shlomi*, nitin gupta +,...

40
QNET: A tool for querying protein interaction networks Banu Dost + , Tomer Shlomi*, Nitin Gupta + , Eytan Ruppin*, Vineet Bafna + , Roded Sharan* + University of California, San Diego *Tel Aviv University, Israel contact: [email protected]

Upload: clement-snow

Post on 12-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

QNET: A tool for querying protein

interaction networks

Banu Dost+, Tomer Shlomi*, Nitin Gupta+, Eytan Ruppin*, Vineet Bafna+, Roded Sharan*

+University of California, San Diego*Tel Aviv University, Israel

contact: [email protected]

Page 2: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Protein Interaction Networks Proteins rarely function in isolation, protein interactions affect all

processes in a cell.

Forms of protein-protein interactions: Modification, complexation [Cardelli, 2005].

e.g. phosphorylation

e.g. protein complex

Page 3: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Protein Interaction Networks

High-throughput methods are available to find all interactions, “PPI network”, of a species.

an undirected graph nodes: protein, edges: interactions

Yeast DIP network: ~5K proteins, ~18K interactions

Fly DIP network: ~7K proteins, ~20K interactions

Proteins rarely function in isolation, protein interactions affect all processes in a cell.

Forms of protein-protein interactions: Modification, complexation [Cardelli, 2005]

PPI network

Page 4: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Motivation: Conservation of Subnetworks

Subnetworks can denote cellular processes, signaling pathways, metabolic pathways, etc.

Many “subnetworks” are conserved across species. Sequences are conserved Interactions are conserved

Yeast FlyWorm

Sharan, Roded et al. (2005), PNAS

Page 5: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network Querying Problem Species A

well studied protein interaction sub-

networks defined by extensive experimentation

Species B less studied little knowledge of sub-

networks protein interaction network

known using high-throughput technologies

Can we use the knowledge of A to discover corresponding sub-networks in B if it is “present”?

Page 6: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network Querying Problem: Homeomorphic Alignment

homeomorphic to Q

insertion

match

match

match

match

match

match

Match of homologous proteins and deletion/insertion of degree-2 nodes

deletion

Species BSpecies A

Q

Page 7: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network Querying Problem: Score of Alignment

ScoreSequence similarity score for matches

Penalty for deletions&insertions

Interaction reliabilities score

+ +=

h(q1,v1)q1 v1

h(q2,v2)

h(q3,v3)

h(q4,v4)

h(q6,v6)

h(q5,v5)

del pen

ins pen

v2

w(v

1,v2)

),()(#)(#),( jiid vvwInsDelvqhScore

Page 8: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network Querying Problem

Given a query graph Q and a network G, find the sub-network of G that is homeomorphic to Q aligned with maximal score

Query Q

Network G

Page 9: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Complexity

Network querying problem is NP-complete. (for general n and k) by reduction from sub-graph

isomorphism problem

Naïve algorithm has O(nk) complexity n = size of the PPI network, k=size of the query Intractable for realistic values of n and k n ~5000, k~10

We use randomized “color coding” technique developed by [Alon et al, JACM, 1995] to find a tractable solution. Reduces O(nk) to n22O(k).

Page 10: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Previous Work

Current Tools: PathBlast [Kelley et al., 2003] MaWish [Koyuturk et al., 2006] Graemlin [Flannick et al., 2006]

Different alignment interpretation Some heuristics to search for the optimal solution

Page 11: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

QNET

Implemented for tree-like queries.

Color coding approach to search for the global optimal sub-network.

Extension of QPATH [Shlomi et al., 2006]

Solves the problem of querying chains using color coding approach.

Page 12: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Query

Color Coded Querying - TreesNetwork

Query has k nodes.

Page 13: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network

Color Coded Querying - Trees

Query has k nodes.Randomly color the network with k distinct colors.Suppose optimal sub-network is “colorful”.

(all of its vertices colored with distinct colors)Use the colors to remember the visited nodes.

Page 14: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Query Network

DP solution for Color Coded Querying - Trees

q1

q2

q3

q4

q5

q6

q7

v1

v2 v3

v6

v7

v4

Page 15: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network

Probability of failure

The optimal alignment can be found only if the optimal sub-network is “colorful”.

Repeat color-coded search multiple times until probability of failure ≤ ε.

v1

v2 v3

v6

v7

v4

v5

kk

ek

kfailureP 1

!1)(

Page 16: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Number of Repeats

keN )./1(ln

Necessary number of repeats to guarantee a failure ≤ ?

Repeat times, then

k=9 and =0.01 => N ~ 30K We reduce N by a new approach “restricted color coding”.

)1ln(. trials)allin P(failure

1per trial) P(failure

ee

eeNe

ek

k

k

Page 17: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Restricted Color CodingNetwork

q1

q2 q3

q4 q5

q6 q7 q8

Idea: take advantage of queries whose proteins tend to have non-overlapping sets of homologs.

Ideal case: no insertions & no common homolog of query proteins => P(failure per trial) = 0.On average, N is reduced by 10-fold.

Page 18: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network Querying with Color Coding Approach

Net

wo

rk

Gra

ph

high scoringsubnetwork

query

randomly color

DP algorithm

repeatN times

Page 19: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Querying General Graphs

We have extended the algorithm for also general graphs.

Idea: Map the original graph into a tree, i.e. tree

decomposition. (Polynomial time for bounded-tree-width graphs)

Solve the querying problem on this tree using DP.

Page 20: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Color Coded Querying – General Graphs

Map the original query into a tree using tree-decomposition.

G

T

vertex

node=set of vertices

u

v z

Page 21: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Color Coded Querying – General Graphs

Width(T) = size of its largest node – 1.Tree-width(G) = minimum width among all possible tree decompositions of G.

G

T

Page 22: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network

Color Coded Querying – General Graphs

Original query has k nodes and tree-width t.Randomly color the network with k distinct colors.. q1

q2

q2

q3

q3

q4

q4q5

q5

q6 q7q8

T

Page 23: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Network

Color Coded Querying – General Graphs

Original query has k nodes and tree-width t.Randomly color the network with k distinct colors.

q1

q2

q2

q3

q3

q4

q4q5

q5

q6 q7q8

v2 v3

v7

v6

v5

v8

v1

v4

O(n(t+1))

T

Page 24: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Running time

n=size of network, k=size of query.

Tree queries: Reduces O(nk) to n22O(k).

Tractable for realistic values of n and k. n ~5000, k~10

Bounded-tree-width graphs: t : tree-width n(t+1)2O(k)

Page 25: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Heuristic for Color Coded Querying - General Graphs

G

1. Extract several spanning trees from the original query.

Page 26: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

1. Extract several spanning trees from the original query.2. Query each spanning tree in the network.

Heuristic for Color Coded Querying - General Graphs

Page 27: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

1. Extract several spanning trees from the original query.2. Query each spanning tree in the network.

Heuristic for Color Coded Querying - General Graphs

Page 28: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

1. Extract several spanning trees from the original query.2. Query each spanning tree in the network.

Heuristic for Color Coded Querying - General Graphs

Page 29: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

1. Extract several spanning trees from the original query.2. Query each spanning tree in the network. 3. Merge the matching trees to obtain matching graph.

Heuristic for Color Coded Querying - General Graphs

Page 30: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Testing

Time

Quality of solutions

Page 31: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

QNET: timing

Handles queries with upto 9 proteins in seconds.

Query

size (k)

#Iterations Avg. time (sec)

Standard color coding

Restricted

color coding

Standard

color coding

Restricted

color coding

5 752 603 1.71 1.58

6 1916 917 6.36 4.73

7 4916 1282 20.46 6.24

8 12690 1669 61.17 9.08

9 32916 2061 173.88 11.03

Page 32: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Test 1: Importance of Topology Motivation: Is sequence similarity enough to find

corresponding sub-network?

Queries: Random tree queries from yeast DIP network [Salwinski,

2004] Topology perturbed (≤2 ins-dels).

Network: Yeast PPI Protein sequences mutated (50-70 percent)

How distant is the result from the original extracted tree?

Page 33: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Test 1: Importance of Topology

Ave

rage

dis

tanc

e

Ave

rage

dis

tanc

e

#ins+#del #ins+#del

QNETBLAST

Distance = #missing proteins + #extra proteins

Outperforms sequence-based searches.

Page 34: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Test 2: Cross-species comparison of MAPK pathways

Motivation: finding conserved pathways.

Query: human MAPK pathway involved in cell proliferation and differentiation.

Network: fly PPI network ~7K proteins ~20K interactions

Match: a known fly MAPK pathway involved in dorsal pattern formation.

Query from

human

Match in fly

Page 35: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Test 3: Cross-species comparison of protein complexes

Motivation: conserved protein complexes between yeast and fly.

Queries: Hand-curated yeast MIPS complexes []. Project onto yeast DIP network Extract several spanning trees

Page 36: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Test 3: Cross-species comparison of protein complexes

Motivation: conserved protein complexes between yeast and fly.

Queries: Hand-curated yeast MIPS complexes []. Project onto yeast DIP network Extract several spanning trees

Network: Fly DIP network

Match Consensus matching graph for each query complex.

Page 37: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Test 3: Cross-species comparison of protein complexes

Result: ~40 of the queries resulted in a match with >1 protein. 72% of the consensus matches are functionally enriched.

(p-value < 0.05) 17% of the random trees extracted from network are functionally

enriched.

YeastCdc28p complex

Fly

Page 38: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Summary

QNET: a tool for querying protein interaction networks Tree-like queries

Randomized algorithm and heuristic proposed for querying general graphs.

Page 39: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Future Work

Development of appropriate score functions to better identify conserved pathways.

Extending QNET for queries with more general structure. bounded-tree-width graphs.

Page 40: QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University

Thank you

University of California, San Diego [email protected]