structure discovery in ppi networks using pattern-based network decomposition

32
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no. 14 2009 May 15, 2009

Upload: zalika

Post on 22-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Structure discovery in PPI networks using pattern-based network decomposition. Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no. 14 2009 May 15, 2009. Outline. Introduction Graph-Theoretic concepts The algorithm Algorithm and results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structure discovery in PPI networks using pattern-based network decomposition

Structure discovery in PPI networks using pattern-based network decomposition

Philip Bachman and Ying LiuBIOINFORMATICS System biology

Vol.25 no. 14 2009May 15, 2009

Page 2: Structure discovery in PPI networks using pattern-based network decomposition

Outline

• Introduction• Graph-Theoretic concepts• The algorithm• Algorithm and results• Pattern-based network decomposition• Conclusion

Page 3: Structure discovery in PPI networks using pattern-based network decomposition

Introduction

• The large, complex networks of interactions between proteins provide a lens through which one can examine the structure and function of biological systems.

• Previous analyses of these networks:– Large-scales statistical analysis of holistic network

properties– Small-scale analysis of local topological features

Page 4: Structure discovery in PPI networks using pattern-based network decomposition

Introduction

• Investigation of meso-scale network structure has been hindered by the computational complexity of structure search in networks.

• In this article, an efficient algorithm for performing sub-graph isomorphism queries on a network and show its computational advantage.

Page 5: Structure discovery in PPI networks using pattern-based network decomposition

Graph-Theoretic concepts

• A graph G as G=(V,E)• Given an ordering o of vertices, one can produce the

adjacency matrix Mo

• If two graph Gi and Gj are isomorphic, then there exist orderings oi of Gi and oj of Gj such that Moi=Moj

• A canonical labeling:

Page 6: Structure discovery in PPI networks using pattern-based network decomposition

Graph-Theoretic concepts

– One canonical label :interpreting all possible orderings of a graph’s vertices and selecting the ordering omax such that Momax is maximized.

Page 7: Structure discovery in PPI networks using pattern-based network decomposition

Graph-Theoretic concepts

• An automorphism of a graph G– mapping of the graph’s vertices onto each other– any two oi and oj such that Moi=Moj

• Aut(G) : the set of all automorphisms of G• The automorphism orbit A, of a graph G– the maximal sets of the vertices in G that are

closed under all mappings in Aut(G)– Refer to the automorphism orbit to which a vertex

v belongs as AutOrb(v).

Page 8: Structure discovery in PPI networks using pattern-based network decomposition

Graph-Theoretic concepts

– Define a canonical ID for each automorphism orbit of a graph G :examining the canonical matrix for G , their order of first appearance in the matrix

Page 9: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm

• To solve the following problem:Given a query graph Gq, find all sub-graphs in a source graph Gs that are isomorphic to Gq

• Three components of this algorithm :– Basic backtracking search used in previous motif

discovery algorithm.– Enhanced by the second, a sysmmetry-breaking

technique present by (Grochow and Kellis, 2007).– Third components, a constraint set for each vertex v is

added.

Page 10: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm

• Constraint set :– For each vertex v in Gq, the set of all vertices in Gs

that are potentially mappable onto v under some mapping of Gq onto an isomorphic sub-graph of Gs.

– quick elimination of candidate vertex pairs during the backtracking search.

– Generating these constraint sets requires a constraint database that is populated by performing an initial set of specific sub-graph queries.

Page 11: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : Backtracking

• Find all sub-graphs in a source graph Gs that are isomorphic to a query graph Gs

• For each vertex v in Gs and each vertex u in Gq

– A mapping , I, of Gq onto Gs is initialized for each such (v,u) pair

– Recursively extend I using each pair of compatible vertices v’ in Gs and u’ in Gq such that v’ and u’ are both adjacent to vertices to vertices already in I.

– An isomorphism has been found once I maps all vertices in Gq onto compatible vertices in Gs.

Page 12: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : symmetry-breaking

• The number of mappings is equal to the number of automorphisms in Aut(Gq), which grow factorially with the number of vertices in Gq.

• We can avoid these ‘repeated’ mappings by using symmetry-breaking constraints

Page 13: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : Generating constraint sets for the source graph

• The database used for per-vertex constraint generation contains entries for each query graph Gq :– The entry for Gq is referenced by lc(Gq)

• The entry for each Gq is a list of constraints sets, one for each automorphism orbit of Gq

– The entries for its automorphism orbits are referenced by their respective canonical IDs.

Page 14: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : Generating constraint sets for the source graph

• The constraint sets for each query graph Gq can be generated by performing a slight modified version of the basic backtracking– Checking each vertex vs in Gs against a representative

vq from each automorphism orbit A of Gq

– If an isomorphic mapping exits which maps vq onto vs, then vs is added to the constraint set for A

• This allows the skipping of any future checks for some vs against some vq where vs is already in the constraint set for AutOrb(vq).

Page 15: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : Generating constraint sets for the source graph

• The database can be bootstrapped :If sub-graph with n vertices will be sampled, starting with all connected graphs of some small size k, and then using the generated database to accelerate the generation of the database entries for all connected graphs of size k+1 , and so on, until size n is reached.

Page 16: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : Vertex constraints for a query sub-graph

Page 17: Structure discovery in PPI networks using pattern-based network decomposition

The algorithm : Vertex constraints for a query sub-graph

• The intersection of constraint performed in step 4 does not preclude any viable mapping of Gq onto Gs.

• Source graph vertices fundamentally incompatible with a query graph vertex will fail to appear in one or more of the constraint sets associated with that vertex.

Page 18: Structure discovery in PPI networks using pattern-based network decomposition

Algorithm and results

The ratio between the times for unconstrained and constrained searches across set of 100 randomly generated queries at each edge density and query size

Queries comprising form 8 to 19 vertices

Page 19: Structure discovery in PPI networks using pattern-based network decomposition

Algorithm and results

The cumulative times for processing all 400 queries of various densities at each query size

Page 20: Structure discovery in PPI networks using pattern-based network decomposition

Algorithm and results• Due to the presence of pathological queries that required an impractical

time to complete.• In this plot, the cumulative time for unconstrained searches appears to

plateau due to our imposed limit on search time, with the maximum possible cumulative time at eatch graph size being 400,000 s, as we set the individual query time-out to 1000s.

• It was common for queries with 15 or more nodes and with edge densities of 0.6 or 0.8 to time-out during unconstrained search.

• Only one constrained search out of the 3600 performed timed-out.• Unconstrained search time was strongly affected by the edge-density of a

query. The effect of density were drastically reduced when using our constraints.

Page 21: Structure discovery in PPI networks using pattern-based network decomposition

Algorithm and results

• The time required to fill the database– CPU time for exhaustive search of all seven and eight

vertex graph.

– The database was bootstrapped, and time required to do so was 4477 s. This is significantly < 37361 s.

Page 22: Structure discovery in PPI networks using pattern-based network decomposition

Pattern-based network decomposition

• An applications to using sub-graph queries that allows a PPI network to be decomposed into sub-network exhibiting specific structural patterns.– Create generalizations of their topological

structure– Search for all appearances of each generalization

in the source network

Page 23: Structure discovery in PPI networks using pattern-based network decomposition

Generating and using query patterns

• Given a specific form of inter-module interaction, deriving generalized query patterns:– A query pattern may be created at the smallest

size which adequately represents some desired structural features

– Allow to ‘expand’ to cover larger sub-graph, which still fit the interaction pattern that it was designed.

Page 24: Structure discovery in PPI networks using pattern-based network decomposition

Generating and using query patterns

Top row : a four-node clique covering all edges and nodes in a dense eight-node graph

Bottom row : overlapping four-node cliques expanding to fully cover overlapping dense six-node graphs

Page 25: Structure discovery in PPI networks using pattern-based network decomposition

Generating and using query patterns

• How a dense module may be covered by a small clique query pattern and how a pair of interacting/overlapping modules may be covered by a small pair of overlapping cliques

• A pattern generalization created using this property of graphs can be used to filter the source network.

Illustration of four sample pattern generalizations

Page 26: Structure discovery in PPI networks using pattern-based network decomposition

Generating and using query patterns

• Filtering a source network using a generalized query pattern :– All instances of the query in the source network are

found– All edges and vertices from the source network that

do not appear in some instance of the query are removed

– Inspect all regions of the source network that have topologies matching the pattern that the query was designed to represent

Page 27: Structure discovery in PPI networks using pattern-based network decomposition

An application of pattern-based network

• Core PPI network for Yeast – 17000 interactions between 5000 proteins– All of the generalizations that we searched for

appeared as significantly enriched motif with respect to an ensemble of 100 random networks, indicating potential biological significance.

– We occasionally had to determine boundaries between overlapping groups of proteins• Determined these boundaries heuristically, by looking for

‘bottlenecks’ between groups of densely interacting proteain.

Page 28: Structure discovery in PPI networks using pattern-based network decomposition
Page 29: Structure discovery in PPI networks using pattern-based network decomposition

An application of pattern-based network

• The most immediate results and the largest extracted network(EN) came from using a four-node clique as the query pattern.

Page 30: Structure discovery in PPI networks using pattern-based network decomposition

An application of pattern-based network

• These components likely represent the most evolutionarily conserved core of the Yeast PPI network, It was shown by Wuchty et al. (2003) that proteins participating in four-node cliques have an evolutionary conservation rate that is over 400 times higher than that which would be expected.

Page 31: Structure discovery in PPI networks using pattern-based network decomposition
Page 32: Structure discovery in PPI networks using pattern-based network decomposition

Conclusion

• An algorithm for one such problem, sub-graph isomorphism, that is more efficient than previous algorithms.

• In concert with suitable query patterns that exploit some simple properties of graphs, query-based graph search can be used to examine network structure at a scale that reveals relationships within and between groups of interacting proteins.