interactome - national institutes of health

34
Interactome Teresa M. Przytycka* and Dong-Yeon Cho National Center of Biotechnology Information, NIH 8600 Rockville Pike, Bethesda MD, 20894 USA Correspondig author e-mail: [email protected]

Upload: others

Post on 15-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Interactome

Teresa M. Przytycka* and Dong-Yeon Cho

National Center of Biotechnology Information, NIH

8600 Rockville Pike, Bethesda MD, 20894 USA

Correspondig author e-mail: [email protected]

Cell’s interactome consists of all the molecular interactions occurring in the cell. Among diverse

types of molecular interactions, genome-wide protein-protein interactions are most broadly

studied. In this chapter we focus on genome-scale methods to infer and study protein-protein

interactions. Following a brief introduction to the subject (Section 1), we outline experimental

(Section2) and computational (Section3) techniques to uncover interactions between proteins. In

the context of computational methods, we also discuss methods for predicting interactions

between protein domains. In Section 4, we describe basic topological properties of protein

interaction networks such as node degree distribution, modularity, and network motifs. In section

5, we discuss the motivation and approaches for comparing biological networks. In Section 6,

we briefly introduce protein and domain interaction databases. Section 7 provides sample

applications that take advantage of interactome data while the last Section 8 points to possible

future directions of interactome related research.

Keyword Glossary

Co-evolution Correlated evolutionary changes in the genetic composition of biological objects such as genes. Typically it is assumed that the change in one object is triggered by the genetic change in the second. However in the context of this article, we don’t make the causality assumption. Co-expression Analogous spatial or temporal expression patterns of two or more genes. Typically, co-expression can be identified by calculating correlation coefficients and assessing these correlations though statistical tests. Graph/Network A representation of a binary relation between a set of objects. The objects are called vertices (or nodes). The relation between two vertices is represented by an edge (or link). Specially, a graph whose edges are directed is called directed graph. Directed graphs can be used to represent an asymmetric relation such as gene regulation. Gene Regulatory Network A directed network of genes where one gene (a transcription factor) regulates the expression of another gene (target gene).

Homologs and Orthologs Homologs are genes which originated from a common ancestor. Two homologous found in different species are called orthologs if their least common ancestor in the gene evolutionary tree corresponds to a speciation event. Orthologs typically perform closely related roles in the corresponding organisms. Protein-Protein Interaction Network A network whose nodes denote proteins and links represent binding of proteins to carry out their biological function. Transcription Factor A protein which binds to specific DNA sequences to regulate the transcription or expression of genes. By controlling RNA polymerase’s access to the genes, transcription factors can activate or repress the expression (transfer of genetic information from DNA to mRNA) of the corresponding gene. Yeast Two-Hybrid (Y2H) A yeast-based biological technique used to test protein interaction. One of tested proteins is fused to a DNA binding domain and the other fused to an activation domain. If the two proteins interact a reporter gene is expressed.

1. Introduction

Within a cell, diverse biomolecules work together in a coordinated fashion to provide specific

cellular functions. This coordinated action is achieved, in large part, by a variety of

intermolecular interactions including protein-protein interactions, protein-DNA interactions,

RNA interactions, and many others. Here we focus mainly on interactions between proteins,

including those corresponding to physical interactions (here protein-protein binding) as well as

more abstract “functional” interactions between them. However we briefly mention other

interaction types as well.

Biomolecular interactions are conveniently represented as networks (graphs) with nodes

(vertices) representing molecules, and links (edges) representing interactions between them.

Depending on the type of interaction, the corresponding edge might be directed or not. For

example, a binding of two proteins is typically represented by an undirected edge while an

interaction between a transcription factor and a gene whose expression is regulated by the given

transcription factor is usually represented by a directed edge. A representation of the interactome

as a network/graph provides not only a convenient visualization but also enables using graph

theoretical concepts and tools in studies of biological networks. For example, the Cytoscape suite

has emerged as a one of the leading tools for visualization and network analysis. It also allows

for the development of third party tools (plugins) that take advantage of its functionality [1]. An

example of yeast protein-protein interaction network visualized with the help of Cystoscape is

shown in Figure 1.

Figure 1. Yeast protein-protein interaction network drown using Cytoscape [1] constructed from based on complex purification experiments [2] and [3] with statistical scores from [4]. Color-coded are modules - two or more proteins which are simultaneously present in multiple protein complexes from the complex purification experiments [2].

2. Experimental techniques for detecting protein interactions

The two main technologies that are used in high-throughput protein-protein interaction detection

experiments are yeast two-hybrid assay (Y2H) and protein complex purification followed by

identification by mass spectrometry (CoAP/MS). These two techniques are vastly different, each

having its own strengths and limitations. In the yeast-two-hybrid (Y2H) experiment, pioneered

by Fields and Song [5] , one of the tested proteins, the so-called bait, is fused with a DNA-

binding domain (usually GAL4 or LexA) while the second protein (prey) is fused with a

transcriptional activation domain for a transcription factor that can activate expression of a

reporter gene (such as beta-galactosidase). Both chimeras are then expressed in a yeast cell and if

they interact, their interaction prompts expression of the reporter gene. The two important

properties of this experimental technique that one has to keep in mind are: (i) it detects binary

interactions only and thus might miss interacting proteins that require additional proteins, such as

scaffold proteins or other members of a protein complex, to facilitate the interaction, (ii) it

uncovers the potential for interaction – that is, whether or not the two proteins actually interact

in a cell depends upon spatial, temporal, and contextual constraints such as, cell cycle phase,

stress, presence (or lack) of a particular nutrient etc. Currently high-throughput Y2H

interaction networks (maps) are available for many organisms including Saccharomyces

cerevisiae [6-9] C.elegans [10, 11] Drosophila melanogaster [12], and humans [13, 14].

The second key technology used to obtain high-throughput interaction maps is co-complex

identification CoIP/MS, including tandem affinity purification (TAP) followed by mass

spectrometry TAP/MS [15-17]. In contrast to the Y2H procedure, this approach reveals one-to-

many interactions in a particular experimental condition. Specifically, a bait protein is tested for

interaction with all other proteins (preys) expressed in the given condition. This is done by

allowing complex formation of the bait protein with other proteins in the cell, retrieving and

purifying the corresponding complexes and identifying co-complexed proteins with Mass

Spectrometry. Retrieval of the corresponding complexes requires antibodies for the bait protein

or tagging the bait protein with a peptide for which antibodies are available. Consequently one

should keep in mind that (i) since the experiment recovers whole complexes, the bait protein

does not necessarily interact directly with all proteins in the complex, (ii) transient interactions

are often difficult to capture using this approach. Like with Y2H experiment, a number of high-

throughput co-complexed interaction have been obtained [2, 3, 18-20] .

The two experimental methods described above are, in many ways, complementary and have

unique strengths and limitations [21-23]. Therefore the results of both types of experiments are

commonly combined into one protein-protein interaction network. While such a combined

network provides a more complete interactome, it is important to keep in mind that if treated

individually, the two types of experiments used to obtain such a network define two rather

different networks in terms of their biological and topological properties (e.g. [9, 24]) .

3. Computational Prediction of Protein Interactions

Experimental procedures detecting protein-protein interactions are complemented by

computational approaches. Computational methods explore a variety of techniques which can

roughly be divided into three categories: evolutionary based approaches, statistical methods, and

machine learning techniques. Evolutionary based approaches for predicting protein interactions

typically explore the idea that interacting proteins are subject to common evolutionary

constraints. Such constraints can impact the spatial organization of interacting genes in the

genome, position in protein-protein interaction network, or the amino-acid sequence. While

some of these approaches are designed to predict physical interactions, many do not attempt to

distinguish physical and functional interactions, as both types of interactions might be subject to

similar evolutionary constrains.

3.1. Interaction prediction from the gene patterns across genomes.

3.1.1. Gene Fusion

The gene fusion method is an evolution based approach for predicting physical interactions. The

main idea follows from the observation that if a pair of proteins, A and B, present in one

organism are in another organism fused together into a single protein, then these two proteins are

likely to interact (Figure 2a) [25, 26]. A natural explanation for this observation is that if A and

B interact then bringing A and B together in the fused protein facilitates interaction between

them. Marcotte et al. coined the term “the Rosetta Stone protein” for the fused protein. Using

this approach, they identified 6809 such putative protein-protein interactions in Escherichia coli

and 45,502 in yeast (Figure 2a). In a larger study, Enright and Ouziunis uncovered 7,224

component and 2,365 unique composite proteins across 24 species [27].

Figure 2. Prediction of protein interaction from gene pattern in the genomes: a) the “Rosetta stone” method that predicts interaction between proteins P1 and P2 based on the fact that they are fused together into one protein in genome 3; b) the gene order method predicts interaction between P1 and P3 based on conservation of their order; c) Phylogenic profile method predicts (functional) interaction based on similar pattern of presence and absence in genomes G1-G5.

3.1.2. Gene Order

Whole genome sequencing of large numbers of genomes allows us to observe patterns of

genome organization and evolution [28]. In the context of bacterial and archaeal genomes, it has

been observed that, while in general the ordering of genes along genomes is not well conserved

between species, the order of genes encoding interacting proteins tends to be conserved [29, 30].

Conversely, proteins encoded by gene pairs with conserved gene order often interact physically.

Based on this observation, Dandekar et al. proposed a method for predicting interacting proteins

[30] as proteins encoded by genes with a conserved gene order (Figure 2b). One shall note that

this, the so called gene order method, and its variants [31-34], are most suitable in the context

of bacterial and archaeal genomes where groups of genes are organized into operon structures.

3.1.3. Phylogenetic profiling

Phylogenic profiling method is based on the premise that functional interactions are conserved

across a range of species. Consistently with this assumption, there should be a correlation

between patterns of presence and absence in various genomes of functionally interacting genes.

Such presence /absence pattern is referred to as the phylogenic profile of a gene [35-38]. More

formally, the phylogenic profile of a gene within a set of n reference genomes is a vector of

length n where the ith element of the vector is set to one if the given gene is present in ith genome

and zero otherwise. Similarity between phylogenic profiles can be measured using metrics such

as Hamming distance, correlation coefficient, or mutual information. Presence/absence of a gene

within a genome can be also quantified using probability scores instead of binary values.

Similarities between phylogenetic profiles can be used to predict functional linkage between

proteins (Figure 2c) [35, 36]. Grouping genes with common evolutionary patterns allows also to

predict functional associations for genes with unknown function and to discover previously

uncharacterized cellular pathways and functional network modules [39-54]. It has been

demonstrated that the reliability of the phylogenic profile method depends on the selection of the

reference genomes [53, 55].

Bowers et al. extended the phylogenic profiling method so that it considers three proteins at a

time. They searched for instances in which the combined logical patterns embodied by two

proteins determined the behavior of a third [56].

Phylogenic profile predicts functional associations rather than physical interactions. One

important drawback of this method is that it cannot be used to predict interaction between

proteins which are present in all (or nearly all) reference genomes.

3.2. Predicting interaction from sequence co-evolution

So far we have looked at approaches utilizing patterns of gene organization within genomes. The

mirror tree method, discussed in this section, takes the next step and zooms in on the details of

sequence features. This approach is based on the premise that sequences of interacting proteins

are expected to co-evolve to maintain their interactions [57-60]. Motivated by this supposition,

the mirrortree method predicts protein-protein interactions by assessing the extent of agreement

between evolutionary trees, or more precisely, distance matrices used to infer such trees as

illustrated in Figure 3 [57, 59-67].

Figure 3. The basic variant of the mirror tree method. The co-evolution between proteins A and B is assessed by comparing evolutionary rates of a family of sequences orthologous to A and the corresponding family of sequences orthologous to B. Subsequent variants of the method attempt to account for the common speciation history (illustrated by the evolutionary tree on the right) of both families.

In its simplest form, the mirror tree method assesses the co-evolution of two proteins by the

correlation between the distance matrices constructed individually for the sets of sequences

orthologous to each of the two proteins. That is, given two proteins A and B, it considers n

sequences orthologous with A and the same number of sequences orthologous with B coming

from the same set of species. Subsequently, for each pair (i,j) of orthologs of A (respectively

B), it estimates the evolutionary distance A(i,j) (respectively B(i,j) ) between them. The degree

of co-evolution between the two families of orthologs is then assessed by computing the

correlation coefficient between the distance matrices (Figure 3).

As described above, the mirror tree measures the correlation between rates of evolutionary

changes (rates of divergence). There are two reasons for which such correlation might occur:

common speciation history and common evolutionary constraints imposed by physical and/or

functional interaction. Thus one of the challenges related to the mirror tree approach is to

separate apart these two sources of correlation. Recently several methods have been developed to

address this problem and have been shown to improve interaction prediction accuracy [64, 65,

68]. The approaches of Pazos et al. and Sato et al. subtract the estimated organism divergence

rates from the combined co-evolution signal while Kann et al. shows that the performance of the

mirrortree method can be further improved by restricting the co-evolution analysis to the more

conserved regions in the protein domain sequences and disregarding highly divergent regions.

Such regions are likely to be diverged by neutral evolution and are not expected to contain a

functional co-evolution signal.

It is natural to hypothesize that the driving mechanisms of sequence co-evolution come from

compensating mutations, where mutations in one binding partner is compensated by

complementary mutations in another partner to maintain amino acid interactions. However it has

been shown that compensating mutations are not the only, perhaps not even the dominating,

contributor to the correlation of evolutionary rates [69, 70]. Instead, the co-evolution signal is

more likely a composite of many other factors such co-expression of interacting proteins, similar

codon usage, interaction with other proteins in a complex, etc. This suggests that it should be

meaningful to consider co-evolution (or co-adaptation) of proteins in a broader network-level

context. This idea has been explored by Juan et al. who replaced the vector of evolutionary

distances by the vector of co-evolutionary correlations between all proteins within a genome

and, in this way, obtained a reliable interaction network of Escherichia coli [71].

There are several other variants of the mirror tree method. For example, Tiller and Charlebois

replaced simple matrix correlation by a more sophisticated search for the most similar common

subtree, that can also include paralogs [72]. A variant of the mirror tree approach has been used

to predict interaction specificity. Namely, given two families of proteins, which are known to

interact, the objective is to establish a mapping defining interaction partners between the

members of one family with the members of the other family [62, 73, 74]. Finally, Jothi et al.

used the mirror tree method to identify interacting domains within interacting proteins [61].

3.3. Domain interactions

A large fraction of prokaryotic and most of eukaryotic proteins are composed of more than one

domain [75]. Protein interaction typically involves binding between two or more specific

domains. Indeed, the domain composition of two proteins can be used for predicting interaction

between them [76-84]. Conversely, the knowledge of a protein interaction network can be used

to infer interacting domains [85-90].

The idea of using protein-protein interaction network to predict domain-domain interactions was

first explored by Sprinzak and Margalit who proposed a simple statistical approach, referred to

as the Association Method [82]. The idea behind this approach is to score each domain pair by

the log ratio of the frequency of occurrences in interacting proteins to the expected frequency of

independent occurrences of these two domains [82]. That is, if Pi is the observed frequency of

domain i in the interaction network and Pij is the observed frequency of domain pair (i, j) in

interacting protein pairs, then ji

ij

PPP

jiScorenAssociatio log),(_ = . Interacting domain pairs are then

predicted as the pairs with high association score. Following the work of Sprinzak and Margalit,

several related methods have been proposed (reviewed in [86]). For example, Deng et al.

developed a maximum likelihood approach to estimate the probability of domain-domain

interactions [81]. The main idea is to estimate, for each domain pair, the probability of

interaction between domains so that the likelihood of the interaction network is maximized. An

elegant feature of this approach is an explicit modeling of errors in the high-throughput data that

constitutes the protein interaction network.

These early domain interaction prediction methods were successful in uncovering domains that

interact in a constitutive way however were challenged in the situations where a domain pair ( i,

j) interacts in the context of some protein pairs but at the same time there are also many proteins

containing i and j respectively that do not interact. To be able to discover such context specific

interactions, Riley et al. introduced Domain Pair Exclusion Analysis (DPEA) [91]. This

approach is a clever utilization of the maximum likelihood approach. Namely the likelihood

score of a network can be viewed as a measure of how well the probabilities assigned to putative

domain interactions explain the network. Thus, if domain pair (i, j) mediates some protein-

protein interactions in a context specific way, then excluding such domain pair as a possible

interacting pair should decrease the likelihood score of the network. Consistently with this

premise, DPEA predicts interaction between domains by measuring the drop in the likelihood

score when a particular domain pair is not allowed to interact. This approach was able to detect

interacting domain pairs missed by the previous approaches. More recently, Wang et al. [92]

improved this idea further and suggested a scoring method that accounts more fully for the

context in which the interaction occurs. To achieve this, rather than globally disallowing all

interactions between two specific domains as proposed by Riley et al., they disallow only

interactions in the context of a specific pair of interacting proteins.

The idea of recovering interacting domains by examining how well the potential domain contacts

explain the protein interaction network was also the basis of the method proposed by Guimarães

and colleagues [85]. Building on the assumption that protein interactions evolved in a most

parsimonious way, they proposed the Parsimonious Explanation method which finds a smallest

weighted set of domain interactions that can “explain” the protein interaction network. In other

words, the method finds the smallest set of domains so that if one considers them interacting then

each interacting protein pair contains at least one interacting domain pair. This model is

formalized as an optimization problem and solved with a Linear Programming procedure. The

variables of the linear program represent the potential domain contacts derived from the protein

interaction network, and the constraints are defined protein-protein interactions (edges). The

construction is illustrated in Figure 4.

Figure 4 Domain-domain interaction inference from protein interaction networks via a linear programming construction. In this toy network the rectangles correspond to proteins and colored squares to domains. For each domain pair i,j there is variable taking real values between 0 and 1. For each pair (A,B) of interacting proteins A and B, there is one constraint ∑ 𝑥𝑖𝑗 i∈A,j∈B ≥1where the summation is over all domain pairs i,j where i is a domain of protein A and j is a domain in protein B. So the constraint for each interacting protein pair enforces that the values of the variables representing the potentially interacting domain pairs add up to at least 1.0. Following the parsimony principle, the objective function aims to minimize the overall sum of the variables.

The methods described so far utilized, for predicting interacting domains, only protein

interaction- interaction network and domain composition of each protein. This basic information

can be enriched in many ways [92, 93]. For example, by analogy to the Rosetta Stone approach

(Section 3.1.1), it has been observed that two domains that co-occur in one protein chain are also

likely to mediate interactions between different proteins. Incorporating such additional

information has the potential of improving domain interaction prediction [89, 94, 95].

A formidable challenge faced by the methods for predicting interacting domains is evaluation of

the method’s quality. Typically such evaluation is based on the knowledge of domain

interactions derived from crystal structures. Unfortunately, this data is biased towards particular

types of interactions so our ability to test these method in full is limited [95].

3.4. Co-expression networks

Starting from the assumption that proteins from the same protein complex are likely to be co-

expressed, expression data has been utilized to predict new and validate known protein-protein

interactions [96-102]. Co-expression networks are usually inferred by computing the Pearson

correlation coefficients or mutual information between every pair of gene expression profiles

across a variety of experimental conditions. It has been shown that genes with similar expression

patterns across a set of samples tend to be functionally related [103]. Therefore co-expression

data is frequently combined with other type of data to predict protein-protein interactions and

build functional networks. Such networks built using expression information have been

constructed for variety of organisms [96-98, 104-108].

Expression profiles are often used to infer regulatory networks. The underlying assumption

explored in gene regulatory network reconstruction programs such as ARACNE [109] is that

change in the expression of a transcription factor should be mirrored in a change in the

expression of genes regulated by a given transcription factor. Co-expression alone does not

provide information on the direction of regulatory relationship. However, expression data can be

also used to construct Bayesian networks which can represent conditional dependence of

expression levels (for a primer on Bayesian network analysis utilizing expression data see [110];

for a recent review see [111]). A related approach to orient the edges was proposed by Schadt et

al. [112, 113].

4. Exploring the topology of the interactome

Graph theory provides a unifying language to describe relations within complex systems and has

played an increasingly important role in understanding biological systems. It enables the use of

graph-theoretical tools and concepts to interrogate properties of interaction networks. Currently a

number of packages for visualizing, modeling and analyzing various types of networks are

available including the popular Cytoscape package [1, 114].

4.1 Global properties

Studies of protein interaction networks form several model organisms revealed that such

networks often have interesting topological properties. One of the most celebrated properties is a

particular distribution of node degrees (the number of immediate network neighbors for a given

node). Specifically, it has been argued that for these networks the degree distribution is

consistent with a power law [115-117]. In other words, it has been suggested that interaction

networks are scale free. Formally, in a scale free network, the fraction P(k) of nodes in the

network having degree k is proportional to k--γ that is ; where γ is a constant whose

value is typically in the range 2 < γ < 3. While the accuracy of the supposition that the protein

interaction networks are scale free has been questioned, [118-120], the main property of protein

interaction networks consistent with a scale free network – small number of highly connected

nodes (hubs) and large number of weakly connected nodes - is generally not disputed.

4.2. Network centrality and protein essentiality

Network centrality is a measure of the topological prominence of a node within a network. Is

network centrality related to protein function? There are several ways to measure node

centrality emphasizing different aspects of network topology [121]. In the context of biological

networks, the most studied centrality indexes are degree centrality and betweenness centrality,

however many other indexes have been also considered [24, 122, 123]. Degree centrality

evaluates the node’s centrality by the number of its immediate neighbors in the network (Figure

5). The nodes with high degree are referred to as hubs. In contrast, in the shortest-path

betweenness centrality, the node’s centrality value is proportional to the fraction of shortest paths

between all pairs of nodes that go through a given node (Figure 5). Thus betweenness centrality

measures whether a node might be central for the information flow within the network.

Alternatively, one can model the interactome as an electric circuit and use current flow to model

propagation of information between all pairs nodes is such network and define current flow

centrality as a total amount of current going through a given node [24, 124-128].

Figure 5. Illustration of centrality measures. Vertices a, a’, b, b’ tie for the highest degree centrality of 4. The orange vertices c and c’ have the highest betweenness centrality since every shortest path between the green component and the blue component has to pass through these vertices.

In their influential paper, Jeong et al. [115] demonstrated that in a protein interaction network of

Saccharomyces cerevisiae, high-degree nodes contain more essential proteins than would be

expected by chance. (In that study a gene was called essential if its deletion prevented yeast

growth in optimal laboratory conditions). Similarly, nodes with high betweenness centrality

(bottlenecks) have been correlated with gene essentiality [129]. The correlation between vertex

degree and essentiality has been studied extensively [24, 130-132]. Interestingly this relation

hasn’t been so clearly observed in Y2H networks [9, 24, 131].

It has been proposed that enrichment of essential proteins among high-degree nodes is an

implication of the central role hubs play in mediating interactions among other proteins. Indeed,

the removal of hubs disrupts the connectivity of the network, as measured by the network

diameter or the size of the largest connected component, more than the removal of an equivalent

number of random nodes [115, 133, 134]. However the lack of a clear relation between

essentiality and vertex degree in the Y2H derived networks as well as computational evidence

[24] suggest that the correlation between vertex degree is related to enrichment in essential

proteins of large dense subnetworks, typically corresponding to complexes. This observation was

further confirmed by Wang et al. who additionally showed that the enrichment of complexes

with essential proteins increases with complex size [135]. In contrast, in yeast binary interactions

(Y2H) Yu et al. uncovered a relation between essentiality of a protein and number of cellular

processes in which it participates.

Combing vertex degree information with expression data, Han et al. distinguished two groups of

hubs called ‘party’ and ‘date’ hubs [136]. For a party hub, the expression of the hub node is

correlated with that of its neighbors, suggesting that all of the interactions may take place

simultaneously or under similar conditions. In contrast, in date hubs, the correlation between the

expression of a hub and its neighbor is, on average low. Thus these two types of hubs are

proposed to play distinct roles in interactome –party hubs being members of protein complexes

or functional modules while date hubs corresponding to global regulators possibly linking

various functional modules.

4.3. Network modules

In their landmark paper, Hartwell et al. proposed functional modules as a critical level of

biological organization [137]. They defined a functional module as an entity, composed of many

type of interacting molecule, whose function is separable from those of other modules. Currently,

modularity of biological systems is a widely accepted phenomenon. Indeed, by analyzing an

early yeast protein-protein interaction network, Schwikowski et al. observed that proteins of

known function and cellular location tend to cluster together. Genome-scale reconstructions of

biological networks that are enabled by current technologies provide context for identifying such

modules. However there is no unique way to mathematically define functional modules.

Computationally, most methods search for densely connected subgraphs or clusters using a

variety of heuristics ranging from growing modules from seed clusters, clustering based on

graph-theoretical distance measures, to Monte Carlo Markov Clustering (MCMC) approaches

[138-155]. Additionally, gene expression information can be utilized to obtain more reliable

modules [156-158].

A characteristic property, that one naturally expects modules to satisfy, is that molecules within

the module are more strongly connected between themselves than are the molecules in the

module to molecules outside the module. This intuition can is formalized by the following

concept of modularity. Given a partition of nodes of network with m edges (links) into groups

C1, C2 … Cr the modularity of such partition can be defined as

𝑄 = 1

2𝑚�(𝐴𝑖𝑗 −

𝑘𝑖𝑘𝑗2𝑚

𝑖,𝑗

)𝐶𝑖𝑗

Where Aij equals one if i and j are connected in the network and zero otherwise; Cij equals to one

of i and j are in the same group and zero otherwise. So defined Q takes values between -1 and 1

where positive value of Q indicates that number of edges within groups is higher than expected

by chance. Following such (or related) definition, some methods identify modules as partitions

into “communities” that maximize modularity [151, 159].

Most of modules finding algorithms assign each node to at most one module. However, in

practice, biological modules can overlap and/or form modular hierarchies. Furthermore, a given

component may belong to a different module at different time. Thus some of the more recent

approaches focus on identifying overlapping modules [140, 160] their hierarchy [161] or

dynamics along activity pathways [162].

Given the wealth and diversity of module finding algorithms, it is important to provide some

means of establishing the biological relevance of uncovered modules. The most commonly

applied strategy is to evaluate how well various methods perform in uncovering known

complexes [163], potentially overlapping functional modules [164], or how well they are

conserved through evolution [165]. In a recent evaluation of module finding algorithms, Song

and Singh observed that the performances of various algorithms in uncovering functional

modules can differ substantially when run on the same network and that their relative

performances change depending upon the topological characteristics of the network under

consideration indicating that there is currently no single best approach to this problem.

4.4 Network motifs and related concepts

In section 4.1 we observed that vertices in protein-protein interaction network tend to have a

characteristics vertex degree distribution manifested by a small number of high degree nodes and

a large number of nodes with a very small degree. As a natural extension of node degree we can

consider distribution of small subnetworks such as such as triangles, squares etc. Along this line

Milo et al., defined network motifs as subgraphs that occur in a network much more often than it

is expected by chance [166]. The “by chance” occurrence in a given type of network is usually

estimated by constructing a set of random networks with the same basic properties as the tested

network - for example the same degree distribution. In their landmark paper, Milo et al.

discovered that various networks are characterized by overrepresentation of certain network

motifs. Focusing on directed networks, Milo et al. showed, among other findings,

overrepresentation within gene regulatory network of feed-forward loops and bi-fan network

motifs (Figure 6 a). Network motifs have been demonstrated to support specific regulatory

functions [167-171]. In the context of undirected protein-protein interaction network, Przulj et

al. studied distribution of graphlets – small induced subgraphs [172]. As a somewhat related

concept but taking advantage of protein functional annotations, Banks et al. introduced network

schemas to describe patterns of labeled subgraphs [173]. A network schema consists of

descriptions of proteins (e.g. their molecular functions or putative domains) along with the

desired topology and types of interactions (e.g. physical, phosphorylation or regulatory) (Figure

6b). In addition to searching for matches to particular network schemas, it is also possible to

infer which network schemas are frequent and over-represented in networks [174] and thereby to

uncover general recurring patterns underlying a range of biological processes.

Figure 6 a) Examples of network motif in regulatory network motifs: feed-forward and bi-fan motifs. The

arrows indicate the direction of regulation. b) Example of a network schema associated with signaling: the

nodes are labeled with specific feature description, such as GTPase, proteins kinase etc, and the edges are

labeled with the interaction type.

5 Comparing protein-protein interaction networks

The basis of interaction prediction methods discussed in section 3.3 was the assumption that if

two proteins interact in one organism then the orthologs of these two proteins in another are very

likely to interact as well. Walhout et al. coined term interologs to describe such orthologous pairs

of interacting proteins [175]. Indeed, it has been show that a protein interaction map generated

in one species can be used to predict interactions in another species [175-177].

The idea of transferring interaction annotation between organisms can be extended further by

comparing whole interaction networks between different organisms. Such network comparison

allows addressing a number of fundamental biological questions related to the evolution of

protein interaction networks and predicting new functional information about proteins and

interactions that are poorly characterized [178]. Just as sequence alignment is the cornerstone of

sequence comparison, network comparison demands methods for alignment of biological

networks. Indeed, aligning interaction networks from different organisms allows one to discover

evolutionary conserved pathways and functional orthologs [179, 180] [181] [182, 183] .

From the algorithmic-theoretical perspective, network alignment is a difficult problem and, in

the most general formulation, reduces to finding of a maximal common subnetwork of two (or

more) networks – a classical problem in graph theory known to be NP-complete [184]. Thus it is

unlikely that a fast algorithm to solve the problem in the full generality exit. However, over the

years, a number of insightful and efficient algorithms have been proposed that take advantage of

various specific properties of biomolecular networks [181, 185-192]. These algorithms allows

for global network alignment, local alignments, as well as finding alignments of a subnetwork

within a larger network.

With these tools at hand one can search for conserved network regions such as conserved protein

complexes and pathways, identify proteins with despite lack of sequence similarity perform the

same function in the network, or even use network similarities to infer evolutionary trees [193,

194] .

6. Databases of protein and domain interactions

As the data about protein and domain interactions has been steadily increasing, various databases

and public repositories have been constructed to share the knowledge and to support subsequent

studies. Recent reviews of these databases can be found in [195-197]. In this section, we provide

representative examples of such databases. The Database of Interacting Proteins (DIP) catalogs

experimentally determined protein-protein interactions which are obtained from various

resources including the literature, the Protein Data Bank (PDB), and high-throughput

experiments [198, 199]. IntAct [200] and BioGRID [201] contain not only protein-protein

interactions but also other types of interactions such as protein-small molecule interactions and

genetic interactions. MINT [202] annotates each interaction with a score which ranges from 0 to

1 for quantifying interaction support. Since none of the current database can provide the

complete information about interactions for all species, some groups tried to extract and unified

interaction data from different repositories. APID [203] and PINA [204] are two representatives

of such meta-databases. Furthermore, some databases (e.g., STRING [54, 205, 206] and I2D

[207] ) include protein-protein interactions predicted by computational approaches. Finally, some

dedicated protein interaction data focusing on specific model organisms are incorporated as a

part of the organism related resources such as FlyBase [208] for D.melanogaster, and SGD

[209] for yeast.

Analysis of domain interactions often provides important insights into the role and the

mechanism of an interaction. The databases 3did, [210], iPFAM [211], and PIBASE [212]

allow for exploring the details of domain interactions by studying three dimensional structures

of interacting domains extracted from in the Protein Data Bank [213]. The Conserved Binding

Mode (CBM) database [214] categorizes interacting domains by the Conserved Domain

Database (CDD) family type and interaction mode. DOMINE contains both known and predicted

domain interactions obtained from two 3D-structure based databases (iPfam and 3did) and eight

different computational approaches [215].

7. Applications

Knowledge of protein interactions can provide important clues about the functioning of cells and

organisms. In this section, we provide several examples of how interaction networks can be

explored to empower biomedical research. Obviously the examples presented below are only a

small sampling of the diverse applications of bimolecular networks.

7.1. Predicting protein function

The observation that majority of interactions occur between proteins with common functional

assignment [216] paved the way to several approaches to predict protein function based on the

protein’s location in the network in relation to functionally annotated proteins. Sharan et al.

divide the emerging methods into two types: direct and module-assisted schemes [217]. The

direct annotation schemes, individual links in the network are used for inferring functions of

proteins. In contrast, module-based methods first detect modules of interconnected proteins and

then assign protein functions based on functional annotation of other proteins in the module.

Thus the key step of module-based methods is utilization of module finding approaches

discussed in section 4.3. Here we focus on the direct annotation approaches. The pioneering

method of Schwikowski et al. predicts the biological process of a non-annotated protein by

considering its neighboring interactions and assigning to this protein the annotations that are

most frequent among the neighbors [216]. This strategy works very well for biological networks

with high proportion of annotated proteins where un-annotated proteins have many annotated

neighboring proteins [218]. Hishigaki et al. extended further this neighbor derived annotation

method and rather than considering direct neighbors of a given proteins, they consider all

proteins within a particular radius [219]. Subsequently, Nabieva et al. argued that in such

transfer of functional annotation from more distant neighbors, one should take into account not

only the distance to annotated proteins but also the topology of the network. This view is utilized

in their “functional flow” algorithm [218].

Integrating network topology with other type of data, such as gene expression profile, domain

context, text mining can further inform functional annotation [217, 220, 221]. Kourmpetis et al.

proposed Markov random field analysis for integrating protein interaction network with multiple

data sources [222].

7.2. Application to human diseases

It is now increasingly recognized that complex diseases should be studied from the perspective

of dys-regulated pathways and processes rather than individual genes. The availability of

genome scale protein-protein interaction and other interaction maps has made it possible to begin

such systems-level investigation of human diseases. Following this principle, in their elegant

work, Chuang et al. proposed network-based method for classification of breast cancer

metastasis [223]. The main idea of this approach is to combine gene expression profiles with

network/pathway information and look for disease altered subnetworks. The underlying

assumption is that disease related perturbations, manifested by gene expression change,

propagate over the interaction network and lead to clusters of perturbed nodes. Importantly,

some abnormally expressed genes might differ between disease cases but many clusters are

expected to be common. Such perspective has proven to be is helpful for in disease

classification [224, 225], identification of disease dys-regulated pathways [226, 227], and for

finding disease associated genes [228-234].

Recent studies have also begun to connect disease perturbed networks to genetic variations.

Integrating expression and genotypic data from an intercross population, Cheng et al. identified

a liver and adipose macrophage enriched sub-network that associated with metabolic disease

traits [235]. More recently, Kim et al. proposed a novel computational method for genome-wide

identification putatively causal gene copy number variation and affected pathways. Their method

utilizes gene expression profiles, copy number variation information, and diverse interaction

networks [125, 127, 128].

8. Looking ahead - towards the dynamic interactome

Availability of genome scale interactome has made it possible to ask general questions about

organization of biological systems and information flow within such system. In this chapter, we

have focused mostly on protein-protein interaction networks. However in practical applications,

including some of the applications discussed in this chapter, integrative networks combining

various types of interactions are increasingly being used. There are a growing number of tools

that allow interrogating existing networks to generate testable hypotheses. However, most of the

current studies treat molecular networks as hardwired structures. Yet real biological network are

dynamic, and their functioning depends upon this. In a recent review, Przytycka et al. roughly

categorize interactome dynamics as spatial, temporal, and contextual [236]. One should

expect that after the initial decade of interactome studies, the next decade will constitute a shift

from static to dynamic network analysis -- an essential paradigm change necessary for a deeper

understanding of molecular systems.

Acknowledgments

This work is supported by the Intramural Program of the National Institutes of Health, National

Library of Medicine.

References 1. Shannon, P., A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, D. Ramage, N. Amin, B. Schwikowski, and T.

Ideker, Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res, 2003. 13(11): p. 2498-504.

2. Gavin, A.C., P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L.J. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M.A. Heurtier, V. Hoffman, C. Hoefert, K. Klein, M. Hudak, A.M. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J.M. Rick, B. Kuster, P. Bork, R.B. Russell, and G. Superti-Furga, Proteome survey reveals modularity of the yeast cell machinery. Nature, 2006. 440(7084): p. 631-6.

3. Krogan, N.J., G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta, and A.P. Tikuisis, Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 2006. 440(7084): p. 637 - 643.

4. Collins, S.R., P. Kemmeren, X.C. Zhao, J.F. Greenblatt, F. Spencer, F.C. Holstege, J.S. Weissman, and N.J. Krogan, Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics, 2007. 6(3): p. 439-50.

5. Fields, S. and O. Song, A novel genetic system to detect protein-protein interactions. Nature, 1989. 340(6230): p. 245-6.

6. Fromont-Racine, M., J.C. Rain, and P. Legrain, Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens. Nat Genet, 1997. 16(3): p. 277-82.

7. Ito, T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. U. S. A., 2001. 98(8): p. 4569-74.

8. Uetz, P., L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J.M. Rothberg, A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000. 403(6770): p. 623-7.

9. Yu, H., P. Braun, M.A. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, T. Hao, J.F. Rual, A. Dricot, A. Vazquez, R.R. Murray, C. Simon, L. Tardivo, S. Tam, N. Svrzikapa, C. Fan, A.S. de Smet, A. Motyl, M.E. Hudson, J. Park, X. Xin, M.E. Cusick, T. Moore, C. Boone, M. Snyder, F.P. Roth, A.L. Barabasi, J. Tavernier, D.E. Hill, and M. Vidal, High-quality binary protein interaction map of the yeast interactome network. Science, 2008. 322(5898): p. 104-10.

10. Reboul, J., P. Vaglio, J.F. Rual, P. Lamesch, M. Martinez, C.M. Armstrong, S. Li, L. Jacotot, N. Bertin, R. Janky, T. Moore, J.R. Hudson, Jr., J.L. Hartley, M.A. Brasch, J. Vandenhaute, S. Boulton, G.A. Endress, S. Jenna, E. Chevet, V. Papasotiropoulos, P.P. Tolias, J. Ptacek, M. Snyder, R. Huang, M.R. Chance, H. Lee, L. Doucette-Stamm, D.E. Hill, and M. Vidal, C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nat Genet, 2003. 34(1): p. 35-41.

11. Li, S., C.M. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem, P.O. Vidalain, J.D. Han, A. Chesneau, T. Hao, D.S. Goldberg, N. Li, M. Martinez, J.F. Rual, P. Lamesch, L. Xu, M. Tewari, S.L. Wong, L.V. Zhang, G.F. Berriz, L. Jacotot, P. Vaglio, J. Reboul, T. Hirozane-Kishikawa, Q. Li, H.W. Gabel, A. Elewa, B. Baumgartner, D.J. Rose, H. Yu, S. Bosak, R. Sequerra, A. Fraser, S.E. Mango, W.M. Saxton, S. Strome, S. Van Den Heuvel, F. Piano, J. Vandenhaute, C. Sardet, M. Gerstein, L. Doucette-Stamm, K.C. Gunsalus, J.W. Harper, M.E. Cusick, F.P. Roth, D.E. Hill, and M. Vidal, A map of the interactome network of the metazoan C. elegans. Science, 2004. 303(5657): p. 540-3.

12. Giot, L., J.S. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y.L. Hao, C.E. Ooi, B. Godwin, E. Vitols, G. Vijayadamodar, P. Pochart, H. Machineni, M. Welsh, Y. Kong, B. Zerhusen, R. Malcolm, Z. Varrone, A. Collis, M. Minto, S. Burgess, L. McDaniel, E. Stimpson, F. Spriggs, J. Williams, K. Neurath, N. Ioime, M. Agee, E. Voss, K. Furtak, R. Renzulli, N. Aanensen, S. Carrolla, E. Bickelhaupt, Y. Lazovatsky, A. DaSilva, J. Zhong, C.A. Stanyon, R.L. Finley, Jr., K.P. White, M. Braverman, T. Jarvie, S. Gold, M. Leach, J. Knight, R.A. Shimkets, M.P. McKenna, J. Chant, and J.M. Rothberg, A protein interaction map of Drosophila melanogaster. Science, 2003. 302(5651): p. 1727-36.

13. Rual, J.F., K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, G.F. Berriz, F.D. Gibbons, M. Dreze, N. Ayivi-Guedehoussou, N. Klitgord, C. Simon, M. Boxem, S. Milstein, J. Rosenberg, D.S. Goldberg, L.V. Zhang, S.L. Wong, G. Franklin, S. Li, J.S. Albala, J. Lim, C. Fraughton, E. Llamosas, S. Cevik, C. Bex, P. Lamesch, R.S. Sikorski, J. Vandenhaute, H.Y. Zoghbi, A. Smolyar, S. Bosak, R.

Sequerra, L. Doucette-Stamm, M.E. Cusick, D.E. Hill, F.P. Roth, and M. Vidal, Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062): p. 1173-8.

14. Stelzl, U., U. Worm, M. Lalowski, C. Haenig, F.H. Brembeck, H. Goehler, M. Stroedicke, M. Zenkner, A. Schoenherr, S. Koeppen, J. Timm, S. Mintzlaff, C. Abraham, N. Bock, S. Kietzmann, A. Goedde, E. Toksoz, A. Droege, S. Krobitsch, B. Korn, W. Birchmeier, H. Lehrach, and E.E. Wanker, A human protein-protein interaction network: a resource for annotating the proteome. Cell, 2005. 122(6): p. 957-68.

15. Puig, O., F. Caspary, G. Rigaut, B. Rutz, E. Bouveret, E. Bragado-Nilsson, M. Wilm, and B. Seraphin, The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods, 2001. 24(3): p. 218-29.

16. Rigaut, G., A. Shevchenko, B. Rutz, M. Wilm, M. Mann, and B. Seraphin, A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol., 1999. 17(10): p. 1030-2.

17. Collins, M.O. and J.S. Choudhary, Mapping multiprotein complexes by affinity purification and mass spectrometry. Curr Opin Biotechnol, 2008. 19(4): p. 324-30.

18. Gavin, A.C., M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, and C.M. Cruciat, Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 2002. 415(6868): p. 141 - 147.

19. Ho, Y., A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H. Sassi, P.A. Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D. Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F. Moran, D. Durocher, M. Mann, C.W. Hogue, D. Figeys, and M. Tyers, Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 2002. 415(6868): p. 180-3.

20. Butland, G., J.M. Peregrin-Alvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, and N. Krogan, Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature, 2005. 433(7025): p. 531 - 537.

21. Deane, C.M., L. Salwinski, I. Xenarios, and D. Eisenberg, Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics, 2002. 1(5): p. 349-56.

22. Deng, M., F. Sun, and T. Chen, Assessment of the reliability of protein-protein interactions and protein function prediction. Pac Symp Biocomput, 2003. 8: p. 140-51.

23. Goll, J. and P. Uetz, The elusive yeast interactome. Genome Biol., 2006. 7(6): p. 214.1-6. 24. Zotenko, E., J. Mestre, D.P. O'Leary, and T.M. Przytycka, Why do hubs in the yeast protein interaction

network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol, 2008. 4(8): p. e1000140.

25. Marcotte, E.M., M. Pellegrini, H.L. Ng, D.W. Rice, T.O. Yeates, and D. Eisenberg, Detecting protein function and protein-protein interactions from genome sequences. Science, 1999. 285(5428): p. 751-3.

26. Enright, A.J., I. Iliopoulos, N.C. Kyrpides, and C.A. Ouzounis, Protein interaction maps for complete genomes based on gene fusion events. Nature, 1999. 402(6757): p. 86-90.

27. Enright, A.J. and C.A. Ouzounis, Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol, 2001. 2(9): p. research0034.1-research0034.7.

28. Koonin, E.V. and Y.I. Wolf, Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res, 2008. 36(21): p. 6688-719.

29. Mushegian, A.R. and E.V. Koonin, Gene order is not conserved in bacterial evolution. Trends Genet, 1996. 12(8): p. 289-90.

30. Dandekar, T., B. Snel, M. Huynen, and P. Bork, Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci, 1998. 23(9): p. 324-8.

31. Overbeek, R., M. Fonstein, M. D'Souza, G.D. Pusch, and N. Maltsev, The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A, 1999. 96(6): p. 2896-901.

32. Tamames, J., G. Casari, C. Ouzounis, and A. Valencia, Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol, 1997. 44(1): p. 66-73.

33. Huynen, M.A. and B. Snel, Gene and context: integrative approaches to genome analysis. Adv Protein Chem, 2000. 54: p. 345-79.

34. Snel, B., G. Lehmann, P. Bork, and M.A. Huynen, STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res, 2000. 28(18): p. 3442-4.

35. Gaasterland, T. and M.A. Ragan, Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics, 1998. 3(4): p. 199-217.

36. Pellegrini, M., E.M. Marcotte, M.J. Thompson, D. Eisenberg, and T.O. Yeates, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A, 1999. 96(8): p. 4285-8.

37. Tatusov, R.L., E.V. Koonin, and D.J. Lipman, A genomic perspective on protein families. Science, 1997. 278(5338): p. 631-7.

38. Huynen, M.A. and P. Bork, Measuring genome evolution. Proc Natl Acad Sci U S A, 1998. 95(11): p. 5849-56.

39. Zheng, Y., R.J. Roberts, and S. Kasif, Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biol, 2002. 3(11): p. research0060.1-research0060.9.

40. De Las Rivas, J., J.J. Lozano, and A.R. Ortiz, Comparative analysis of chloroplast genomes: functional annotation, genome-based phylogeny, and deduced evolutionary patterns. Genome Res, 2002. 12(4): p. 567 - 583.

41. Levesque, M., D. Shasha, W. Kim, M.G. Surette, and P.N. Benfey, Trait-to-gene: a computational method for predicting the function of uncharacterized genes. Curr Biol, 2003. 13(2): p. 129 - 133.

42. Martin, M.J., J. Herrero, A. Mateos, and J. Dopazo, Comparing bacterial genomes through conservation profiles. Genome Res, 2003. 13(5): p. 991 - 998.

43. Peregrin-Alvarez, J.M., S. Tsoka, and C.A. Ouzounis, The phylogenetic extent of metabolic enzymes and pathways. Genome Res, 2003. 13(3): p. 422 - 427.

44. Wu, H., F. Mao, Z. Su, V. Olman, and Y. Xu, Prediction of functional modules based on gene distributions in microbial genomes. Genome Inform, 2005. 16(2): p. 247 - 259.

45. Gutierrez, R.A., M.D. Larson, and C. Wilkerson, The plant-specific database. Classification of Arabidopsis proteins based on their phylogenetic profile. Plant Physiol, 2004. 135(4): p. 1888 - 1892.

46. Mikkelsen, T.S., J.E. Galagan, and J.P. Mesirov, Improving genome annotations using phylogenetic profile anomaly detection. Bioinformatics, 2005. 21(4): p. 464 - 470.

47. Wu, J., Z. Hu, and C. DeLisi, Gene annotation and network inference by phylogenetic profiling. BMC Bioinformatics, 2006. 7: p. 80.

48. Date, S.V. and E.M. Marcotte, Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat Biotechnol, 2003. 21(9): p. 1055 - 1062.

49. Glazko, G.V. and A.R. Mushegian, Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns. Genome Biol, 2004. 5(5): p. R32.

50. Marcotte, E.M., I. Xenarios, A.M. van Der Bliek, and D. Eisenberg, Localizing proteins in the cell from their phylogenetic profiles. Proc Natl Acad Sci USA, 2000. 97(22): p. 12115 - 12120.

51. Snel, B. and M.A. Huynen, Quantifying modularity in the evolution of biomolecular systems. Genome Res, 2004. 14(3): p. 391 - 397.

52. Yamada, T., M. Kanehisa, and S. Goto, Extraction of phylogenetic network modules from the metabolic network. BMC Bioinformatics, 2006. 7: p. 130.

53. Jothi, R., T. Przytycka, and L. Aravind, Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics, 2007. 8(1): p. 173.

54. von Mering, C., E.M. Zdobnov, S. Tsoka, F.D. Ciccarelli, J.B. Pereira-Leal, C.A. Ouzounis, and P. Bork, Genome evolution reveals biochemical networks and functional modules. Proc Natl Acad Sci USA, 2003. 100(26): p. 15428 - 15433.

55. Sun, J., J. Xu, Z. Liu, Q. Liu, A. Zhao, T. Shi, and Y. Li, Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics, 2005. 21(16): p. 3409 - 3415.

56. Bowers, P.M., S.J. Cokus, D. Eisenberg, and T.O. Yeates, Use of logic relationships to decipher protein network organization. Science, 2004. 306(5705): p. 2246 - 2249.

57. Goh, C.S., A.A. Bogan, M. Joachimiak, D. Walther, and F.E. Cohen, Co-evolution of proteins with their interaction partners. J Mol Biol, 2000. 299(2): p. 283 - 293.

58. Pazos, F., M. Helmer-Citterich, G. Ausiello, and A. Valencia, Correlated mutations contain information about protein-protein interaction. J Mol Biol, 1997. 271(4): p. 511-23.

59. Pazos, F. and A. Valencia, Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng, 2001. 14(9): p. 609-14.

60. Goh, C.S. and F.E. Cohen, Co-evolutionary analysis reveals insights into protein-protein interactions. J Mol Biol, 2002. 324(1): p. 177 - 192.

61. Jothi, R., P.F. Cherukuri, A. Tasneem, and T.M. Przytycka, Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein protein interactions. J Mol Biol, 2006. 362(4): p. 861-875.

62. Gertz, J., G. Elfond, A. Shustrova, M. Weisinger, M. Pellegrini, S. Cokus, and B. Rothschild, Inferring protein interactions from phylogenetic distance matrices. Bioinformatics, 2003. 19(16): p. 2039 - 2045.

63. Pazos, F. and A. Valencia, In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins, 2002. 47(2): p. 219-27.

64. Sato, T., Y. Yamanishi, M. Kanehisa, and H. Toh, The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics, 2005. 21(17): p. 3482 - 3489.

65. Pazos, F., J.A. Ranea, D. Juan, and M.J. Sternberg, Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J Mol Biol, 2005. 352(4): p. 1002 - 1015.

66. Tan, S.H., Z. Zhang, and S.K. Ng, ADVICE: Automated Detection and Validation of Interaction by Co-Evolution. Nucleic Acids Res, 2004. 32(Web Server issue): p. W69-72.

67. Craig, R.A. and L. Liao, Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices. BMC Bioinformatics, 2007. 8: p. 6.

68. Kann, M.G., R. Jothi, P.F. Cherukuri, and T.M. Przytycka, Predicting protein domain interactions from coevolution of conserved regions. Proteins, 2007. 67(4): p. 811-20.

69. Hakes, L., S.C. Lovell, S.G. Oliver, and D.L. Robertson, Specificity in protein interactions and its relationship with sequence diversity and coevolution. Proc Natl Acad Sci U S A, 2007. 104(19): p. 7999-8004.

70. Kann, M.G., B.A. Shoemaker, A.R. Panchenko, and T.M. Przytycka, Correlated evolution of interacting proteins: looking behind the mirrortree. J Mol Biol, 2009. 385(1): p. 91-8.

71. Juan, D., F. Pazos, and A. Valencia, High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc Natl Acad Sci U S A, 2008. 105(3): p. 934-9.

72. Tillier, E.R. and R.L. Charlebois, The human protein coevolution network. Genome Res, 2009. 19(10): p. 1861-71.

73. Jothi, R., M.G. Kann, and T.M. Przytycka, Predicting protein-protein interaction by searching evolutionary tree automorphism space. Bioinformatics, 2005. 21(Suppl 1): p. i241 - i250.

74. Ramani, A.K. and E.M. Marcotte, Exploiting the co-evolution of interacting proteins to discover interaction specificity. J Mol Biol, 2003. 327(1): p. 273 - 284.

75. Apic, G., J. Gough, and S.A. Teichmann, An insight into domain combinations. Bioinformatics, 2001. 17 Suppl 1: p. S83-9.

76. Huang, C., F. Morcos, S.P. Kanaan, S. Wuchty, D.Z. Chen, and J.A. Izaguirre, Predicting protein-protein interactions from protein domains using a set cover approach. IEEE/ACM Trans Comput Biol Bioinform, 2007. 4(1): p. 78-87.

77. Kim, W.K., J. Park, and J.K. Suh, Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. Genome Inform, 2002. 13: p. 42-50.

78. Nguyen, T.P. and T.B. Ho, Discovering signal transduction networks using signaling domain-domain interactions. Genome Inform, 2006. 17(2): p. 35-45.

79. Singhal, M. and H. Resat, A domain-based approach to predict protein-protein interactions. BMC Bioinformatics, 2007. 8: p. 199.

80. Wojcik, J. and V. Schachter, Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics, 2001. 17 Suppl 1: p. S296-305.

81. Deng, M., S. Mehta, F. Sun, and T. Chen, Inferring domain-domain interactions from protein-protein interactions. Genome Res, 2002. 12(10): p. 1540-8.

82. Sprinzak, E. and H. Margalit, Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol, 2001. 311(4): p. 681-92.

83. Wuchty, S., Topology and weights in a protein domain interaction network--a novel way to predict protein interactions. BMC Genomics, 2006. 7: p. 122.

84. Liu, M., X.W. Chen, and R. Jothi, Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks. Bioinformatics, 2009. 25(19): p. 2492-9.

85. Guimaraes, K.S., R. Jothi, E. Zotenko, and T.M. Przytycka, Predicting domain-domain interactions using a parsimony approach. Genome Biol, 2006. 7(11): p. R104.

86. Guimaraes, K.S. and T.M. Przytycka, Protein Interaction Network Based Prediction of Domain-Domain and Domain-Peptide Interactions, in Protein-Portein interactions and Netwoks A.R. Panchenko and T.M. Przytycka, Editors. 2008, Springer-Verlag. p. 83-98.

87. Ng, S.K., Z. Zhang, and S.H. Tan, Integrative approach for computationally inferring protein domain interactions. Bioinformatics, 2003. 19(8): p. 923-9.

88. Kluger, Y., R. Basri, J.T. Chang, and M. Gerstein, Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res, 2003. 13(4): p. 703-16.

89. Zhao, X.M., L. Chen, and K. Aihara, A discriminative approach for identifying domain-domain interactions from protein-protein interactions. Proteins, 2010. 78(5): p. 1243-53.

90. Wang, H., E. Segal, A. Ben-Hur, Q.R. Li, M. Vidal, and D. Koller, InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale. Genome Biol, 2007. 8(9): p. R192.

91. Riley, R., C. Lee, C. Sabatti, and D. Eisenberg, Inferring protein domain interactions from databases of interacting proteins. Genome Biol, 2005. 6(10): p. R89.

92. Wang, H., E. Segal, A. Ben-Hur, Q. Li, M. Vidal, and D. Koller, InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale. Genome Biol, 2007. 8(9): p. R192.

93. Lee, H., M. Deng, F. Sun, and T. Chen, An integrated approach to the prediction of domain-domain interactions. BMC Bioinformatics, 2006. 7: p. 269.

94. Wang, R.S., Y. Wang, L.Y. Wu, X.S. Zhang, and L. Chen, Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinformatics, 2007. 8: p. 391.

95. Guimaraes, K.S. and T.M. Przytycka, Interrogating domain-domain interactions with parsimony based approaches. BMC Bioinformatics, 2008. 9: p. 171.

96. Jansen, R., H. Yu, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, and M. Gerstein, A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 2003. 302(5644): p. 449-53.

97. Rhodes, D.R., S.A. Tomlins, S. Varambally, V. Mahavisno, T. Barrette, S. Kalyana-Sundaram, D. Ghosh, A. Pandey, and A.M. Chinnaiyan, Probabilistic model of the human protein-protein interaction network. Nat Biotechnol, 2005. 23(8): p. 951-9.

98. Tirosh, I. and N. Barkai, Computational verification of protein-protein interactions by orthologous co-expression. BMC Bioinformatics, 2005. 6: p. 40.

99. Soong, T.T., K.O. Wrzeszczynski, and B. Rost, Physical protein-protein interactions predicted from microarrays. Bioinformatics, 2008. 24(22): p. 2608-14.

100. Jansen, R., D. Greenbaum, and M. Gerstein, Relating Whole-Genome Expression Data with Protein-Protein Interactions. Genome Res, 2002. 12(1): p. 37-46.

101. Kemmeren, P., N.L. van Berkum, J. Vilo, T. Bijma, R. Donders, A. Brazma, and F.C. Holstege, Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol Cell, 2002. 9(5): p. 1133-43.

102. von Mering, C., R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 2002. 417(6887): p. 399-403.

103. Eisen, M.B., P.T. Spellman, P.O. Brown, and D. Botstein, Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 1998. 95(25): p. 14863-14868.

104. Marcotte, E.M., M. Pellegrini, M.J. Thompson, T.O. Yeates, and D. Eisenberg, A combined algorithm for genome-wide prediction of protein function. Nature, 1999. 402(6757): p. 83-6.

105. Greenbaum, D., N.M. Luscombe, R. Jansen, J. Qian, and M. Gerstein, Interrelating Different Types of Genomic Data, from Proteome to Secretome: 'Oming in on Function. Genome Res, 2001. 11(9): p. 1463-1468.

106. Ramani, A.K., Z. Li, G.T. Hart, M.W. Carlson, D.R. Boutz, and E.M. Marcotte, A map of human protein interactions derived from co-expression of human mRNAs and their orthologs. Mol Syst Biol, 2008. 4(180): p. 1-16.

107. Costello, J.C., M.M. Dalkilic, S.M. Beason, J.R. Gehlhausen, R. Patwardhan, S. Middha, B.D. Eads, and J.R. Andrews, Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function. Genome Biol, 2009. 10(9): p. R97.

108. Lee, I., S.V. Date, A.T. Adai, and E.M. Marcotte, A probabilistic functional network of yeast genes. Science, 2004. 306(5701): p. 1555 - 1558.

109. Margolin, A.A., I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla Favera, and A. Califano, ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 2006. 7 Suppl 1: p. S7.

110. Pe'er, D., Bayesian network analysis of signaling networks: a primer. Sci STKE, 2005. 2005(281): p. pl4. 111. Alterovitz, G., J. Liu, E. Afkhami, and M.F. Ramoni, Bayesian methods for proteomics. Proteomics, 2007.

7(16): p. 2843-55. 112. Schadt, E.E., Exploiting naturally occurring DNA variation and molecular profiling data to dissect disease

and drug response traits. Curr Opin Biotechnol, 2005. 16(6): p. 647-54. 113. Sieberts, S.K. and E.E. Schadt, Moving toward a system genetics view of disease. Mamm Genome, 2007.

18(6-7): p. 389-401. 114. Cline, M.S., M. Smoot, E. Cerami, A. Kuchinsky, N. Landys, C. Workman, R. Christmas, I. Avila-

Campilo, M. Creech, B. Gross, K. Hanspers, R. Isserlin, R. Kelley, S. Killcoyne, S. Lotia, S. Maere, J. Morris, K. Ono, V. Pavlovic, A.R. Pico, A. Vailaya, P.L. Wang, A. Adler, B.R. Conklin, L. Hood, M. Kuiper, C. Sander, I. Schmulevich, B. Schwikowski, G.J. Warner, T. Ideker, and G.D. Bader, Integration of biological networks and gene expression data using Cytoscape. Nat Protoc, 2007. 2(10): p. 2366-82.

115. Jeong, H., S.P. Mason, A.L. Barabasi, and Z.N. Oltvai, Lethality and centrality in protein networks. Nature, 2001. 411(6833): p. 41-2.

116. Wagner, A., The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol, 2001. 18(7): p. 1283-92.

117. Yook, S.H., Z.N. Oltvai, and A.L. Barabasi, Functional and topological characterization of protein interaction networks. Proteomics, 2004. 4(4): p. 928-942.

118. Przulj, N., D.G. Corneil, and I. Jurisica, Modeling interactome: scale-free or geometric? Bioinformatics, 2004. 20(18): p. 3508-15.

119. Han, J.D., D. Dupuy, N. Bertin, M.E. Cusick, and M. Vidal, Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol, 2005. 23(7): p. 839-44.

120. Tanaka, R., T.M. Yi, and J. Doyle, Some protein interaction data do not exhibit power law statistics. FEBS Lett, 2005. 579(23): p. 5140-4.

121. Brandes , U. and T. Erlebach, eds. Network Analysis: Methodological Foundations. 2005, Springer: Berlin, New York:.

122. Zielinski, R., P.F. Przytycki, J. Zheng, D. Zhang, T.M. Przytycka, and J. Capala, The crosstalk between EGF, IGF, and Insulin cell signaling pathways--computational and experimental analysis. BMC Syst Biol, 2009. 3: p. 88.

123. Lohmann, G., D.S. Margulies, A. Horstmann, B. Pleger, J. Lepsien, D. Goldhahn, H. Schloegl, M. Stumvoll, A. Villringer, and R. Turner, Eigenvector centrality mapping for analyzing connectivity patterns in FMRI data of the human brain. PLoS One, 2010. 5(4): p. e10232.

124. Missiuro, P.V., K. Liu, L. Zou, B.C. Ross, G. Zhao, J.S. Liu, and H. Ge, Information flow analysis of interactome networks. PLoS Comput Biol, 2009. 5(4): p. e1000350.

125. Kim, Y.-A., S. Wuchty, and T.M. Przytycka, Simultaneous Identification of Causal Genes and Dysregulated Pathways in Complex Diseases LNBI, 2010. 6044(RECOMB 2010): p. 263-281.

126. Suthram, S., A. Beyer, R.M. Karp, Y. Eldar, and T. Ideker, eQED: an efficient method for interpreting eQTL associations using protein networks. Mol Syst Biol, 2008. 4: p. 162.

127. Kim, Y.-A., J.H. Przytycki, S. Wuchty, and T.M. Przytycka, Modeling Infromation Flow in Molecular Networks. Physical Biology, 2011. 8(3): p. 035012.

128. Kim, Y.-A., S. Wuchty, and T.M. Przytycka, Identifying Causal Genes and Dysregulated Pathways in Complex Diseases. PLoS Comput Biol, 2011. 7(3): p. e1001095.

129. Yu, H., P.M. Kim, E. Sprecher, V. Trifonov, and M. Gerstein, The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol, 2007. 3(4): p. e59.

130. He, X. and J. Zhang, Why do hubs tend to be essential in protein networks? PLoS Genet, 2006. 2(6): p. e88. 131. Batada, N.N., L.D. Hurst, and M. Tyers, Evolutionary and physiological importance of hub proteins. PLoS

Comput Biol, 2006. 2(7): p. e88. 132. Hahn, M.W. and A.D. Kern, Comparative genomics of centrality and essentiality in three eukaryotic

protein-interaction networks. Mol Biol Evol, 2005. 22(4): p. 803-6. 133. Albert, R., H. Jeong, and A.L. Barabasi, Error and attack tolerance of complex networks. Nature, 2000.

406(6794): p. 378-82.

134. Yu, H., D. Greenbaum, H. Xin Lu, X. Zhu, and M. Gerstein, Genomic analysis of essentiality within protein networks. Trends Genet, 2004. 20(6): p. 227-31.

135. Wang, H., B. Kakaradov, S.R. Collins, L. Karotki, D. Fiedler, M. Shales, K.M. Shokat, T.C. Walther, N.J. Krogan, and D. Koller, A complex-based reconstruction of the Saccharomyces cerevisiae interactome. Mol Cell Proteomics, 2009. 8(6): p. 1361-81.

136. Han, J.D., N. Bertin, T. Hao, D.S. Goldberg, G.F. Berriz, L.V. Zhang, D. Dupuy, A.J. Walhout, M.E. Cusick, F.P. Roth, and M. Vidal, Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 2004. 430(6995): p. 88-93.

137. Hartwell, L.H., J.J. Hopfield, S. Leibler, and A.W. Murray, From molecular to modular cell biology. Nature, 1999. 402(6761 Suppl): p. C47-52.

138. Bader, G.D. and C.W. Hogue, An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003. 4: p. 2.

139. Spirin, V. and L.A. Mirny, Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A, 2003. 100(21): p. 12123-12128.

140. Adamcsek, B.z., G. Palla, I.s.J. Farkas, I. Derényi, and T.s. Vicsek, CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 2006. 22(8): p. 1021-1023.

141. Altaf-Ul-Amin, M., Y. Shinbo, K. Mihara, K. Kurokawa, and S. Kanaya, Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics, 2006. 7: p. 207.

142. Rives, A.W. and T. Galitski, Modular organization of cellular networks. Proc Natl Acad Sci U S A, 2003. 100(3): p. 1128-1133.

143. Arnau, V., S. Mars, and I. Marín, Iterative Cluster Analysis of Protein Interaction Data. Bioinformatics, 2005. 21(3): p. 364-378.

144. Bader, J.S., Greedily building protein networks with confidence. Bioinformatics, 2003. 19(15): p. 1869-74. 145. Asthana, S., O.D. King, F.D. Gibbons, and F.P. Roth, Predicting protein complex membership using

probabilistic network reliability. Genome Res, 2004. 14(6): p. 1170-5. 146. Brun, C., F. Chevenet, D. Martin, J. Wojcik, A. Guenoche, and B. Jacq, Functional classification of

proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol, 2003. 5(1): p. R6.

147. King, A.D., N. Przulj, and I. Jurisica, Protein complex prediction via cost-based clustering. Bioinformatics, 2004. 20(17): p. 3013-20.

148. Luo, F., Y. Yang, C.F. Chen, R. Chang, J. Zhou, and R.H. Scheuermann, Modular organization of protein interaction networks. Bioinformatics, 2007. 23(2): p. 207-14.

149. Pereira-Leal, J.B., A.J. Enright, and C.A. Ouzounis, Detection of functional modules from protein interaction networks. Proteins, 2004. 54(1): p. 49-57.

150. Wang, C., C. Ding, Q. Yang, and S.R. Holbrook, Consistent dissection of the protein interaction network by combining global and local metrics. Genome Biol, 2007. 8(12): p. R271.

151. Newman, M.E., Modularity and community structure in networks. Proc Natl Acad Sci U S A, 2006. 103(23): p. 8577-82.

152. Dunn, R., F. Dudbridge, and C.M. Sanderson, The use of edge-betweenness clustering to investigate biological function in protein interaction networks. BMC Bioinformatics, 2005. 6: p. 39.

153. Qi, Y., F. Balem, C. Faloutsos, J. Klein-Seetharaman, and Z. Bar-Joseph, Protein complex identification by supervised graph local clustering. Bioinformatics, 2008. 24(13): p. i250-8.

154. Navlakha, S., M.C. Schatz, and C. Kingsford, Revealing biological modules via graph summarization. J Comput Biol, 2009. 16(2): p. 253-64.

155. Jiang, P. and M. Singh, SPICi: a fast clustering algorithm for large biological networks. Bioinformatics, 2010. 26(8): p. 1105-11.

156. Chen, J. and B. Yuan, Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics, 2006. 22(18): p. 2283-90.

157. Maraziotis, I.A., K. Dimitrakopoulou, and A. Bezerianos, Growing functional modules from a seed protein via integration of protein interaction and gene expression data. BMC Bioinformatics, 2007. 8: p. 408.

158. Feng, J., R. Jiang, and T. Jiang, A max-flow based approach to the identification of protein complexes using protein interaction and microarray data. IEEE/ACM Trans Comput Biol Bioinform, 2011. 8(3): p. 621-634.

159. Rosvall, M. and C.T. Bergstrom, An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci U S A, 2007. 104(18): p. 7327-31.

160. Palla, G., I. Derenyi, I. Farkas, and T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005. 435(7043): p. 814-818.

161. Sales-Pardo, M., R. Guimera, A.A. Moreira, and L.A. Amaral, Extracting the hierarchical organization of complex systems. Proc Natl Acad Sci U S A, 2007. 104(39): p. 15224-9.

162. Zotenko, E., K.S. Guimaraes, R. Jothi, and T.M. Przytycka, Decomposition of overlapping protein complexes: a graph theoretical method for analyzing static and dynamic protein associations. Algorithms Mol Biol, 2006. 1(1): p. 7.

163. Brohee, S. and J. van Helden, Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 2006. 7: p. 488.

164. Song, J. and M. Singh, How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics, 2009. 25(23): p. 3143-50.

165. Poyatos, J.F. and L.D. Hurst, How biologically relevant are interaction-based modules in protein networks? Genome Biol, 2004. 5(11): p. R93.

166. Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs: simple building blocks of complex networks. Science, 2002. 298(5594): p. 824-7.

167. Goentoro, L., O. Shoval, M.W. Kirschner, and U. Alon, The incoherent feedforward loop can provide fold-change detection in gene regulation. Mol Cell, 2009. 36(5): p. 894-9.

168. Rosenfeld, N., J.W. Young, U. Alon, P.S. Swain, and M.B. Elowitz, Accurate prediction of gene feedback circuit behavior from component properties. Mol Syst Biol, 2007. 3: p. 143.

169. Alon, U., Network motifs: theory and experimental approaches. Nat Rev Genet, 2007. 8(6): p. 450-61. 170. Mangan, S., A. Zaslaver, and U. Alon, The coherent feedforward loop serves as a sign-sensitive delay

element in transcription networks. J Mol Biol, 2003. 334(2): p. 197-204. 171. Mangan, S. and U. Alon, Structure and function of the feed-forward loop network motif. Proc Natl Acad

Sci U S A, 2003. 100(21): p. 11980-5. 172. Przulj, N., D.G. Corneil, and I. Jurisica, Efficient estimation of graphlet frequency distributions in protein-

protein interaction networks. Bioinformatics, 2006. 22(8): p. 974-80. 173. Banks, E., E. Nabieva, B. Chazelle, and M. Singh, Organization of physical interactomes as uncovered by

network schemas. PLoS Comput Biol, 2008. 4(10): p. e1000203. 174. Banks, E., E. Nabieva, R. Peterson, and M. Singh, NetGrep: fast network schema searches in interactomes.

Genome Biol, 2008. 9(9): p. R138. 175. Walhout, A.J.M., R. Sordella, X. Lu, J.L. Hartley, G.F. Temple, M.A. Brasch, N. Thierry-Mieg, and M.

Vidal, Protein Interaction Mapping in C.elegans Using Proteins Involved in Vulval Development. Science, 2000. 287(5450): p. 116-122.

176. Yu, H., N.M. Luscombe, H.X. Lu, X. Zhu, Y. Xia, J.D. Han, N. Bertin, S. Chung, M. Vidal, and M. Gerstein, Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res, 2004. 14(6): p. 1107-18.

177. Matthews, L.R., P. Vaglio, J. Reboul, H. Ge, B.P. Davis, J. Garrels, S. Vincent, and M. Vidal, Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res, 2001. 11(12): p. 2120-6.

178. Sharan, R. and T. Ideker, Modeling cellular machinery through biological network comparison. Nat Biotechnol, 2006. 24(4): p. 427-33.

179. Kelley, B.P., B. Yuan, F. Lewitter, R. Sharan, B.R. Stockwell, and T. Ideker, PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res, 2004. 32(Web Server issue): p. W83-8.

180. Kelley, B.P., R. Sharan, R.M. Karp, T. Sittler, D.E. Root, B.R. Stockwell, and T. Ideker, Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A, 2003. 100(20): p. 11394-9.

181. Sharan, R., S. Suthram, R.M. Kelley, T. Kuhn, S. McCuine, P. Uetz, T. Sittler, R.M. Karp, and T. Ideker, Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci U S A, 2005. 102(6): p. 1974-9.

182. Bandyopadhyay, S., R. Sharan, and T. Ideker, Systematic identification of functional orthologs based on protein network comparison. Genome Res, 2006. 16(3): p. 428-35.

183. Ogata, H., W. Fujibuchi, S. Goto, and M. Kanehisa, A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res, 2000. 28(20): p. 4021-8.

184. Gary, M.R. and D.S. Johnson, Computers and Intractability a Guide to the Theory of NP-Completeness. 1979: WH. Freeman and Co.

185. Flannick, J., A. Novak, B.S. Srinivasan, H.H. McAdams, and S. Batzoglou, Graemlin: general and robust alignment of multiple large interaction networks. Genome Res, 2006. 16(9): p. 1169-81.

186. Koyuturk, M., Y. Kim, U. Topkara, S. Subramaniam, W. Szpankowski, and A. Grama, Pairwise alignment of protein interaction networks. J Comput Biol, 2006. 13(2): p. 182-99.

187. Berg, J. and M. Lassig, Cross-species analysis of biological networks by Bayesian alignment. Proc Natl Acad Sci U S A, 2006. 103(29): p. 10967-72.

188. Kalaev, M., V. Bafna, and R. Sharan, Fast and accurate alignment of multiple protein networks. J Comput Biol, 2009. 16(8): p. 989-99.

189. Singh, R., J. Xu, and B. Berger, Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci U S A, 2008. 105(35): p. 12763-8.

190. Pinter, R.Y., O. Rokhlenko, E. Yeger-Lotem, and M. Ziv-Ukelson, Alignment of metabolic pathways. Bioinformatics, 2005. 21(16): p. 3401-8.

191. Cheng, Q., R. Harrison, and A. Zelikovsky, MetNetAligner: a web service tool for metabolic network alignments. Bioinformatics, 2009. 25(15): p. 1989-90.

192. Dost, B., T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R. Sharan, QNet: a tool for querying protein interaction networks. J Comput Biol, 2008. 15(7): p. 913-25.

193. Kuchaiev, O., T. Milenkovic, V. Memisevic, W. Hayes, and N. Przulj, Topological network alignment uncovers biological function and phylogeny. J R Soc Interface, 2010. 7(50): p. 1341-54.

194. Erten, S., X. Li, G. Bebek, J. Li, and M. Koyuturk, Phylogenetic analysis of modularity in protein interaction networks. BMC Bioinformatics, 2009. 10: p. 333.

195. Shoemaker, B.A. and A.R. Panchenko, Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol, 2007. 3(3): p. e42.

196. Lehne, B. and T. Schlitt, Protein-protein interaction databases: keeping up with growing interactomes. Hum Genomics, 2009. 3(3): p. 291-7.

197. De Las Rivas, J. and C. Fontanillo, Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput Biol, 2010. 6(6): p. e1000807.

198. Salwinski, L., C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie, and D. Eisenberg, The Database of Interacting Proteins: 2004 update. Nucleic Acids Res, 2004. 32(Database issue): p. D449-51.

199. Xenarios, I., D.W. Rice, L. Salwinski, M.K. Baron, E.M. Marcotte, and D. Eisenberg, DIP: the database of interacting proteins. Nucleic Acids Res, 2000. 28(1): p. 289-91.

200. Aranda, B., P. Achuthan, Y. Alam-Faruque, I. Armean, A. Bridge, C. Derow, M. Feuermann, A.T. Ghanbarian, S. Kerrien, J. Khadake, J. Kerssemakers, C. Leroy, M. Menden, M. Michaut, L. Montecchi-Palazzi, S.N. Neuhauser, S. Orchard, V. Perreau, B. Roechert, K. van Eijk, and H. Hermjakob, The IntAct molecular interaction database in 2010. Nucleic Acids Res, 2010. 38(Database issue): p. D525-31.

201. Stark, C., B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, BioGRID: a general repository for interaction datasets. Nucleic Acids Res, 2006. 34(Database issue): p. D535-9.

202. Ceol, A., A. Chatr Aryamontri, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli, and G. Cesareni, MINT, the molecular interaction database: 2009 update. Nucleic Acids Res, 2010. 38(Database issue): p. D532-9.

203. Prieto, C. and J. De Las Rivas, APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res, 2006. 34(Web Server issue): p. W298-302.

204. Wu, J., T. Vallenius, K. Ovaska, J. Westermarck, T.P. Makela, and S. Hautaniemi, Integrated network analysis platform for protein-protein interactions. Nat Methods, 2009. 6(1): p. 75-7.

205. Jensen, L.J., M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, P. Bork, and C. von Mering, STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res, 2009. 37(Database issue): p. D412-6.

206. von Mering, C., M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, and B. Snel, STRING: a database of predicted functional associations between proteins. Nucleic Acids Res, 2003. 31: p. 258 - 261.

207. Brown, K.R. and I. Jurisica, Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol, 2007. 8(5): p. R95.

208. FlyBase Consortium, The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res, 2003. 31(1): p. 172-5.

209. Cherry, J.M., C. Ball, S. Weng, G. Juvik, R. Schmidt, C. Adler, B. Dunn, S. Dwight, L. Riles, R.K. Mortimer, and D. Botstein, Genetic and physical maps of Saccharomyces cerevisiae. Nature, 1997. 387(6632 Suppl): p. 67-73.

210. Stein, A., A. Panjkovich, and P. Aloy, 3did Update: domain-domain and peptide-mediated interactions of known 3D structure. Nucleic Acids Res, 2009. 37(Database issue): p. D300-4.

211. Finn, R.D., M. Marshall, and A. Bateman, iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics, 2005. 21(3): p. 410-2.

212. Davis, F.P. and A. Sali, PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics, 2005. 21(9): p. 1901-7.

213. Berman, H.M., J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42.

214. Shoemaker, B.A., A.R. Panchenko, and S.H. Bryant, Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci, 2006. 15(2): p. 352-61.

215. Raghavachari, B., A. Tasneem, T.M. Przytycka, and R. Jothi, DOMINE: a database of protein domain interactions. Nucleic Acids Res, 2008. 36(Database issue): p. D656-61.

216. Schwikowski, B., P. Uetz, and S. Fields, A network of protein-protein interactions in yeast. Nat Biotechnol, 2000. 18(12): p. 1257-61.

217. Sharan, R., I. Ulitsky, and R. Shamir, Network-based prediction of protein function. Mol Syst Biol, 2007. 3: p. 88.

218. Nabieva, E., K. Jim, A. Agarwal, B. Chazelle, and M. Singh, Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 2005. 21 Suppl 1: p. i302-10.

219. Hishigaki, H., K. Nakai, T. Ono, A. Tanigami, and T. Takagi, Assessment of prediction accuracy of protein function from protein--protein interaction data. Yeast, 2001. 18(6): p. 523-31.

220. Zhang, S., H. Chen, K. Liu, and Z. Sun, Inferring protein function by domain context similarities in protein-protein interaction networks. BMC Bioinformatics, 2009. 10: p. 395.

221. Jaeger, S., S. Gaudan, U. Leser, and D. Rebholz-Schuhmann, Integrating protein-protein interactions and text mining for protein function prediction. BMC Bioinformatics, 2008. 9 Suppl 8: p. S2.

222. Kourmpetis, Y.A., A.D. van Dijk, M.C. Bink, R.C. van Ham, and C.J. ter Braak, Bayesian Markov Random Field analysis for protein function prediction based on network data. PLoS One, 2010. 5(2): p. e9293.

223. Chuang, H.Y., E. Lee, Y.T. Liu, D. Lee, and T. Ideker, Network-based classification of breast cancer metastasis. Mol Syst Biol, 2007. 3: p. 140.

224. Lee, E., H.Y. Chuang, J.W. Kim, T. Ideker, and D. Lee, Inferring pathway activity toward precise disease classification. PLoS Comput Biol, 2008. 4(11): p. e1000217.

225. Daemen, A., M. Signoretto, O. Gevaert, J.A. Suykens, and B. De Moor, Improved microarray-based decision support with graph encoded interactome data. PLoS One, 2010. 5(4): p. e10225.

226. Ulitsky, I., R.M. Karp, and R. Shamir, Detecting Disease-Specific Dysregulated Pathways Via Analysis of Clinical Expression Profiles in Research in Computational Molecular Biology, M. Vingron and L. Wong, Editors. 2008, Springer. p. 347-359.

227. Mani, K.M., C. Lefebvre, K. Wang, W.K. Lim, K. Basso, R. Dalla-Favera, and A. Califano, A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol Syst Biol, 2008. 4: p. 169.

228. Navlakha, S. and C. Kingsford, The power of protein interaction networks for associating genes with diseases. Bioinformatics, 2010. 26(8): p. 1057-63.

229. Kohler, S., S. Bauer, D. Horn, and P.N. Robinson, Walking the interactome for priorization of candidate genes. Am. J. Human Genet., 2008. 82: p. 949-958.

230. Vanunu, O. and R. Sharan, A propagation-based algorithm for inferring gene-disease associations, in German Conference on Bioinformatics 2008: Dresden. p. 54-62.

231. Wu, X., R. Jiang, M.Q. Zhang, and S. Li, Network-based global inference of human disease. Mol. Sys. Biol., 2008. 4: p. 189.

232. Karni, S., H. Soreq, and R. Sharan, A network-based method for predicting disease-causing genes. J Comput Biol, 2009. 16(2): p. 181-9.

233. Vanunu, O., O. Magger, E. Ruppin, T. Shlomi, and R. Sharan, Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol, 2010. 6(1): p. e1000641.

234. Lee, E., H. Jung, P. Radivojac, J.W. Kim, and D. Lee, Analysis of AML genes in dysregulated molecular networks. BMC Bioinformatics, 2009. 10 Suppl 9: p. S2.

235. Chen, Y., J. Zhu, P.Y. Lum, X. Yang, S. Pinto, D.J. MacNeil, C. Zhang, J. Lamb, S. Edwards, S.K. Sieberts, A. Leonardson, L.W. Castellini, S. Wang, M.F. Champy, B. Zhang, V. Emilsson, S. Doss, A. Ghazalpour, S. Horvath, T.A. Drake, A.J. Lusis, and E.E. Schadt, Variations in DNA elucidate molecular networks that cause disease. Nature, 2008. 452(7186): p. 429-35.

236. Przytycka, T.M., M. Singh, and D.K. Slonim, Toward the dynamic interactome: it's about time. Brief Bioinform, 2010. 11(1): p. 15-29.