network properties 1.global network properties ( chapter 3 of the course textbook “analysis of...
TRANSCRIPT
Network Properties
1. Global Network Properties (Chapter 3 of the course textbook “Analysis of
Biological Networks” by Junker and Schreiber)
1) Degree distribution2) Clustering coefficient and spectrum3) Average diameter4) Centralities
1) Degree Distribution
G
• Cv – Clustering coefficient of node vCA= 1/1 = 1CB = 1/3 = 0.33CC = 0 CD = 2/10 = 0.2 …
• C = Avg. clust. coefficient of the whole network = avg {Cv over all nodes v of G}
• C(k) – Avg. clust. coefficient of all nodesof degree kE.g.: C(2) = (CA + CC)/2 = (1+0)/2 = 0.5
=> Clustering spectrum
E.g. (not for G)
2) Clustering Coefficient and Spectrum
G
3) Average Diameter
G
u
v
E.g.(not for G)
• Distance between a pair of nodes u and v:
Du,v = min {length of all paths between u and v} = min {3,4,3,2} = 2 = dist(u,v)
• Average diameter of the whole network:
D = avg {Du,v for all pairs of nodes {u,v} in G}
• Spectrum of the shortest path lengths
Network Properties
2. Local Network Properties(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)
1) Network motifs2) Graphlets:
2.1) Relative Graphlet Frequence Distance between 2 networks
2.2) Graphlet Degree Distribution Agreement between 2 networks
• Small subgraphs that are overrepresented in a network when compared to randomized networks
• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks
- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1
• But:– Functionally important but not statistically significant patterns could be
missed– The choice of the appropriate null model is crucial, especially across
“families”
1) Network motifs (Uri Alon’s group, ’02-’04)
• Small subgraphs that are overrepresented in a network when compared to randomized networks
• Network motifs:– Reflect the underlying evolutionary processes that generated the network– Carry functional information– Define superfamilies of networks
- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1
• But:– Functionally important but not statistically significant patterns could be
missed– The choice of the appropriate null model is crucial, especially across
“families”
1) Network motifs (Uri Alon’s group, ’02-’04)
• Small subgraphs that are overrepresented in a network when compared to randomized networks
• Network motifs:– Reflect the underlying evolutionary processes that generated the
network– Carry functional information– Define superfamilies of networks
- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1
• Also – generation of random graphs is an issue:– Random graphs with the same degree in- & out- degree distribution as
data constructed– But this might not be the best network null model
1) Network motifs (Uri Alon’s group, ’02-’04)
1) Network motifs (Uri Alon’s group, ’02-’04)
http://www.weizmann.ac.il/mcb/UriAlon/
N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.
_____
Different from network motifs: Induced subgraphs Of any frequency
2) Graphlets (Przulj, ’04-’09)
N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free
or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.
N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free
or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.
N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free
or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.
2.1) Relative Graphlet Frequency (RGF) distance between networks G and H:
Generalize node degree
2.2) Graphlet Degree Distributions
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.
T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.
Network structure vs. biological function & disease
Graphlet Degree (GD) vectors, or “node signatures”
Similarity measure between “node signature” vectors
T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.
T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.
Signature Similarity Measure between nodes u and v
Later we will see how to use this and other techniquesto link network structure with biological function.
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.
Generalize Degree Distribution of a network
The degree distribution measures:• the number of nodes “touching” k edges for each value of k.
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.
/ sqrt(2) ( to make it between 0 and 1)
This is called Graphlet Degree Distribution (GDD) Agreementnetween networks G and H.
Software that implements many of these networkproperties and compares networks with respect to them: GraphCrunchhttp://www.ics.uci.edu/~bio-nets/graphcrunch/
Network models
Degree distribution
Clustering coefficient
Diameter
Real-world (e.g., PPI) networks
Power-law High Small
Erdos-Renyi graphs Poisson Low Small
Random graphs with the same degree distribution as the data
Power-law Low Small
Small-world networks Poisson High Small
Scale-free networks Power-law Low Small
Geometric random graphs Poisson High Small
Stickiness network model Power-law High Small
Network models
Network modelsGeometric Gene Duplication and Mutation
Networks
• Intuitive “geometricity” of PPI networks:
• Genes exist in some bio-chemical space• Gene duplications and mutations• Natural selection = “evolutionary
optimization”
N. Przulj, O. Kuchaiev, A. Stevanovic, and W. Hayes “Geometric Evolutionary Dynamics of Protein Interaction Network”, Pacific Symposium on Biocomputing (PSB’10), Hawaii, 2010.
Network models
Stickiness-index-based model (“STICKY”)
N. Przulj and D. Higham “Modelling protein-protein interaction networks via a stickiness indes”, Journal of the Royal Society Interface 3, pp. 711-716, 2006.