comparison of networks across species cs374 presentation october 26, 2006 chuan sheng foo
Post on 19-Dec-2015
216 views
TRANSCRIPT
![Page 1: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/1.jpg)
![Page 2: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/2.jpg)
Comparison of Networks Across Species
CS374 Presentation October 26, 2006Chuan Sheng Foo
![Page 3: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/3.jpg)
In the beginning there was DNA…
Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334
![Page 4: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/4.jpg)
…then came protein interactions
Arabidopsis
PPI network
E. Coli
PPI network
Yeast PPI network
![Page 5: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/5.jpg)
Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional
relevance Sequence conservation implies functional
conservation Network conservation implies functional conservation
too!
What new insights might we gain from network comparisons? (Why should we care?)
![Page 6: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/6.jpg)
Network comparisons allow us to:
Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions
Only possible with network comparisons
Possible with existing techniques, but improved with network comparisons
![Page 7: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/7.jpg)
What is a Protein Interaction Network? Proteins are nodes Interactions are
edges Edges may have
weights
Yeast PPI network
H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)
![Page 8: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/8.jpg)
The Network Alignment Problem
Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks
Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)
![Page 9: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/9.jpg)
Example Network Alignment
Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
![Page 10: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/10.jpg)
General Framework For Network Alignment Algorithms
Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
Network construction
Scoring function
Alignment algorithm
Covered in lecture on network integration
![Page 11: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/11.jpg)
Two Algorithms Discussed Today
NetworkBLASTSharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.
Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: 1169-1181, 2006.
![Page 12: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/12.jpg)
Overview of
Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.
![Page 13: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/13.jpg)
Estimation of Interaction Probabilities In the preprocessing step, edges in the
network are given a reliability score using a logistic regression model based on three features:
1. Number of times an interaction was observed
2. Pearson correlation coefficient between expression profiles
3. Proteins’ small world clustering coefficient
![Page 14: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/14.jpg)
Network Alignment Graphs
Construct a Network Alignment Graph to represent the alignment
Nodes contain groups of sequence similar proteins from the k organisms
Edges represent conserved interactions. An edge between two nodes is present if:
1. One pair of proteins directly interacts, the rest are distance at most 2 away
2. All protein pairs are of distance exactly 23. At least max(2, k – 1) protein pairs directly interact
Tries to account for interaction deletions
![Page 15: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/15.jpg)
Example Network Alignment Graph
Nodes
a
b
c
a’
b’
c’
a’’
b’’
c’’
ab
c
a’
b’
c’
a’’
b’’
c’’
Network alignment graph
Individual species’ PPI network
Species X Species Y Species Z
![Page 16: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/16.jpg)
Scoring Function
Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure
High scoring subgraphs correspond to structured sub-networks (cliques or pathways)
Only network topology is scored, node similarity is not
![Page 17: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/17.jpg)
Log Likelihood Ratio Model
Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network
Randomly constructed network preserves degree distribution for nodes
logPr(Subgraph occurs | Conserved Network)
Pr(Subgraph occurs | Random Network)
![Page 18: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/18.jpg)
Likelihood Ratio Scoring of a Protein Complex in a Single Species
U : a subset of vertices (proteins) in the PPI graphOU : collection of all observations on vertex pairs in UOuv : interaction between proteins u, v observedMs : conserved network modelMn: random network (null) modelTuv : proteins u, v interactFuv : proteins u, v do not interactβ : probability that proteins u, v interact in conserved modelpuv : probability that edge u, v exists in a random model
Probability of complex being observed in a conserved network model
Probability of subgraph being observed in a random network model
![Page 19: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/19.jpg)
Likelihood Ratio Scoring of a Protein Complex in a Single Species
Hence, log likelihood for a complex occurring in a single species is given by
For multiple complexes across different species, it is the sum of the log likelihoods
L(A, B, C) = L(A) + L(B) + L(C)
![Page 20: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/20.jpg)
Example of Complex Scoring
Nodes
a
b
c
a’
b’
c’
a’’
b’’
c’’
ab
c
a’
b’
c’
a’’
b’’
c’’
Conserved complex A in the Network alignment graph
Individual species’ PPI network
L(A) = L(X1) + L(Y1) + L (Z1)
Complex X1 in Species X
Complex Y1 in Species Y
Complex Z1 in Species Z
![Page 21: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/21.jpg)
Alignment algorithm
Problem of identifying conserved sub-networks reduces to finding high scoring subgraphs
NP-complete problem Heuristic solution:
Greedy extension of high scoring seeds(Does this sound familiar? BLAST?)Common to both papers discussed
![Page 22: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/22.jpg)
Alignment algorithm
1. Find seeds for each node v in the alignment graph
a. Find high scoring paths of 4 nodes by exhaustive search
b. Greedily add 3 other nodes one by one, that maximally increase the score of the seed
![Page 23: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/23.jpg)
Alignment algorithm
2. Iteratively add or remove nodes to increase the overall score of the node
Original seeds are preserved Limit size of discovered subgraphs to 15
nodes Record up to 4 highest scoring subgraphs
discovered around each node
![Page 24: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/24.jpg)
Alignment algorithm
3. Filter subgraphs with a high degree of overlap
Iteratively find high scoring subgraph and remove all highly overlapping ones remaining
![Page 25: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/25.jpg)
ResultsConserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.
![Page 26: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/26.jpg)
ResultsPrediction of protein function
• ‘Guilt by association’
• If a conserved cluster or path is significantly enriched in a functional annotation
Prediction of protein interactions
Predictions based on 2 strategies:
• Evidence that proteins with similar sequences interact
• Co-occurrence of proteins in the same conserved cluster or path
• Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate
![Page 27: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/27.jpg)
Overview of
Fast, scalable, network alignmentScales linearly in number of networks
comparedNetworkBLAST scales exponentially
Supports efficient querying of modules Speed-sensitivity control via user defined
parameterNot supported in NetworkBLAST
![Page 28: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/28.jpg)
Input to the Algorithm
Weighted protein interaction graphsWeights represent probability that proteins
interactConstructed via network integration algorithm
covered in a later lecture A phylogenetic tree relating the species in
the desired alignmentUsed for progressive alignment
![Page 29: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/29.jpg)
Definition of an alignment
A set of subgraphs chosen from the interaction networks of different species, together with a mapping between aligned proteins
Aligned proteins form equivalence classes Each class was derived from a common ancestral
protein Can contain multiple proteins from the same species
a a’ a’’ b’’
Equivalence class showing paralogs
![Page 30: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/30.jpg)
Scoring Function
Log likelihood ratio model based onAlignment model M: modules are subject to
evolutionary constraintRandom model R: modules are not subject to
any constraints Scores equivalence classes and alignment
edges separately
![Page 31: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/31.jpg)
Log Likelihood Ratio Model (Recap) Measures the likelihood that a module occurs if it
is subject to evolutionary constraint vs. that if it were a randomly constructed network
Randomly constructed network preserves degree distribution for nodes
logPr(Module occurs | Alignment Model M)
Pr(Module occurs | Random Model R)
![Page 32: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/32.jpg)
Scoring Equivalence Classes
Reconstruct most parsimonious ancestral history of an equivalence class using Dynamic Programming based on five types of evolutionary events
Alignment model and random model give probabilities for each of these events, combined to give a log likelihood score
![Page 33: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/33.jpg)
Scoring Alignment Edges
Alignment scores should reflect both network conservation and high connectivity – difficult to strike a balance
Introduction of a novel scoring approachEdge Scoring Matrix – Indexed by labelsAlgorithm assigns a label to each equivalence
class, scores according to distribution function in cells referenced by labels
![Page 34: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/34.jpg)
Scoring: ESM
![Page 35: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/35.jpg)
Alignment Algorithm:d-Clusters for Seed Generation A d-cluster consists of d
proteins close together in a network
“Close” means edge weights are high, so interaction is highly likely
Intuition is that high scoring alignments will have high scoring d-clusters
![Page 36: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/36.jpg)
Alignment Algorithm:d-Clusters for Seed Generation Identify pairs of d-clusters
that score higher than a threshold T Score is defined by greedily
matching nodes from each d-cluster to obtain a high score
Uses these pairs as seeds Allows for speed-sensitivity
tradeoff
![Page 37: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/37.jpg)
Alignment Algorithm: Generating An Initial Alignment From The Seed Determine highest scoring pair of nodes
(one from each d-cluster) when aligned Align these nodes and place these nodes
as well as their neighbors, into a frontier
3.0
1.5
5.0
![Page 38: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/38.jpg)
Alignment Algorithm:Greedy Seed Extension Phase Examine all pairs of
nodes in frontier for pair that maximally increases score when added to alignment
Stops when no pair can further increase the score
Remove equivalence classes if it can further increase the score
Frontier
Current alignment
![Page 39: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/39.jpg)
Alignment Algorithm:Multiple Alignment Progressive alignment
technique using the phylogenetic tree Successively aligns closest
pair of networks Places each aligned
network at the parent node of the two aligned species
Linear scaling in number of species
![Page 40: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/40.jpg)
Performance Comparison:Speed-sensitivity / Linear Scaling
![Page 41: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/41.jpg)
Performance Comparison: Multiple Alignment
![Page 42: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/42.jpg)
Performance Comparison: Module Querying
![Page 43: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/43.jpg)
ResultsFunctional module identification using network alignment
Functional module for transformation?
![Page 44: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/44.jpg)
Results
Functional annotation using network alignment
Pairwise alignment
Multiple alignment of 9 networks
Conserved DNA replication module
![Page 45: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/45.jpg)
Results
Multiple alignment of 10 networks showing possible cell division module
Functional annotation using network alignment
![Page 46: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/46.jpg)
The Future of Network Comparison
Græmlin
Græmlin?
Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
![Page 47: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/47.jpg)
That’s all folks!
Thank you!
Questions?
![Page 48: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/48.jpg)
![Page 49: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/49.jpg)
Performance Comparison:Sensitivity
![Page 50: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d3f5503460f94a182f7/html5/thumbnails/50.jpg)
Scoring Sequence Mutations
Weighted sum of pairs scoring