a double spectral approach to disease module identificationjjc2718.github.io/rsg_dream_talk.pdf ·...
TRANSCRIPT
![Page 1: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/1.jpg)
A Double Spectral Approach to Disease Module Identification
Jake Crawford, Junyuan Lin, Xiaozhe Hu, Benjamin Hescott, Donna Slonim, and Lenore Cowen
Tufts University
![Page 2: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/2.jpg)
Our Goal
• Identify module structure in biological networks, enriched for a collection of disease gene sets.
![Page 3: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/3.jpg)
PPI network: issue with defining local neighborhoods: “small-world” network where everything is nearby!
![Page 4: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/4.jpg)
Dream Network 1
![Page 5: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/5.jpg)
Dream Network 2
![Page 6: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/6.jpg)
Dream Network 3
![Page 7: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/7.jpg)
Dream Network 4
![Page 8: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/8.jpg)
Dream Network 5
![Page 9: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/9.jpg)
Dream Network 6
![Page 10: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/10.jpg)
Our Approach
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 11: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/11.jpg)
Our Approach
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 12: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/12.jpg)
Our insight: not all short paths give equal indications of similarity
Define He (A,B) to be the expected number of times a k-step random walk starting at A reaches B (including 0-length walk)
k
![Page 13: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/13.jpg)
DSD: A Spectral Distance Metric
• Hek (A) = (Hek (A, v1), Hek (A, v2), …, Hek (A, vn))
• DSDk (A, B) = | Hek (A) - Hek (B) |1
• We prove DSD is a metric, and converges as k→∞ : call the converged version DSD (A, B) .
![Page 14: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/14.jpg)
DSD spreads out distances
![Page 15: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/15.jpg)
Our Approach
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 16: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/16.jpg)
Speeding Up DSD
For an ergodic network, the DSD distance between two nodes u and v can also be written as
where bi is the ith basis vector, I is the identity matrix, P is the transition matrix of the network, and W is a matrix where each row is a copy of πT = the steady state distribution of the network.
(proof of this in Cao et al. 2013)
![Page 17: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/17.jpg)
Speeding Up DSD
• Computing the matrix inverse (I - P + W)-1 is the most computationally expensive part
• We developed an efficient algorithm to reduce the amortized time for sparse networks, using algebraic multigrid methods
![Page 18: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/18.jpg)
The method only computes approximate DSD; some artifacts, good enough
![Page 19: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/19.jpg)
Our Approach
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 20: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/20.jpg)
Genetic interactions (epistasis)
• For non-essential genes, we can compare the growth of the double knockout to its component single knockouts
Picture: Ulitsky
![Page 21: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/21.jpg)
Finding Bipartite Subgraphs
• “Between Pathway Model” (Kelley and Ideker, 2005)
• Regions having many negative genetic interactions between them (and physical interactions within regions)
• May indicate compensatory biological pathways
• Use Brady et. al 2009 definition; based only on negative genetic interactions (ignore physical interactions)
![Page 22: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/22.jpg)
What is the Quality of a BPM?
Once we obtain a candidate BPM we can score it using
interaction data.
Sum interactions within
Sum interactions between
Take the difference and normalize to create an
interaction score
-0.664347
0.553838
-7.321556
-6.315511
3.685398
-5.252571
-3.365368
3.236723
-1.366879
2.13473
0.13342
Look for a collection of BPMs in half the networks
![Page 23: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/23.jpg)
Our Approach
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 24: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/24.jpg)
Subchallenge 1, Step 1: Get the DSD matrix for each network
![Page 25: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/25.jpg)
What about the directed network?
• We wanted to preserve edge direction, but keep the network (strongly) connected to run DSD in a sensible way
• Add low-weight back edges if none exists already
• Gave better results than treating all edges as undirected
u v
0.8
u v
0.8
u v
0.8
0.001
![Page 26: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/26.jpg)
We have a distance matrix. Now what?
• Many pre-existing methods to cluster a pairwise distance (or similarity) matrix
• Spectral clustering performed the best in the leaderboard rounds
![Page 27: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/27.jpg)
Spectral Clustering
• Convert distance matrix to similarity matrix using radial basis function (RBF) kernel (high distance -> low similarity and vice-versa)
• Dimension reduction + k-means clustering (used default scikit-learn implementation)
• Tested different values of k in the training rounds
Pedregosa et al, JMLR 12:2825-2830 (2011)
![Page 28: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/28.jpg)
Network 1 - Choosing k
Submission NS (train) NS for different FDR (1%, 2.5%, 10%) NS (test)
k = 1000 14 (8, 10, 22) 16
k = 1200 14 (3, 7, 19) N/A
k = 500, altered weighting scheme 14 (5, 9, 23) N/A
k = 500 12 (3, 7, 20) N/A
![Page 29: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/29.jpg)
Network 5 - Choosing k
Submission NS (train) NS for different FDR (1%, 2.5%, 10%) NS (test)
GC and k=100 overlap 6 (1, 2, 8) N/A
GC and k=200 overlap 5 (3, 3, 7) 5
k=100 5 (1, 3, 6) N/A
![Page 30: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/30.jpg)
So far we have:
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 31: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/31.jpg)
• Small clusters (size < 3): throw out
• Large clusters (size > 100): partition recursively (again using spectral clustering on the DSD matrix), until all clusters are of size < 100
• In practice, splitting large clusters improved most of the results we tested, and only hurt one result (out of ~15 comparisons)
Correcting Cluster Sizes
![Page 32: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/32.jpg)
• For networks 1, 4, and 6, we’re done.
• For networks 2, 3, and 5, we tried one more trick…looking for BPMs!
![Page 33: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/33.jpg)
Finding Bipartite Subgraphs
• We can identify many candidates by looking only at negative genetic interactions (Brady et al., 2009)
• We suspected there might be bipartite subgraphs in networks 2, 3, and 5
• We used the Genecentric software package to identify bipartite substructure
![Page 34: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/34.jpg)
Clusters from bipartite subgraphs
Clusters from spectral clustering
Single set of final clusters
But, we’re not quite finished
(for networks 2, 3, and 5)
![Page 35: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/35.jpg)
Merging Clusters: Approach 1 (greedy)
1
1 2 3
4 5
2 3 4
5 6
7 8
1 2 3 4 5 7 8
9
9
Spectral clusters Genecentric clusters
![Page 36: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/36.jpg)
Merging Clusters: Approach 2 (overlap)
1 1 2 3
4 5
2 3 4
5 6
7 8
1 2 3 7 8
6 7 8
1
2
3
1
2
3
4 5 6
(1, 1) (1, 1) (1, 1) (2, 2)(1, 2) (2, 3) (3, 3) (3, 3)
9 9
9
(3, 3)
Spectral clusters Genecentric clusters
![Page 37: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/37.jpg)
Looking back:
Cluster the resulting distance matrix
Correct cluster sizesLook for bipartite
subgraphs
Return valid clusters
DSDDiffusion State Distance
![Page 38: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/38.jpg)
Subchallenge 2
• Took union of networks
• For overlapping edges with weights w1, w2, create edge with weight max(w1, w2) + 0.05
• Same DSD + spectral clustering approach as before
• Union of networks 1, 2, and 4 performed the best across FDR levels
![Page 39: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/39.jpg)
Subchallenge 2: Step 0: Superimpose networks 1,2 and 4
Step 1: Get the DSD matrix for the combined network
![Page 40: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/40.jpg)
Future Directions
• Test modifications to edge weights
• Modify other clustering algorithms to run on top of DSD, there are many possible alternatives
• We could also test additional methods of combining edges (or adding networks) for Subchallenge 2
![Page 41: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/41.jpg)
Acknowledgements
Thanks to the Tufts Bioinformatics and Computational Biology group
for helpful ideas and critiques
![Page 42: A Double Spectral Approach to Disease Module Identificationjjc2718.github.io/rsg_dream_talk.pdf · Finding Bipartite Subgraphs • “Between Pathway Model” (Kelley and Ideker,](https://reader030.vdocuments.net/reader030/viewer/2022041115/5f24a81c9c7fa651c970bef0/html5/thumbnails/42.jpg)
Looking back
Cluster the resulting distance matrix(spectral clustering)
Correct cluster sizes(recursive spectral
clustering)
Look for bipartite subgraphs (Genecentric)
Return valid clusters (putative disease modules)
Define a network distance measure (DSD)