center-piece subgraphs: problem definition and fast solutions hanghang tong christos faloutsos...
TRANSCRIPT
![Page 1: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/1.jpg)
Center-Piece Subgraphs: Problem definition and
Fast Solutions
Hanghang Tong
Christos FaloutsosCarnegie Mellon University
![Page 2: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/2.jpg)
2
Center-Piece Subgraph(Ceps)
• Given Q query nodes• Find Center-piece ( )
• Input of Ceps– Q Query nodes– Budget b– K softand coefficient
• App.– Social Network– Law Inforcement– Gene Network– …
A C
B
A C
B
A C
B
b
![Page 3: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/3.jpg)
3
Challenges in Ceps
• Q1: How to measure the importance?
• Q2: How to extract connection subgraph?
• Q3: How to do it efficiently?
![Page 4: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/4.jpg)
4
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: Extract Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
![Page 5: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/5.jpg)
5
Ceps Overview
• Individual Score Calculation– Measure importance wrt individual query
• Combine Individual Scores– Measure importance wrt query set
• “Extract” Alg.– … the connection subgraphs
( , ) n Qr i j
1( , ) nr Q j
arg max ( )H g H
A C
B
![Page 6: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/6.jpg)
6
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
![Page 7: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/7.jpg)
7
RWR: Individual Score Calculation
• Goal– Individual importance score r(i,j) = ri,j
– For each node j wrt each query i
• How to– Random walk with restart– Steady State Prob.
(1 )r c Pr ce
![Page 8: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/8.jpg)
8
An Illustrating Example
1
2
3
4
5
6
789
11
10 13
12•Starting from 1
•Randomly to neighbor
•Some p to return to 1
Prob (RW will finally stay at j)
![Page 9: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/9.jpg)
9
Individual Score Calculation
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
1
10
11
9 8
12
13
4
3
62
0.5767
0.1260
0.1235
0.1260
0.0283
0.0333
0.0024
0.0088
0.0076
0.00760.00240.0333
0.0088
7
5
![Page 10: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/10.jpg)
10
Individual Score Calculation
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
Individual Score matrix
1
10
11
9 8
12
13
4
3
62
0.5767
0.1260
0.1235
0.1260
0.0283
0.0333
0.0024
0.0088
0.0076
0.00760.00240.0333
0.0088
7
5
![Page 11: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/11.jpg)
11
AND: Combine Scores
• Q: How to combine scores?
• A: Multiply• …= prob. 3 random
particles coincide on node j
![Page 12: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/12.jpg)
12
K_SoftAnd: Combine Scores
Generalization – SoftAND:
We want nodes close to k of Q (k<Q) query nodes.
Q: How to do that?
![Page 13: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/13.jpg)
13
K_SoftAnd: Combine Scores
Generalization – softAND:
We want nodes close to k of Q (k<Q) query nodes.
Q: How to do that?
A: Prob(at least k-out-of-Q will meet each other at j)
![Page 14: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/14.jpg)
14
K_SoftAnd: Relaxation of AND
Asking AND query? No Answer!
Disconnected Communities
Noise
![Page 15: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/15.jpg)
16
AND query vs. K_SoftAnd query
And Query 2_SoftAnd Query
x 1e-4
1 7
5
10
11
9 8
12
13
4
3
62
0.4505
0.1010
0.0710
0.1010
0.2267
0.1010
0.1010
0.4505
0.0710
0.07100.10100.1010
0.4505
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.0046
0.0019
0.0046
0.0024
0.0046
0.0046
0.0103
0.0019
0.00190.00460.0046
0.0103
![Page 16: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/16.jpg)
17
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.1617
0.1387
0.1617
0.0849
0.1617
0.1617
0.0103
0.1387
0.13870.16170.1617
0.0103
1_SoftAnd query = OR query
![Page 17: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/17.jpg)
18
Measuring Importance
Q1 Q2 Q3
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12Node 13
0.5767 0.0088 0.0088 0.1235 0.0076 0.0076 0.0283 0.0283 0.0283 0.0076 0.1235 0.0076 0.0088 0.5767 0.0088 0.0076 0.0076 0.1235 0.0088 0.0088 0.5767 0.0333 0.0024 0.1260 0.1260 0.0024 0.0333 0.1260 0.0333 0.0024 0.0333 0.1260 0.0024 0.0024 0.1260 0.0333 0.0024 0.0333 0.1260
0.45050.07100.22670.07100.45050.07100.45050.10100.10100.10100.10100.10100.1010
OR
0.01030.00190.01030.00190.01030.00190.00240.00460.00460.00460.00460.00460.0046
K_SoftAnd
Random
walk w
ith restart
And 2_SoftAnd
Individual Scores Combining Scores
Steady State Prob
Meeting Prob
![Page 18: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/18.jpg)
19
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency
• Experimental Results
• Conclusion
![Page 19: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/19.jpg)
20
• Goal– Maximize total scores and– ‘Appropriate’ Connections
• How to…”Extract” Alg.– Dynamic Programming– Greedy Alg.
• Pickup promising node• Find ‘best’ path
“Extract” Alg.
1
2
3
54
6
7
8
910
11
12
13
14 15 16
1
2
3
54
6
7
8
910
11
12
13
![Page 20: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/20.jpg)
21
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency
• Experimental Results
• Conclusion
![Page 21: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/21.jpg)
22
Graph Partition: Efficiency Issue
• Straightforward way– Q linear system: – linear to # of edge
• Observation– Skewed dist.
• How to…– Graph partition
![Page 22: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/22.jpg)
23
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
![Page 23: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/23.jpg)
24
Experimental Setup
• Dataset– DBLP/authorship– Author-Paper– 315k nodes– 1,800k edges
• Evaluation Criteria– I Node Ratio
– I Edge Ratio
![Page 24: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/24.jpg)
25
Experimental Setup
• We want to check– Does the goodness criteria make sense?– Does “extract” alg. capture most of important
nodes/edge?– Efficiency
![Page 25: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/25.jpg)
26
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Heikki Mannila
Christos Faloutsos
Padhraic Smyth
Corinna Cortes
15 1013
1 1
6
1 1
4 Daryl Pregibon
10
2
11
3
16
![Page 26: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/26.jpg)
27
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Umeshwar Dayal
Bernhard Scholkopf
Peter L. Bartlett
Alex J. Smola
1510
13
3 3
5 2 2
327
4
2_SoftAnd query
Statistic
database
![Page 27: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/27.jpg)
28
Evaluation of “Extract” Alg.
• 20 nodes
• 90%+ preserved
Budget (b)
Node Ratio 2 query nodes
3 query nodes
![Page 28: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/28.jpg)
29
Running Time vs. Quality for Fast Ceps
Running Time
Quality
~90% quality
6:1 speedup
![Page 29: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/29.jpg)
30
Roadmap
• Ceps Overview
• Q1: Goodness Score Calculation
• Q2: “Extract” Alg.
• Q3: Efficiency Issue
• Experimental Results
• Conclusion
![Page 30: Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University](https://reader035.vdocuments.net/reader035/viewer/2022062515/56649c755503460f9492944e/html5/thumbnails/30.jpg)
31
Conclusion
• Q1:How to measure the importance?• A1: RWR+K_SoftAnd• Q2: How to find connection subgraph?• A2:”Extract” Alg.• Q3:How to do it efficiently?• A3:Graph Partition (Fast Ceps)
– ~90% quality– 6:1 speedup