betweenness centrality: algorithms and implementations dimitrios prountzos keshav pingali the...
TRANSCRIPT
Betweenness Centrality: Algorithms and
ImplementationsDimitrios Prountzos
Keshav Pingali
The University of Texas at Austin
2
Focus of this Talk
• A novel formulation of Betweenness Centrality– Based on Operator Formulation (Pingali et al. 2011)– Can express existing parallel solutions – Basis for new class of asynchronous parallel solutions
• Systematic derivation of parallel implementations from operator formulation– Ideas applicable to other irregular algorithms
3
• Basic ingredient in Betweenness Centrality
• Problem Formulation– Compute shortest distance
from source node S to every other node
• Many algorithms– Bellman-Ford (1957)– Dijkstra (1959)– Chaotic relaxation (Miranker 1969)– Delta-stepping (Meyer et al. 1998)
• Common structure– Each node has label d
with known shortest distance from S
• Key operation– relax-edge(u,v)
Warm-up: Single-Source Shortest-Path
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
12
2 A
C
3
If d(A) + WAC < d(C) d(C) = d(A) + WAC
4
Operator Formulation Concepts
u
vdu
dv
w du + w < dv
du+wdu
wu
v
Operator: Conditional rewrite rule on graph
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
What should be done How it should
be done
: activity
“TAO of parallelism”PLDI 2011
5
• Identifies important nodes in a network
• Brandes’ Algorithm (2001)– Forward Pass:
• Compute shortest-path DAG for a given source S• Compute shortest-path count σ(u) for each node u
– Backward Pass: • Traverse DAG and compute BC(u)
• Parallel Implementations– Bader et al. (2006)– Madduri et al. (2009)– Edmonds et al. (2010)– …
Betweenness Centrality
# shortest paths from s to t
# shortest paths from s to t through v
B
A
D
C
E
0,1
1,11,1
1,1
2,2
dC,σC
6
B
BC Operator Formulation
uvdu ,σu
dv ,σv
duv ,w,σuv
dv > du + w
du+w , 0du ,σu
duv ,w,σuv
Pv= , S∅ v=∅
uv
Shortest Path (SP)
A D
C
E0,1
1,1∞,0
1,1
2,02,1
∞,0 C
D1,1
predecessor
successor
uvdu , σu
du+w ,σv
duv ,w,σuv
duv ≠ du
du+w,σv+σu
du , σu
duv ,w, σu
Pv= u⩲
uv
First Update (FU)
Su v⩲
uvdu ,σu
du+w ,σv
du ,w,σuv
σu ≠ σuv
du+w, σv+σu- σuv
du ,σu
duv ,w,σuuv
Update Sigma (US)
uv
du , σu
dv ,σv
du ,w,σuv
du ≠ ∞ ⋀ du+w >dv
dv ,σv
du ,σu, Su−=v
∞ , w, σuv
u
v
Correct Node (CN)
dA,σA
7
B
BC Operator Formulation
uvdu ,σu
dv ,σv
duv ,w,σuv
dv > du + w
du+w , 0du ,σu
duv ,w,σuv
Pv= , S∅ v=∅
uv
Shortest Path (SP)
A D
C
E0,1
1,11,1
1,1
uvdu , σu
du+w ,σv
duv ,w,σuv
duv ≠ du
du+w,σv+σu
du , σu
duv ,w, σu
Pv= u⩲
uv
First Update (FU)
Su v⩲
uvdu ,σu
du+w ,σv
du ,w,σuv
σu ≠ σuv
du+w, σv+σu- σuv
du ,σu
duv ,w,σuuv
Update Sigma (US)
uv
du , σu
dv ,σv
du ,w,σuv
du ≠ ∞ ⋀ du+w >dv
dv ,σv
du ,σu, Su−=v
∞ , w, σuv
u
v
Correct Node (CN)
2,1
8
Operator Scheduling for Parallel Algorithm Derivation
op {SP,FU,US,CN}∈while op(u,v) enabled∃
apply op(u,v)
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activitiesop {SP,FU,US,CN}∈
Wl = { (u,v) : op(u,v) enabled }while ¬Wl.empty { apply operator(s) Wl = …∪}
Static Ordering
Dynamic Ordering
9
Dynamic Operator Scheduling
Worklist… …
B
A E
C
G
…
D F
J
H
I
………
T1 Tk
…
Our operators are general enough to enable this scheme
Variation of Delta-Stepping(Meyer et al. 1998)
10
Static Operator Scheduling
• Operator Grouping⊕ Exploit locality, reduce worklist pressure⊖ Load balancing
• Operator Merging – E.g. SP(u,v); FU(u,v) Combine into SP FU⨀⊕ Optimize computation + locking
• Context-based Operator Inlining⊕ Reduce worklist pressure⊖ Load balancing
Bind scheduling decisions at compile-time by committing to particular code structure
B1
Bn
A
…
11
Algorithm Encodings
• Async1 : (SP|FU|US|CN)* – SP FU⨀
• Async2 : (SP|FU|US|CN)*
– Group (u,v*)– SP FU⨀– SP FU ⨀ inline CN
• Leveled : level-by-level (SP|FU)* – Group (u,v*)– SP FU⨀
12
Experimental Evaluation
13
Experiments on Unweighted Graphs
24 core Intel Xeon @ 2 GHz
Scale-Free RMAT Graph33 M nodes 268 M edges
Random Graph67 M nodes 268 M edges
1 4 8 12 16 20 240
200
400
600
800
1000
1200
Leveled1 Leveled2 Async2 Leveled2-Serial
Threads
Tim
e (s
ec)
1 4 8 12 16 20 240
200
400
600
800
1000
1200
1400
1600
1800
Leveled1 Leveled2 Async2 Leveled2-Serial
Threads
Tim
e (s
ec)
Leveled1 (Bader et al. 2006)Leveled2 (Madduri et al. 2009)
14
Experiments on Weighted Graphs
USA Road Network24 M nodes 58 M edges
USA Central Road Network14 M nodes 34 M edges
24 core Intel Xeon @ 2 GHz
Sun T5440 UltraSPARC T2+ @ 1.4 GHz
1 4 8 12 16 20 240
100
200
300
400
500
600
Async1 Boost-Serial
Threads
Tim
e (s
ec)
1 16 32 48 64 96 1280
200
400
600
800
1000
1200
Async1 Boost-Serial
Threads
Tim
e (s
ec)
9.5x
38x
15
Conclusion
• New BC formulation – Expresses existing solutions– Basis for new asynchronous solutions
• Systematic derivation of parallel implementations– Dynamic + Static schedule transformations
• Enables automatic synthesis of parallel programs– Elixir [OOPSLA2012]
Thank You
Parallel Graph Algorithm
Operators Schedule
16
Backup
17
• Identifies important nodes in a network
• Brandes’ Algorithm (2001)
Updateδ(v),BC(v)
Betweenness Centrality
A B
C
E F
H
S
Compute shortest path
DAG
(0,1)
(1,1)
(1,1)
(2,2)
(3,2)(3,2)
(4,4)
L(u),σ(u)
18
B
BC Operator Formulation
uvLu ,σu
Lv ,σv
Luv ,w,σuv
Lv > Lu + w
Lu+w , 0Lu ,σu
Luv ,w,σuv
Pv= , S∅ v=∅
uv
Shortest Path (SP)
A D
C
E0,1
1,1∞,0
1,1
2,02,1
∞,0 C
D1,1
predecessor
successor
uvLu , σu
Lu+w ,σv
Luv ,w,σuv
Luv ≠ Lu
Lu+w,σv+σu
Lu , σu
Luv ,w, σu
Pv= u⩲
uv
First Update (FU)
Su v⩲
uvLu ,σu
Lu+w ,σv
Lu ,w,σuv
σu ≠ σuv
Lu+w, σv+σu- σuv
Lu ,σu
Luv ,w,σuuv
Update Sigma (US)
uv
Lu , σu
Lv ,σv
Lu ,w,σuv
Lu ≠ ∞ ⋀ Lu+w >Lv
Lv ,σv
Lu ,σu, Su−=v
∞ , w, σuv
u
v
Correct Node (CN)
L(u),σ(u)
19
BC Operator Formulation
B
A D
C
E0,1
1,12,1
1,1
3,12,1
2,22,2
3,2
D
1,0
1,1∞,1
uvLu ,σu
Lv ,σv
Luv ,w,σuv
Lv > Lu + w
Lu+w , 0Lu ,σu
Luv ,w,σuv
Pv= , S∅ v=∅
uv
Shortest Path (SP)
uvLu , σu
Lu+w ,σv
Luv ,w,σuv
Luv ≠ Lu
Lu+w,σv+σu
Lu , σu
Luv ,w, σu
Pv= u⩲
uv
First Update (FU)
Su v⩲
uvLu ,σu
Lu+w ,σv
Lu ,w,σuv
σu ≠ σuv
Lu+w, σv+σu- σuv
Lu ,σu
Luv ,w,σuuv
Update Sigma (US)
uv
Lu , σu
Lv ,σv
Lu ,w,σuv
Lu ≠ ∞ ⋀ Lu+w >Lv
Lv ,σv
Lu ,σu, Su−=v
∞ , w, σuv
u
v
Correct Node (CN)
20
Deriving Algorithm VariantsWorklist Wl = { src }foreach u Wl ∈ { forall v outNbrs(u) ∈ { lock(u,v) if grd[SP FU,u,v]⨀ { … Wl = { v }∪ if vHasPreds { forall w inNbrs(v) {∈ lock(v,w) if grd[CN,w,v] {…} unlock(v,w) } } } else-if … }}
Worklist Wl = { (src,w) : (src,w) G(V,E) }∈foreach (u,v) Wl ∈ { lock(u,v) if grd[SP FU,u,v]⨀ { apply SP FU ; unlock(u,v)⨀ Wl = { (v,w) : w outNbrs(v) }∪ ∈ if vHasPreds Wl = { (w,u) : w inNbrs(v) }∪ ∈ } else-if grd[CN,u,v] { apply CN ; unlock(u,v) } else-if grd[FU,u,v] { … } else-if grd[US,u,v] { … }}
Async1 Async2
21
Insights Behind Elixir
What should be done
How it should be done
Unordered/Ordered algorithms
Operator Delta
: activity
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
“TAO of parallelism”PLDI 2011