data partitioning strategies for graph workloads on ... · sc 2015 data partitioning strategies for...
TRANSCRIPT
SC 2015
Data Partitioning Strategies for Graph Workloads on Heterogeneous Clusters
Michael LeBeane, Shuang Song, Reena Panda, Jee Ho Ryoo, Lizy K. JohnThe University of Texas at [email protected]
SC 2015
▪ Heterogeneity is pervasive in
modern data centers [][]
▪ Graph analytics are a pervasive
workload in the data center []
– Many frameworks available to
efficiently and easily perform graph
analytics [][][][]
▪ Most frameworks are not
equipped to deal with
heterogeneity in the data center
Motivation
2 Michael LeBeane 11/18/2015
Network
Compute
Node
Compute
Node
Compute
Node
Compute
NodeCompute
Node
Compute
Node
Data
Data
Data
Data Data
Data
SC 2015
▪ Online vs. Offline Partitioning
Background
3 Michael LeBeane 11/18/2015
1
2
1
1
2
21
2
12
1
1
▪ All work performed on
PowerGraph[] framework
▪ Three relevant graph partitioning
topics:
– Online vs. Offline Partitioning
– Vertex vs. Edge Cut
– Gather/Apply/Scatter
SC 2015
▪ Vertex vs. Edge Cut
Background
4 Michael LeBeane 11/18/2015
Machine X Machine Y(a) Vertex Cut
(b) Edge Cut
Master
Ghost
1 2
3 4
1 2
3 4
1 2
3 4
1 2
3 4
1 2
3 4(a) Gather (b) Apply (c) Scatter
F(x)
1
2
3
4
1’
2’
3’
4’
5 5’
▪ Gather/Apply/Scatter
SC 2015
▪ Skewed Data Partitioning
Workload Skew in Heterogeneous Data Centers
5 Michael LeBeane 11/18/2015
Time
Compute
Communication
Fast
NodeData
Data
Barrier
Slow
Node
ComputeCommunication
Compute
Communication
ComputeCommunication
Runtime Improvement
Communication
ComputeCommunication
Fast
Node Data
DataSlow
Node
Barrier
Idle Compute
Communication
ComputeCommunication
Time
IdleCompute
▪ Normal Data Partitioning
SC 2015
▪ Local node computation time
dependent on data
distribution
▪ To properly balance work, we
need:
– Estimation of each node’s
computational capacity
– Partitioning algorithms that
account for skewed
computational capacity
Heterogeneous Graph Analytics
6 Michael LeBeane 11/18/2015
File 1 File 2 File N
Loading Files
Partitioning Graph
Finalizing Graph
App Execution
DataData Data Data
Baseline Partitioner
Heterogeneity Aware Partitioner
Computation Capacity
1 2
Graph
Node 1 Node 2 Node 3 Node n
SC 2015
▪ Computation capacity is
complex
▪ Dependent on many factors:
– Hardware of the node
– Nature of the graph
– Nature of the algorithm
– Communication patterns
▪ Can we determine a simple,
static estimate?
Heterogeneous Computation Capacity
7 Michael LeBeane 11/18/2015
File 1 File 2 File N
Loading Files
Partitioning Graph
Finalizing Graph
App Execution
DataData Data Data
Baseline Partitioner
Heterogeneity Aware Partitioner
ComputationCapacity
1 2
Graph
Node 1 Node 2 Node 3 Node n
SC 2015
Skew Factor Calculation
8 Michael LeBeane 11/18/2015
▪ Static estimate of node computational capacity could be based on:
– Threads: Logical compute threads on node (default N – 2 )
– Memory: Physical memory assigned to a node
– Profiling: Local throughput of graph subset and algorithm
▪ We will refer to the estimated ratios of computation capacity as the
skew factor of the heterogeneous data center
Name HW Threads Memory Network
c4.xlarge 4 7.5 GB 100 Mbps to 1.86 Gbps
c4.2xlarge 8 15 GB 100 Mbps to 1.86 Gbps
c4.4xlarge 16 30 GB 100 Mbps to 1.86 Gbps
c4.8xlarge 36 60 GB up to 8.86 Gbps
Thread Skew Factor Memory Skew Factor
1 1
3 2
7 4
17 8
SC 2015
▪ Online partitioning algorithms
must be modified to support
skew factor
▪ Easy to modify current online
partitioning algorithms
▪ We have modified 5 popular
algorithms from multiple
sources
Heterogeneous Partitioning Algorithm
9 Michael LeBeane 11/18/2015
File 1 File 2 File N
Loading Files
Partitioning Graph
Finalizing Graph
App Execution
DataData Data Data
Baseline Partitioner
Heterogeneity Aware Partitioner
ComputationCapacity
1 2
Graph
Node 1 Node 2 Node 3 Node n
SC 2015
Problem Formulation
10 Michael LeBeane 11/18/2015
▪ Statically estimated based on:
– Threads: Logical compute threads on node (default N – 2 )
– Memory: Physical memory assigned to a node
– Profiling: Local throughput of graph subset and algorithm
▪ Statically estimated based on:
SC 2015
Random Skewed Partitioner
11
▪ Original ▪ Skewed
Node 0 Node 1 Node n
….
Random
Assignment
….
Random
Assignment
Node 0 Node 1 Node n
Skew
Factor
Edge Edge
▪ Random assignment of edges to nodes
SC 2015
Greedy Skewed Partitioner
12 Michael LeBeane 11/18/2015
▪ Original ▪ Skewed
Node 0 Node 1 Node n
….
Heuristic
Assignment
….
Heuristic
Assignment
Node 0 Node 1 Node n
Skew
Factor
Edge Edge
Balance Balance
▪ Greedy decision using current distribution of edges
– Either locally or coordinated
SC 2015
Grid Skewed Partitioner
13 Michael LeBeane 11/18/2015
▪ Original ▪ SkewedNode 0 Node 1 Node n
….
Grid Hash
Edge
▪ Greedy decision using current distribution of edges
– Either locally or coordinated
Random
Selection
Grid
Node 0 Node 1 Node n
….
Grid Hash
Edge
Random
Selection
GridSkew
Factor
SC 2015
Hybrid Skewed Partitioner
14 Michael LeBeane 11/18/2015
Node 0 Node 1 Node n
….
Random
Assignment
Edge Vertex
Node 0 Node 1 Node n
….
Heuristic
Assignment
Degree >
Threshold
Vertex
Node 0 Node 1 Node n
….
Random
Assignment
Edge Vertex
Node 0 Node 1
….
Heuristic
Assignment
Degree >
Threshold
Vertex
Node n
▪ Skewed
Skew
Factor
Skew
Factor
▪ Random assignment of edges/verticies to nodes based on degree
▪ Original
SC 2015
Ginger Skewed Partitioner
15 Michael LeBeane 11/18/201515 Michael LeBeane 11/18/2015
▪ Random assignment of edges/verticies to nodes based on degree
Node 0 Node 1 Node n
….
Random
Assignment
Edge Vertex
Node 0 Node 1 Node n
….
Heuristic
Assignment
Degree >
Threshold
Vertex
Node 0 Node 1 Node n
….
Random
Assignment
Edge Vertex
Node 0 Node 1
….
Heuristic
Assignment
Degree >
Threshold
Vertex
Node n
▪ Skewed
Skew
Factor
Skew
Factor
▪ Original
BalanceBalance
SC 2015
Experimental Setup
16 Michael LeBeane 11/18/2015
▪ Algorithms
– Graph: PageRank (PR), Connected Components (CC), Triangle Count (TC)
– Matrix: Stochastic Gradient Descent (SGD), Alternating Least Squares (ALS)
▪ Data Sets
Name Vertices Edges Size (Uncompressed) Type Algorithms
amazon 403,394 3,384,388 46MB Directed Graph PR,CC,TC
citation 3,774,768,NA 16,518,948 268MB Directed Graph PR,CC,TC
netflix NA NA 100MB Sparse Matrix ALS,SGD
road-map 1,379,917 1,921,660 84MB Undirected Graph PR,CC,TC
social-network 4,847,571 68,993,773 1.1GB Directed Graph PR,CC,TC
twitter 41,000,000 1,400,000,000 25GB Directed Graph PR,CC,TC
wiki 2,394,385 5,021,410 64MB Directed Graph PR,CC,TC
SC 2015
Experimental Setup
17 Michael LeBeane 11/18/2015
▪ Data Center
– Graph: PageRank (PR), Connected Components (CC), Triangle Count (TC)
– Matrix: Stochastic Gradient Descent (SGD), Alternating Least Squares (ALS)
▪ Skew Factor
– Results use Thread Based Skew Factor
SC 2015
Execution Time
18 Michael LeBeane 11/18/2015
▪ Pagerank
0
10
20
30
40
50
60
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
social_network amazon citation road_map wiki
Ru
nti
me
(s)
Transmit Receive Gather Apply Scatter
SkewedBaseline
SC 2015
Execution Time
19 Michael LeBeane 11/18/2015
▪ Connected Components
0
20
40
60
80
100
120
140
160
0
2
4
6
8
10
12
14
16
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
social_network amazon citation road_map wiki
Ru
nti
me
(s)
Ru
nti
me
(s)
Receive Gather Apply Scatter Transmit
SkewedBaseline
Right Axis
SC 2015
Execution Time
20 Michael LeBeane 11/18/2015
▪ Triangle Count
0
10
20
30
40
50
60
0
1
2
3
4
5
6R
ando
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
social_network amazon citation road_map wiki
Ru
nti
me
(s)
Ru
nti
me
(s)
Receive Gather Apply Scatter Transmit
Skewed
Baseline
Right Axis
SC 2015
Execution Time
21 Michael LeBeane 11/18/2015
▪ Stochastic Gradient Descent
0
2
4
6
8
10
12
14
16
18
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
netflix
Ru
nti
me
(s)
TX RX G A S
SkewedBaseline
0
10
20
30
40
50
60
70
80
90
100
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
netflix
Ru
nti
me
(s)
TX RX G A S
Skewed
Baseline
▪ Alternating Least Squares
SC 2015
Data distribution
22 Michael LeBeane 11/18/2015
▪ Ideal distribution 17-7-3-1
00.10.20.30.40.50.60.70.80.9
1S
Ran
do
m
SG
reed
y
SG
rid
SH
yb
rid
SG
inger
SR
ando
m
SG
reed
y
SG
rid
SH
yb
rid
SG
inger
SR
ando
m
SG
reed
y
SG
rid
SH
yb
rid
SG
inger
SR
ando
m
SG
reed
y
SG
rid
SH
yb
rid
SG
inger
SR
ando
m
SG
reed
y
SG
rid
SH
yb
rid
SG
inger
No
n-S
kew
Tar
get
-Skew
social_network amazon citation road_map wiki optimal
Rel
ativ
e E
dg
e D
istr
ibuti
on
Node (1) Node (3) Node (7) Node (17)
SC 2015
Results
23 Michael LeBeane 11/18/2015
▪ Skewed approach generally decreases network communication
0
0.5
1
1.5
2
2.5
3
3.5
4R
ando
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
social_network amazon citation road_map wiki
Rep
lica
tio
n F
acto
r
SkewedBaseline
SC 2015
Results
24 Michael LeBeane 11/18/2015
▪ Data Ingress Time
0
5
10
15
20
25
30
35
40R
ando
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
Ran
do
m
Gre
edy
Gri
d
Hy
bri
d
Gin
ger
social_network amazon citation road_map wiki
Ing
ress
Tim
e (s
)
SkewedBaseline
58
SC 2015
Scale-out Results
25 Michael LeBeane 11/18/2015
Configuration Name C4.2xlarge C4.4xlarge C4.8xlarge
Config 1 12 8 4
Config 2 8 8 8
Config 3 4 8 12
Config 4 3 5 16
0
5
10
15
20
25
30
35
Config 1 Config 2 Config 3 Config 4
Per
centa
ge
Impro
vem
ent
Cluster Size
Random Greedy Grid Hybrid Ginger0
20
40
60
80
100
120
140
160
180
200
0 10 20 30 40 50 60
Runti
me
(s)
Cluster Size
Random
SRandom
Greedy
SGreedy
Grid
SGrid
Hybrid
SHybrid
Ginger
SGinger
▪ Extremely large Twitter graph
▪ No benefits after 36 nodes
SC 2015
Future Work
26 Michael LeBeane 11/18/2015
▪ Incorporate better network model
▪ Profile based partitioning scheme
– How do we sample graph inputs?
SC 2015
Conclusion
27 Michael LeBeane 11/18/2015
▪ Simple, static throughput estimation can greatly improve
performance
▪ We modify 5 existing on-line graph partitioning strategies for
heterogeneous environments
▪ Our modified algorithms improve runtime by as much as 64% and
on average 32% on Amazon EC2
▪ We show that our strategies also work up to 48 nodes, achieving
18% performance improvement on scale-out
SC 2015
28 Michael LeBeane 11/18/2015
Thank You!
SC 2015
References
Michael LeBeane 11/18/201529
[1] S. Garg, S. Sundaram, and H. D. Patel. Robust heterogeneous data center design: A principled approach. SIGMETRICS Perform. Eval. Rev.,
39(3):28–30, Dec. 2011.
[2] B.-G. Chun, G. Iannaccone, G. Iannaccone, R. Katz, G. Lee, and L. Niccolini. An energy case for hybrid datacenters. SIGOPS Oper. Syst. Rev.,
4(1):76–80, Mar. 2010.
[1] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages
17–30. USENIX Association, 2012.