presentation on "mizan: a system for dynamic load balancing in large-scale graph...
Post on 18-Oct-2014
1.008 Views
Preview:
DESCRIPTION
TRANSCRIPT
Mizan: A System for Dynamic LoadBalancing in Large-scale Graph Processing
Zuhair Khayyat1 Karim Awara1 Amani Alonazi1
Hani Jamjoom2 Dan Williams2 Panos Kalnis1
1King Abdullah University of Science and TechnologyThuwal, Saudi Arabia
2IBM Watson Research CenterYorktown Heights, NY
1/34
Importance of Graphs
Graphs abstract application-specific algorithms into genericproblems represented as interactions using vertices and edges
Max flow in road network Diameter of WWW
Ranking in social networks Simulating protein interactions
Algorithms vary in their computational requirements
2/34
How to Scale Graph Processing
Pregel was introduced by Google as a scalable abstraction forgraph processing
Overcomes the limitations of processing graphs on MapReduce
Based on vertex-centric computation
Utilizes bulk synchronous parallel (BSP)
System Programming Data ComputationsAbstraction Exchange
map() Key basedMapReduce combine() grouping Disk based
reduce()
compute() MessagePregel combine() passing In memory
aggregate()
3/34
Pregel’s BSP Graph Processing
Superstep 1 Superstep 2 Superstep 3
Worker 3
Worker 2
Worker 1
Worker 3
Worker 2
Worker 1
Worker 3
Worker 2
Worker 1
BSP Barrier
Balanced computation and communication is fundamental toPregel’s efficiency
4/34
Optimizing Graph Processing
Existing work focus on optimizing for graph structure (staticoptimization):
Optimize graph partitioning:
Simple graph partitioning schemes (e.g., hash or range): Giraph
User-defined partitioning function: Pregel
Sophisticated partitioning techniques (e.g., min-cuts): GraphLab,PowerGraph, GoldenOrb and Surfer
Optimize graph data access:
Distributed data stores and graph indexing: GoldenOrb, Hamaand Trinity
What about algorithm behavior?
Pregel provides coarse-grained load balancing, is it enough?
5/34
Optimizing Graph Processing
Existing work focus on optimizing for graph structure (staticoptimization):
Optimize graph partitioning:
Simple graph partitioning schemes (e.g., hash or range): Giraph
User-defined partitioning function: Pregel
Sophisticated partitioning techniques (e.g., min-cuts): GraphLab,PowerGraph, GoldenOrb and Surfer
Optimize graph data access:
Distributed data stores and graph indexing: GoldenOrb, Hamaand Trinity
What about algorithm behavior?
Pregel provides coarse-grained load balancing, is it enough?
5/34
Types of Algorithms
Algorithms behaves differently, we classify algorithms in twocategories depending on their behavior
Algorithm Type In/Out Vertices ExamplesMessages State
PageRank,Stationary Predictable Fixed Diameter estimation,
WCC
Distributed minimumspanning tree,
Non-stationary Variable Variable Belief propagation,Graph queries,Ad propagation
6/34
Types of Algorithms – First Superstep
PageRank
V1
V3
V6
V8
DMST
V5
V2
V4
V7
V1
V3
V6
V8
V5
V2
V4
V7
7/34
Types of Algorithms – Superstep k
PageRank
V1
V3
V6
V8
DMST
V5
V2
V4
V7
V1
V3
V6
V8
V5
V2
V4
V7
8/34
Types of Algorithms – Superstep k +m
PageRank
V1
V3
V6
V8
DMST
V5
V2
V4
V7
V1
V3
V6
V8
V5
V2
V4
V7
9/34
What Causes Computation Imbalance inNon-stationary Algorithms?
Superstep 1 Superstep 2 Superstep 3
-Vertex response time
-Time to send out messages
-Time to receive in messages
Worker 3
Worker 2
Worker 1
Worker 3
Worker 2
Worker 1
Worker 3
Worker 2
Worker 1
BSP Barrier
10/34
Why to Optimize for Algorithm Behavior?
Difference between stationary and non-stationary algorithms
0.001
0.01
0.1
1
10
100
1000
0 10 20 30 40 50 60
In M
ess
ages
(Mill
ions)
SuperSteps
PageRank - TotalPageRank - Max/W
DMST - TotalDMST - Max/W
11/34
What is Mizan?
BSP-based graph processing framework
Uses runtime fine-grained vertex migrations to balancecomputation and communication
Follows the Pregel programming model
Open source, written in C++
12/34
PageRank’s compute() in Mizan
void compute(messageIterator * messages, userVertexObject * data,
messageManager * comm) {
double currVal = data->getVertexValue().getValue();
double newVal = 0; double c = 0.85;
if (data->getCurrentSS() > 1) {
while (messages->hasNext()) {
newVal = newVal + messages->getNext().getValue();
}
newVal = newVal * c + (1.0 - c) / ((double) vertexTotal);
data->setVertexValue(mDouble(newVal));
} else {newVal = currVal;}
if (data->getCurrentSS() <= maxSuperStep) {
mDouble outVal(newVal / ((double) data->getOutEdgeCount()));
for (int i = 0; i < data->getOutEdgeCount(); i++) {
comm->sendMessage(data->getOutEdgeID(i), outVal);
}
} else {data->voteToHalt();}
}
13/34
Migration Planning Objectives
Decentralized
Simple
Transparent
No need to change Pregel’s APIDoes not assume any a priori knowledge to graph structure oralgorithm
14/34
Mizan’s Migration Barrier
Mizan performs both planning and migrations after all workersreach the BSP barrier
Superstep 1 Superstep 2 Superstep 3
Worker 3
Worker 2
Worker 1
Worker 3
Worker 2
Worker 1
Worker 3
Worker 2
Worker 1
BSP BarrierMigration Barrier
Migration Planner
15/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance: By comparing the worker’sexecution time against a normal distribution and flagging outliers
Mizan monitors for each vertex:
Remote outgoing messages
All incoming messages
Response time
High level summaries are broadcast to each worker
V1
Worker 2Worker 1Remote Incoming Messages
Remote Outgoing Messages
VertexResponse Time
V3V2
V4
Mizan
V5V6
Mizan
Local Incoming Messages
16/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance: By comparing the worker’sexecution time against a normal distribution and flagging outliers
Mizan monitors for each vertex:
Remote outgoing messages
All incoming messages
Response time
High level summaries are broadcast to each worker
V1
Worker 2Worker 1Remote Incoming Messages
Remote Outgoing Messages
VertexResponse Time
V3V2
V4
Mizan
V5V6
Mizan
Local Incoming Messages
16/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance
2 Select the migration objective:
Mizan finds the strongest cause of workload imbalance bycomparing statistics for outgoing messages and incomingmessages of all workers with the worker’s execution time
The migration objective is either:
Optimize for outgoing messages, or
Optimize for incoming messages, or
Optimize for response time
17/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance
2 Select the migration objective:
Mizan finds the strongest cause of workload imbalance bycomparing statistics for outgoing messages and incomingmessages of all workers with the worker’s execution time
The migration objective is either:
Optimize for outgoing messages, or
Optimize for incoming messages, or
Optimize for response time
17/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance
2 Select the migration objective
3 Pair over-utilized workers with under-utilized ones:
All workers create and execute the migration plan in parallelwithout centralized coordination
Each worker is paired with one other worker at most
W7 W2 W1 W5 W8 W4 W0 W6 W3
0 1 2 3 4 5 6 7 8
W9
18/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance
2 Select the migration objective
3 Pair over-utilized workers with under-utilized ones
4 Select vertices to migrate: Depending on the migrationobjective
19/34
Mizan’s Migration Planning Steps
1 Identify the source of imbalance
2 Select the migration objective
3 Pair over-utilized workers with under-utilized ones
4 Select vertices to migrate
5 Migrate vertices:
How to migrate vertices freely across workers while maintainingvertex ownership and fast updates?
How to minimize migration costs for large vertices?
20/34
Vertex Ownership
Mizan uses distributed hash table (DHT) to implement adistributed lookup service:
V can execute at any worker
V ’s home worker ID = (hash(ID) mod N)
Workers ask the home worker of V for its current location
The home worker is notified with the new location as V migrates
V6
Worker 2Worker 1
V4V1
V1:W1 V4:W2 V7:W3
V3
V2
V2:W1 V5:W2 V8:W3 V3:W1 V6:W2 V9:W3
V9
V8
Worker 3
V5
V7
V5
V8
V9 V3V2 V1
DHT
Compute
Where is V5 & V8
21/34
Migrating Vertices with Large Message Size
Introduce delayed migration for very large vertices:At SS t: only ownership of vertex v is moved to workernew
At SS t + 1:workernew receives vertex 7 messages
workerold process vertex 7
After SS t + 1: workerold moves state of vertex 7 to workernew
1
2
3
4
5 6
7
1
2
3
4
5 6
7
'7
1
2
3
4
5 6
7
Superstep t Superstep t+1 Superstep t+2
Worker_new
Worker_old Worker_old
Worker_new Worker_new
Worker_old
22/34
Migrating Vertices with Large Message Size
Introduce delayed migration for very large vertices:At SS t: only ownership of vertex v is moved to workernew
At SS t + 1:workernew receives vertex 7 messages
workerold process vertex 7
After SS t + 1: workerold moves state of vertex 7 to workernew
1
2
3
4
5 6
7
1
2
3
4
5 6
7
'7
1
2
3
4
5 6
7
Superstep t Superstep t+1
Worker_new
Worker_old Worker_old
Worker_new Worker_new
Worker_old
22/34
Migrating Vertices with Large Message Size
Introduce delayed migration for very large vertices:At SS t: only ownership of vertex v is moved to workernew
At SS t + 1:workernew receives vertex 7 messages
workerold process vertex 7
After SS t + 1: workerold moves state of vertex 7 to workernew
1
2
3
4
5 6
7
1
2
3
4
5 6
7
'7
1
2
3
4
5 6
7
Superstep t Superstep t+1 Superstep t+2
Worker_new
Worker_old Worker_old
Worker_new Worker_new
Worker_old
22/34
Experimental Setup
Mizan is implemented on C++ with MPI with three variations:
Static Mizan: Emulates Giraph, disables dynamic migrationWork stealing (WS): Emulates Pregel’s coarse-graineddynamic load balancingMizan: Activates dynamic migration
Local cluster of 21 machines:
Mix of i5 and i7 processors with 16GB RAM Each
IBM Blue Gene/P supercomputer with 1024 compute nodes:
Each is a 4 core PowerPC450 CPU at 850MHz with 4GB RAM
23/34
Datasets
Experiments on public datasets:
Stanford Network Analysis Project (SNAP)The Laboratory for Web Algorithmics (LAW)Kronecker generator
name |nodes| |edges|
kg1 (synthetic) 1,048,576 5,360,368
kg4m68m (synthetic) 4,194,304 68,671,566
web-Google 875,713 5,105,039
LiveJournal1 4,847,571 68,993,773
hollywood-2011 2,180,759 228,985,632
arabic-2005 22,744,080 639,999,458
24/34
Giraph vs. Static Mizan
0
5
10
15
20
25
30
35
40
web-Googlekg1 LiveJournal1
kg4m68m
Run T
ime (
Min
)
GiraphStatic Mizan
Figure: PageRank on social networkand random graphs
0
2
4
6
8
10
12
1 2 4 8 16R
un T
ime (
Min
)Vertex count (Millions)
GiraphStatic Mizan
Figure: PageRank on regular randomgraphs, each has around 17M edge
25/34
Effectiveness of Dynamic Vertex Migration
PageRank on a social graph (LiveJournal1)
The shaded columns: algorithm runtime
The unshaded columns: initial partitioning cost
0
5
10
15
20
25
30
Sta
tic
WS
Miz
an
Sta
tic
WS
Miz
an
Sta
tic
WS
Miz
an
Run T
ime (
Min
)
MetisRangeHash
26/34
Effectiveness of Dynamic Vertex Migration
Comparing the performance on highly variable messaging patternalgorithms, which are DMST and ad propagation simulation, ona metis partitioned social graph (LiveJournal1)
0
50
100
150
200
250
300
Advertisment
DMST
Run T
ime (
Min
)
StaticWork Stealing
Mizan
27/34
Mizan’s Overhead with Scaling
0
1
2
3
4
5
6
7
8
0 2 4 6 8 10 12 14 16
Sp
eed
up
Compute Nodes
Mizan - hollywood-2011
Figure: Speedup on Linux Cluster
0
2
4
6
8
10
64
12
8
25
6
51
2
10
24
Sp
eed
up
Compute Nodes
Mizan - Arabic-2005
Figure: Speedup on IBMBlue Gene/P supercomputer
28/34
Future Work
Work with skewed graphs
Improving Pregel’s fault tolerance
29/34
Conclusion
Mizan is a Pregel system that uses fine-grained vertex migrationto load balance computation and communication acrosssupersteps
Mizan is an open source project developed within InfoCloudgroup in KAUST in collaboration with IBM, programmed inC++ with MPI
Mizan scales up to thousands of workers
Mizan improves the overall computation cost between 40% up totwo order of magnitudes with less than 10% migration overhead
Download it at: http://cloud.kaust.edu.sa
Try it on EC2 (ami-52ed743b)
30/34
Extra: Migration Details and Costs
0
0.5
1
1.5
2
2.5
3
3.5
4
5 10 15 20 25 30
Run T
ime (
Min
ute
s)
Supersteps
Superstep RuntimeAverage Workers Runtime
Migration CostBalance Outgoing MessagesBalance Incoming Messages
Balance Response Time
31/34
Extra: Migrated Vertices per Superstep
0
100
200
300
400
500
5 10 15 20 25 30 0
1
2
3
4M
igra
ted V
ert
ices
(x1
00
0)
Mig
rati
on C
ost
(M
inute
s)
Supersteps
Migration CostTotal Migrated VerticesMax Vertices Migrated
by Single Worker
32/34
Extra: Mizan’s Migration Planning
Compute()
No
Yes
No Yes
SuperstepImbalanced?
MigrationBarrier
BSPBarrier
WorkerOverutilized?
Pair withUnderutilized
Worker
Select Vertices
and MigrateIdentify Sourceof Imbalance
33/34
Extra: Architecture of Mizan
Communicator - DHT
Vertex Compute()
BSP Processor
Storage Manager
Migration Planner
HDFS/Local Disks
IO
Mizan Worker
34/34
top related