powergraph
TRANSCRIPT
22.06.2015 DIMA – TU Berlin 1
Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin
http://www.dima.tu-berlin.de/
Hot Topics in Information Management PowerGraph: Distributed Graph-Parallel
Computation on Natural Graphs
Igor Shevchenko
Mentor: Sebastian Schelter
22.06.2015 DIMA – TU Berlin 2
Agenda
1. Natural Graphs: Properties and Problems;
2. PowerGraph: Vertex Cut and Vertex Programs;
3. GAS Decomposition;
4. Vertex Cut Partitioning;
5. Delta Caching;
6. Applications and Evaluation;
Paper: Gonzalez at al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.
22.06.2015 DIMA – TU Berlin 3
■ Natural graphs are graphs derived from real-world or natural phenomena;
■ Graphs are big: billions of vertices and edges and rich metadata;
Natural graphs have
Power-Law Degree Distribution
Natural Graphs
22.06.2015 DIMA – TU Berlin 4
Power-Law Degree Distribution
(Andrei Broder et al. Graph structure in the web)
22.06.2015 DIMA – TU Berlin 5
■ We want to analyze natural graphs;
■ Essential for Data Mining and Machine Learning;
Goal
Identify influential people and information; Identify special nodes and communities; Model complex data dependencies;
Target ads and products; Find communities; Flow scheduling;
22.06.2015 DIMA – TU Berlin 6
■ Existing distributed graph computation systems
perform poorly on natural graphs (Gonzalez et al.
OSDI ’12);
■ The reason is presence of high degree vertices;
Problem
High Degree Vertices: Star-like motif
22.06.2015 DIMA – TU Berlin 7
Possible problems with high degree vertices:
■ Limited single-machine resources;
■ Work imbalance;
■ Sequential computation;
■ Communication costs;
■ Graph partitioning;
Applicable to:
■ Hadoop; GraphLab; Pregel (Piccolo);
Problem Continued
22.06.2015 DIMA – TU Berlin 8
■ High degree vertices can exceed the memory capacity of a single machine;
■ Store edge meta-data and adjacency information;
Problem: Limited Single-Machine Resources
22.06.2015 DIMA – TU Berlin 9
■ The power-law degree distribution can lead to significant work imbalance and frequency barriers;
■ For ex. with synchronous execution (Pregel):
Problem: Work Imbalance
22.06.2015 DIMA – TU Berlin 10
■ No parallelization of individual vertex-programs;
■ Edges are processed sequentially;
■ Locking does not scale well to high degree vertices (for ex. in GraphLab);
Problem: Sequential Computation
Sequentially process edges Asynchronous execution requires heavy locking
22.06.2015 DIMA – TU Berlin 11
■ Generate and send large amount of identical messages (for ex. in Pregel);
■ This results in communication asymmetry;
Problem: Communication Costs
22.06.2015 DIMA – TU Berlin 12
■ Natural graphs are difficult to partition;
■ Pregel and GraphLab use random (hashed) partitioning on natural graphs thus maximizing the network communication;
Problem: Graph Partitioning
22.06.2015 DIMA – TU Berlin 13
■ Natural graphs are difficult to partition;
■ Pregel and GraphLab use random (hashed) partitioning on natural graphs thus maximizing the network communication;
Expected edges that are cut
Examples:
■ 10 machines:
■ 100 machines:
Problem: Graph Partitioning Continued
= number of machines
90% of edges cut;
99% of edges cut;
22.06.2015 DIMA – TU Berlin 14
■ GraphLab and Pregel are not well suited for computations on natural graphs;
Reasons:
■ Challenges of high-degree vertices;
■ Low quality partitioning;
Solution:
■ PowerGraph new abstraction;
In Summary
22.06.2015 DIMA – TU Berlin 15
PowerGraph
22.06.2015 DIMA – TU Berlin 16
Two approaches for partitioning the graph in a distributed environment:
■ Edge Cut;
■ Vertex Cut;
Partition Techniques
22.06.2015 DIMA – TU Berlin 17
■ Used by Pregel and GraphLab abstractions;
■ Evenly assign vertices to machines;
Edge Cut
22.06.2015 DIMA – TU Berlin 18
■ Used by PowerGraph abstraction;
■ Evenly assign edged to machines;
Vertex Cut The strong point of the paper
4 edges 4 edges
22.06.2015 DIMA – TU Berlin 19
Think like a Vertex
[Malewicz et al. SIGMOD’10]
User-defined Vertex-Program:
1. Runs on each vertex;
2. Interactions are constrained by graph structure;
Pregel and GraphLab also use this concept, where
parallelism is achieved by running multiple vertex
programs simultaneously;
Vertex Programs
22.06.2015 DIMA – TU Berlin 20
■ Vertex cut distributes a single vertex-program across several machines;
■ Allows to parallelize high-degree vertices;
GAS Decomposition The strong point of the paper
22.06.2015 DIMA – TU Berlin 21
Generalize the vertex-program into three phases:
1. Gather
Accumulate information about neighborhood;
2. Apply
Apply accumulated value to center vertex;
3. Scatter
Update adjacent edges and vertices;
GAS Decomposition
Gather, Apply and Scatter are user-defined functions;
The strong point of the paper
22.06.2015 DIMA – TU Berlin 22
■ Executed on the edges in parallel;
■ Accumulate information about neighborhood;
Gather Phase
22.06.2015 DIMA – TU Berlin 23
■ Executed on the central vertex;
■ Apply accumulated value to center vertex;
Apply Phase
22.06.2015 DIMA – TU Berlin 24
■ Executed on the neighboring vertices in parallel;
■ Update adjacent edges and vertices;
Scatter Phase
22.06.2015 DIMA – TU Berlin 25
■ Vertex-programs that are written using GAS
decomposition will automatically scale to several
machines; How does it work?
GAS Decomposition
22.06.2015 DIMA – TU Berlin 26
GAS in a Distributed Environment
22.06.2015 DIMA – TU Berlin 27
■ Case with 2 machines;
GAS in a Distributed Environment
22.06.2015 DIMA – TU Berlin 28
■ Compute partial sums on each machine;
Gather Phase
22.06.2015 DIMA – TU Berlin 29
■ Send partial sum to the master machine;
■ Master machine computes the total sum;
Gather Phase
22.06.2015 DIMA – TU Berlin 30
■ Apply accumulated value to center vertex;
■ Replicate value to the mirrors;
Apply Phase
22.06.2015 DIMA – TU Berlin 31
■ Update adjacent edges and vertices;
■ Initiate neighboring vertex-programs if necessary;
Scatter Phase
22.06.2015 DIMA – TU Berlin 32
■ During the Gather Phase the partial results are
combined using commutative and associative
user-defined SUM operation;
■ Examples:
sum(a, b): return a + b
sum(a, b): return union(a, b)
sum(a, b): return min(a, b)
■ Also a requirement for Pregel combiners;
■ What if not commutative and associative?
SUM Operation
22.06.2015 DIMA – TU Berlin 33
■ If not commutative and associative sum;
■ Send each edge data to the master machine;
■ Increases communication amount on Gather:
Gather Phase: no partial sums
22.06.2015 DIMA – TU Berlin 34
Vertex Cut Partitioning
The strong point of the paper
22.06.2015 DIMA – TU Berlin 35
Three distributed approaches for Vertex Cut:
■ Random Edge Placement;
■ Coordinated Greedy Edge Placement;
■ Oblivious Greedy Edge Placemen;
Vertex Cut Partitioning
Minimize machines
spanned by each vertex
Minimize communication
and storage overhead =
22.06.2015 DIMA – TU Berlin 36
■ Randomly assign edges to machines;
Random Edge Placement
22.06.2015 DIMA – TU Berlin 37
Random Edge Placement
■ Randomly assign edges to machines;
22.06.2015 DIMA – TU Berlin 38
Random Edge Placement
■ Randomly assign edges to machines;
■ Edge data is uniquely assigned to one machine
22.06.2015 DIMA – TU Berlin 39
■ Only 3 network communication channels;
■ Can predict network communication usage;
■ Significantly less communication comparing to the Edge Cut graph placement;
■ Can improve upon random placement!
Communication Overhead
22.06.2015 DIMA – TU Berlin 40
■ Place edges on machines which already has the vertices in that edge;
Greedy Edge Placement
22.06.2015 DIMA – TU Berlin 41
■ If several choices are possible, assign to the least loaded machine;
Greedy Edge Placement
22.06.2015 DIMA – TU Berlin 42
■ Greedy Edge Placement is de-randomization;
■ Minimizes the number of machines spanned;
Coordinated Greedy Edge Placement:
■ Requires coordination to place each edge;
■ Maintains global distributed placement table;
■ Slower but produces higher quality cuts;
Oblivious Greedy Edge Placement:
■ Approx. greedy objective without coordination;
■ Faster but produces lower quality cuts;
Greedy Edge Placement
22.06.2015 DIMA – TU Berlin 43
■ Twitter Follower Graph: 41M vertices, 1.4B edges;
■ Oblivious Greedy Edge Placement balances cost (replication factor) and construction time;
Vertex Cut Partitioning: Comparison
22.06.2015 DIMA – TU Berlin 44
■ Greedy Edge Placement improves computation performance;
Vertex Cut Partitioning: Comparison
22.06.2015 DIMA – TU Berlin 45
Delta Caching
Execution Modes
22.06.2015 DIMA – TU Berlin 46
■ Vertex-program can be triggered in response to a change only in a few of its neighbors;
■ In response Gather Phase will accumulate information about the all neighborhood;
Delta Caching The strong point of the paper
22.06.2015 DIMA – TU Berlin 47
■ Accelerate the process by caching neighborhood accumulators from previous gather phase;
Delta Caching The strong point of the paper
22.06.2015 DIMA – TU Berlin 48
Delta Caching can speed up:
■ Gather Phase;
■ Scatter Phase;
Requires Abelian Group;
■ sum (+)
■ inverse (−)
Examples:
■ Page Rank – applicable;
■ Graph Coloring – not applicable;
Delta Caching
Commutative and associative
The strong point of the paper
22.06.2015 DIMA – TU Berlin 49
Supports three execution modes:
■ Synchronous: Bulk-Synchronous GAS Phases;
■ Asynchronous: Interleave GAS Phases;
■ Asynchronous Serializable: Prevent neighboring
vertices to run simultaneously;
Different tradeoffs:
■ Algorithm performance;
■ System performance;
■ Determinism;
Execution Modes
22.06.2015 DIMA – TU Berlin 50
Evaluation
22.06.2015 DIMA – TU Berlin 51
PowerGraph on the natural graphs shows:
■ Reduced network communication;
■ Reduced runtime;
■ Reduced storage;
On many examples
Evaluation
PageRank on the Twitter Follower Graph (40M Users, 1.4 Billion Links)
22.06.2015 DIMA – TU Berlin 52
■ Collaborative Filtering
Alternating Least Squares
Stochastic Gradient Descent
SVD
Non-negative MF
■ Statistical Inference
Loopy Belief Propagation
Max-Product Linear Programs
Gibbs Sampling
Applicability
■ Graph Analytics
PageRank
Triangle Counting
Shortest Path
Graph Coloring
K-core Decomposition
■ Computer Vision
Image stitching
■ Language Modeling
LDA
22.06.2015 DIMA – TU Berlin 53
■ Vertex Cut;
■ GAS Decomposition;
■ Delta Caching;
■ Three modes of execution;
Synchronous;
Asynchronous;
Asynchronous + Serializable;
Strong Points of the Paper
22.06.2015 DIMA – TU Berlin 54
■ “In all cases the system is entirely symmetric with no single coordinating instance or scheduler”;
How do they deal with Synchronous execution?
Evaluation mess:
■ Evaluated Synchronous execution using PageRank;
■ Evaluated Asynchronous execution using GraphColoring;
■ Evaluated Asynchronous+S execution using GraphColoring;
■ Compared PowerGraph with published results again using PageRank, Triangle Counting but not GraphColoring;
■ Oblivious Greedy Edge Placement is poorly explained;
Weak Points of the Paper
22.06.2015 DIMA – TU Berlin 55
■ Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, Carlos Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2012);
■ Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In SIGMOD (2010).
■ Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. in PVLDB (2012).
■ http://graphlab.org
References
22.06.2015 DIMA – TU Berlin 56
Questions?
1. Natural Graphs: Properties and Problems;
2. PowerGraph: Vertex Cut and Vertex Programs;
3. GAS Decomposition;
4. Vertex Cut Partitioning;
5. Delta Caching;
6. Applications and Evaluation;
Paper: Gonzalez at al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.