a framework for processing large graphs in shared...
TRANSCRIPT
A Framework for Processing Large Graphs in Shared Memory
Julian Shun UC Berkeley (Miller Institute, AMPLab) Based on joint work with Guy Blelloch and Laxman Dhulipala, Kimon Fountoulakis, Michael Mahoney, and Farbod Roosta-Khorasani
Graphs are everywhere!
• Can contain up to billions of vertices and edges • Need simple, efficient, and scalable way to analyze them
2
Ligra Graph Processing Framework 3
EdgeMap VertexMap
Breadth-first search Betweenness centrality Connected components Triangle counting K-core decomposition Maximal independent set …
Single-source shortest paths Eccentricity estimation (Personalized) PageRank Local graph clustering Biconnected components Collaborative filtering …
Simplicity, Performance, Scalability
Graph Processing Systems
• Existing: Pregel/Giraph/GPS, GraphLab, PRISM, Pegasus, Knowledge Discovery Toolbox, GraphChi, GraphX, and many others…
• Our system: Ligra - Lightweight graph processing system for shared memory
4
Takes advantage of “frontier-based” nature of many algorithms (active set is dynamic and often small)
Breadth-first Search (BFS) • Compute a BFS tree rooted at source r containing all vertices reachable from r
5
• Can process each level in parallel • Race conditions, load balancing
Applications Betweenness centrality Eccentricity estimation Maximum flow Web crawlers Network broadcasting Cycle detection …
… …
…
…
.
Steps for Graph Traversal • Operate on a subset of vertices • Map computation over subset of edges in parallel • Return new subset of vertices • Map computation over subset of vertices in parallel
6
}
Graph
VertexSubset
EdgeMap
VertexMap
Think with flat data-parallel operators
We built the Ligra abstraction for these kinds of computations
Optimizations: • hybrid traversal • graph compression
Many graph traversal algorithms do this!
Shared memory parallelism
Breadth-first Search in Ligra parents = {-1, …, -1}; //-1 indicates “unexplored” procedure UPDATE(s, d):
return compare_and_swap(parents[d], -1, s); procedure COND(v):
return parents[v] == -1; //checks if “unexplored” procedure BFS(G, r):
parents[r] = r; frontier = {r}; //VertexSubset while (size(frontier) > 0): frontier = EDGEMAP(G, frontier, UPDATE, COND);
7
frontier
T T T T F
Actual BFS code in Ligra 8
Sparse or Dense EdgeMap? 9
11
10
9
12
13
15
14
1
4
3
2
5
8
7
6
Frontier • Dense method better when
frontier is large and many vertices have been visited
• Sparse (traditional) method better for small frontiers
• Switch between the two methods based on frontier size [Beamer et al. SC ’12]
11
10
9
12
Limited to BFS?
EdgeMap 10
Loop through outgoing edges of frontier vertices in parallel
procedure EDGEMAP(G, frontier, Update, Cond): if (size(frontier) + sum of out-degrees > threshold) then: return EDGEMAP_DENSE(G, frontier, Update, Cond); else: return EDGEMAP_SPARSE(G, frontier, Update, Cond);
Loop through incoming edges of “unexplored” vertices (in parallel), breaking early if possible
• More general than just BFS! • Generalized to many other problems • Users need not worry about this
Frontier-based approach enables sparse/dense traversal
11
0
2
4
6
8
10
12
14
BFS Betweenness Centrality
Connected Components
Eccentricity Estimation
40-c
ore
runn
ing
time
(sec
onds
)
Twitter graph (41M vertices, 1.5B edges)
Dense
Sparse
Sparse/Dense
30.7 20.7
(using default threshold of |E|/20)
PageRank 12
VertexMap 13
0 4 68VertexSubset
4
7
52
1
0
6
8
3
68VertexSubset
bool f(v){ data[v] = data[v] + 1; return (data[v] == 1); }
4
0
6
8
VertexMap
T F F T
PageRank in Ligra p_curr = {1/|V|, …, 1/|V|}; p_next = {0, …, 0}; diff = {}; error =∞; procedure UPDATE(s, d):
atomic_increment(p_next[d], p_curr[s] / degree(s)); return 1;
procedure COMPUTE(i):
p_next[i] = α · p_next[i] + (1- α) · (1/|V|); diff[i] = abs(p_next[i] – p_curr[i]); p_curr[i] = 0; return 1;
procedure PageRank(G, α, ε):
frontier = {0, …, |V|-1}; while (error > ε): frontier = EDGEMAP(G, frontier, UPDATE, CONDtrue); frontier = VERTEXMAP(frontier, COMPUTE); error = sum of diff entries; swap(p_curr, p_next) return p_curr;
14
PageRank • Sparse version?
• PageRank-Delta: Only update vertices whose PageRank value has changed by more than some Δ-fraction (discussed in GraphLab and McSherry WWW ‘05)
15
PageRank-Delta in Ligra
PR[i] = {1/|V|, …, 1/|V|}; nghSum = {0, …, 0}; Change = {}; //store changes in PageRank values procedure UPDATE(s, d): //passed to EdgeMap
atomic_increment(nghSum[d], Change[s] / degree(s)); return 1;
procedure COMPUTE(i): //passed to VertexMap
Change[i] = α · nghSum[i]; PR[i] = PR[i] + Change[i]; return (abs(Change[i]) > Δ); //check if absolute value of change is big enough
16
Performance of Ligra
17
Ligra BFS Performance 18
0
0.05
0.1
0.15
0.2
0.25
0.3
BFS
Run
ning
tim
e (s
econ
ds)
Twitter graph (41M vertices, 1.5B edges)
Ligra (40-core machine)
Hand-written Cilk/OpenMP (40-core machine)
0 20 40 60 80
100 120 140 160 180
BFS
Line
s of
cod
e • Comparing against direction-optimizing code by Beamer et al.
Ligra PageRank Performance 19
0
2
4
6
8
10
12
14
Page Rank (1 iteration)
Run
ning
tim
e (s
econ
ds)
Twitter graph (41M vertices, 1.5B edges)
GraphLab (64 x 8-cores)
GraphLab (40-core machine)
Ligra (40-core machine)
Hand-written Cilk/OpenMP (40-core machine) 0
10 20 30 40 50 60 70 80
Page Rank
Line
s of
cod
e
• Easy to implement “sparse” version of PageRank in Ligra
Ligra Connected Components Performance 20
0
2
4
6
8
10
12
14
Connected Components
Run
ning
tim
e (s
econ
ds)
Twitter graph (41M vertices, 1.5B edges)
GraphLab (16 x 8-cores)
GraphLab (40-core machine)
Ligra (40-core machine)
Hand-written Cilk/OpenMP (40-core machine) 0
50 100 150 200 250 300 350 400
Connected Components
Line
s of
cod
e
120 244
• Performance close to hand-written code • Faster than existing high-level frameworks at the time • Shared-memory graph processing can be very efficient
• Requires graph to fit on a single machine
Large Graphs 21
6.6 billion edges
128 billion edges
~1 trillion edges [VLDB 2015]
• Most can fit on commodity shared memory machine
Amazon EC2
Ongoing Work: Local Graph Clustering 22
• Significant savings if output cluster is much smaller than entire graph
• We parallelized local algorithms using Ligra • Algorithms are frontier-based, processing small active sets on
each iteration • Ligra is crucial to getting local running time
Graph clustering (Community detection)
Local algorithms only touch small part of graph
Ligra Summary 23
EdgeMap VertexMap
Breadth-first search Betweenness centrality Connected components Triangle counting K-core decomposition Maximal independent set …
Single-source shortest paths Eccentricity estimation (Personalized) PageRank Local graph clustering Biconnected components Collaborative filtering …
Simplicity, Performance, Scalability
VertexSubset
Optimizations: Hybrid traversal and graph compression
24
Thank you!
Code: https://github.com/jshun/ligra/