a framework for processing large graphs in shared...

A Framework for Processing Large Graphs in Shared Memory

Julian Shun UC Berkeley (Miller Institute, AMPLab) Based on joint work with Guy Blelloch and Laxman Dhulipala, Kimon Fountoulakis, Michael Mahoney, and Farbod Roosta-Khorasani

Graphs are everywhere!

• Can contain up to billions of vertices and edges • Need simple, efficient, and scalable way to analyze them

2

Ligra Graph Processing Framework 3

EdgeMap VertexMap

Breadth-first search Betweenness centrality Connected components Triangle counting K-core decomposition Maximal independent set …

Single-source shortest paths Eccentricity estimation (Personalized) PageRank Local graph clustering Biconnected components Collaborative filtering …

Simplicity, Performance, Scalability

Graph Processing Systems

• Existing: Pregel/Giraph/GPS, GraphLab, PRISM, Pegasus, Knowledge Discovery Toolbox, GraphChi, GraphX, and many others…

• Our system: Ligra - Lightweight graph processing system for shared memory

4

Takes advantage of “frontier-based” nature of many algorithms (active set is dynamic and often small)

Breadth-first Search (BFS) • Compute a BFS tree rooted at source r containing all vertices reachable from r

5

• Can process each level in parallel • Race conditions, load balancing

Applications Betweenness centrality Eccentricity estimation Maximum flow Web crawlers Network broadcasting Cycle detection …

… …

…

…

.

Steps for Graph Traversal • Operate on a subset of vertices • Map computation over subset of edges in parallel • Return new subset of vertices • Map computation over subset of vertices in parallel

6

}

Graph

VertexSubset

EdgeMap

VertexMap

Think with flat data-parallel operators

We built the Ligra abstraction for these kinds of computations

Optimizations: •  hybrid traversal •  graph compression

Many graph traversal algorithms do this!

Shared memory parallelism

Breadth-first Search in Ligra parents = {-1, …, -1}; //-1 indicates “unexplored” procedure UPDATE(s, d):

return compare_and_swap(parents[d], -1, s); procedure COND(v):

return parents[v] == -1; //checks if “unexplored” procedure BFS(G, r):

parents[r] = r; frontier = {r}; //VertexSubset while (size(frontier) > 0): frontier = EDGEMAP(G, frontier, UPDATE, COND);

7

frontier

T T T T F

Actual BFS code in Ligra 8

Sparse or Dense EdgeMap? 9

11

10

9

12

13

15

14

1

4

3

2

5

8

7

6

Frontier •  Dense method better when

frontier is large and many vertices have been visited

•  Sparse (traditional) method better for small frontiers

•  Switch between the two methods based on frontier size [Beamer et al. SC ’12]

11

10

9

12

Limited to BFS?

EdgeMap 10

Loop through outgoing edges of frontier vertices in parallel

procedure EDGEMAP(G, frontier, Update, Cond): if (size(frontier) + sum of out-degrees > threshold) then: return EDGEMAP_DENSE(G, frontier, Update, Cond); else: return EDGEMAP_SPARSE(G, frontier, Update, Cond);

Loop through incoming edges of “unexplored” vertices (in parallel), breaking early if possible

•  More general than just BFS! •  Generalized to many other problems •  Users need not worry about this

Frontier-based approach enables sparse/dense traversal

11

0

2

4

6

8

10

12

14

BFS Betweenness Centrality

Connected Components

Eccentricity Estimation

40-c

ore

runn

ing

time

(sec

onds

)

Twitter graph (41M vertices, 1.5B edges)

Dense

Sparse

Sparse/Dense

30.7 20.7

(using default threshold of |E|/20)

PageRank 12

VertexMap 13

0 4 68VertexSubset

4

7

52

1

0

6

8

3

68VertexSubset

bool f(v){ data[v] = data[v] + 1; return (data[v] == 1); }

4

0

6

8

VertexMap

T F F T

PageRank in Ligra p_curr = {1/|V|, …, 1/|V|}; p_next = {0, …, 0}; diff = {}; error =∞; procedure UPDATE(s, d):

atomic_increment(p_next[d], p_curr[s] / degree(s)); return 1;

procedure COMPUTE(i):

p_next[i] = α · p_next[i] + (1- α) · (1/|V|); diff[i] = abs(p_next[i] – p_curr[i]); p_curr[i] = 0; return 1;

procedure PageRank(G, α, ε):

frontier = {0, …, |V|-1}; while (error > ε): frontier = EDGEMAP(G, frontier, UPDATE, CONDtrue); frontier = VERTEXMAP(frontier, COMPUTE); error = sum of diff entries; swap(p_curr, p_next) return p_curr;

14

PageRank • Sparse version?

• PageRank-Delta: Only update vertices whose PageRank value has changed by more than some Δ-fraction (discussed in GraphLab and McSherry WWW ‘05)

15

PageRank-Delta in Ligra

PR[i] = {1/|V|, …, 1/|V|}; nghSum = {0, …, 0}; Change = {}; //store changes in PageRank values procedure UPDATE(s, d): //passed to EdgeMap

atomic_increment(nghSum[d], Change[s] / degree(s)); return 1;

procedure COMPUTE(i): //passed to VertexMap

Change[i] = α · nghSum[i]; PR[i] = PR[i] + Change[i]; return (abs(Change[i]) > Δ); //check if absolute value of change is big enough

16

Performance of Ligra

17

Ligra BFS Performance 18

0

0.05

0.1

0.15

0.2

0.25

0.3

BFS

Run

ning

tim

e (s

econ

ds)


Ligra (40-core machine)

Hand-written Cilk/OpenMP (40-core machine)

0 20 40 60 80

100 120 140 160 180

BFS

Line

s of

cod

e • Comparing against direction-optimizing code by Beamer et al.

Ligra PageRank Performance 19

0

2

4

6

8

10

12

14

Page Rank (1 iteration)

Run

ning

tim

e (s

econ

ds)


GraphLab (64 x 8-cores)

GraphLab (40-core machine)


Hand-written Cilk/OpenMP (40-core machine) 0

10 20 30 40 50 60 70 80

Page Rank

Line

s of

cod

e

• Easy to implement “sparse” version of PageRank in Ligra

Ligra Connected Components Performance 20

0

2

4

6

8

10

12

14


Run

ning

tim

e (s

econ

ds)


GraphLab (16 x 8-cores)

GraphLab (40-core machine)


Hand-written Cilk/OpenMP (40-core machine) 0

50 100 150 200 250 300 350 400


Line

s of

cod

e

120 244

• Performance close to hand-written code •  Faster than existing high-level frameworks at the time • Shared-memory graph processing can be very efficient

•  Requires graph to fit on a single machine

Large Graphs 21

6.6 billion edges

128 billion edges

~1 trillion edges [VLDB 2015]

•  Most can fit on commodity shared memory machine

Amazon EC2

Ongoing Work: Local Graph Clustering 22

•  Significant savings if output cluster is much smaller than entire graph

•  We parallelized local algorithms using Ligra •  Algorithms are frontier-based, processing small active sets on

each iteration •  Ligra is crucial to getting local running time

Graph clustering (Community detection)

Local algorithms only touch small part of graph

Ligra Summary 23

EdgeMap VertexMap

Breadth-first search Betweenness centrality Connected components Triangle counting K-core decomposition Maximal independent set …

Single-source shortest paths Eccentricity estimation (Personalized) PageRank Local graph clustering Biconnected components Collaborative filtering …

Simplicity, Performance, Scalability

VertexSubset

Optimizations: Hybrid traversal and graph compression

24

Thank you!

Code: https://github.com/jshun/ligra/

a framework for processing large graphs in shared...

Documents