x-stream: a case study in building a graph processing systemey204/teaching/acs/r212... · x-stream:...

75
X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1

Upload: others

Post on 02-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

X-Stream: A Case Study in Building a Graph Processing

SystemAmitabha Roy

(LABOS)

1

Page 2: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

X-Stream

• Graph processing system

• Single Machine

• Works on graphs stored• Entirely in RAM

• Entirely in SSD

• Entirely on Magnetic Disk

• Generic• Can do all kinds of graph algorithms from BFS to triangle counting

• Paper, presentation slides and talk video from SOSP 2013 are online

2

Page 3: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

This talk …

• A brief history of X-Stream• November 2012 to SOSP camera ready

• Cover the details not in the SOSP text• Including bad design decisions

3

Page 4: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

• Toying with graph processing from an algorithms perspective

• Observed graph processing as an instance of SpMV

Y = XTA• X,Y are vertex state vectors. A is the adjacency matrix

• Operators are algorithm specific• Numerical operations for pagerank

• Reachability (and tree parent assignment) for BFS

• Do we know how to do sparse matrix multiplication efficiently ?

4

Page 5: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

• Yes ! Algorithms community had beaten the problem to death

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

-Bender et. al.

• Regretted not paying attention in grad school to complexity theory

• Isolated the good ideas from that paper • Cache the essentials (upper level of memory hierarchy, random access)

• Stream everything else (lower level of memory hierarchy, block transfer)

• Stream, don’t sort the edge list

5

Page 6: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

6

V(1) V(2) V(3)

Shard vertices to fit each chunk in cache

I/Os to memory: 𝑂 𝑃 = 𝑂(𝑉

𝑀)

V(P)

Page 7: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

7

Also partition edges (inverse merge sort)

E

V(1)

V(1)

E(1) I/Os: 𝑂(𝐸

𝐵log𝑀

𝐵

𝑉

𝑀)E(2)

V(2)

V(2)

Page 8: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

Do multiplication

E(1) X V(1) = U(1)

I/Os: 𝑂(𝐸

𝐵+

𝑉

𝐵+

𝑈

𝐵)

8

Page 9: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

Do shuffle

U(1)

I/Os: 𝑂(𝑈

𝐵log𝑀

𝐵

𝑉

𝑀)

V(2)U(2)

V(2)

9

Page 10: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

Do shuffle (x-split.hpp)

U(1)

I/Os: 𝑂(𝑈

𝐵log𝑀

𝐵

𝑉

𝑀)

V(2)

U’(2)V(2)

10

U(2)V(1)

U’(1)

V(1)

Page 11: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

Do shuffle (x-split.hpp)

U(1)

I/Os: 𝑂(𝑈

𝐵log𝑀

𝐵

𝑉

𝑀)

V(2)

U’(2)V(2)

11

U(2)V(1)

U’(1)

V(1)

Origin of the name X-Stream

Page 12: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

Do additions

U’(1) + V(1) = V’(1)

I/Os: 𝑂(𝑈

𝐵+𝑉

𝐵)

12

Page 13: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Nov 2012)

Total I/Os: 𝑉+𝐸

𝐵+

𝐸

𝐵log𝑀

𝐵

𝑉

𝑀

Bender et. al. tells us this is very close to the most efficient solution

13

Page 14: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Dec 2012)

• Yes, but….• Algorithmic complexity theory ignores constants

• Systems Research • Hypothesize

• Build

• Measure

• Quickly prototyped an SpMV implementation in C++

• Compared to Graphchi

14

Page 15: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Jan 2013)

• Results (BFS and pagerank) looked good

• Beat Graphchi by a huge margin

• Often finished faster than Graphchi finished producing shards !

• Now what ?

• Write a “systems” paper from an “algorithms” idea

15

Page 16: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Preliminary Ideas (~ Jan 2012)

• HotOS submission (Jan 10, 2013, 6 page paper)

• “Pitch” ?

• Graph processing systems spend a lot of time indexing data before processing it

• Here is a system that produces results from unordered “big-data”

• It works from main memory and disk

• Sketch of the system (minimal “complexity theory”)

• Results: Beats Graphchi for graphs on disk

• Results: Beats sorting the edge list for graphs in memory

16

Page 17: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

The next stage (~February 2013)

17

• X-Stream seems like a good idea

• Lets try to build and evaluate the full system

• Only thought about SOSP very vaguely

• Loads of code written that month (month of code )

• Made some arbitrary decisions that (we hoped) would not impact end result

Page 18: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Arbitrary Decision 1

• I/O path to disk

• Chose direct I/O. Great performance, controlled mem footprint

• Nightmare to implement properly (look at core/disk_io.hpp)

18

Option Buffers controlled by Overhead

read()/write() OS (pagecache) Copy

mmap OS (pagecache) Minor fault

Direct I/O You None

Page 19: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Arbitrary Decision 2

• Shuffle entirely in memory

• Greatly simplifies implementation• However this means ….

• One buffer per partition should fit in memory (at least 16 MB)

• Number of partitions bounded• Below: Have to fit vertex data of a partition into memory

• Above: Have to fit one buffer from each partition into memory

• Intersect covers large enough graphs (see sec 3.4 of SOSP paper)

19

Page 20: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Arbitrary Decision 3

• X-Stream targets any two level memory hierarchy

1. Disk/SSD + Main memory

2. Main memory + CPU cache

• Correct approach is to build two ‘X-Streams’ as independent pieces of software

• We instead decided to implicitly deal with a three level memory hierarchy in the code

• Disk/SSD + Main memory + CPU cache

• Does in-memory partitions of disk partitions !

20

Page 21: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Arbitrary Decision 3

• Why ?• Algorithmically elegant, same I/O complexity for any combination of two

levels in the hierarchy

• User does not need to worry about whether the graph fits in memory

• In the distant future PCM cache connections would be handled gracefully

• Why not ?• HORRIBLY complex (look at x-lib.hpp)

• Elegant complexity theory useless for a systems paper

• PCM is yet to arrive

21

Page 22: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

SOSP Submission ~ March 2013

• HotOS results arrived in March• Paper got rejected but …

• Review and PC explicitly said • Great set of ideas

• Almost got in

• Felt it was mature enough for a full conference rather than HotOS

• Decided to submit to SOSP at that point• Code base was stable, experiments were running, results were good

22

Page 23: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

SOSP Submission ~ March 2013

• Reworked “pitch” for SOSP submission

• De-emphasized algorithmic contributions

• De-emphasized ability to process unordered data

• Emphasized difference between sequential and random access bandwidth

• Called the execution model “edge-centric”

• Justified saying that it results in more sequential access

• Paper became very evaluation heavy

23

Page 24: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

SOSP Submission ~ March 2013

• Experimental evaluation critical to strength of a systems paper

• Carefully planned and executed experiments (~ 500 hours)

• Figure placeholders in the paper with expected results

• Tried to duplicate configurations in the cluster

~ 4 machines with 2x3TB drives each

1 machine with SSD

• 4x experimental throughput for the magnetic disk experiments

• SSD experiments slower as only one SSD

• Hence more magnetic disk results than SSD results

24

Page 25: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

April 2013

• Vacation• Burnt out

• Zero work

25

Page 26: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

May 2013

• Started thinking about more algorithms over X-Stream

• SOSP submission had• BFS, CC, SSSP, MIS, Pagerank, ALS.

• Could all be cast as SpMV and therefore fitted our execution model

• Wanted to go further: show that X-Stream model not limited• SCC

• Belief propagation

• Solution was to allow algorithm to generate new sparse matrices

26

Page 27: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

May 2013

• X-Stream implemented Y=XTA efficiently

• A was static • for graph G=(V, E) A = E, X = V

• Allowed X-Stream to generate matrices instead of vectors

B = X I A

• Very similar to SpMV

• Similar algorithmic complexity

• Equivalent to generating new edge list

27

Page 28: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

May 2013

• Divided algorithms on top of X-Stream into two categories

• Standard : BFS, CC, SSSP, Pagerank

• Special: BP, Triangle counting, SCC, MCST

• Special algorithms use a lower level interface that lets them create, manage and manipulate sparse matrices of O(E) non-zeros.

• Had to completely rewrite core X-Stream to support this

28

Page 29: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

June 2013

• Started preparing for possible resubmission to ASPLOS (July deadline)

• Added in more ”systemsy” features

• Primarily compression

• Added zlib compression

• Bad idea in retrospect • Zlib too slow to keep up with streaming speeds from RAIDED magnetic disks !!

• Software decompression < 200 MB/s

• RAID array, sequential access > 300 MB/s

29

Page 30: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

July – August 2013

• SOSP paper accepted

• SOSP camera ready deadline was September

• Diverted July and August to doing strategic extensions to X-Stream

• Worked with two summer interns• Intern 1: Added support to express algorithms in Python on X-Stream

• Intern 2: Added more algorithms, Triangle counting, BC, K-Cores, HyperANF

30

Page 31: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

August 2013 - September 2013

• SOSP camera ready

• Re-ran experiments

• Completely re-wrote paper, made it far clearer

• Interesting points:

• Yahoo webgraph did not work well, left it as such

• Kept in complexity analysis (hat-tip to X-Stream’s roots)

• Camera ready deadline 15 Sep

• Conference presentation Nov 3 (video online)

31

Page 32: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Conclusion

• Overview of a large systems project from concept to publication

• Many mistakes made, not apparent from finished paper

• Lots of people contributed• Willy Zwaenepoel, Ivo Mihailovic, Mia Primorac, Aida Amini

• What next ?• X-Stream could get us to a billion plus edges

• How about a trillion edges ?

• X-1: Scale out version

32

Page 33: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

BACKUP (SOSP slides)

33

Page 34: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

X-Stream

Process large graphs on a single machine

1U server = 64 GB RAM + 2 x 200 GB SSD + 3 x 3TB drive

34

Page 35: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Approach

• Problem: Graph traversal = random access

• Random access is inefficient for storage• Disk (500X slower)• SSD (20X slower)• RAM (2X slower)

Solution: X-Stream makes graph accesses sequential

35

Page 36: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Contributions

• Edge-centric scatter gather model

• Streaming partitions

36

Page 37: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Standard Scatter Gather

• Edge-centric scatter gather based on Standard Scatter gather

• Popular graph processing model

Pregel [Google, SIGMOD 2010]

Powergraph [OSDI 2012]

37

Page 38: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Standard Scatter Gather

• State stored in vertices

• Vertex operations• Scatter updates along outgoing edges• Gather updates from incoming edges

38

V V

Scatter Gather

Page 39: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

1 63

58

7

4

2

BFS

39

Standard Scatter Gather

Page 40: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Vertex-Centric Scatter Gather

• Iterates over vertices

40

for each vertex vif v has update

for each edge e from vscatter update along e

• Standard scatter gather is vertex-centric• Does not work well with storage

Scatter

Page 41: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

1 63

58

7

4

2

BFS

SOURCE DEST

1 3

1 5

2 7

2 4

3 2

3 8

4 3

4 7

4 8

5 6

6 1

8 5

8 6

V

1

2

3

4

5

6

7

8

Vertex-Centric Scatter Gather

Lookup Index

41

Page 42: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Transformation

42

for each vertex vif v has update

for each edge e from vscatter update along e

for each edge eIf e.src has update

scatter update along e

Vertex-Centric Edge-Centric

Scatter Scatter

Page 43: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

1 63

58

7

4

2

SOURCE DEST

1 3

1 5

2 7

2 4

3 2

3 8

4 3

4 7

4 8

5 6

6 1

8 5

8 6

V

1

2

3

4

5

6

7

8

BFS

Edge-Centric Scatter Gather

43

Page 44: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

SOURCE DEST

1 3

1 5

2 7

2 4

3 2

3 8

4 3

4 7

4 8

5 6

6 1

8 5

8 644

=

SOURCE DEST

1 3

8 6

5 6

2 4

3 2

4 7

4 3

3 8

4 8

2 7

6 1

8 5

1 5

No indexNo clusteringNo sorting

Page 45: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Tradeoff

Edge-centric Scatter-Gather: 𝑺𝒄𝒂𝒕𝒕𝒆𝒓𝒔×𝑬𝒅𝒈𝒆 𝑫𝒂𝒕𝒂

𝑺𝒆𝒒𝒖𝒆𝒏𝒕𝒊𝒂𝒍 𝑨𝒄𝒄𝒆𝒔𝒔 𝑩𝒂𝒏𝒅𝒘𝒊𝒅𝒕𝒉

Vertex-centric Scatter-Gather: 𝑬𝒅𝒈𝒆 𝑫𝒂𝒕𝒂

𝑹𝒂𝒏𝒅𝒐𝒎 𝑨𝒄𝒄𝒆𝒔𝒔 𝑩𝒂𝒏𝒅𝒘𝒊𝒅𝒕𝒉

45

• Sequential Access Bandwidth >> Random Access Bandwidth

• Few scatter gather iterations for real world graphs • Well connected, variety of datasets covered in the paper

Page 46: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Contributions

• Edge-centric scatter gather model

• Streaming partitions

46

Page 47: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Streaming Partitions

• Problem: still have random access to vertex setV

1

2

3

4

5

6

7

8

• Solution: partition the graph into streaming partitions47

Page 48: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Streaming Partitions

• A streaming partition is• A subset of the vertices that fits in RAM• All edges whose source vertex is in that subset• No requirement on quality of the partition

48

Page 49: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

V1

1

2

3

4

V2

5

6

7

8

SOURCE DEST

1 5

4 7

2 7

4 3

4 8

3 8

2 4

1 3

3 2

SOURCE DEST

5 6

8 6

8 5

6 1 49

Partitioning the Graph

Subset of vertices

Page 50: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

V1

1

2

3

4

50

Random Accesses for FreeSOURCE DEST

1 5

4 7

2 7

4 3

4 8

3 8

2 4

1 3

3 2

Page 51: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

V1

1

2

3

4

51

Generalization

Fast storage Slow storage

Applies to any two level memory hierarchy

SOURCE DEST

1 5

4 7

2 7

4 3

4 8

3 8

2 4

1 3

3 2

Page 52: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Generally Applicable

OR

Disk

OR

SSD RAM

RAM RAM CPU Cache

52

Page 53: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Parallelism

• Simple Parallelism

• State is stored in vertex

• Streaming partitions have disjoint vertices

•Can process streaming partitions in parallel

53

Page 54: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Gathering Updates

54

Edges Vertices

XX YVertices

YShuffler

Minimize random access for large number of partitionsMulti-round copying akin to merge sort but cheaper

Partition 1

Partition 100

Page 55: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Performance

• Focus on SSD results in this talk• Similar results with in-memory graphs

55

Page 56: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Baseline

• Graphchi [OSDI 2012]

• First to show that graph processing on a single machine• Is viable• Is competitive

• Also targets larger sequential bandwidth of SSD and Disk

56

Page 57: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Different Approaches

• Fundamentally different approaches to same goal

• Graphchi uses “shards”• Partitions edges into sorted shards

• X-Stream uses sequential scans • Partitions edges into unsorted streaming partitions

57

Page 58: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Baseline to Graphchi

• Replicated OSDI 2012 experiments on our SSD

InputCreate shards

ShardsRun Algorithm

Answer

InputRun Algorithm

Answer

Graphchi

X-Stream

58

Page 59: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

0 1 2 3 4 5 6

Netflix/ALS

Twitter/Pagerank

Twitter/Belief Propagation

RMAT27/WCC

X-Stream Speedup over Graphchi

59

Mean Speedup = 2.3

Page 60: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Baseline to Graphchi

• Replicated OSDI 2012 experiments on our SSD

InputCreate shards

ShardsRun Algorithm

Answer

InputRun Algorithm

Answer

Graphchi

X-Stream

60

Page 61: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

0 1 2 3 4 5 6

Netflix/ALS

Twitter/Pagerank

Twitter/Belief Propagation

RMAT27/WCC

X-Stream Speedup over Graphchi ( + sharding)

61

Mean Speedup Prev = 2.3Now = 3.7

Page 62: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

0

500

1000

1500

2000

2500

3000Ti

me

(sec

)

Graphchi Sharding

X-Stream runtime

Preprocessing Impact

62

X-Stream returns answers before Graphchi finishes sharding

Page 63: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Sequential Access Bandwidth

• Graphchi shard• All vertices and edges must fit in memory

• X-Stream partition• Only vertices must fit in memory

• More Graphchi shards than X-Stream partitions

• Makes access more random for Graphchi

63

Page 64: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

SSD Read Bandwidth (Pagerank on Twitter)

0

100

200

300

400

500

600

700

800

900

1000

Rea

d (

MB

/s)

5 minute window

X-Stream

Graphchi

64

Page 65: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

SSD Write Bandwidth (Pagerank on Twitter)

0

100

200

300

400

500

600

700

800

Wri

te (

MB

/s)

5 minute window

X-Stream

Graphchi

65

Page 66: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Disk Transfers (Pagerank on Twitter)

Metric X-Stream Graphchi

Data moved 224 GB 322 GB

Time taken 398 seconds 2613 seconds

Transfer rate 578 MB/s 126 MB/s

66

SSD can sustain reads = 667 MB/s, writes = 576 MB/sX-Stream uses all available bandwidth from the storage device

Page 67: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Scaling up

67

0:00:010:00:050:00:210:01:240:05:370:22:301:30:006:00:00

24:00:00

Tim

e (H

H:M

M:S

S)

Input Edge Data

Weakly Connected Components

16 GB RAM400 GB SSD

6 TB Disk

8 Million V, 128 Million E, 8 sec

256 Million V, 4 Billion E, 33 mins

4 Billion V, 64 Billion E, 26 hours

Page 68: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Conclusion

68

Big graphs

X-Stream

Good PerformanceRAM, SSD, Disk

Edge-centric processing+

Streaming Partitions=

Sequential Access

Download from http://labos.epfl.ch/xstream

Page 69: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

BACKUP

69

Page 70: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

API Restrictions

• Updates must be commutative

• Cannot access all edges from a vertex in single step

70

Page 71: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Applications

• X-Stream can solve a variety of problems

BFS, SSSP, Weakly connected components, Strongly connected components, Maximal independent sets, Minimum cost spanning trees, Belief propagation, Alternating least squares, Pagerank, Betweenness centrality, Triangle counting, Approximate neighborhood function, Conductance, K-Cores

Q. Average distance between people on a social network ?

A. Use approximate neighborhood function.

71

Page 72: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Edge-centric Scatter Gather

• Real world graphs have low diameter

1 6

3

8

7

4

25

1

2

3 4 5 6

7

8

D=3, BFS in 3 steps, Most real-world graphs

D=7, BFS in 7 steps

72

Page 73: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

X-Stream Main Memory Performance

0

20

40

60

80

100

1 2 4 8 16

Ru

nti

me

(s)

Low

er is

bet

ter

Threads

BFS (32M vertices/256M edges)

BFS-1 [HPC 2010]

BFS-2 [PACT 2011]

X-Stream

73

Page 74: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Runtime impact of Graphchi Sharding

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Netflix/ALS Twitter/Pagerank Twitter/BeliefPropagation

RMAT27/WCC

Frac

tio

n o

f R

un

tim

e

Benchmark

Graphchi Runtime Breakdown

Compute + I/O

Re-sort shard

74

Page 75: X-Stream: A Case Study in Building a Graph Processing Systemey204/teaching/ACS/R212... · X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1. X-Stream

Pre-processing Overhead

• Low overhead for producing streaming partition

• Strictly cheaper than sorting edges by source vertex

75