dependency-driven disk-based graph processing...•monotonic selection (kickstarter[asplos’17])...

Post on 13-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dependency-Driven Disk-based Graph ProcessingKeval Vora

Simon Fraser University

https://github.com/pdclab/lumos

USENIX ATC’19 - Renton, WAJuly 11, 2019

Graph Processing12/20

Iterative Graph Processing13/20PageRank computation

Iterative Graph Processing

ab

c d

e

f

ab

c d

e

f

iteration t iteration t+1

13/20

Out-of-Core Graph Processing

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =a b c d e fV =

P0 P1P0 P1

14/20iteration t

iteration t+1

partitions reside on disk

Out-of-Core Graph Processing

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =a b c d e fV =

iteration t

iteration t+1

P0 P1 P0 P1

14/20

Out-of-Core Graph Processing

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =a b c d e fV =

iteration t

iteration t+1

P0 P1 P0 P1

14/20

Out-of-Core Graph Processing

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

14/20

I/O C

Out-of-Core Graph Processing

I/O C I/O C I/O C

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

14/20

compute

I/O C

Out-of-Core Graph Processing

I/O C I/O C I/O C

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C I/O C

I/O = 69-90%

14/20

I/O C

Lumos

I/O C

ae

b

cd ea

db

c f

a b c d e fV =• Out-of-Order processing

• Amortize I/O across multiple iterations

• Guarantee Bulk Synchronous Parallel Semantics

• Dependency-Driven Cross-Iterationvalue propagation• Safely compute future values

iteration t

P0 P1

15/20

Computing Future Values

ae

b

cd

ae

b

cd

c f

ea

db

c f

ea

db

VF =

a b c d e fV =

a b c d e f

iteration t

iteration t+1

a b c d e fV =

P0 P0 P1

I/O C I/O C I/O C I/O C

P1

16/20future vertex values

Computing Future Values

b

ae

b

cd

c

ea

db

c f

ea

b

f

d

VF =

a b c d e fV =

a b c d e f

ae

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C I/O C

16/20

I/O I/O

Computing Future Values

b

ae

b

cd

c

ea

db

c f

ea

b

f

d

VF =

a b c d e fV =

a b c d e f

ae

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C C

16/20

I/O

Computing Future Values

b

ae

b

cd

c

e

db

c f

ea

b

f

d

VF =

a b c d e fV =

a b c d e f

ae

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C C

16/20

I/O

Computing Future Values

b

ae

b

cd

c

e

db

c f

ea

b

f

d

VF =

a b c d e fV =

a b c d e f

ae

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C C

16/20

future dependencies not met

Computing Future Values

0

15

30

45

UK

TW TT FT UK

TW TT FT UK

TW TT FT

16 64 256Chunk Cyclic Random

% V

erti

ces

Edges Vertices

UK 1B 39.5M

TW 1.5B 41.7M

TT 2B 52.6M

FT 2.5B 68.3M

0

1

2

3

4

UK

TW TT FT UK

TW TT FT UK

TW TT FT

16 64 256Chunk Cyclic Random

% E

dges

17/2030-45% vertices compute future values, and they contribute for less than 4% edges

Graph Computation

f

Partial Aggregation

18/20

Graph Computation

f

Partial Aggregation

18/20

Graph Computation

f

Partial Aggregation

18/20

Graph Computation

f

Partial Aggregation

18/20

Cross-Iteration Value Propagation

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

P0 P0 P1

I/O C I/O C I/O C I/O C

P1

19/20

Cross-Iteration Value Propagation

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

P0 P0 P1

I/O C I/O C I/O C I/O C

P1

19/20

I/O

Cross-Iteration Value Propagation

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

P0 P0

I/O C I/O C I/O C I/O C

P1 P1

19/20

Cross-Iteration Value Propagation

ae

b

cd

ae

b

cdea

db

c f

ea

db

c f

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

P0 P0

I/O C I/O C I/O C

P1 P1

19/20

I/O C

Cross-Iteration Value Propagation

0

15

30

45

UK

TW TT FT UK

TW TT FT UK

TW TT FT

16 64 256Chunk Cyclic Random

% E

dges

10/20

future values computed across >45% edges

Intra-Partition Propagation

ae

b

cd

ae

b

cdea

db

c f

e

d

f

a b c d e fV =

V = a b c d e f

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/OI/O C I/O C C CI/OI/O

11/20

Intra-Partition Propagation

ae

b

cd

ae

b

cdea

db

c f

e

d

f

a b c d e fV =

V = a b c d e f

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C C CI/OI/O

11/20

Lumos: Cross-Iteration Value Propagation

• Propagation across multiple future iterations• Synergy with Dynamic Shards [ATC’16]• Detailed comparison with out-of-order techniques• Generalized graph layout & partitioning strategies• Cross-iteration propagation across different partition sizes• Interplay with selective scheduling

• Lumos for Asynchronous Algorithms

12/20

Lumos for Asynchronous Algorithms

ae

b

cd

ae

b

cdea

db

c f

e

d

f

a b c d e fV =a b c d e fV =

V = a b c d e f

iteration t

iteration t+113/20shortest paths computation

Lumos for Asynchronous Algorithms

ae

b

cd

ae

b

cdea

db

c f

e

d

f

a b c d e fV =a b c d e fV =

V = a b c d e f

iteration t

iteration t+113/20

Lumos for Asynchronous Algorithms

ae

b

cd ea

db

c f

a b c d e fV =

V = a b c d e f • Monotonic selection (KickStarter [ASPLOS’17])

iteration t

13/20

Lumos for Asynchronous Algorithms

ae

b

cd ea

db

c f

a b c d e fV =

V = a b c d e f • Monotonic selection (KickStarter [ASPLOS’17])• Partial aggregations merged directly

iteration t

13/20

Lumos for Asynchronous Algorithms

ae

b

cd ea

db

c f

a b c d e fV =

• Monotonic selection (KickStarter [ASPLOS’17])• Partial aggregations merged directly

• Partial values safe to be propagated• Any change can be propagated immediately

iteration t

13/20

Lumos for Asynchronous Algorithms

ae

b

cd ea

db

c f

a b c d e fV =

• Monotonic selection (KickStarter [ASPLOS’17])• Partial aggregations merged directly

• Partial values safe to be propagated• Any change can be propagated immediately

iteration t

13/20

Lumos for Asynchronous Algorithms

ae

b

cd ea

db

c f

a b c d e fV =

• Monotonic selection (KickStarter [ASPLOS’17])• Partial aggregations merged directly

• Partial values safe to be propagated• Any change can be propagated immediately

• Intra-partition propagation based on degree of asynchrony (ASPIRE [OOPSLA’14])it

eration t

13/20

• Grid layout based on GridGraph [ATC’15]

Lumos System

source

target

14/20

Lumos System

source

target

source =target =

14/20

source

target

Lumos System14/20

future values propagated for upper triangle + diagonal edges

I/O C

V =

V =

V =

I/O C

Lumos Systemiteration t

iteration t+1

14/20only lower triangle edges loaded

I/O C

V =

V =

V =

I/O C

Lumos Systemiteration t

iteration t+1

14/20

Light-weight Degree-Aware PartitioningHOF: Highest Out-Degree FirstHIL: Highest In-Degree LastHRF: Highest Out-Degree to In-Degree Ratio First

source

target

15/20

maximize edges in upper triangle

Light-weight Degree-Aware Partitioning

0

25

50

75

100

16 32 64 128

256 16 32 64 128

256 16 32 64 128

256

TT FT YHHOF HIL HRF

% E

dges

# Partitions

HOF: Highest Out-Degree FirstHIL: Highest In-Degree LastHRF: Highest Out-Degree to In-Degree Ratio First

Edges Vertices

TT 2B 52.6M

FT 2.5B 68.3M

YH 6.6B 1.4B

15/20

cross-iteration propagation across 72-93% edges

Experimental Setup

• Graph algorithms• PageRank (weighted and unweighted), Belief Propagation,

Co-Training Expectation Maximization, Dispersion, Label Propagation

• Performance on single disk• h1.2xlarge: 278MB/sec HDD read bandwidth

• Scaling I/O• d2.4xlarge: 195-768MB/sec read bandwidth over 1-4 HDDs• i3.8xlarge: 1.2-4.1GB/sec read bandwidth over 1-4 SSDs

16/20

Performance

0

0.2

0.4

0.6

0.8

1G

ridG

raph

LUM

OS

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

TT YH

PR CoEM DP BP WPR LP PR CoEM DP BP WPR LPN

orm

aliz

ed T

ime

h1.2xlarge: 278MB/sec HDD read bandwidth

Edges Vertices

TT 2B 52.6M

YH 6.6B 1.4B

17/20

Performance

0

0.2

0.4

0.6

0.8

1G

ridG

raph

LUM

OS

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

Gri

dGra

phLU

MO

S

TT YH

PR CoEM DP BP WPR LP PR CoEM DP BP WPR LP

Read Compute

Nor

mal

ized

Tim

e

h1.2xlarge: 278MB/sec HDD read bandwidth

Edges Vertices

TT 2B 52.6M

YH 6.6B 1.4B

17/20

Scaling I/O

0

2K

4K

1 2 3 4

Exe

cuti

on T

ime

# Drives

HDD SSD

Edges Vertices

RMAT29 8.6B 537M

DeviceType

Single Drive

RAID-0 with k drives

k = 2 k = 3 k = 4

d2.4xlarge HDD 195MB/s 368MB/s 590MB/s 768MB/s

i3.8xlarge SSD 1.2GB/s 3.8GB/s 4.1GB/s 3.9GB/s

18/20

0

25

50

75

100

16 32 64 128

256 16 32 64 128

256 16 32 64 128

256

TT FT YHHOF HIL HRF

% E

dges

# Partitions

Detailed Evaluation of Lumos

0 50

100 150 200 250

00:00 01:30

Rea

ds (

MB

/s)

Time

1 2

4 8

16

0 200 400 600 800

1000

00:00 01:30

I/O

Wai

t T

ime

(ms)

Time

1 2

4 8

16

0 0.2 0.4 0.6 0.8

1 1.2 1.4 1.6

GG LBL

GG LBL

GG LBL

GG LBL

TW TT FT YHDegrees OrderLayouts

Nor

mal

ized

Tim

e

0 0.2 0.4 0.6 0.8

1 1.2

GG LBL

GG LBL

GG LBL

GG LBL

TW TT FT YHPrimary Secondary

Nor

mal

ized

Spa

ce

0

0.2

0.4

0.6

0.8

1

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

GG LBL

TT FT YHPR CoEM DP BP WPR LP PR CoEM DP BP WPR LP PR CoEM DP BP WPR LP

Read Compute

Nor

mal

ized

Tim

e

19/20

Conclusion

• Dependency-Driven Cross-iteration value propagation • Out-of-Order processing to reduce I/O• Guarantees Bulk Synchronous Parallel semantics

• Generic technique that fundamentally eliminates the performance barrier

github.com/pdclab/lumos

top related