natcha simsiri, kanat tangwongsan, srikanta tirthapura ... · natcha simsiri, kanat tangwongsan,...

28
Work-efficient parallel union-find Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering Spring 2019, MIT

Upload: others

Post on 15-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Work-efficient parallel union-find

Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-LungWu

Presenter: Jessica Shi

6.886 Algorithm EngineeringSpring 2019, MIT

Page 2: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Introduction

Introduction

Jessica Shi Work-efficient parallel union-find 1 / 27

Page 3: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Introduction

Union-find

Union-find: Maintain a collection of disjoint sets supporting:

union(u, v): Combine sets containing u and v

find(v): Return set containing v

If u and v are in the same set, find(u) = find(v)

4

8 93

6

2 50

1 7

Jessica Shi Work-efficient parallel union-find 2 / 27

Page 4: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Introduction

Goal: Incremental graph connectivity

Incremental graph connectivity: Graph connectivity as edges areadded over time

find(0) = 1 =⇒ union(0, 3) =⇒ find(0) = 4

1

20

4

8 93

=⇒ 1

2 0

4

8 93

Jessica Shi Work-efficient parallel union-find 3 / 27

Page 5: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Introduction

Goal: Parallelization

Shared-memory parallelization:

Communication overhead in distributed setting

Multicore machines can store large graphs

Work-efficiency:

Guarantee worst-case performance

Jessica Shi Work-efficient parallel union-find 4 / 27

Page 6: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Introduction

Previous work

McColl et al. [1]: Parallel alg for fully dynamic connectivity

No theoretical bound

Manne and Patwary [2]: Parallel union-find alg for distributed setting

Patwary et al. [3]: Shared-memory parallel union-find alg

No theoretical bound

Shun et al. [4] and Gazit [5]: Work-efficient parallel alg forconnectivity

Only for static graphs

[1] McColl, Green, and Bader. 2013.[2] Manne and Patwary. 2010.[3] Patwary, Refsnes, and Manne. 2012.[4] Shun, Dhulipala, and Blelloch. 2014.[5] Gazit. 1991.

Jessica Shi Work-efficient parallel union-find 5 / 27

Page 7: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Introduction

Main results: Union-find

Simple parallel algorithm

b union/find: O(b log n

)work, O

(polylog(n)

)depth

O(n)

memory

Work-efficient parallel algorithm

m union, q find: O((m + q)α(m + q, n)

)total work,

O(polylog(m, n)

)depth

α = inverse Ackermann’s function

O(n)

memory

Implementation of simple parallel algorithm

Jessica Shi Work-efficient parallel union-find 6 / 27

Page 8: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Preliminaries

Preliminaries

Jessica Shi Work-efficient parallel union-find 7 / 27

Page 9: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Preliminaries

Preliminaries

Discretized stream input: Sequence of minibatches

Each minibatch consists of either union queries or find queries

Parallel subroutines:

Filter, prefix sum, map, pack: O(n)

work, O(log2 n

)depth

Duplicate removal: O(n)

work, O(log(2n)

)depth

Integer sort: ai ∈ [0,O(1)· n]: O

(n)

work, O(polylog(n)

)depth

Connectivity (static) [6]: O(|V |+ |E |

)work, O

(polylog(|V |, |E |)

)depth

[6] Shun, Dhulipala, and Blelloch. 2014.Jessica Shi Work-efficient parallel union-find 8 / 27

Page 10: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Preliminaries

Union by size

Always link tree with fewer vertices to tree with more vertices

Tree height O(log n

)Each union/find O

(log n

)size 4 size 6 =⇒ size 10

4

8 93

6

2 50

1 7

=⇒

6

2 50

1 7

4

8 93

Jessica Shi Work-efficient parallel union-find 9 / 27

Page 11: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Simple parallel algorithm

Simple parallel algorithm

Jessica Shi Work-efficient parallel union-find 10 / 27

Page 12: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Simple parallel algorithm

Simple parallel algorithm: Find

Parallel find: Perform finds in parallel

Read-only = no conflicts

Work: O(b log n

)Depth: O

(polylogn

)

Jessica Shi Work-efficient parallel union-find 11 / 27

Page 13: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Simple parallel algorithm

Simple parallel algorithm: Union

Safe to run multiple unions in parallel if they belong to different trees

Worst case: Star minibatch: (0, 1), (0, 2), (0, 3), . . . , (0, 7)

0 1 2 3

4 5 6 7

=⇒0

2 31 4 5 6 7

Jessica Shi Work-efficient parallel union-find 12 / 27

Page 14: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Simple parallel algorithm

Simple parallel algorithm: Union

Main idea: Doesn’t matter how we connect {0, . . . , 7}3 parallel rounds:

(0, 1), (2, 3), (4, 5), (6, 7) =⇒ (0, 2), (4, 6) =⇒ (0, 4)

0 2 4 6

1 3 5 7

=⇒

0

21

3

4

65

7

=⇒

0

2 41

5 6

7

3

Jessica Shi Work-efficient parallel union-find 13 / 27

Page 15: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Simple parallel algorithm

Simple parallel algorithm: Union

Parallel join: Recursively join tree roots, so that they are allconnected at the end

u ← parallel join on first half of roots

v ← parallel join on second half of roots

Return union(u, v)

Parallel union:

Relabel each (u, v) with the roots of u and v

Remove self-loops

Compute the connected components among our edge pairs

For each connected component (in parallel):

Parallel join the roots

Jessica Shi Work-efficient parallel union-find 14 / 27

Page 16: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Simple parallel algorithm

Simple parallel algorithm: Union

Parallel join:

Work: W (k) = 2W (k/2) + O(1)⇒ O

(k)

Depth: D(k) = D(k/2) + O(1)⇒ D(k) = O

(log k

)Parallel union: b unions:

Work: O(b log n

)Depth: O

(log max(b, n)

)

Jessica Shi Work-efficient parallel union-find 15 / 27

Page 17: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Preliminaries 2.0

Preliminaries 2.0

Jessica Shi Work-efficient parallel union-find 16 / 27

Page 18: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Preliminaries 2.0

Path compression

find(8)

6

1214

2 5

1 3 10

117

4

9

8

=⇒

6

1214

2 5

1 3

10 117

4 9 8

Path compression & union by size: Amortized O(α(n)

)union/find

Jessica Shi Work-efficient parallel union-find 17 / 27

Page 19: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Work-efficient algorithm

Work-efficient algorithm

Jessica Shi Work-efficient parallel union-find 18 / 27

Page 20: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Work-efficient algorithm

Work-efficient algorithm: Path compression

Parallel union: Same as in the simple parallel algorithm

Parallel find:

Find roots for all queries

BFS: When flows meet up, only one moves on (use removeduplicates)

Distribute roots back along BFS path for path compression

Response distributor

Jessica Shi Work-efficient parallel union-find 19 / 27

Page 21: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Work-efficient algorithm

Response distributor

Save all (from, to) pairs on BFS (F = set of all from values)

Must construct function that finds all pairs from f

Response distributor:

Hash all from values to range [3 · |F|]Integer sort ordered pairs by hashed from value

Create array A of length 3 · |F|+ 1 s.t. i th entry marks beginningof pairs where hashed from value is i

Work: O(|F|

), Depth: O

(polylog(|F|)

)Distributor function:

Hash f and use A to find all pairs from f

Work: O(|F|

), Depth: O

(log |F|

)Jessica Shi Work-efficient parallel union-find 20 / 27

Page 22: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Work-efficient algorithm

Work-efficient algorithm: Path compression

Parallel find:

Work: Given by number of nodes encountered in BFS

Depth: O(polylog(n)

)Note: There exists an ordering of find queries s.t. serial find producesthe same forest as parallel find, and traversal cost is equal

∴ work-efficient!

Jessica Shi Work-efficient parallel union-find 21 / 27

Page 23: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Implementation

Implementation

Jessica Shi Work-efficient parallel union-find 22 / 27

Page 24: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Implementation

Implementation

Simple parallel algorithm:

Simple path compression: After union minibatch, traverse treeone more time to distribute roots

Note: Does not give all benefits of path compression, esp withinminibatch

Connected components: Use alg by Blelloch et al. [7] (worsetheoretical bounds, good real-world perf)

[7] Blelloch et al. 2012.Jessica Shi Work-efficient parallel union-find 23 / 27

Page 25: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Implementation

Evaluation

Amazon EC2 instance, 20 cores (40 hyperthreaded)

Parallel overhead: 1.01x – 2.5x compared to seq w/o pathcompression

Speedup: 4.6x with b =500K, 9.4x with b =20M

Figure: Average throughput (edges per second) of batch union over number of threads, oflocal16 (left) and rMat16 (right)

Jessica Shi Work-efficient parallel union-find 24 / 27

Page 26: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Conclusion

Conclusion

Jessica Shi Work-efficient parallel union-find 25 / 27

Page 27: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Conclusion

Conclusion

Simple parallel algorithm

Work-efficient parallel algorithm

Implementation of simple parallel algorithm

Future work:

Implementation of work-efficient parallel algorithm

Switch algorithms depending on batch size:

Linear work in # of edges given large union batch (e.g.,DFS if all edges given in one batch – our alg is superlinear)

Fall back to union-find algorithm for smaller minibatch

Jessica Shi Work-efficient parallel union-find 26 / 27

Page 28: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura ... · Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu Presenter: Jessica Shi 6.886 Algorithm Engineering

Conclusion

Thank you!

Jessica Shi Work-efficient parallel union-find 27 / 27