from theory to practice: efficient join query processing in a parallel database system shumo chu,...

From Theory to Practice: Efficient Join Query Processing in a

Parallel Database System

Shumo Chu, Magdalena Balazinska and Dan SuciuDatabase Group, CSE, University of Washington

2

In industry and science, users need to analyze large datasets

Myria: Parallel DBMS developed at UW

New class of queries

Two key differences: Multiple tables need to be joined Query structure may be cyclic

Motivation

Knowledge base exploration

Social network analysis: find all triangles

Traditional Parallel Join Evaluation

Shuffle A, B on y

AB

Worker 1

AB

Worker 2

AB

Worker 3

A’B’

Worker 1

⋈

A’B’

Worker 2

⋈

A’B’

Worker 3

⋈

A’⋈B’

c’

Worker 1

⋈

A’⋈B’

c’

Worker 2

⋈

A’⋈B’

c’

Worker 3

⋈Shuffle A⋈B, C on (x, z)

3

A⋈B⋈C

A

⋈

B

C

⋈

Solution 1: Shuffle on joined attributes

Large intermediate result Skew on shuffle

Solution 2: keep largest table, broadcast others

Background: HyperCube (Shares) Shuffle

A

⋈

B C

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

CA

C

B

CA

C

B

CA

C

B

……

P

worke

rs

A(x1, y1) (h1(x1), h2(y1), *) P1/3 replication

B(y1, z1) (*, h2(y1), h3(z1)) P1/3 replication

C(z1, x1) (h1(x1), * , h3(z1)) P1/3 replication

4

Afrati and Ullman EDBT10

Beame et. PODS13

P1/3

P1/3

P1/3

x

y

z

5

Single Node Multiway Join• Join algorithm with optimal

guarantees • Leapfrog TrieJoin by Veldhuizen,

2014• Minesweeper by Ngo etc, 2014

• Pipeline of joins Single multiway join

• Tributary Join : Leapfrog TrieJoin in Myria

• A multiway sort-merge join on steroid

• Avoid constructing tries compared with Leapfrog

x y

2 0

2 1

2 3

3 4

4 2

5 6

y z

0 1

2 0

2 3

3 4

4 2

5 6

x z

0 2

1 0

2 4

3 2

4 3

6 5

A B C

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

6

Questions

Empirical study of HyperCube shuffle and Tributary join

HyperCube configuration optimization

Tributary join cost model and attribute order optimization

7

Empirical StudyMyria deployment with 64 workers.

Shuffle paradigms: Regular shuffles HyperCube shuffle Broadcast

Local join algorithms: Symmetric hash join Tributary join

Parallel semi-join

Evaluate 8 queries on Twitter social graph and Freebase

8

Triangle Query on Twitter

Query: T(x, y, z) :- A(x, y), B(y, z), C(z, x)

Dataset: Sampled twitter social network graph with 1 million

edges (follower:int, followee:int)

9

Triangle Query: Data Shuffling HyperCube Shuffle (12M Total)

A B

A⋈B C

A⋈B⋈C

#: 1MSkew:1.35

#: 1MSkew:1.72

#: 51MSkew:20.8

#: 1MSkew: 1.01

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

Regular Shuffle (54M Total)

A B C

A⋈B⋈C# 4MSkew: 1.06

# 4MSkew: 1.06

# 4MSkew: 1.06

Broadcast (142M, no skew)

10

Triangle Query: Runtime

Query Runtime (Sec)

Shuffle paradigm:HyperCube < Broadcast < Regular

Sequential join:Tributary Join < Hash Join

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

HyperCube BroadcastRegular

11

Query 2: Knowledge Base Exploration Query

Query: Show the full cast members of all films starring both Joe Pesci and Robert de Niro

• Dataset: FreeBase RDF, data is partitioned into separate tables by its predicate

CastMember(cast):- ActorName(a1, “Joe Pesci”), ActorPerform(a1, p1), PerformFilm(p1, film), ActorName(a2, “Robert de Niro”), ActorPerform(a2, p2), PerformFilm(p2, film), PerformFilm(p, film), ActorPerform(p, cast)

12

Freebase Query: Data Shuffling

Regular shuffles: 7M tuples

HyperCube shuffle:105M tuples (16x replication)

Broadcast: 351M tuples (50x replication)

R1 R2

R3⋈

R3

R5

R6

R7

R8

⋈

⋈

⋈

⋈

⋈

⋈

26

1.09M

1.09M

1.10M

1.10M

2

1.09M

1.10M

660

660

25.2K

25.2K

140

10.3K

Regular shuffle

8-way join on freebase 1

13

Knowledge Exploration in Freebase

Comparing shuffle paradigms:

Regular < HyperCube < Broadcast

Comparing sequential join algorithms:

Hash join < Tributary joinQuery Runtime (sec)

8-way join on freebase

14

Empirical Study Summary

The best query plan depends on query, data and cluster Size of intermediate result Replication factor of HyperCube

Large intermediate results favor HyperCube and Tributary Join Small communication Small input Reducing

sorting time

15

Optimizing HyperCube Shuffle

Optimization goal: minimizing maximum load of single worker

Example: Q1 with 64 workers 4x4x4 is better than 2x4x8

What if we have 63 workers or a 7 way join?

State of the art: Linear Programming (BeameKS, PODS13) If |A| = |B| = |C| = N, 63 servers, optimal is 3.98 x 3.98 x 3.98

The penalty of rounding down is non-negligible 3x3x3 only use 27 servers out of 63

16

A Simple Yet Effective Algorithm for HyperCube Configuration

Algorithm:1. Enumerate all the hypercube configurations with

number of servers ≤ P

2. find the configuration with minimal shuffle cost

Tie-breaking heuristic: 1x16 vs 4x4

Best configuration of previous example: 3x4x5

17

Evaluation of HyperCube Optimization

Compare different configuration algorithms Our Algorithm Rounding down Random (many virtual servers real servers)

Opt. Ratio: Max Load / Optimal (by LP Solution)

Our algorithm outperforms rounding down and random, with at most 1.06 optimality ratio

18

More in the paper

Tributary join cost model and attribute order optimization

Evaluation of more queries

Comparison with parallel semi-join plans

Open source implementation in Myria:https://github.com/uwescience/myria

19

Conclusions

Efficient parallel join query evaluation - break down the gap between theory and practice:

Select the best parallel query plan Shuffle paradigm Sequential join algorithm

Optimal HyperCube configuration

Optimizing Tributary join attribute order

20

Thanks! Myria Team

21

Conclusions

Efficient parallel join query evaluation - break down the gap between theory and practice:

Select the best parallel query plan Shuffle paradigm Sequential join algorithm

Optimal HyperCube configuration

Optimizing Tributary join attribute order

22

Query execution profiling

PerfOpticon: the visual query profiling tool used in Myria

23

Cost Model Explained query:

Number of binary searches in first attribute:

Number of binary searches in a joined attribute:

The total cost

24

Why random HyperCube cell allocation is bad?

Query:A(x, y, z, p) :- S(x, y), R(y, z), T(z, p)

64 cells, 8 x 8 hypercube of cells, randomly allocate cells to 4 servers

Server 1 will receive 7/8 of S (1/2 if optimal) 1/4 of R 7/8 of T (1/2 if optimal)

Myria: new generation parallel DBMS

MyriaX

Coordinator

REST Server

Worker Catalog

Catalog

…

JSON query plans & other instructions

RDBMS

Worker Catalog

RDBMS

Worker Catalog

RDBMS

HDFS HDFS HDFS

Shared-nothing cluster

Primary data store:

Can also ingest data

from:25

from theory to practice: efficient join query processing in a parallel database system shumo chu,...

Documents

tributary join hash

way multiway join

freebase query

shuffle paradigms

int77triangle query

shuffle ab

query structure

xregular shuffle