from theory to practice: efficient join query processing in a parallel database system shumo chu,...

25
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE, University of Washington

Upload: erika-cain

Post on 19-Jan-2016

242 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

From Theory to Practice: Efficient Join Query Processing in a

Parallel Database System

Shumo Chu, Magdalena Balazinska and Dan SuciuDatabase Group, CSE, University of Washington

Page 2: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

2

In industry and science, users need to analyze large datasets

Myria: Parallel DBMS developed at UW

New class of queries

Two key differences: Multiple tables need to be joined Query structure may be cyclic

Motivation

Knowledge base exploration

Social network analysis: find all triangles

Page 3: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

Traditional Parallel Join Evaluation

Shuffle A, B on y

AB

Worker 1

AB

Worker 2

AB

Worker 3

A’B’

Worker 1

A’B’

Worker 2

A’B’

Worker 3

A’⋈B’

c’

Worker 1

A’⋈B’

c’

Worker 2

A’⋈B’

c’

Worker 3

⋈Shuffle A⋈B, C on (x, z)

3

A⋈B⋈C

A

B

C

Solution 1: Shuffle on joined attributes

Large intermediate result Skew on shuffle

Solution 2: keep largest table, broadcast others

Page 4: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

Background: HyperCube (Shares) Shuffle

A

B C

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

CA

C

B

CA

C

B

CA

C

B

……

P

worke

rs

A(x1, y1) (h1(x1), h2(y1), *) P1/3 replication

B(y1, z1) (*, h2(y1), h3(z1)) P1/3 replication

C(z1, x1) (h1(x1), * , h3(z1)) P1/3 replication

4

Afrati and Ullman EDBT10

Beame et. PODS13

P1/3

P1/3

P1/3

x

y

z

Page 5: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

5

Single Node Multiway Join• Join algorithm with optimal

guarantees • Leapfrog TrieJoin by Veldhuizen,

2014• Minesweeper by Ngo etc, 2014

• Pipeline of joins Single multiway join

• Tributary Join : Leapfrog TrieJoin in Myria

• A multiway sort-merge join on steroid

• Avoid constructing tries compared with Leapfrog

x y

2 0

2 1

2 3

3 4

4 2

5 6

y z

0 1

2 0

2 3

3 4

4 2

5 6

x z

0 2

1 0

2 4

3 2

4 3

6 5

A B C

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

Page 6: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

6

Questions

Empirical study of HyperCube shuffle and Tributary join

HyperCube configuration optimization

Tributary join cost model and attribute order optimization

Page 7: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

7

Empirical StudyMyria deployment with 64 workers.

Shuffle paradigms: Regular shuffles HyperCube shuffle Broadcast

Local join algorithms: Symmetric hash join Tributary join

Parallel semi-join

Evaluate 8 queries on Twitter social graph and Freebase

Page 8: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

8

Triangle Query on Twitter

Query: T(x, y, z) :- A(x, y), B(y, z), C(z, x)

Dataset: Sampled twitter social network graph with 1 million

edges (follower:int, followee:int)

Page 9: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

9

Triangle Query: Data Shuffling HyperCube Shuffle (12M Total)

A B

A⋈B C

A⋈B⋈C

#: 1MSkew:1.35

#: 1MSkew:1.72

#: 51MSkew:20.8

#: 1MSkew: 1.01

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

Regular Shuffle (54M Total)

A B C

A⋈B⋈C# 4MSkew: 1.06

# 4MSkew: 1.06

# 4MSkew: 1.06

Broadcast (142M, no skew)

Page 10: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

10

Triangle Query: Runtime

Query Runtime (Sec)

Shuffle paradigm:HyperCube < Broadcast < Regular

Sequential join:Tributary Join < Hash Join

T(x, y, z) :- A(x, y), B(y, z), C(z, x)

HyperCube BroadcastRegular

Page 11: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

11

Query 2: Knowledge Base Exploration Query

Query: Show the full cast members of all films starring both Joe Pesci and Robert de Niro

• Dataset: FreeBase RDF, data is partitioned into separate tables by its predicate

CastMember(cast):- ActorName(a1, “Joe Pesci”), ActorPerform(a1, p1), PerformFilm(p1, film), ActorName(a2, “Robert de Niro”), ActorPerform(a2, p2), PerformFilm(p2, film), PerformFilm(p, film), ActorPerform(p, cast)

Page 12: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

12

Freebase Query: Data Shuffling

Regular shuffles: 7M tuples

HyperCube shuffle:105M tuples (16x replication)

Broadcast: 351M tuples (50x replication)

R1 R2

R3⋈

R3

R5

R6

R7

R8

26

1.09M

1.09M

1.10M

1.10M

2

1.09M

1.10M

660

660

25.2K

25.2K

140

10.3K

Regular shuffle

8-way join on freebase 1

Page 13: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

13

Knowledge Exploration in Freebase

Comparing shuffle paradigms:

Regular < HyperCube < Broadcast

Comparing sequential join algorithms:

Hash join < Tributary joinQuery Runtime (sec)

8-way join on freebase

Page 14: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

14

Empirical Study Summary

The best query plan depends on query, data and cluster Size of intermediate result Replication factor of HyperCube

Large intermediate results favor HyperCube and Tributary Join Small communication Small input Reducing

sorting time

Page 15: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

15

Optimizing HyperCube Shuffle

Optimization goal: minimizing maximum load of single worker

Example: Q1 with 64 workers 4x4x4 is better than 2x4x8

What if we have 63 workers or a 7 way join?

State of the art: Linear Programming (BeameKS, PODS13) If |A| = |B| = |C| = N, 63 servers, optimal is 3.98 x 3.98 x 3.98

The penalty of rounding down is non-negligible 3x3x3 only use 27 servers out of 63

Page 16: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

16

A Simple Yet Effective Algorithm for HyperCube Configuration

Algorithm:1. Enumerate all the hypercube configurations with

number of servers ≤ P

2. find the configuration with minimal shuffle cost

Tie-breaking heuristic: 1x16 vs 4x4

Best configuration of previous example: 3x4x5

Page 17: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

17

Evaluation of HyperCube Optimization

Compare different configuration algorithms Our Algorithm Rounding down Random (many virtual servers real servers)

Opt. Ratio: Max Load / Optimal (by LP Solution)

Our algorithm outperforms rounding down and random, with at most 1.06 optimality ratio

Page 18: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

18

More in the paper

Tributary join cost model and attribute order optimization

Evaluation of more queries

Comparison with parallel semi-join plans

Open source implementation in Myria:https://github.com/uwescience/myria

Page 19: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

19

Conclusions

Efficient parallel join query evaluation - break down the gap between theory and practice:

Select the best parallel query plan Shuffle paradigm Sequential join algorithm

Optimal HyperCube configuration

Optimizing Tributary join attribute order

Page 20: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

20

Thanks! Myria Team

Page 21: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

21

Conclusions

Efficient parallel join query evaluation - break down the gap between theory and practice:

Select the best parallel query plan Shuffle paradigm Sequential join algorithm

Optimal HyperCube configuration

Optimizing Tributary join attribute order

Page 22: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

22

Query execution profiling

PerfOpticon: the visual query profiling tool used in Myria

Page 23: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

23

Cost Model Explained query:

Number of binary searches in first attribute:

Number of binary searches in a joined attribute:

The total cost

Page 24: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

24

Why random HyperCube cell allocation is bad?

Query:A(x, y, z, p) :- S(x, y), R(y, z), T(z, p)

64 cells, 8 x 8 hypercube of cells, randomly allocate cells to 4 servers

Server 1 will receive 7/8 of S (1/2 if optimal) 1/4 of R 7/8 of T (1/2 if optimal)

Page 25: From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

Myria: new generation parallel DBMS

MyriaX

Coordinator

REST Server

Worker Catalog

Catalog

JSON query plans & other instructions

RDBMS

Worker Catalog

RDBMS

Worker Catalog

RDBMS

HDFS HDFS HDFS

Shared-nothing cluster

Primary data store:

Can also ingest data

from:25