selectivity-based partitioning

29
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz

Upload: lan

Post on 08-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Selectivity-Based Partitioning. Alkis Polyzotis UC Santa Cruz. Query Optimization. Integral component of declarative query processing Key problem: join ordering Most important (and most complex!) module of a DBMS. R 1 R 2 R 3 R 4. Parser. R 1 R 2 R 3 R 4. Optimizer. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Selectivity-Based Partitioning

Selectivity-Based Partitioning

Alkis PolyzotisUC Santa Cruz

Page 2: Selectivity-Based Partitioning

Parser

Optimizer

ExecutionEngine

R1 R2 R3 R4

R1 R2 R3 R4

( (R2 R3) R1) R4

Query Optimization

• Integral component of declarative query processing

• Key problem: join ordering

• Most important (and most complex!) module of a DBMS

Page 3: Selectivity-Based Partitioning

“Monolithic” Query Optimization• Output: a single join order based on join selectivities between tables

Plan: (P E) D

Page 4: Selectivity-Based Partitioning

Partition-Based Query Optimization• Output: multiple join orders based on selectivities between fragments of tables

Plan: ( (P D2) E ) ( (E D1) P )

Page 5: Selectivity-Based Partitioning

Selectivity-Based Partitioning• Divide-and-Union paradigm • Optimization problem and analysis

• Partitioning algorithm• Experimental results

Page 6: Selectivity-Based Partitioning

Roadmap

• Preliminaries• Problem Definition• Partitioning Algorithm

• Optimal Splits• Iterative Partitioning

• Experimental Results• Conclusions

Page 7: Selectivity-Based Partitioning

Data and Query Model

• Chain-join queries• Example: R1 R2 R3 R4

• Relations may have optional selections• Relation Frequency matrix• Left-deep evaluation plans

• Example: R3 R2 R4 R1

R3 R2

R4

R1

Page 8: Selectivity-Based Partitioning

Problem Definition

• Given: query Q, maximum partition count N• Goal: find partitioning of Q in nN partitions that minimizes query cost

• On-the-fly partitioning vs. Off-line partitioning

• Difficult optimization problem!• Determine the pivot relation• Determine the number of partitions• Compute a partitioning of the pivot• Determine the orderings of partitioned plans

R1 R2 R3 R4 R1 R21 R4 R3

R3 R22 R1 R4

Page 9: Selectivity-Based Partitioning

Query Cost Function

• One possibility: optimizer’s cost model• Accurate cost estimation• Solution depends on low-level system details

• Difficult to gain intuitions• Our approach: query cost = number of intermediate results• Simple function that admits analysis• Sound connections to realistic cost models (Cluet and Moerkotte, ICDT’95)

Cost(R3 R2 R4 R1 ) = |R3 R2| + |R3 R2 R4|

Page 10: Selectivity-Based Partitioning

Roadmap

• Preliminaries• Problem Definition• Partitioning Algorithm

• Optimal Splits• Iterative Partitioning

• Experimental Results• Conclusions

Page 11: Selectivity-Based Partitioning

Partitioning Algorithm - Overview• State space: partitioned join orders

• Partitioning algorithm:• Explore a set of states• Compute optimal partitioning for each state• Return global optimum

• Our approach: order joins then partition• Another possibility: partition then order joins

Page 12: Selectivity-Based Partitioning

Distributing Tuples

• Goal: Distribute tuples to minimize cost

• Optimal distribution depends on:• Frequency matrices of other relations• Position (m,l)

Page 13: Selectivity-Based Partitioning

Optimal Split Theorem

• Distribute each value (m,l) independently

• Place (m,l) in partition that minimizes g(L,T,m,l)

Page 14: Selectivity-Based Partitioning

Partitioning Algorithm - Overview• State space: partitioned join orders

• Partitioning algorithm:• Explore a set of states• Compute optimal partitioning for each state• Return global optimum

Page 15: Selectivity-Based Partitioning

Search Algorithm

• Exhaustive search is impractical [ Pivot, Leading orders, Trailing orders ]

• Search heuristics:• Tighter search space:

[ Pivot, Optimal Leading orders ]• Iterative Partitioning• Guided search by using lower bounds on cost of partitions

Page 16: Selectivity-Based Partitioning

Encoding of State Space

• State: [ Pivot , Optimal leading orders ]

• Transition: insert relation in a leading order

Page 17: Selectivity-Based Partitioning

R5 R1

R3 R4 R5

Iterative Partitioning

• Key idea: (Partition, Optimize)+• Compute optimal split for leading/trailing orders

• Optimize trailing orders for the current split

• Theorem: query cost can only decrease

• Idea extended to more detailed cost models

R1

R3 R4

R2R21

R22

R3 R5 R4

R1 R5

R21

R22

Leading Trailing

Page 18: Selectivity-Based Partitioning

Search Algorithm

• Initial states: single-relation leading orders

• Search process:• Compute partitions with IP• Open more states with transition function

• Transitions are guided by lower bound on cost function

• Same lower bound can also prune states

• Stopping criteria:• Search space is exhausted• Time budget is exhausted

Page 19: Selectivity-Based Partitioning

System Integration

Parser

Optimizer

ExecutionEngine

Parser

Optimizer

ExecutionEngine

Partitioner

Monolithic Partition-based

Page 20: Selectivity-Based Partitioning

Roadmap

• Preliminaries• Problem Definition• Partitioning Algorithm

• Optimal Splits• Iterative Partitioning

• Experimental Results• Conclusions

Page 21: Selectivity-Based Partitioning

Effect of Skew

0

10

20

30

40

50

60

0.5 1 1.5 2Data Skew

Avg. Reduction (%)

ComputePartitionOptimalPartition

Synthetic Data

Page 22: Selectivity-Based Partitioning

Execution Time

0.1

1

10

100

1000

2 3 4Maximum Partition Count

Execution Time (sec)

IterativeOptimal

Synthetic Data (Skew=1.5)

Page 23: Selectivity-Based Partitioning

Varying Time Budget

05101520253035404550

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Time (seconds)

Avg. Reduction (%)

Q_HQ_L

Synthetic Data (Skew=1.5)

Page 24: Selectivity-Based Partitioning

Results on Real-Life Data

0

5

10

15

20

25

30

Q0 Q1 Q2 Q3Query

Avg. Reduction (%)

7.64E+05

3.19E+05

7.20E+05

1.08E+06

SwissProt

Page 25: Selectivity-Based Partitioning

Conclusions

• Monolithic optimization Missed opportunities

• Selectivity-Based Partitioning• Divide & Union approach• Multiple join orders per query• Join selectivity between relation fragments

• Partitioning Algorithm• Iterative Partitioning

• Experimental Results• Significant reduction of intermediate results

Page 26: Selectivity-Based Partitioning

Future Work

• Extension to multiple pivots• Partition-then-order optimization

• Efficient execution of partitioned plans

• Off-line workload-aware partitioning

Page 27: Selectivity-Based Partitioning

Thank you!

Page 28: Selectivity-Based Partitioning
Page 29: Selectivity-Based Partitioning

Partitioning Model

• General case: Multi-relation partitioning

• Our approach: Single-relation partitioning

R1 R2 R3 R4 R1 R21 R4 R3 R31 R22 R1 R4 R1 R22 R32 R4