multi-query optimization and applications

42
Multi-Query Multi-Query Optimization and Optimization and Applications Applications Prasan Roy Prasan Roy Indian Institute of Technology - Indian Institute of Technology - Bombay Bombay

Upload: moses-sutton

Post on 31-Dec-2015

30 views

Category:

Documents


1 download

DESCRIPTION

Multi-Query Optimization and Applications. Prasan Roy Indian Institute of Technology - Bombay. Motivation. Queries often involve repeated computation Queries on overlapping views, stored procedures, nested queries, etc. Update expressions for a set of overlapping materialized views - PowerPoint PPT Presentation

TRANSCRIPT

Multi-Query Multi-Query Optimization andOptimization and

ApplicationsApplications

Prasan RoyPrasan RoyIndian Institute of Technology - BombayIndian Institute of Technology - Bombay

May 2000 Multi-Query Optimization and Applications 2

MotivationMotivation Queries often involve repeated

computation– Queries on overlapping views, stored

procedures, nested queries, etc.– Update expressions for a set of overlapping

materialized views– Automatically generated queries

• XML-QL complex path expressions SQL query batches

Our focus: Faster query processing by avoiding repeated computation

May 2000 Multi-Query Optimization and Applications 3

OutlineOutline

Multi-query optimizationApplication to related problems

– Query result caching– Materialized view selection and

maintenanceConclusions and future work

Multi-Query OptimizationMulti-Query Optimization

Prasan RoyPrasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe,, S. Seshadri, S. Sudarshan and Siddhesh Bhobe,Efficient and Extensible Algorithms for Multi-Query OptimizationEfficient and Extensible Algorithms for Multi-Query Optimization,,ACM SIGMOD 2000ACM SIGMOD 2000

May 2000 Multi-Query Optimization and Applications 5

Motivating ExampleMotivating Example

AA

BB CC

BB

CC DD

Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C

Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D

Foreign Key Dependency: AForeign Key Dependency: ABBCCDD Total Cost = 460Total Cost = 460

100100

1010

100100

100100

10101010

1010

100100

1010 1010

May 2000 Multi-Query Optimization and Applications 6

BCBC

Motivating ExampleMotivating Example

AA

BB CC

DD

Total Cost = 370Total Cost = 370Benefit = 90Benefit = 90

100100 100100

100100

1010

1010

1010

1010

1010

1010

1010

Foreign Key Dependency: AForeign Key Dependency: ABBCCDD

May 2000 Multi-Query Optimization and Applications 7

Problem StatementProblem Statement

AA

BB CC

DD

Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs)– Assumption: No shared pipelines

Common SubexpressionCommon Subexpression

May 2000 Multi-Query Optimization and Applications 8

ProblemsProblems Locally optimal subplans may not be

globally optimal Mutually exclusive alternatives

(A JOIN B JOIN C)(A JOIN B JOIN C)

(B JOIN C JOIN D)(B JOIN C JOIN D)

(C JOIN D JOIN E)(C JOIN D JOIN E)What to share: (B JOIN C)(B JOIN C) or (C JOIN D)(C JOIN D) ?

Materializing and sharing a CSE not necessarily cheaper

May 2000 Multi-Query Optimization and Applications 9

ExampleExample

AA

BB CC

BB

CC DD

Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C

Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D

Foreign Key Dependency: AForeign Key Dependency: ABBCCDD Total Cost = 154Total Cost = 154

100100

1010

1010

1010

111010

1010

11

11 11

May 2000 Multi-Query Optimization and Applications 10

BCBC

ExampleExample

AA

BB CC

DD

100100 1010

1010

1010

11

1010

1010

11

1010

1010

Foreign Key Dependency: AForeign Key Dependency: ABBCCDDTotal Cost = 172Total Cost = 172

Benefit = -18Benefit = -18

May 2000 Multi-Query Optimization and Applications 11

ApproachApproach

1. Set up the search space of execution plans

2. Explore the search space to find the best execution plan

May 2000 Multi-Query Optimization and Applications 12

Representation of Plan Representation of Plan SpaceSpace

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

AND/OR Query DAG

BCBC

AA

ABCABC BCDBCD

CDCDABAB

CC DDBB

Example PlanExample Plan(Solution Graph)(Solution Graph)

May 2000 Multi-Query Optimization and Applications 13

DAG Generation DAG Generation ModificationsModificationsUnificationUnification Volcano: Duplicate subexpressions No CSEs!

BCBC

AA

ABCABC

ABAB

CCBB

BCBC

BCDBCD

CDCD

CC DDBB

Modification: Duplicate subexpressions unified

May 2000 Multi-Query Optimization and Applications 14

DAG Generation DAG Generation ModificationsModificationsSubsumptionSubsumption Volcano: No expression subsumption Missed

CSEs

(A<10)

(A<10) (A>50)

(A>50)

(A<10 or A>50)

(A>50)

(A>10)

(A>50)

SubsumptionSubsumptionderivationderivation

Modification: Subsumption derivations introduced

May 2000 Multi-Query Optimization and Applications 15

Exploring the Search SpaceExploring the Search SpaceAn Exhaustive AlgorithmAn Exhaustive AlgorithmInput: DAG for query QOutput: Set of nodes to materialize, corresp. best

plan1. Y = set of equivalence nodes in DAG2. Pick X Y which minimizes BestCost(Q, X) 3. Return X

BestCost(Q, X) = cost of the best plan for Q given that the nodes

in X are transiently materialized

Too expensive! Need heuristics.

May 2000 Multi-Query Optimization and Applications 16

Exploring the Search SpaceExploring the Search SpaceA Greedy HeuristicA Greedy HeuristicInput: DAG for query QOutput: Set of nodes to materialize, corresp. best

plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z})

Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency

May 2000 Multi-Query Optimization and Applications 17

Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes

May 2000 Multi-Query Optimization and Applications 18

Improving EfficiencyImproving EfficiencyOnly CSEs Materialized Only CSEs Materialized CSEs identified in a bottom-up traversal

Common SubexpressionCommon Subexpression

BCBC

AA

ABCABC BCDBCD

CDCDABAB

CC DDBB

May 2000 Multi-Query Optimization and Applications 19

Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes

May 2000 Multi-Query Optimization and Applications 20

Efficient Benefit ComputationEfficient Benefit Computation Incremental Re- Incremental Re-optimizationoptimizationX : Set of CSEs already materializedz : unmaterialized CSE

Best plan given X materialized Best plan given X U {z} materialized

Observation Best plans change only for the

ancestors of z

May 2000 Multi-Query Optimization and Applications 21

Incremental Re-optimizationIncremental Re-optimization ExampleExample

BCBC

ABCABC BCDBCD

CDCDABAB

Best PlanBest Plan

X = {}

1010 101010101010

100100 100100100100

100100 100100 100100 100100

230230230230 230230

230230z = (B JOIN C)

BCBC10101010

1010

120120 120120

130130

CCBBAA DD

May 2000 Multi-Query Optimization and Applications 22

Incremental Re-optimizationIncremental Re-optimization Efficient PropagationEfficient PropagationAncestor nodes visited bottom-up in

a topological order– Guarantees no revisits

Propagation path pruned if the current node’s best cost remains unchanged

May 2000 Multi-Query Optimization and Applications 23

Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes

May 2000 Multi-Query Optimization and Applications 24

Avoiding Benefit Avoiding Benefit ComputationComputation Monotonicity Assumption

– Benefit of a node does not increase due to materialization of other nodes

• Often true

An earlier benefit of a node is an upper bound on its current benefit

Do not recompute a node’s benefit if another node’s current benefit is greater

Optimization costs decrease by 90%

May 2000 Multi-Query Optimization and Applications 25

Experimental ResultsExperimental ResultsTPCD-0.1 on Microsoft SQL Server

6.5 – using SQL rewriting for MQO

0

200

400

600

800

1000

Q2 Q2-D Q11 Q15

Exec

utio

n Ti

me

(sec

s)

No-MQO

MQO (Greedy)

May 2000 Multi-Query Optimization and Applications 26

Alternatives to GreedyAlternatives to Greedy Volcano-SHVolcano-SH A lightweight post-pass heuristic

1.Compute the best plan for each query independently, using Volcano

2.Find the set of nodes in the best plans to materialize (cost-based)

Similar previous work [Subramanium and Venkataraman, SIGMOD 1998]

May 2000 Multi-Query Optimization and Applications 27

Alternatives to GreedyAlternatives to Greedy Volcano-RUVolcano-RU A lightweight extension of

Volcano1. Batched queries optimized in

sequence Q1, Q2, …, Qn2. Find the best plan for query Qi given

the best plans for queries Qj, j < i3. Cost based materialization of nodes

in best plans of Qj, j < i Plan quality sensitive to the query

sequence

May 2000 Multi-Query Optimization and Applications 28

Experimental ResultsExperimental ResultsTPCD-0.1 query batches

0

200

400

600

800

BQ1 BQ2 BQ3 BQ4 BQ5

Estim

ated

Ex

ecut

ion

Tim

e (s

ecs) Volcano

Volcano-SH

Volcano-RU

Greedy

May 2000 Multi-Query Optimization and Applications 29

Experimental ResultsExperimental ResultsTPCD-0.1 query batches

0.01

0.1

1

10

BQ1 BQ2 BQ3 BQ4 BQ5

Opt

imiz

atio

n Ti

me

(sec

s), l

ogar

ithm

ic s

cale Volcano

Volcano-SH

Volcano-RU

Greedy

May 2000 Multi-Query Optimization and Applications 30

FeaturesFeatures Easily implemented

– First MQO implementation integrated with a state-of-the-art optimizer (as far as we know)

– Also partially prototyped on Microsoft SQL-Server

Support for index selection– Index modeled as physical property

(like “interesting order”) Extensible and flexible

– New operators, data models– Readily adapts to other problems

• Query result caching• Materialized view selection/maintenance

Query Result CachingQuery Result Caching

P. RoyP. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan,, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan,Don’t Trash Your Intermediate Results, Cache ‘emDon’t Trash Your Intermediate Results, Cache ‘em,,Submitted for publicationSubmitted for publication

May 2000 Multi-Query Optimization and Applications 32

Problem StatementProblem Statement

Minimize the total execution time of an online workload by– Caching intermediate/final results of

individual queries, and– Using these cached results to answer

later queries

May 2000 Multi-Query Optimization and Applications 33

System ModelSystem Model

May 2000 Multi-Query Optimization and Applications 34

ContributionsContributionsIntermediate as well as final results

cached– Optimizer-driven cache management– Adapts to workload changes

Cache-aware cost-based optimization– Novel framework for cached result

matching

May 2000 Multi-Query Optimization and Applications 35

Experimental ResultsExperimental Results Overheads negligible Performance on 900 query TPCD-1

based uniform cube-point workload

Materialized View Materialized View Selection and Selection and MaintenanceMaintenance

Hoshi Mistry, Hoshi Mistry, Prasan RoyPrasan Roy, K. Ramamritham and S. Sudarshan,, K. Ramamritham and S. Sudarshan,Materialized View Selection and Maintenance Using Multi-Query OptimizationMaterialized View Selection and Maintenance Using Multi-Query Optimization,,Submitted for publicationSubmitted for publication

May 2000 Multi-Query Optimization and Applications 37

Problem StatementProblem StatementSpeed up maintenance of a set of

materialized views by– Exploiting CSEs between different

view maintenance expressions– Selecting additional views to be

materialized

May 2000 Multi-Query Optimization and Applications 38

ContributionsContributionsOptimization of maintenance

expressions– Support for transiently materialized

“delta’’ viewsNicely integrates transient vs

permanent view materialization choices

May 2000 Multi-Query Optimization and Applications 39

Experimental ResultsExperimental ResultsOverheads negligiblePerformance benefit for maintenance

of two TPCD-0.1 based SPJA views

May 2000 Multi-Query Optimization and Applications 40

ConclusionConclusion

MQO is practical– Low overheads, high benefits– Easily implemented and integrated

Leads to novel solutions to related problems– Query result caching– Materialized view selection and

maintenance

May 2000 Multi-Query Optimization and Applications 41

Future WorkFuture Work

Further extensions of MQO– Shared execution pipelines

Query result caching in presence of updates

Other problems– Continuous queries, XML view

caching, etc.

May 2000 Multi-Query Optimization and Applications 42

Other ContributionsOther ContributionsGarbage Collection in Object

Oriented Databases– Developed a “transaction-aware”

cyclic reference counting algorithm– Provided a formal proof of correctness

S. Ashwin, S. Ashwin, Prasan RoyPrasan Roy, S. Seshadri, Avi Silberschatz and S. , S. Seshadri, Avi Silberschatz and S. Sudarshan,Sudarshan,Garbage Collection in Object-Oriented Databases Using Transactional Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference CountingCyclic Reference Counting, VLDB 1997, VLDB 1997

Prasan RoyPrasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. , S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin,Ashwin,Garbage Collection in Object-Oriented Databases Using Transactional Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference CountingCyclic Reference Counting, Invited Paper, VLDB Journal, August 1998, Invited Paper, VLDB Journal, August 1998