multi-query optimization and applications
DESCRIPTION
Multi-Query Optimization and Applications. Prasan Roy Indian Institute of Technology - Bombay. Motivation. Queries often involve repeated computation Queries on overlapping views, stored procedures, nested queries, etc. Update expressions for a set of overlapping materialized views - PowerPoint PPT PresentationTRANSCRIPT
Multi-Query Multi-Query Optimization andOptimization and
ApplicationsApplications
Prasan RoyPrasan RoyIndian Institute of Technology - BombayIndian Institute of Technology - Bombay
May 2000 Multi-Query Optimization and Applications 2
MotivationMotivation Queries often involve repeated
computation– Queries on overlapping views, stored
procedures, nested queries, etc.– Update expressions for a set of overlapping
materialized views– Automatically generated queries
• XML-QL complex path expressions SQL query batches
Our focus: Faster query processing by avoiding repeated computation
May 2000 Multi-Query Optimization and Applications 3
OutlineOutline
Multi-query optimizationApplication to related problems
– Query result caching– Materialized view selection and
maintenanceConclusions and future work
Multi-Query OptimizationMulti-Query Optimization
Prasan RoyPrasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe,, S. Seshadri, S. Sudarshan and Siddhesh Bhobe,Efficient and Extensible Algorithms for Multi-Query OptimizationEfficient and Extensible Algorithms for Multi-Query Optimization,,ACM SIGMOD 2000ACM SIGMOD 2000
May 2000 Multi-Query Optimization and Applications 5
Motivating ExampleMotivating Example
AA
BB CC
BB
CC DD
Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C
Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D
Foreign Key Dependency: AForeign Key Dependency: ABBCCDD Total Cost = 460Total Cost = 460
100100
1010
100100
100100
10101010
1010
100100
1010 1010
May 2000 Multi-Query Optimization and Applications 6
BCBC
Motivating ExampleMotivating Example
AA
BB CC
DD
Total Cost = 370Total Cost = 370Benefit = 90Benefit = 90
100100 100100
100100
1010
1010
1010
1010
1010
1010
1010
Foreign Key Dependency: AForeign Key Dependency: ABBCCDD
May 2000 Multi-Query Optimization and Applications 7
Problem StatementProblem Statement
AA
BB CC
DD
Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs)– Assumption: No shared pipelines
Common SubexpressionCommon Subexpression
May 2000 Multi-Query Optimization and Applications 8
ProblemsProblems Locally optimal subplans may not be
globally optimal Mutually exclusive alternatives
(A JOIN B JOIN C)(A JOIN B JOIN C)
(B JOIN C JOIN D)(B JOIN C JOIN D)
(C JOIN D JOIN E)(C JOIN D JOIN E)What to share: (B JOIN C)(B JOIN C) or (C JOIN D)(C JOIN D) ?
Materializing and sharing a CSE not necessarily cheaper
May 2000 Multi-Query Optimization and Applications 9
ExampleExample
AA
BB CC
BB
CC DD
Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C
Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D
Foreign Key Dependency: AForeign Key Dependency: ABBCCDD Total Cost = 154Total Cost = 154
100100
1010
1010
1010
111010
1010
11
11 11
May 2000 Multi-Query Optimization and Applications 10
BCBC
ExampleExample
AA
BB CC
DD
100100 1010
1010
1010
11
1010
1010
11
1010
1010
Foreign Key Dependency: AForeign Key Dependency: ABBCCDDTotal Cost = 172Total Cost = 172
Benefit = -18Benefit = -18
May 2000 Multi-Query Optimization and Applications 11
ApproachApproach
1. Set up the search space of execution plans
2. Explore the search space to find the best execution plan
May 2000 Multi-Query Optimization and Applications 12
Representation of Plan Representation of Plan SpaceSpace
Equivalence ClassEquivalence Class(OR node)(OR node)
OperationOperation(AND node)(AND node)
AND/OR Query DAG
BCBC
AA
ABCABC BCDBCD
CDCDABAB
CC DDBB
Example PlanExample Plan(Solution Graph)(Solution Graph)
May 2000 Multi-Query Optimization and Applications 13
DAG Generation DAG Generation ModificationsModificationsUnificationUnification Volcano: Duplicate subexpressions No CSEs!
BCBC
AA
ABCABC
ABAB
CCBB
BCBC
BCDBCD
CDCD
CC DDBB
Modification: Duplicate subexpressions unified
May 2000 Multi-Query Optimization and Applications 14
DAG Generation DAG Generation ModificationsModificationsSubsumptionSubsumption Volcano: No expression subsumption Missed
CSEs
(A<10)
(A<10) (A>50)
(A>50)
(A<10 or A>50)
(A>50)
(A>10)
(A>50)
SubsumptionSubsumptionderivationderivation
Modification: Subsumption derivations introduced
May 2000 Multi-Query Optimization and Applications 15
Exploring the Search SpaceExploring the Search SpaceAn Exhaustive AlgorithmAn Exhaustive AlgorithmInput: DAG for query QOutput: Set of nodes to materialize, corresp. best
plan1. Y = set of equivalence nodes in DAG2. Pick X Y which minimizes BestCost(Q, X) 3. Return X
BestCost(Q, X) = cost of the best plan for Q given that the nodes
in X are transiently materialized
Too expensive! Need heuristics.
May 2000 Multi-Query Optimization and Applications 16
Exploring the Search SpaceExploring the Search SpaceA Greedy HeuristicA Greedy HeuristicInput: DAG for query QOutput: Set of nodes to materialize, corresp. best
plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )
Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )
Y = Y – {z}; X = X U {z}Else Y = {}
3. Return X
Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z})
Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency
May 2000 Multi-Query Optimization and Applications 17
Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )
Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )
Y = Y – {z}; X = X U {z}Else Y = {}
3. Return X
Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes
May 2000 Multi-Query Optimization and Applications 18
Improving EfficiencyImproving EfficiencyOnly CSEs Materialized Only CSEs Materialized CSEs identified in a bottom-up traversal
Common SubexpressionCommon Subexpression
BCBC
AA
ABCABC BCDBCD
CDCDABAB
CC DDBB
May 2000 Multi-Query Optimization and Applications 19
Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )
Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )
Y = Y – {z}; X = X U {z}Else Y = {}
3. Return X
Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes
May 2000 Multi-Query Optimization and Applications 20
Efficient Benefit ComputationEfficient Benefit Computation Incremental Re- Incremental Re-optimizationoptimizationX : Set of CSEs already materializedz : unmaterialized CSE
Best plan given X materialized Best plan given X U {z} materialized
Observation Best plans change only for the
ancestors of z
May 2000 Multi-Query Optimization and Applications 21
Incremental Re-optimizationIncremental Re-optimization ExampleExample
BCBC
ABCABC BCDBCD
CDCDABAB
Best PlanBest Plan
X = {}
1010 101010101010
100100 100100100100
100100 100100 100100 100100
230230230230 230230
230230z = (B JOIN C)
BCBC10101010
1010
120120 120120
130130
CCBBAA DD
May 2000 Multi-Query Optimization and Applications 22
Incremental Re-optimizationIncremental Re-optimization Efficient PropagationEfficient PropagationAncestor nodes visited bottom-up in
a topological order– Guarantees no revisits
Propagation path pruned if the current node’s best cost remains unchanged
May 2000 Multi-Query Optimization and Applications 23
Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )
Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )
Y = Y – {z}; X = X U {z}Else Y = {}
3. Return X
Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes
May 2000 Multi-Query Optimization and Applications 24
Avoiding Benefit Avoiding Benefit ComputationComputation Monotonicity Assumption
– Benefit of a node does not increase due to materialization of other nodes
• Often true
An earlier benefit of a node is an upper bound on its current benefit
Do not recompute a node’s benefit if another node’s current benefit is greater
Optimization costs decrease by 90%
May 2000 Multi-Query Optimization and Applications 25
Experimental ResultsExperimental ResultsTPCD-0.1 on Microsoft SQL Server
6.5 – using SQL rewriting for MQO
0
200
400
600
800
1000
Q2 Q2-D Q11 Q15
Exec
utio
n Ti
me
(sec
s)
No-MQO
MQO (Greedy)
May 2000 Multi-Query Optimization and Applications 26
Alternatives to GreedyAlternatives to Greedy Volcano-SHVolcano-SH A lightweight post-pass heuristic
1.Compute the best plan for each query independently, using Volcano
2.Find the set of nodes in the best plans to materialize (cost-based)
Similar previous work [Subramanium and Venkataraman, SIGMOD 1998]
May 2000 Multi-Query Optimization and Applications 27
Alternatives to GreedyAlternatives to Greedy Volcano-RUVolcano-RU A lightweight extension of
Volcano1. Batched queries optimized in
sequence Q1, Q2, …, Qn2. Find the best plan for query Qi given
the best plans for queries Qj, j < i3. Cost based materialization of nodes
in best plans of Qj, j < i Plan quality sensitive to the query
sequence
May 2000 Multi-Query Optimization and Applications 28
Experimental ResultsExperimental ResultsTPCD-0.1 query batches
0
200
400
600
800
BQ1 BQ2 BQ3 BQ4 BQ5
Estim
ated
Ex
ecut
ion
Tim
e (s
ecs) Volcano
Volcano-SH
Volcano-RU
Greedy
May 2000 Multi-Query Optimization and Applications 29
Experimental ResultsExperimental ResultsTPCD-0.1 query batches
0.01
0.1
1
10
BQ1 BQ2 BQ3 BQ4 BQ5
Opt
imiz
atio
n Ti
me
(sec
s), l
ogar
ithm
ic s
cale Volcano
Volcano-SH
Volcano-RU
Greedy
May 2000 Multi-Query Optimization and Applications 30
FeaturesFeatures Easily implemented
– First MQO implementation integrated with a state-of-the-art optimizer (as far as we know)
– Also partially prototyped on Microsoft SQL-Server
Support for index selection– Index modeled as physical property
(like “interesting order”) Extensible and flexible
– New operators, data models– Readily adapts to other problems
• Query result caching• Materialized view selection/maintenance
Query Result CachingQuery Result Caching
P. RoyP. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan,, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan,Don’t Trash Your Intermediate Results, Cache ‘emDon’t Trash Your Intermediate Results, Cache ‘em,,Submitted for publicationSubmitted for publication
May 2000 Multi-Query Optimization and Applications 32
Problem StatementProblem Statement
Minimize the total execution time of an online workload by– Caching intermediate/final results of
individual queries, and– Using these cached results to answer
later queries
May 2000 Multi-Query Optimization and Applications 34
ContributionsContributionsIntermediate as well as final results
cached– Optimizer-driven cache management– Adapts to workload changes
Cache-aware cost-based optimization– Novel framework for cached result
matching
May 2000 Multi-Query Optimization and Applications 35
Experimental ResultsExperimental Results Overheads negligible Performance on 900 query TPCD-1
based uniform cube-point workload
Materialized View Materialized View Selection and Selection and MaintenanceMaintenance
Hoshi Mistry, Hoshi Mistry, Prasan RoyPrasan Roy, K. Ramamritham and S. Sudarshan,, K. Ramamritham and S. Sudarshan,Materialized View Selection and Maintenance Using Multi-Query OptimizationMaterialized View Selection and Maintenance Using Multi-Query Optimization,,Submitted for publicationSubmitted for publication
May 2000 Multi-Query Optimization and Applications 37
Problem StatementProblem StatementSpeed up maintenance of a set of
materialized views by– Exploiting CSEs between different
view maintenance expressions– Selecting additional views to be
materialized
May 2000 Multi-Query Optimization and Applications 38
ContributionsContributionsOptimization of maintenance
expressions– Support for transiently materialized
“delta’’ viewsNicely integrates transient vs
permanent view materialization choices
May 2000 Multi-Query Optimization and Applications 39
Experimental ResultsExperimental ResultsOverheads negligiblePerformance benefit for maintenance
of two TPCD-0.1 based SPJA views
May 2000 Multi-Query Optimization and Applications 40
ConclusionConclusion
MQO is practical– Low overheads, high benefits– Easily implemented and integrated
Leads to novel solutions to related problems– Query result caching– Materialized view selection and
maintenance
May 2000 Multi-Query Optimization and Applications 41
Future WorkFuture Work
Further extensions of MQO– Shared execution pipelines
Query result caching in presence of updates
Other problems– Continuous queries, XML view
caching, etc.
May 2000 Multi-Query Optimization and Applications 42
Other ContributionsOther ContributionsGarbage Collection in Object
Oriented Databases– Developed a “transaction-aware”
cyclic reference counting algorithm– Provided a formal proof of correctness
S. Ashwin, S. Ashwin, Prasan RoyPrasan Roy, S. Seshadri, Avi Silberschatz and S. , S. Seshadri, Avi Silberschatz and S. Sudarshan,Sudarshan,Garbage Collection in Object-Oriented Databases Using Transactional Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference CountingCyclic Reference Counting, VLDB 1997, VLDB 1997
Prasan RoyPrasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. , S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin,Ashwin,Garbage Collection in Object-Oriented Databases Using Transactional Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference CountingCyclic Reference Counting, Invited Paper, VLDB Journal, August 1998, Invited Paper, VLDB Journal, August 1998