university of pennsylvania, database research group 1 chase & backchase: a versatile tool for...

48
University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa , Alin Deutsch, Arnaud Sahuguet, Val Tannen University of Pennsylvania

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

1

Chase & Backchase:A Versatile Tool for Query

Optimization

Lucian Popa, Alin Deutsch, Arnaud Sahuguet, Val Tannen

University of Pennsylvania

Page 2: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

2

Logical vs. Physical

• A separation is necessary for physical data independence:

Logical schem

a

Physical

schema

Logical optimization

Physical optimization

Network

Unfortunately, limited cooperation !

Page 3: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

3

Optimization Techniques

1 Some rewriting (unnesting)

2 Acces path (index) use (ad-hoc)

3 Join ordering (dynamic programming)

4 Semantic optimization (more rewriting)

5 Use of materialized views (rewriting, too)

6 Object-oriented techniques (everything above, but OO! )

Tradition

al

More

“exotic”

Logical

Physical

Physical

Logical

Both ?!

Mixed

• Limited cooperation, so far, because of :– different foundations

– different affiliation: logical or physical

Page 4: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

4

Bridges

• What is needed: interaction between all these

techniques. – can produce better plans, by enabling each other:

• This work connects together some of these techniques,

by finding a common foundation

Semantic optimizati

on

Use of materialized

views

Use of indexes

Page 5: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

5

In a (Coco)Nutshell

• The Unifier: Constraints (Dependencies):– logical constraints:

• semantic relationships among elements of the logical schema– physical constraints:

• semantic relationships between elements of the physical schema and elements of the logical schema

• Chase with constraints produces the universal plan UP:– UP incorporates all relevant access paths– subqueries of UP provide the search space (candidate plans)

• Backchase: – search among candidate plans for scan-minimal queries, checking for

equivalence using (reverse) chase

– search space can be pruned using cost-based optimization

Page 6: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

6

Talk Outline

• Motivation and Overview

• Logical and Physical Constraints, Chase and Backchase (VLDB’99)

• Theoretical Results

(ICDT’99, VLDB’99, recent improvements)

• Using Cost Information

• Experimental Results (SIGMOD’00, recent improvements)

• Conclusion

Page 7: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

7

Proj : Set < Struct { string PName; string PDept; string CustName; string Budget;} >

class Dept (extent depts){ attribute string DName; attribute Set<string> DProjs;}

Logical Schema (ODMG syntax)

Dept : Dict < Doid, Struct { string DName; Set <string> DProjs; }>

fresh base type for oids

• To formally describe classes and their operations:

use dictionaries (finite functions).

Page 8: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

8

Translating OO Queries into Dictionary Form

select struct (PN: s, DN: Dept [d].DName)

from dom Dept d , Dept [d].DProjs s

“domain” as extent “lookup” for oid dereferencing

select struct (PN: s, DN: d.DName)from depts d , d.DProjs s OO

• Dictionary operations: dom M, M[k]

• Constraints are translated in a similar way

Dict

Page 9: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

9

Logical Schema Constraints

(RIC1) (d dom Dept) (s Dept [d].DProjs)

(p Proj) s = p.PName

(INV2) (p Proj) (d dom Dept)

p.PDept = Dept [d].DName (s Dept [d].DProjs) p.Pname = s

+ two others …

• Describe semantic relationships among elements of the logical schema

• Example: an inverse relationship between Proj and

Dept:

Page 10: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

10Physical Schema

Primary Index on PName I : Dict <string, T>

Secondary Index on CustName SI : Dict <string, Set < T >>

(where T is the type of tuples in Proj)

• Two indexes for relation Proj :

• A materialized view (a la “join index”) :

JI : Set< Struct{Doid JDoid; string JPN} >

JI = select struct (JDoid: d, JPN: p.PName)

from dom Dept d, Dept [d].DProjs s, Proj p

where s = p.PName

Page 11: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

11

Physical Schema Constraints

Describe semantic relationship between elements of the logical schema and elements of the physical schema.

(JI1) (d dom Dept) (s Dept [d].DProjs) (p Proj)

[ s = p. PName ( j JI) j .JDoid = d and j.JPN = p.Pname ]

(SI1) (p Proj) (k dom SI)

(t SI[k]) [ k = p.CustName and p

= t ]

One of the constraints for SI :

One of the constraints for JI :

Page 12: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

12

An Example of Interaction in Optimization

select struct (PN: s, PB: p.Budg, DN: Dept [d].DName)from dom Dept d, Dept [d].DProjs s , Proj pwhere s = p.PName and p.CustName = “CitiBank”

A user (logical) query

• The query is chased with all the applicable logical and physical constraints (eg., RIC1, SI1, JI1)

• The result of the chase is the universal plan.

Page 13: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

13

The Generalized Chase

select O(r1,…,rm)from R1 r1,…,Rm rm

where B1(r1,…,rm)

(r1R1) … (rmRm) [ B1(r1,…,rm) (s1S1) … (snSn) B2(r1,…,rm,s1,

…,sn) ]

Let d be the constraint:

select O(r1,…,rm) from R1 r1,…,Rm rm,

S1 s1,…,Sn sn

where B1(r1,…,rm) and B2(r1,…,rm,s1,…,sn)

d

distinct !

distinct !

Page 14: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

14

A Universal Plan

select struct (PN: s, DN: Dept [d].DName)

from dom Dept d , Dept [d].DProjs s ,

Proj p, JI j, dom SI k, SI [k] t, dom I i

where s = p.PName and p.CustName = “CitiBank”

and Dept[d].DName = p.PDept and j.DOID = d and j.PN =

p.PName

and p.CustName = k and p = t and i = p.PName and p =

I[i]

U:

• U gathers those elements from both logical and physical schema, that are relevant for alternative implementations

• U is redundant

Added by chase

Page 15: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

15

(Top-Down) Backchase Minimization

• Eliminates the redundancies from the universal plan

• Enumerates subqueries of UP:

eliminates scans top-down, as long as equivalence is

preserved;

equivalence is verified by a (reverse) chase with

applicable constraints (eg., INV2)

• Outputs several scan-minimal subqueries

Page 16: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

16Chase & (Top-Down) Backchase

Q0Original Query

U

Chase

dn

d1

Universal Plan

...

d1

dn

More minimization possible ...

. . .

...

. . .Minimal subqueries of U

QnQ1

BackChased’m

d’1

...

Cost-based optimization still needed

Page 17: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

17

Some of the Generated Candidate Plans

select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)

from SI[“Citibank”] p

select struct (PN: j.JPN, PB: I[j.JPN].Budg, DN: Dept[j.JDoid].DName)from JI j

where I[j.JPN].CustName = “CitiBank”

Plan 3: Using the Join Index and the Primary Index

Plan 2: Secondary Index Lookup

select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)from Proj pwhere p.CustName = “CitiBank”

Plan 1: Relation Scan

Different access paths, but equivalent

Page 18: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

18

Talk Outline

• Motivation and Overview

• Constraints, Chase and Backchase

• Theoretical Results

• Using Cost

• Experimental Results

• Conclusion

Page 19: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

19

Path-Conjunctive Language

Paths: P ::= x | c | R | P.A | dom P | P[x]

Path-Conjunctions: B ::= P1 = P1’ and ... and Pk = Pk’

Path-Conjunctive (PC) Queries: select struct (A1: P1’, ..., An: Pn’) from P1 x1, ..., Pm xm where B

Embedded PC Dependencies (EPCDs): (r1P1) … (rmPm) [ B1(r1,…,rm) (s1P’1) … (snP’n) B2(r1,…,rm,s1 ,…, sn) ]

No set/dictionary type here

Still very expressive!

Page 20: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

20

Overview of PC Query Containment Results

Chase Complexity

PC query containment/ EPCD t riviality

NP-complete

PC query containmentunder f ull EPCDs

Decisionprocedure EXP-complete

PC query containmentunder EPCDs

Complete proofprocedure

r .e.

• The chase is also complete for EPCD implication

• Strengthen results of Chandra & Merlin, Beeri & Vardi (70’s & 80’s)

Page 21: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

21

Completeness of C&B

Assume the following:

Logical constraints: D a set of EPCDs

Physical schema:

• primary indexes

• materialized views that are PC queries (includes join indexes and access suport relations)

• access structures representable as dictionary expressions with PC query domain and entry (includes secondary indexes and gmaps [Tsatalos et al] )

Physical constraints: C (eg., SI1, JI1)

C is a set of EPCDs. (In general the constraints in C are not full.)

Page 22: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

22

Completeness of C&B (continued)

Theorem (Completeness)

Let Q be a PC query such that some chasing sequence of Q with D terminates (with chase D

(Q) ). Then:

(a) chase C (chase D (Q) ) terminates

(b) any scan-minimal candidate plan equivalent to Q under D is a subquery of chase C (chase D (Q) )

Strengthens results of Levy at al, new search space

Page 23: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

23

Talk Outline

• Motivation and Overview

• Constraints, Chase and Backchase

• Theoretical Results

• Interaction with Cost-Based Optimization

• Experimental Results

• Conclusion

Page 24: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

24

Using Cost

Want to– reduce the size of the backchase search space– output a scan-minimal subquery that is also cost-

minimal

Cost monotonicity assumption (made implicitly earlier): subqueries are cheaper

Hence the top-down backchase cannot exploit cost!

Instead, a bottom-up backchase enumerates queriesbuilt from the scans of the universal plan, checking

for equivalence with the universal plan and cost minimality

and pruning all superqueries

Page 25: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

25Bottom-Up Backchase with Cost-Based Pruning

Q0

Original Query

U

Chase

dn

d1

Universal Plan

...

S1, S2, S3, S4

Scans of U:

S1

S2

S4

S3

subqueries of size 1

subqueries of size 2

S3, S4

...S1, S2

S1, S3

S1, S4

Equivalent to U ?

If YES then minimal rewriting.

Becomes best plan so far.

Prune all superqueries

Cost higher than the min cost so

far ?

If YES then prune it together with all

superqueries

S1, S2, S3

S1, S3, S4

S1, S2, S4

...

Page 26: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

26

Interaction with a Cost-Based Component

Cost-basedoptimization

Logical schema

=

relations + OO classes

Physical schema

= views + indexes

Rewriting with logical and

physical constraints (C&B)

Cost-based pruning

Query

Best physical plan

Candidate plan q

Physical plan and cost for q

Cost information

Page 27: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

27

Cost-Based Optimization of a Candidate Plan(the usual suspects)

• Scan order (“join reordering”)

generalized dynamic programming technique– beyond relational– global”, i.e., interacting with backchase pruning

• Placement of selections and projections– because the chase works with queries in “normalized”

form

• Join methods: nested loops, index join , others

Page 28: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

28

Talk Outline

• Motivation and Overview

• Chase and Backchase (as in proposal): Example of candidate plans enumeration

• Chase and Backchase (as in proposal): Completeness of path-conjunctive (PC) case

• Cost-based C&B

• Experimental results

• Future Research and Conclusion

Page 29: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

29

Is the C&B Technique Practical ?

• Is the chase feasible ? – tries exponentially many mappings for each step

• Is the backchase feasible ? – explores exponentially many subqueries of UP– what is the effect of cost-based pruning ?

• Is it worthwhile ? – does the quality of the generated plans outweigh

the cost of optimization ?

Page 30: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

30 Experiments

• We have developed a prototype in Java

• Several experimental configurations that: • cover important practical cases• are scalable

• Not shown here:

– Exper. Config. 1 : relational queries and indexes.

– Exper. Config. 3 : OO techniques

interaction between semantic optimization and path

indexes

Page 31: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

31Experimental Configuration 2 : Relational Views

• Input query: chain of stars

• Schemas: views and key constraints

– In addition: indexes on the corner relations

• Scale-up parameters:– # stars, size of stars, # views/star, #indexes/star

R1 R2

S11

S12

S13

S21

S22

S23

V11

V12

V21

V22

key constraints

Interaction between semantic optimization and views • No rewriting with views in the absence of key constraints

Page 32: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

32

Time to Chase

Time to chase [EC2]

0

0.5

1

1.5

2

10 15 20 25

Size of the query

Tim

e in

sec

on

ds

9 view s + 3 key = 21 constraints

6 view s + 3 key = 15 constraints

• Chasing alone is fast !

• For queries with more than 15 joins and more than 15 constraints it takes seconds

Page 33: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

33

Stratification

• For UP of size 12-15 joins, FB (top-down or bottom-up) may become impractical

• OQF (On-line Query Fragmentation): UP can be decomposed, prior to backchase, into smaller univ. plans.

Querychase

UP

backchase, no cost

Plans (minimal

subqueries)

R1 R2

S11

S12

S13

S21

S22

S23

V11

V12

V21

V22

Full backchase (FB):

Complexity: 2k1 + … + 2km << 2k1+…+km

Page 34: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

34 Backchase Strategies

• We compare several C&B optimizers obtained by

combining in various ways: – top-down/bottom-up full backchase

– cost-based pruning

– stratification

– TopDownFB (top-down full, no cost pruning)

– BottomUpFB (bottom-up full, no cost pruning)

– BottomUpFB+Prune (bottom-up full, cost pruning)

– OQF (OQF fragmentation, BottomUpFB within each fragment)

– OQF+Prune (OQF, BottomUpFB+Prune within each fragment)

– DP (dynamic programming, i.e. no C&B) • also used for cost-evaluation in all the other strategies

Page 35: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

35One Star Query (No Stratification Applicable)

One star query (s ize = 6)

0

10

20

30

40

50

60

70

80

90

100

110

1 2 3 4

Num ber of View s

To

tal O

pti

miz

atio

n T

ime

[s]

TopDow n FB

BottomUp FB

BottomUp FB + Prune

Dynamic-Programming

One star query (s ize = 7)

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

1 2 3 4 5

Num ber of View s

To

tal O

pti

miz

atio

n T

ime

[s]

TopDow n FB

BottomUp FB

BottomUp FB + Prune

Dynamic-Programming

• BottomUpFB+Prune outperforms its full counterparts that do not use cost pruning:– total optimization time, for query size 6, and 4 views:

• BottomUpFB+Prune : 6.4s • TopDownFB : 108.1 s• BottomUpFB : 82.2s

Page 36: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

36

Adding IndexesQuery size = 6, #Views = 2

0

10

20

30

40

50

60

70

80

90

100

[6,2,0] [6,2,1] [6,2,2] [6,2,3] [6,2,4] [6,2,5]

(query size, #views, #idx)

TopDow n FB

BottomUpFB

BottomUp FB +Prune

Query size = 6, #Views = 4

0

10

20

30

40

50

60

70

80

90

100

[6,4,0] [6,4,1] [6,4,2] [6,4,3] [6,4,4]

(query size, #views, #idx)

BottomUp FB + Prune

• Total optimization time (query size=6, views=2, indexes=3):• BottomUpFB+Prune : 10.7s • TopDownFB : 68.3 s• BottomUpFB : 63.6s

• BottomUpFB+Prune can go to larger configurations: – (query size=6, views=4, indexes=4): 53.2s

Page 37: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

37

Chain of Two Stars (OQF Stratification is Applicable)

1

10

100

1000

[6,2] [8,2] [8,4] [10,2] [10,4]

[query size, number of views]

To

tal

Op

tim

iza

tio

n T

ime

[s

]

Bottom Up + Prune

OQF

pruneOQF

Dynam ic-Program m ing

TopDownFB

Bottom UpFB

• BottomUpFB+Prune is similar to OQF: 72s for query size 10 and 4 views

• OQF+Prune scales best: only 8.6s (faster than DP !) – with dictionaries (indexes), it may miss good plans

• DP becomes expensive at large queries: 14.6s for query size 10– DP’s performance directly affects BottomUpFB+Prune

Page 38: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

38

Is C&B Worthwhile ?

• Configuration: chain of two stars

• Execution time measured with DB2 on a medium database (15,000 tuples) – unoptimized query vs. C&B optimized query (produced with BottomUpFB+Prune)

Querysize

Exec Time Views OptimizedExec Time

OptimizationTime

Total ProcesingTime (C&B)

6 20min 40s 2 2min 16s 1.8s 2min 18s

2 5min 44s 7.3s 5min 51s 8 24min 30s

4 19s 20.2s 39.2s

2 7min 52s 30s 8min 22s 10 28min 10s

4 1min 34s 1min 12s 2min 46s

Page 39: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

39

Summary

• C&B integrates, flexibly, many aspects of logical and physical optimization.

• Good theoretical foundations based on constraints and chase

• The technique is practical (feasible + worthwhile)– cost-based pruning is good !– stratification is good !– even better when we can combine them

Page 40: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

40

Future Work

• Better stratification techniques:– complete for arbitrary constraints – complete, when combined with cost-based

pruning

• Other query languages:– union / disjunction – grouping / aggregates– bag and list semantics

Page 41: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

41

Page 42: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

42

Tableaux Chase

A B C D

RR

w x y z’w’ x y’ z

w x y z

d

A B C D

RRR

w x y z’w’ x y’ zw x y’’ z’’

w x y z

d

A B C D

RR

a b c da’ b c’ d’

a b c’’ d’’

(r1R)(r2R) [ r1.B = r2.B

(r3R) r3.A = r1.A and r3.B =

r1.B ]

Think conjunctive query syntax ...

• Used to check equivalence of tableaux (conjunctive queries) under dependencies.

• A chase step:

Page 43: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

43

Interaction of Indexes with Views

• (Levy et al ’95) For conjunctive queries and views:

– finitely many minimal equivalent rewritings

• However, the optimal plan may not be among them!

– Scenario:

• relations R(A,B), S(B, C) and view V = A (R S)

• input query Q = R S

• P = V R S is equivalent, but not minimal thrown away

• but, if V is small and R has an index on A, then P can be better than Q !

No interaction captured: indexes not in the language

Page 44: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

44C&B Captures Interaction of Indexes with

Views• The index IR on A is explicit in our framework.

• Chase Q with constraints describing V and IR to obtain U:

U = select struct (A=r.A, B=r.B, C=s.C) from R r, S s, V v, dom IR k

where r.B = s.B and v.A = r.A and k = r.A and IR

[k] = r

select struct (A=r.A, B=r.B, C=s.C)from R r, S s, V vwhere r.B = s.B and v.A = r.A

Elim k

select struct (A=r.A, B=r.B, C=s.C)from R r, S swhere r.B = s.B

Elim v

P1

select struct (A=v.A, B=s.B,

C=s.C)from V v, S s

where IR [v.A].B = s.BP2

Interaction! More minimal rewritings, incorporating indexes

Page 45: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

45

Path-Conjunctive Language (cont’d)

The following query plan:

select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)

from SI[“Citibank”] p

is not PC. However, the following is PC:

select struct (PN: p.PName, PB: p.Budg, DN: p.PDept)

from dom SI k, SI[k] pwhere p = “CitiBank”

In general, PC cannot express navigation OO queries, index-based joins or index-based selections.

We will rediscover them when we translate PC queries into physical plans.

Page 46: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

46

PC Containment

Theorem (Containment). The PC containment problem, Q1

Q2 (under all instances), is in NP (and equivalent to the

existence of a PC-containment mapping).

Theorem (Containment under EPCDs). Let:

• D a set of EPCDs

• Q1, Q2 PC queries s.t. some chasing sequence of Q1 with D terminates (with chase D (Q1) ).

The following are equivalent : (a) Q1 D Q2 (b) chase D (Q1) Q2

Page 47: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

47

OQF (On-line Query Fragmentation)

INPUT: Query Q, Constraints C

F1

OQF

Fm

C&B C&B C&B

...

All plans

OUTPUT: Q = …

C1 C2 Cm … C =

partial plans P1 partial plans Pm

...

F2

Page 48: University of Pennsylvania, Database Research Group 1 Chase & Backchase: A Versatile Tool for Query Optimization Lucian Popa, Alin Deutsch, Arnaud Sahuguet,

University of Pennsylvania, Database Research Group

48

OQF + Prune

INPUT: Query Q, Constraints C

F1

OQF

Fm

BottomUpFB+ Prune

...

Plan P

OUTPUT: Q = …

C1 C2 Cm … C =

partial plan P1 partial plan Pm

...

F2

BottomUpFB+ Prune

BottomUpFB+ Prune