dbms 2001notes 6: query compilation1 principles of database management systems 6: query compilation...

48
DBMS 2001 Notes 6: Query Compilatio n 1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

Upload: curtis-reynold-lynch

Post on 31-Dec-2015

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 1

Principles of Database Management Systems

6: Query Compilation and Optimization

Pekka Kilpeläinen(partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff

Ullman and Jennifer Widom)

Page 2: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 2

Overview• We have studied recently:

– Algorithms for selections and joins, and their costs (in terms of disk I/O)

• Next: A closer look at query compilation– Parsing– Algebraic optimization of logical query plans– Estimating sizes of intermediate results

• Focus still on selections and joins

• Remember the overall process of query execution:

Page 3: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 3

{P1,P2,…..}

{P1,C1>...}

parse

convert

apply laws

estimate result sizes

consider physical plans estimate costs

pick best

execute

Pi

answer

SQL query

parse tree

logical query plan

“improved” l.q.p

l.q.p. +sizesstatistics

Page 4: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 4

Step 1: Parsing• Check syntactic correctness of the query,

and transform it into a parse tree– Based on the formal grammar for SQL– Inner nodes nonterminal symbols

(syntactic categories for things like <Query>, <RelName>, or <Condition>)

– Leaves terminal symbols: names of relations or attributes, keywords (SELECT, FROM, …), operators (+, AND, OR, LIKE, …), operands (10, '%1960', …)

– Also check semantic correctness: Relations and attributes exist, operands compatible with operators, …

Page 5: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 5

Example: SQL queryConsider querying a movie database with relations

StarsIn(title, year, starName)MovieStar(name, address, gender, birthdate)

SELECT titleFROM StarsIn, MovieStarWHERE starName = name AND birthdate LIKE

‘%1960’ ;

(Find the titles for movies with stars born in 1960)

Page 6: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 6

Example: Parse Tree<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> , <FromList> AND

title StarsIn <RelName>

<Condition> <Condition>

<Attribute> = <Attribute> <Attribute> LIKE <Pattern>

starName name birthdate ‘%1960’

MovieStar

Page 7: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 7

Step 2: Parse Tree Logical Query

Plan• Basic strategy:

SELECT A, B, CFROM R1, R2WHERE Cond ;

becomes

A,B,C[Cond(R1 x R2)]

Page 8: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 8

Example: Logical Query Plan

title

starName=name and birthdate LIKE ‘%1960’

StarsIn MovieStar

Page 9: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 9

Step 3: Improving the L.Q.P

• Transform the logical query plan into an equivalent form expected to lead to better performance– Based on laws of relational algebra– Normal to produce a single optimized

form, which acts as input for the generation of physical query plans

Page 10: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 10

Example: Improved Logical Query Plan

title

starName=name

StarsIn

birthdate LIKE ‘%1960’

MovieStar

Page 11: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 11

Step 4: Estimate Result Sizes

• Cost estimates of database algorithms depend on sizes of input relations Need to estimate sizes of intermediate results– Estimates based on statistics about

relations, gathered periodically or incrementally

Page 12: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 12

Example: Estimate Result Sizes

Need expected size

StarsIn

MovieStar

Page 13: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 13

Steps 5, 6, ...

• Generate and compare query plans– generate alternate query plans P1, …, Pk

by selecting algorithms and execution orders for relational operations

– Estimate the cost of each plan

– Choose the plan Pi estimated to be "best"

• Execute plan Pi and return its result

Page 14: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 14

Example: One Physical Plan

Parameters: join order,

memory size, project attributes,...

Hash join

SEQ scan index scan Parameters:Select Condition,...

StarsIn MovieStar

Page 15: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 15

Next: Closer look at ...

• Transformation rules

• Estimating result sizes

Page 16: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 16

Relational algebra optimization

• Transformation rules(preserve equivalence)

• What are good transformations?

Page 17: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 17

Rules: Natural joins

R S = S R (commutative)

(R S) T = R (S T) (associative) • Carry attribute names in results, so

order is not important Can evaluate in any order

Page 18: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 18

(R x S) x T = R x (S x T)R x S = S x R

R U (S U T) = (R U S) U TR U S = S U R

Rules: Cross products & union similarly (both associative & commutative):

Page 19: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 19

Rules: Selects

p1p2(R) =

p1vp2(R) =

p1 [ p2 (R)]

[ p1 (R)] U [ p2 (R)]

Especially useful (applied left-to-right): Allows compound select conditions to be split and moved to suitable positions (See next)

NB: = AND; v = OR

Page 20: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 20

Rules: Products to JoinsDefinition of Natural Join:

R S = L[C(R x S)] ; Condition C equates attributes common to R

and S, and L projects one copy of them out

Applied right-to-left- definition of general join applied

similarly

Page 21: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 21

Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with attribs common to R,S

p (R S) =

q (R S) =

Rules: combined

[p (R)] S

R [q (S)]

More rules can be derived...

Page 22: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 22

pq (R S) = [p (R)] [q (S)]

pqm (R S) =

m [(p R) (q S)]pvq (R S) =

[(p R) S] U [R (q S)]

Derived rules for

Page 23: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 23

Derivation for first one; Others for homework:

pq (R S) =

p [q (R S) ] =

p [(R q (S) ] =

[p (R)] [q (S)]

Page 24: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 24

Rules for combined with X

similar...

e.g., p (R X S) = ?

Page 25: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 25

p1p2 (R) p1 [p2 (R)]

p (R S) [p (R)] S

R S S R

Some “good” transformations:

• No transformation is always good• Usually good: early selections

Page 26: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 26

Outline - Query Processing

• Relational algebra level– transformations– good transformations

• Detailed query plan level– estimate costs– generate and compare plans

next

Page 27: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 27

• Estimating cost of query plan

(1) Estimating size of intermediate results

(2) Estimating # of I/Os

(considered last week)

Page 28: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 28

Estimating result size

• Maintain statistics for relation R– T(R) : # tuples in R– L(R) : Length of rows,

# of bytes in each R tuple– B(R): # of blocks to hold all R tuples– V(R, A) :

# distinct values in R for attribute A

Page 29: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 29

Example R A: 20 byte string

B: 4 byte integerC: 8 byte dateD: 5 byte string

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

T(R) = 5 L(R) = 37V(R,A) = 3 V(R,C) = 5V(R,B) = 1 V(R,D) = 4

Page 30: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 30

Size estimates for product W = R1xR2

T(W) =

L(W) =

T(R1) T(R2)

L(R1) + L(R2)

Page 31: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 31

L(W) = L(R)

T(W) = ?

Estimates for selection W =

A=a(R)

= AVG number of tuples that satisfy an equality condition on R.A

= …

Page 32: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 32

Example R V(R,A)=3

V(R,B)=1V(R,C)=5V(R,D)=4

AVG size of, say, A=val(R)? T(A=cat(R))=2, T(A=dog(R))=2, T(A=bat(R))=1

=> (2+2+1)/3 = 5/3

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

Page 33: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 33

Size of W = Z=a(R) in general:

Assume: Only existing Z values a1, a2, … are used in a select expression Z= ai, each with equal probalility 1/V(R,Z)

=> the expected size of Z=val(R) is

E(T(W)) = i 1/V(R,Z) * T(Z=ai (R))

= T(R)/V(R,Z)

Page 34: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 34

What about selection W=z val (R) ?

T(W) = ?

• Estimate 1: T(W) = T(R)/2– Rationale: On avg, 1/2 of tuples satisfy the

condition

• Estimate 2: T(W) = T(R)/3– Rationale: Acknowledges the tendency of selecting

"interesting" (e.g., rare tuples) more frequently

<, and < similary

Page 35: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 35

Size estimates for W = R1 R2

Let X = attributes of R1 Y = attributes of R2

X Y =

Same as R1 x R2

Case 1

Page 36: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 36

W = R1 R2 X Y = AR1 A B C R2 A D

Case 2

Assumption:

V(R1,A) V(R2,A) A(R1) A(R2)

V(R2,A) V(R1,A) A(R2) A(R1)

“Containment of value sets” [Sec. 7.4.4]

Page 37: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 37

Why should “containment of value sets” hold?

Consider joining relationsFaculties(FName, …) andDepts(DName, FName, …)

where Depts.FName is a foreign key, and faculties can have 0,…,n departments.

Now V(Depts, FName) V(Faculties, FName), and referential integrity requires that

FName(Depts) FName(Faculties)

Page 38: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 38

R1 A B C R2 A D

a

Estimating T(W) when V(R1,A) V(R2,A)

Take 1 tuple Match

Each estimated to match with T(R2)/V(R2,A) tuples ...

so T(W) = T(R1)T(R2) /V(R2, A)

Page 39: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 39

• V(R1,A) V(R2,A): T(W) = T(R1)T(R2)/V(R2,A)

• V(R2,A) V(R1,A): T(W) = T(R2)T(R1)/V(R1,A)

[A is the join attribute]=> in general: T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A)

}

Page 40: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 40

With similar ideas, can estimate sizes of:• W = AB (R) ….. [Sec. 7.4.2]

• W =A=aB=b (R) = A=a (B=b (R));Ass. A and B independent =>

T(W) = T(R)/(V(R, A) x V(R, B))

• W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5]

Ass. X and Y independent => … T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x

max{V(R,Y), V(S,Y)})

• Union, intersection, diff, …. [Sec. 7.4.7]

Page 41: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 41

Note: for complex expressions, need

intermediate estimates.

E.g. W = [A=a (R1) ] R2

Treat as relation U

T(U) = T(R1)/V(R1,A) L(U) = L(R1)

Also need an estimate for V(U, Ai) !

Page 42: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 42

To estimate V (U, Ai)

E.g., U = A=a (R) Say R has attributes A,B,C,D

V(U, A) = ?V(U, B) = ?V(U, C) = ?V(U, D) = ?

Page 43: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 43

Example R V(R,A)=3

V(R,B)=1V(R,C)=T(R)=5V(R,D)=3

U = A=x (R)

A B C D

cat 1 10 10

cat 1 20 20

dog 1 30 10

dog 1 40 30

bat 1 50 10

V(U, D) = V(R,D)/T(R,A) ... V(R,D)

V(U,A) =1 V(U,B) =1 V(U,C) = T(R) V(R,A)

Page 44: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 44

V(U,Ai) for Joins U = R1(A,B)

R2(A,C)

V(U,A) = min { V(R1, A), V(R2, A) }V(U,B) = V(R1, B)V(U,C) = V(R2, C)

[Assumption called “preservation of value sets”, sect. 7.4.4]

Values of non-join attributes preserved

Page 45: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 45

Example:

Z = R1(A,B) R2(B,C) R3(C,D)

• R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100

• R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300

• R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500

Page 46: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 46

T(U) = T(R1) T(R2)/max{V(R1,B), V(R2,B)}= 10002000/200 = 10,000;

V(U,A) = V(R1,A) = 50V(U,C) = V(R2,C) = 300V(U,B) = min{V(R1,B), V(R2,B)}= 100

Intermediate Result: U(A,B,C)= R1(A,B)

R2(B,C)

Page 47: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 47

Z = U(A,B,C) R3(C,D)

T(Z) = T(U)T(R3)/ max{V(U,C), V(R3,C)}

= 10,0003000/300= 100,000

V(Z,A) = V(U,A) = 50V(Z,B) = V(U,B) = 100V(Z,C) = min{V(U,C), V(R3,C)} = 90V(Z,D) = V(R3,D) =500

Page 48: DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based

DBMS 2001 Notes 6: Query Compilation 48

Outline/Summary

• Estimating cost of query plan– Estimating size of results done!– Estimating # of IOs last week

• Generate and compare plans– skip this

• Execute physical operations of query plan– Sketched last week