1 query evaluation partially using prof. hector garcia-molina’s slides (notes06, notes07)...

Query Evaluation

Partially using Prof. Hector Garcia-Molina’s slides (Notes06, Notes07)http://www-db.stanford.edu/~ullman/dscb.html

Donghui ZhangNortheastern University

Query Evaluation

SQL Query Query Result

SELECT E.NameFROM Emp EWHERE E.SSN<5000AND E.Age>50

Michael JordanDonghui Zhang

• Check the data and meta data;• Produce query result

Server

Michael JordanDonghui Zhang

Query Evaluation Steps

• Query Compiling: get logical Q.P.• Query Optimization: choose a physical

Q.P.• Query Execution: execute

convert

apply laws

estimate result sizes

consider physical plans estimate costs

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

answerSQL query

parse tree

logical query plan

“ improved” l.q.p

l.q.p. +sizes

statistics

query compiling

query optimization

query execution

Query Compiling Parse

• Background knowledge: Grammar.• Input: SQL query.• Output: a parse tree.

• Start with a simple grammar:– Only SFW (no group by, having, nested query)– Simple AND condition (no OR, UNION, EXISTS, IN, …)– One table (no conditions like E.did=D.did)

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

• <SFW> := SELECT <SelList> FROM <Table> WHERE <CondList>

• <SelList> := <Attribute> | <Attribute>, <SelList> • <CondList> := <Condition> | <Condition> AND

<CondList>• <Condition> := <Attribute> <op> <value>• <op>:= > | < | = | >= | <=

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50Query Compiling Parse

Grammar

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50Query Compiling Parse

Parse Tree

SELECT <SelList> FROM <Table> WHERE <CondList>

E.SSN < 5000 <op> <value>

E.Age > 50

Emp E<Attribute>

E.Name

Query Compiling Convert

• Input: a parse tree.• Output: a logical query plan.

• Algorithm: followed by . E.Name(E.SSN<5000 AND E.Age>50(E) )

• Alternatively, a l.q.p tree.

E.SSN<5000 AND E.Age>50

E.Name

Query Compiling Apply Laws

• Replace with , push [and ] down.

• Only used for multiple tables. So skip.

E.Name

convert

apply laws

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

answerSQL query

parse tree

logical query plan

l.q.p. +sizes

statistics

query compiling

query optimization

query execution

Query Optimization Estimate Result Sizes

• The size of each input table is stored as meta data.

• Intermediate result: size not known, but needed to estimate I/O cost of physical plan.

• But for the simple case, can be evaluated on the fly. So no need to estimate the size of . So skip.

Query Optimization Consider Physical Plans

• Associate each RA operator with an implementation scheme.

• Multiple implementation schemes? Enumerate all.

E.Name

Plan 1 (always work!)

on-the-fly

• For the other physical plans, need to know what indices exist.

• Primary index: controls the actual storage of a table.– Suppose a primary B+-tree index exists on SSN.

• Secondary index: built on some other attribute. Does not store the actual record. Each leaf entry stores a set of page IDs in the primary index.– Suppose a secondary B+-tree index exists on Age.

e.g. entry in Age index:

Age=50, pageIDs={1, 4, 6}

21 3 54 6

SSN index

E.Name

Plan 2

range search in SSN index

on-the-fly

E.Name

Plan 3

range search in Age index, follow pointers to SSN index

on-the-fly

Query Optimization Estimate Costs

• Estimate #I/Os for each physical plan.• Pick the cheapest one.

• Input: physical plan.• Additional input:

– meta data (e.g. how many levels a B+-tree has)– assumptions (e.g. the root node of every B+-tree is

pinned)– memory buffer size.

Query Optimization Estimate Costs Meta Data

• All the database tables.• For each table R:

– Schema– T(R): #records in R– For every attribute A:

• V(R, A): #distinct values of A• min(R, A): minimum value of A• max(R, A): maximum value of A

– Primary index: #levels, #leaf nodes.– Secondary index: #levels, #leaf nodes, average

#pageIDs per leaf entry.

Query Optimization Estimate Costs sample input

• Assume for table E:– Schema = (SSN: int, Name: string, Age: int, Salary: int) – T(E) = 100 tuples. – For attribute SSN:

• V(E, SSN)=100, min(E, SSN)=0000, max(E, SSN)=9999– For attribute Age:

• V(E, Age)=20, min(E, Age)=21, max(E, Age)=60– Primary index on SSN: 3 level B+-tree, 50 leaf nodes.– Secondary index on Age: 2 level B+-tree, 10 leaf nodes,

every leaf entry points to 3.5 pageIDs (on average).

• Assumptions: all B+-tree roots are pinned. Can reach the first leaf page of a B+-tree directly.

• Memory buffer size: 2 pages.

• Cost = 50. (The primary index has 50 leaf nodes. Assume we can reach the first leaf page of a B+-tree directly.)

E.Name

Plan 1 (always work!)

on-the-fly

E.Name

Plan 2

range search in SSN index

on-the-fly

• Cost = 25. SSN<5000 selects half of the employees, so 50/2=25 leaf nodes.

• Note: if condition is E.SSN>5000, needs 1 more I/O.

E.Name

Plan 3

on-the-fly

• Cost = 10/4 + 20/4 * 3.5 = 21.

#I/Os in the Age index #I/Os in the SSN index

E.Name

Plan 3

on-the-fly

• Cost = 10/4 + 20/4 * 3.5 = 21.

Age index has 10 leaf nodes. Check 1/4 of them, since [51,60] is 1/4 of [21,60].

E.Name

Plan 3

on-the-fly

• Cost = 10/4 + 20/4 * 3.5 = 21.

20 distinct ages divided by 4to get #ages in [51,60].

times 3.5 (#pageIDs per page)to get #I/Os in the SSN index.

Query Optimization Pick Best

physical plan I/O cost

Plan 1: scan 50

Plan 2: range search SSN index

Plan 3: range search Age index

convert

apply laws

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

answerSQL query

parse tree

logical query plan

l.q.p. +sizes

statistics

query compiling

query optimization

query execution

Another case study: two tables.

• Extended grammar:– Only SFW (no group by, having, nested query)– Simple AND condition (no OR, UNION, EXISTS, IN, …)– Allow two tables (allow conditions like E.did=D.did)

• Example query:SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND

D.budget=1000

• <SFW> := SELECT <SelList> FROM <TableList> WHERE <CondList>

• <SelList> := <Attribute> | <Attribute>, <SelList> • <TableList> := <Table> | <Table>, <Table>• <CondList> := <Condition> | <Condition> AND

<CondList>• <Condition> := <Attribute> <op> <value> |

<Attribute> = <Attribute>• <op>:= > | < | = | >= | <=

Query Compiling Parse Grammar

SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000

Query Compiling Parse Parse Tree

SELECT <SelList> FROM<TableList>WHERE<CondList>

E.Name

, <SelList>

D.Dname

SELECT <SelList> FROM <CondList><TableList>WHERE

Emp E Dept D

SELECT <SelList> FROM <CondList>

E.Did D.Did <Condition>

<TableList>WHERE

Query Compiling Convert

• Algorithm: then then .

E.Name. D.Dname(E.Did=D.Did AND E.SSN<5000 AND

D.budget=1000(ED) )

• The l.q.p tree:

E.Did=D.Did AND E.SSN<5000 AND D.budget=1000

E.Name, D.Dname

Dept D

• Always always: (try to) replace with !

E.Did=D.Did AND E.SSN<5000 AND D.budget=1000

E.Name, D.Dname

Dept D

• Also, push down.

E.SSN<5000 AND D.budget=1000

E.Name, D.Dname

Dept D

E.Name, D.Dname

Dept D

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

Query Compiling Apply Laws Theory Behind

• Let p = predicate with only E attributes q = predicate with only D attributes m = E & D’s common attributes are equal• We have:

pqm (E D) = p(E) q(D)

convert

apply laws

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

answerSQL query

parse tree

logical query plan

l.q.p. +sizes

statistics

query compiling

query optimization

query execution

• Because join is so important, let’s skip result size estimation for now, and let’s assume selections are not pushed down.

E.Name, D.Dname

Dept D

Four Join Algorithms

• Iteration join (nested loop join)• Merge join• Hash join• Join with index

Example E D over common attribute Did

• E:– T(E)=10,000 – primary index on SSN, 3 levels. – |E|= 1,000 leaf nodes.

• D:– T(D)=5,000– primary index on Did. 3 levels.– |D| = 500 leaf nodes.

• Memory available = 101 blocks

Iteration Join

1. for every block in E2. scan through D;3. join records in the E block with records in the D block.

• I/O cost = |E| + |E| * |D| =

1000 + 1000*500 = 501,000.

• Works good for small buffer (e.g. two blocks).

• Can we do better?Use our memory(1) Read 100 blocks of E(2) Read all of D (using 1 block) + join(3) Repeat until done

• I/O cost = |E| + |E|/100 * |D| =

1000 + 10*500 = 6,000.

• Can we do better?Reverse join order: D E. i.e. For every 100 D blocks, go

through E.

• I/O cost = |D| + |D|/100 * |E| =

500 + 5*1000 = 5,500.

• Merge join (conceptually)(1) if R1 and R2 not sorted, sort them(2) i 1; j 1;

While (i T(R1)) (j T(R2)) do if R1{ i }.C = R2{ j }.C then

outputTuples else if R1{ i }.C > R2{ j }.C then j j+1 else if R1{ i }.C < R2{ j }.C then i i+1

Procedure Output-TuplesWhile (R1{ i }.C = R2{ j }.C) (i T(R1)) do

[jj j;

while (R1{ i }.C = R2{ jj }.C) (jj T(R2)) do

[output pair R1{ i }, R2{ jj };

jj jj+1 ]

i i+1 ]

Example

i R1{i}.C R2{j}.C j1 10 5 12 20 20 23 20 20 34 30 30 45 40 30 5

50 6 52 7

Merge Join Cost

• Recall that |E|=1000, |D|=500. And |D| is already sorted on Did.

• External sort E: pass 0, by reading and writing E, produces a file with 10 sorted runs. Another read is enough.

• No need to write! Can pipeline to join operator.

• Cost = 3*1000 + 500 = 3,500.

• Hash join (conceptual)– Hash function h, range 0 k– Buckets for R1: G0, G1, ... Gk– Buckets for R2: H0, H1, ... Hk

Algorithm(1) Hash R1 tuples into G buckets(2) Hash R2 tuples into H buckets(3) For i = 0 to k do

match tuples in Gi, Hi buckets

Simple example hash: even/odd

R1 R2 Buckets2 5 Even 4 4 R1 R23 12 Odd: 5 38 139 8

2 4 8 4 12 8 14

3 5 9 5 3 13 11

Hash Join Cost

• Read + write both E and D for partitioning, then read to join.

• Cost = 3 * (1000 + 500) = 4,500.

• Join with index (Conceptually)

For each r E do

Find the corresponding D tuple by probing index.

• Assuming the root is pinned in memory,Cost = |E| + T(E)*2 = 1000 + 10,000*2 = 21,000.

• The costs are different if integrate selection conditions!

• E.g. for the index join, only check half of E. So should be 500+5,000*2=10,500.

• Selection condition which is not used during join should be evaluated to filter the join result. E.g. index join checked D without evaluating the selection condition on D.

physical plan with selections being pushed down

• Finally, let’s consider pushing down selections.• Now that the join operator takes intermediate

results (which could be written to disk), we need to estimate their sizes…

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

convert

apply laws

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

answerSQL query

parse tree

logical query plan

l.q.p. +sizes

statistics

query compiling

query optimization

query execution

Estimating result size

• Keep statistics for relation R– T(R) : # tuples in R– S(R) : # of bytes in each R tuple– V(R, A) : # distinct values in R for

attribute A– min(R, A)– max(R, A)

Example R A: 20 byte string

B: 4 byte integerC: 8 byte dateD: 5 byte string

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

T(R) = 5 S(R) = 37V(R,A) = 3 V(R,C) = 5V(R,B) = 1 V(R,D) = 4

Size estimates for W = R1 x R2

T(W) =

S(W) =

T(R1) T(R2)

S(R1) + S(R2)

S(W) = S(R)

T(W) = ?

Size estimate for W = A=a(R)

Example R V(R,A)=3

V(R,B)=1V(R,C)=5V(R,D)=4

W = z=val(R) T(W) =

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

T(R)V(R,Z)

Assumption:

Values in select expression Z = valare uniformly distributedover possible V(R,Z) values.

What about W = z val (R) ?

T(W) = ?

• T(W) = T(R)/2?

• Solution: Estimate values in range

Example R ZMin=1 V(R,Z)=10

W= z 16 (R)

Max=20

f = 5 (fraction of range) 20

T(W) = f T(R)

Size estimate for W = R1 R2

Let x = attributes of R1 y = attributes of R2

Same as R1 x R2

Case 1

W = R1 R2 X Y = AR1 A B C R2 A D

Case 2

Assumption:

V(R1,A) V(R2,A) Every A value in R1 is in R2

V(R2,A) V(R1,A) Every A value in R2 is in R1

R1 A B C R2 A D

Computing T(W) when V(R1,A) V(R2,A)

Take 1 tuple Match

1 tuple matches with T(R2)

tuples... V(R2,A)

so T(W) = T(R2) T(R1) V(R2, A)

• V(R1,A) V(R2,A) T(W) = T(R2) T(R1)

V(R2,A)

• V(R2,A) V(R1,A) T(W) = T(R2) T(R1)

V(R1,A)

[A is common attribute]

T(W) = T(R2) T(R1)max{ V(R1,A), V(R2,A) }

In general W = R1 R2

S(W) = S(R1) + S(R2) - S(A) size of attribute

Note: for complex expressions, need

intermediate T,S,V results.

E.g. W = [A=a (R1) ] R2

Treat as relation U

T(U) = T(R1)/V(R1,A) S(U) = S(R1)

Also need V (U, *) !!

To estimate Vs

E.g., U = A=a (R1) Say R1 has attribs A,B,C,D

V(U, A) = V(U, B) =V(U, C) = V(U, D) =

Example R 1 V(R1,A)=3

V(R1,B)=1V(R1,C)=5V(R1,D)=3

U = A=a (R1)

A B C D

cat 1 10 10

cat 1 20 20

dog 1 30 10

dog 1 40 30

cat 1 50 10

V(U,A) =1 V(U,B) =1 V(U,C) = T(R1)

V(R1,A)V(U,D) ... somewhere in between

For an arbitrary attribute D other than A (the attribute being selected)V(R1,D) ranges from 1 to T(R1), andV(U,D) ranges from 1 to T(R1)/V(R1,A).

),1(/)1(

DRVLet’s make

Or, V(U,D) = V(R1,D)/V(R1,A)

For Joins U = R1(A,B) R2(A,C)

V(U,A) = min { V(R1, A), V(R2, A) }V(U,B) = V(R1, B)V(U,C) = V(R2, C)

Example:

Z = R1(A,B) R2(B,C) R3(C,D)

T(R1) = 1000 V(R1,A)=50 V(R1,B)=100

T(R2) = 2000 V(R2,B)=200 V(R2,C)=300

T(R3) = 3000 V(R3,C)=90 V(R3,D)=500

T(U) = 10002000 V(U,A) = 50 200 V(U,B) = 100

V(U,C) = 300

Partial Result: U = R1 R2

Z = U R3

T(Z) = 100020003000 V(Z,A) = 50200300 V(Z,B) = 100

V(Z,C) = 90 V(Z,D) = 500

• E:– T(E)=10,000 – primary index on SSN, 3 levels. – |E|= 1,000 leaf nodes.– V(E,SSN)=10,000: from 0000 to 9999.

• D:– T(D)=5,000– primary index on Did. 3 levels.– |D| = 500 leaf nodes.– V(D,budget)=20: from 100 to 10,000.

• Memory available = 11 blocks• ?? What’s the best physical plan?

Example

Note: |E’| = 500|D’| = 25

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

p.q.p #1

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

range search scan

iteration join; D is outer table

Cost = 500 (read D)+ 25 (write D’)+ 25 + ceiling(25/10)*500

= 2050

p.q.p #2

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

range search scan

sort merge Cost = 5*500 (sort E’; no write)+ 500 (read D)

= 3000

p.q.p #3

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

range search scan

hash join Cost = 3*500 (for E’)+ 500 (read D)+ 25 (write D’)+ 3*25 (for D’)

= 3000

Note: M should be bigger than sqrt(min{|E’|, |D’|})+1. - Why? - What if not?

p.q.p #4

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

range search

index nested loop join

Cost = 500 (scan E’)+ 5000*(3-1) (for D)

= 10,500

Some notes

• For BNL, merge, hash joins: always push selection!

• For index join, do not push selection on the inner table (the one whose primary key is involved in the join condition).

• For BNL, make the smaller table be the outer table – join could be free if it fits in memory!

1 query evaluation partially using prof. hector garcia-molina’s slides (notes06, notes07)...

Documents

cs 347: parallel and distributed data management notes07:...

1 csc 440 database management systems jdbc this presentation...

particle filters - dr. gregory l. plett's...

chapter...

chapter...

cse 3318 notes 7: dynamic...

physics-based optimal...

week #6 : function fitting; animation goals: least-squares...

ppi notes06 impact crisis may2010

notes07 oblique shock waves

statistics of the kolmogorov-smirnov type (conover chapter...

constraints on relations foreign keys local and global...

“screen” design and visualization - university of...

cs 347: distributed databases and transaction...

thermal modeling - university of colorado colorado...

cse 2320 notes 7: dynamic...

notes06 vs illinois hc

6 number theory ii: modular arithmetic, cryptography,...

unit #6 : families of functions, taylor polynomials, l...

6 number theory ii: modular arithmetic, cryptography, and...