generating efficient plans for queries using views chen li stanford university with foto afrati...
Post on 19-Dec-2015
214 views
TRANSCRIPT
![Page 1: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/1.jpg)
Generating Efficient Plans for Queries Using Views
Chen LiStanford University
with Foto Afrati (National Technical University of Athens)
and Jeff Ullman (Stanford University)
SIGMOD, Santa Barbara, CA, May 23, 2001
![Page 2: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/2.jpg)
2
Answering queries using views
How to answer a query using only the results of views? [LMSS95]
Many applications:
– Data warehouses
– Data integration
– Query optimization
– …
Base relations
ViewsV1 V2 … Vn
QueryQ
R1 Rm…
![Page 3: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/3.jpg)
3
An example
View: V1(M, D, C) :- car(M, D), loc(D, C)
Query Q: Q(M, C) :- car(M, anderson), loc(anderson, C)
Rewriting P1: Q(M, C) :- V1(M, anderson, C)
car
BMW AlisonHonda Anderson
Make Dealer
… …Ford Varsity
loc
Anderson Palo altoVarsity Redwood City
Dealer City
Alison Mountain View… …
![Page 4: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/4.jpg)
4
Existing algorithms
Bucket algorithm [LRO96], Inverse-rule algorithm [DG97], MiniCon algorithm [PL00], …
However, instead of generating
P1: Q(M, C) :- V1(M, anderson, C)
they generate rewriting
P2: Q(M, C) :- V1(M, anderson, C1), V1(M1, anderson, C)
Why P2, not P1?– These algorithms take the Open-World Assumption (OWA): “P2 P1.”
– However, under the Closed-World Assumption (CWA): “P1 = P2.”
![Page 5: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/5.jpg)
5
Differences between OWA and CWA
W1(Make, Dealer) :- car(Make, Dealer)
W2(Make, Dealer) :- car(Make, Dealer)
All car tuplesW1 = W2 =
CWA
– W1 and W2 have all car tuples.
– E.g.: W1 and W2 are computed from the same car table in a database.
– W1 and W2 have some car tuples.
– E.g.: W1 and W2 are from two different web sites.
OWA
W1 W2
![Page 6: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/6.jpg)
6
Our problem: generating efficient plans using views under CWA
Base relations
Materialized views V1 V2 … Vn
Query Q
Existing algorithms work under both assumptions.
Our study– takes the CWA assumption.
– considers efficiency of rewritings.
Efficient plans?
R1 R2 Rm
![Page 7: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/7.jpg)
7
Challenge: in what space should we generate rewritings?
Rewritings: P1: Q(S, C) :- V1(M, a, C), V2(S, M, C)
P2: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C)
P2 could be more efficient than P1!
car(Make, Dealer)
loc(Dealer, City)
part(Store, Make, City)
Q(S, C) :- car(M, a), loc(a, C), part(S, M, C)
V1(M, D, C) :- car(M, D), loc(D, C)
V2(S, M, C) :- part(S, M, C)
V3(S) :- car(M, a), loc(a, C), part(S, M, C)
a = ‘anderson’
![Page 8: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/8.jpg)
8
FocusViews
V1,V2,…,Vn
Query QStep 1: generate a rewriting P (logical plan)
Step 2: generate an efficient physical plan from P
We focus on the logical level (step 1).– Prune rewriting space to generate “good” rewritings.
– Different from the one-step approach: [CKPS95, ZCLPU00].
Both steps are cost-based.
Consider select-project-join queries, i.e., conjunctive queries.
Cost model CM
![Page 9: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/9.jpg)
9
Three cost models:
– CM1: number of subgoals in a physical plan
– CM2: sizes of views and intermediate relations
– CM3: CM2 + dropping attributes in intermediate relations
Experimental results
Conclusion and future directions
Rest of the talk
![Page 10: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/10.jpg)
10
CM1: number of subgoals in a physical plan
– Goal: generate rewritings with minimum number of subgoals
Motivations:
– Reduce the number of joins
– Reduce the number of view accesses
Example:
– P1: Q(S, C) :- V1(M, a, C), V2(S, M, C) more efficient
– P2: Q(S, C) :- V1(M1, a, C), V1(M, a, C1), V2(S, M, C)
A view can appear more than once in different “forms.”
Cost model CM1
![Page 11: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/11.jpg)
11
Results under CM1
Analyze the rewriting space:
– Find an interesting structure of the space;
– Show a procedure to reduce number of subgoals in a rewriting.
Develop an algorithm CoreCover:
– Input: a query Q, views V1, …, Vn
– Output: rewritings with minimum number of subgoals
Optimality: if there is a rewriting, then CoreCover guarantees to find a rewriting with minimum number of subgoals.
![Page 12: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/12.jpg)
12
CoreCover: example
Query: Q(S, C) :- car(M, a), loc(a, C), part(S, M, C)
Evaluate views on D:
V1(M, D, C) :- car(M, D), loc(D, C) V1(m0, a, c0)
V2(S, M, C) :- part(S, M, C) V2(s0, m0, c0)
V3(S) :- car(M, a), loc(a, C), part(S, M, C) V3(s0)
Construct database D = { car(m0, a), loc(a, c0), part(s0, m0, c0) }
D
View tuples: V1(M, a, C), V2(S, M, C), V3(S)
Intuition: translate the problem to a set-covering problem.
![Page 13: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/13.jpg)
13
CoreCover: example (cont.)
Find minimal covers of query subgoals using view tuples
Q(S, C) :- V1(M, a, C) , V2(S, M, C)
Find query subgoals “covered” by each view tuple:
V1(M, a, C) car(M, a) V2(S, M, C) loc(a, C) V3(S) part(S, M, C)
View tuples: V1(M, a, C), V2(S, M, C), V3(S)
V1(M, D, C):- car(M, D), loc(D, C)
V2(S, M, C) :- part(S, M, C)
V3(S) :- car(M, a), loc(a, C), part(S, M, C)
Query: Q(S, C) :- car(M, a), loc(a, C), part(S, M, C)
![Page 14: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/14.jpg)
14
Algorithm: CoreCover
Q
Construct database D from QD
Find minimal covers of query subgoals using view tuples.rewritings
Evaluate views on D
“View tuples”
Viewtuples
T1
T2
…
Tk
Find query subgoals “answered” by each view tuple.
Query subgoals
G1
G2
G3
…
Gm
![Page 15: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/15.jpg)
15
Cost model CM2: considering sizes of views and intermediate
relationsMotivation: cost of V1 V2 is related to size(V1) and size(V2).
Cost = size(V1) + size(V2) + … + size(Vn)
+ size(IR1) + size(IR2) + … + size(IRn)
Physical plan:
Q( ) :- V1, V2, V3, …, Vn
IR1 IR2 IRn “IR”: intermediate relation
![Page 16: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/16.jpg)
16
Results under CM2
Observation: Adding more views may make a rewriting more efficient.
P1: Q(S, C) :- V1(M, a, C), V2(S, M, C)
P2: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C)
If V3(S) is very selective, P2 can be more efficient than P1.
Larger search space: rewritings using view tuples produce an optimal physical plan under CM2.
– Modify CoreCover to find these rewritings.
– We discuss how to condense rewritings.
![Page 17: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/17.jpg)
17
Cost model CM3: dropping nonrelevant attributes
CM2: assumes all attributes are kept in IRs.
CM3: assumes attributes can be dropped in IRs to reduce sizes.
Bad news: didn’t find a space that guarantees to produce an optimal physical plan.
Good news: found a heuristic for optimizer to drop more attributes.
IRi
Y
Q( ) :- … Vi Vi+1 …
![Page 18: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/18.jpg)
18
Drop what attributes?
Drop Y if: (1) Y is not used in later joins, and
(2) Y is not in the answers.
Called the “supplementary-relation approach.” [BR87]
IRi
Y
Q( ) :- … Vi Vi+1 …
![Page 19: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/19.jpg)
19
Search space under CM3?
Q(A) :- r(A, A), t(A, B), s(B, B)
V1(A, B) :- r(A, A), s(B, B)
V2(A, B) :- t(A, B), s(B, B)
r(A,B)
s(C,D)
t(E,F)
Rewritings using view tuples may not produce optimal physical plans!
Rewriting using view tuples: P1: Q(A) :- V1(A, B), V2(A, B)
A more efficient rewriting: P2: Q(A) :- V1(A, C), V2(A, B)
Note: P1 and P2 both compute the answers to Q.
![Page 20: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/20.jpg)
20
Targeting rewritings to facilitate dropping of attributes
Goal: after the transformation, we may drop more attributes.
Main idea: given a sequence of subgoals, rename variables.
If Y Y’, the new rewriting is still equivalent to Q, then drop Y’ in IRi even if Y appears in later joins.
IRi
YY’
Q( ) :- … Vi Vi+1 …
P1: Q(A) :- V1(A, B), V2(A, B)
P2: Q(A) :- V1(A, C), V2(A, B)
![Page 21: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/21.jpg)
21
Experimental study
Purpose: – Test how fast CoreCover generates rewritings (cost model CM1).
– Analyze its efficiency and scalability.
Experiment setup:
– A query generator (in Java). Input parameters:• Number of base relations
• Number of attributes in a relation
• Number of views (1-1000), queries (5)
• Number of subgoals in a view and a query
• Shape of queries and views (star, chain, …)
– Implemented in Java on a dual-processor Sun Ultra 2 workstation, running SUNOS 5.6, 256MB memory
![Page 22: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/22.jpg)
22
Star queries and views Each query has 8 subgoals, and each view has 1, 2, or 3 subgoals.
No attribute projection in the head of the queries/views.
![Page 23: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/23.jpg)
23
Chain queries and views Each query has 8 subgoals, and each view has 1, 2, or 3 subgoals.
1 variable is projected in the head of the queries/views.
![Page 24: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/24.jpg)
24
Conclusion
Generating efficient plans using views under CWA:– Cost model CM1: number of subgoals in a plan
• Analysis of the rewriting space• A search space for rewritings
• CoreCover: finding rewritings with minimum number of subgoals
– Cost model CM2: sizes of views and IRs• A search space for rewritings• Condense rewritings
– Cost model CM3: dropping irrelevant attributes in IRs• A heuristic to help optimizer drop attributes
![Page 25: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/25.jpg)
25
Future work
More complicated queries and views:– Arithmetic comparisons ( <=, >=, …)
– Aggregations
Different assumptions:– Open-world assumption
– Maximally-contained rewritings
Constraints: – Functional dependencies
– Foreign-key constraints
![Page 26: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/26.jpg)
26
Thank you!
Questions?
![Page 27: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/27.jpg)
27
Differences between CoreCover and MiniCon
CoreCover takes CWA, and MiniCon takes the OWA.
MiniCon tries to minimize the number of query subgoals, but it has no guarantee.
Technical differences:
– CoreCover is more “aggressive” than MiniCon about finding query subgoals answered by a view tuple.
– Finding set covers of query subgoals: CoreCover allows overlapping, and MiniCon does not allow it.
![Page 28: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/28.jpg)
28
Difference from earlier studiesViews
V1,V2,…,Vn
Query QStep 1: generate a rewriting P (logical plan)
Step 2: generate an efficient physical plan from P
One-step approach: [CKPS95, ZCLPU00].
We focus on the logical level (step 1).– Prune rewriting space to generate “good” rewritings.
– Cost-based.
Cost model CM
![Page 29: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/29.jpg)
29
Rewriting space
All rewritingsMinimalrewritings
Locallyminimalrewritings
Containmentminimalrewritings
Globallyminimalrewritings
P
P’
Rewriting P P’:
Remove its redundant subgoals [Chandra & Merlin 77]:
![Page 30: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/30.jpg)
30
P’ P’’: Remove its subgoals while retaining its equivalence to Q:
P3: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C)
V3(S) can still be removed.
Rewriting space (cont.)
All rewritingsMinimalrewritings
Locallyminimalrewritings
Containmentminimalrewritings
Globallyminimalrewritings
PP’
P’’
![Page 31: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/31.jpg)
31
Rewriting space (cont.)
All rewritingsMinimalrewritings
Locallyminimalrewritings
Containmentminimalrewritings
Globallyminimalrewritings
PP’
P’’ P*: transform P’’ using the mapping from the expansion of P’’ to the query:
P1: Q(S,C) :- v1(M1,a,C),v1(M,a,C1),v2(S,M,C)
P2: Q(S,C) :- v1(M,a,C), v2(S,M,C)
P’’
P*
![Page 32: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/32.jpg)
32
Concise representation of rewritings
Problem: as the number of views increases, the number of rewritings could be large!
Solution:
– Group views into equivalence classes
– Group view tuples into equivalence classes based on their covered query subgoals.
![Page 33: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/33.jpg)
33
Advantages
Advantages:
– Number of equivalence classes bounded by the number of query subgoals.
– The optimizer finds efficient physical plans by considering the “representative rewritings,” then decides how to make them more efficient by adding more view tuples.
– The optimizer can replace a view tuple in a rewriting by another view tuple in the same equivalence class to have another rewriting.
Equivalence classesViews
V1
V2
…
Vn
{V1, V3}
{V4,V10,V15}
{V2, V9}
…
Equivalence classesView tuples
T1
T2
…
Tn
{T2, T5}
{T1,T6,T9}
{T3}
…
![Page 34: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/34.jpg)
34
Main results of experiments
CoreCover has good efficiency and scalability.
By grouping views and view tuples into equivalence classes, we can reduce the number of views and view tuples used by CoreCover.
![Page 35: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/35.jpg)
35
Star queries and views:Number of equivalence classes
![Page 36: Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman](https://reader033.vdocuments.net/reader033/viewer/2022051618/56649d365503460f94a0e2ff/html5/thumbnails/36.jpg)
36
Star queries and views: Number of view tuples