materializing views with minimal size to answer queries

Download Materializing Views With Minimal Size To Answer Queries

If you can't read please download the document

Upload: kaylee

Post on 25-Feb-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Materializing Views With Minimal Size To Answer Queries. Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine). Materializing Minimal-Size Views. Context: relational databases - PowerPoint PPT Presentation

TRANSCRIPT

  • Materializing ViewsWith Minimal SizeTo Answer QueriesRada Chirkova(North Carolina State University) and Chen Li(University of California, Irvine)

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Materializing Minimal-Size ViewsContext: relational databases

    The problem: minimize the amount of data required to answer queries, by: automatically designing new relations (views), andprecomputing and storing (materializing) the new relations

    Central issue: inventing new views to materialize

    Applications include:Mediators in data-integration systemsDatabase as a service in enterprise computing 2

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Example: Modified TPC-H QueryQ(name,o_date,priority,comment,o_key,quantity, shipmode) :-customer(c_key,name,building),order(o_key,c_key,o_date,priority,comment),lineitem(lineno,o_key,quantity,shipmode).V1(name,o_date,priority,comment,o_key) :-customer(c_key,name,building),order(o_key,c_key,o_date,priority,comment),lineitem(lineno,o_key,quantity,shipmode).V2(o_key,quantity,shipmode) :-customer(c_key,name,building),order(o_key,c_key,o_date,priority,comment),lineitem(lineno,o_key,quantity,shipmode).

    3

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Partial Answer to the Query QName O_Date Priority Comment O_Key Quantity Shipmode Tom 3/14/95 0 close 134721 26 REG AIR Tom 3/14/95 0 close 134721 75 REG AIR Tom 3/14/95 0 close 134721 43 AIR Jack 12/21/94 0 final 571683 43 MAIL Jack 12/21/94 0 final 571683 33 AIR 4

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Minimal-Size Views for the Query QQ(name,o_date,priority,comment,o_key,quantity, shipmode) :-customer(c_key,name,building),order(o_key,c_key,o_date,priority,comment),lineitem(lineno,o_key,quantity,shipmode).V1(name,o_date,priority,comment,o_key) :-customer(c_key,name,building),order(o_key,c_key,o_date,priority,comment),lineitem(lineno,o_key,quantity,shipmode).V2(o_key,quantity,shipmode) :-customer(c_key,name,building),order(o_key,c_key,o_date,priority,comment),lineitem(lineno,o_key,quantity,shipmode).

    5

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Questions How do we know that views V1 and V2 are minimal-size views for the query Q? On what databases?

    How to find a set of minimal-size views, given a set of queries and a database:Is the problem decidable? For what inputs?What is the complexity of the problem?Are there good efficient algorithms for finding minimal-size views?

    6

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Preliminaries Two queries are equivalent if they return the same answers on any database.

    An equivalent rewriting of a query Q in terms of views V is a query that:is defined using the relations in V only, andis equivalent to Q

    A conjunctive query (view) can be defined using only equality selections, projections, and joins

    A disjunctive query (view) can be defined as a union of a finite number of conjunctive queries (views)7

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Problem SpecificationInput:Database instance D with schema R Workload Q of queries on D

    Output (optimal solution): a set V of views, such that:each query in Q has an equivalent rewriting in terms of V, andthe total size of the views, SVi V size(Vi), is minimal on D8

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • AssumptionsSingle database instanceSet semanticsFinite query workloads Conjunctive queriesDisjunctive views and rewritings9

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Main ResultsDecidability and upper bounds on the complexity of the problemRelationship between: a restriction on the language of the queries, and the language of optimal viewsDynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case)10

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Conjunctive Views and RewritingsTheorem. Given a query workload Q and a database D. It is possible to construct a finite search space of views that includes all views in all optimal solutions for Q on D. The number of views in the search space is at most doubly-exponential in the size of the input query workload Q.

    Corollary. The problem of finding a minimal-size conjunctive viewset is decidable for finite workloads of conjunctive queries, assuming all rewritings are conjunctive.11

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Self-Joins in Queries Q1(X,Y) :- p(X,Z), p(Z,T), s(Z,Y). // self-join Q2(X,Y) :- p(X,Z), r(Z,T), s(Z,Y). // no self-joins

    Result 1. For some databases and queries, there is a set of disjunctive views that is better than any conjunctive solution.Example for a single query with self-joins

    Result 2. The problem of finding an optimal solution in the space of disjunctive views is decidable, assuming conjunctive rewritings.

    Result 3. It is not necessary to consider disjunctive rewritings.

    Result 4. The size of the search space of views is at most triply-exponential in the size of the input query workload.12

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Queries Without Self-Joins: The Problem Is in NP13

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Queries Without Self-Joins: The Problem Is in NPdisjunctive views13

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Queries Without Self-Joins: The Problem Is in NPdisjunctive views13conjunctive views

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Queries Without Self-Joins: The Problem Is in NPdisjunctive views13conjunctive viewssubexpression views

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Queries Without Self-Joins: The Problem Is in NPdisjunctive views13conjunctive viewssubexpression viewsfull-reducer views

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • 1. Conjunctive Views Are EnoughTheorem. Given a database D and a set of queries Q without self-joins.Suppose a set V of disjunctive views is a solution for (D,Q).Then there exists another solution V for (D,Q), such that: all views in V are conjunctive, andsize (V) size (V).

    Corollary. For any database and any set of queries without self-joins,some optimal disjunctive solution is a set of conjunctive views.14

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • What We Have Showndisjunctive views15conjunctive views

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Idea of the ProofGiven: Q() :- S1(), S2(), , Sn(); rewriting P of Q that uses V: V = V1 V2 Vt Then there exists: V = V1 V2 Vt such that:for some mapping m, each Vi is an image of Vi, and each Vi alone can replace any Vj in the rewriting of Q16

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (1)P Q, P = P1 P2 ... PsThere exists a conjunctive query Pi: Pi QPi () :- Vi1(), , Vij(), , Vim(), G().Fix any Vij in Pi; consider, in P, Pr () :- Vij(), , Vij(), , Vij(), G().Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr 17

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (1)P Q, P = P1 P2 ... PsThere exists a conjunctive query Pi: Pi QPi () :- Vi1(), , Vij(), , Vim(), G().Fix any Vij in Pi; consider, in P, Pr () :- Vij(), , Vij(), , Vij(), G().Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr 17

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (1)P Q, P = P1 P2 ... PsThere exists a conjunctive query Pi: Pi QPi () :- Vi1(), , Vij(), , Vim(), G().Fix any Vij in Pi; consider, in P, Pr () :- Vij(), , Vij(), , Vij(), G().Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr 17

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (1)P Q, P = P1 P2 ... PsThere exists a conjunctive query Pi: Pi QPi () :- Vi1(), , Vij(), , Vim(), G().Fix any Vij in Pi; consider, in P, Pr () :- Vij(), , Vij(), , Vij(), G().Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr 17

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (2)We can always change b, to redirect all subgoals of Q that map into subgoals of more than one Vij in Pr

    Then, we can replace Pr with Pr: Pr() :- Vij(), , Vij(), , Vij(), G(). Pr():- Vij(), G().

    And Pr Q18

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19b

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19b

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19b

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19b

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (3)Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr :

    Q() :- , Sk(,W,),

    Prexp() :- , Sk(,Y,), , Sk(,Y,),

    Pr() :- Vij(), Vij(), , Vij(), G()19bb

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Details of the Proof (4)Thus, we can replace Pr with Pr: Pr() :- Vij(), , Vij(), , Vij(), G(). Pr():- Vij(), G().

    And Pr Q20

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • 2. Subexpression Views Are EnoughTheorem. Given a database D and a set of queries Q without self-joins.Suppose a set V of disjunctive views is a solution for (D,Q).Then there exists another solution V for (D,Q), such that: all views in V are conjunctive subexpression-type, andsize (V) size (V).

    Corollary. For any database and set of queries without self-joins,some optimal disjunctive solution is a set of conjunctive subexpression-type views. The size of the search space of views is at most singly-exponential in the size of the input query workload21

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • 3. Full-Reducer Views Are EnoughA view V is a full-reducer view for a query Q if V and Q have the same body.

    Theorem. Given a database D and a single query Q without self-joins.Suppose a set V of disjunctive views is a solution for (D,Q).Then there exists another solution V for (D,Q), such that: all views in V are conjunctive full-reducer views for Q, andsize (V) size (V).

    Corollary. For any database and any query without self-joins, some optimal disjunctive solution is a set of conjunctive full-reducer views.

    22

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Using Full-Reducer Views To Rewrite Sets of QueriesFor query workloads with more than one query, we can merge optimal full-reducer views for individual queries in the workload- and the number of subgoals in the merged views never exceeds the number of subgoals in full-reducer views.23

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • What We Have Showndisjunctive views24conjunctive viewssubexpression viewsfull-reducer views

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • The Problem Is in NPTheorem. Given a database instance, for any finite workload of conjunctive queries without self-joins, the problem of finding a minimal-size disjunctive viewset is in NP.25

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Generating Minimal-Size ViewsInput: a conjunctive query without self-joins and a databaseOutput: a minimal-size disjunctive viewset for the query on the databaseMethod: produce a minimal-size set of conjunctive full-reducer views,by doing exhaustive search in the space of the viewsusing a dynamic-programming algorithm (cf. query optimization in System R)The algorithm returns an optimal solutionCan be modified to work for non-singleton query workloads26

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Heuristics for Generating ViewsConsider only those views that cover up to a fixed number of subgoals of the queryConsider only those views that have up to a fixed number of head attributesApply the algorithm separately to several subsets of subgoals of the query, then combine the solutions

    27

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Main ResultsDecidability and upper bounds on the complexity of the problemRelationship between: a restriction on the language of the queries, and the language of optimal viewsDynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case)28

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • Some Directions of Future WorkRewriting queries in more expressive languages:built-in predicatesdisjunctive queriesUsing more expressive languages of views and rewritingsMaximally-contained rewritings of queries in terms of views29

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

  • ReferenceJia Li, Rada Chirkova, and Chen Li. Minimizing Data-Communication Costs by Decomposing Query Results in Client-Server Environments. UCI ICS Technical Report, 2003. http://www-db.ics.uci.edu/pages/raccoon/

    30

    Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003

    Chirkova, Halevy, and Suciu VLDB-200102/06/02: need to split into two slides? (want to talk about related work - view selection)

    The talk needs to be 45 minutes!

    People use db: put data, ask queries (same queries again)- can add or change data=> make these queries more efficient

    Stress that *multiple* queries

    Will not be discussing:- update costs- or indexing costs

    The big question: why need to materialize views at all=> the ancestor example

    Why view selection is not a completely satisfactory solutionA formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-2001A formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-2001Stress that *multiple* queries

    Will not be discussing:- update costs- or indexing costs

    Why rewriting uses only the views?- because can dematerialize all original relations (and some new views can be original relations)

    A formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-200102/06/02: the no indexes assumption is essential

    Put weighted sum on slides?Set semantics?A formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-2001A formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-2001A formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-2001A formal perspective on the view selection problemChirkova, Halevy, and Suciu VLDB-2001A formal perspective on the view selection problem