optimizing queries using materialized views qiang wang cs848

21
Optimizing Queries Using Materialized Views Qiang Wang CS848

Upload: deirdre-dawson

Post on 27-Dec-2015

247 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Optimizing Queries Using Materialized Views Qiang Wang CS848

Optimizing Queries Using Materialized Views

Qiang Wang

CS848

Page 2: Optimizing Queries Using Materialized Views Qiang Wang CS848

Answering queries using views

Cost-based rewriting(query optimization and physical data independence)

Logical rewriting(data integration and ontology integration)

System-R style Transformational

approaches

Rewriting algorithms

Query answering algorithms

Page 3: Optimizing Queries Using Materialized Views Qiang Wang CS848

Transformational approach

Transformation-based optimizer Related work

Oracle 8i DBMS IBM DB2Microsoft SQL Server

Page 4: Optimizing Queries Using Materialized Views Qiang Wang CS848

A practical and scalable solution Microsoft SQL Server 2000 Limitations on the materialized view

A single-level SQL statement containing selections, inner joins and an optional group-by

It must reference base tables Aggregation view must output all grouping columns and

a count column Aggregation functions are limited to SUM and COUNT

A solution by Goldstein and Larson An efficient view-matching algorithm A novel index structure on view definitions

q6wang
1.experiment results;2.the limitations on the materialized view is in fact imposd on the indexable view3.because of such limitations on the aggregation expressions, it's hard to answer the MAX or MIN
Page 5: Optimizing Queries Using Materialized Views Qiang Wang CS848

Problem definition View matching with single-view

substitution: Given a relational expression in SPJG form, find all materialized views from which the expression can be computed and, for each view found, construct a substitute expression equivalent to the given expression.

Page 6: Optimizing Queries Using Materialized Views Qiang Wang CS848

Create view v1 as

select p_partkey,p_name,p_retailprice,

count_big(*) as cnt

sum(l_extendedprice*l_quantity) as

gross_revenue

from dbo.lineitem, dbo.part

where p_partkey < 1000

and p_partkey = l_partkey

and p_name like ‘%steel%’

group by p_partkey,p_name,p_retailprice

Create unique clustered index v1_cidx on v1(p_partkey)

select p_partkey,p_name,

sum(l_extendedprice*l_quantity),

from dbo.lineitem, dbo.part

where p_partkey < 500

and p_partkey = l_partkey

and p_name like ‘%steel%’

group by p_partkey,p_retailprice

Materialized View

Query

q6wang
Bag semanticsthe selection predicate of view and query expressions is in CNF format
Page 7: Optimizing Queries Using Materialized Views Qiang Wang CS848

substitutability requirements

1. The view contains all rows needed by the query expression

2. All required rows can be selected from the view

3. All output expressions can be computed from the output of the view

4. All output rows occur with the correct duplication factor

Page 8: Optimizing Queries Using Materialized Views Qiang Wang CS848

Do all required rows exist in the view(1) Check whether the source tables in the view are

superset of the source tables in the query Check whether the query selection predicates

can imply the view selection predicates

)()()( vqvqv PUPUPEPRPRPEPEPEqqq

)()()( vqqvqqvqq PUPUPRPEPRPUPRPEPEPUPRPEqqq

vvvqq PUPRPEPUPRPEq

Equijoin subsumption test

Range subsumption test

Residual subsumption test

q6wang
1.selection predicate is in CNF, so that it's possible to partition the selection predicates into three categories: equality, range, and residual
Page 9: Optimizing Queries Using Materialized Views Qiang Wang CS848

Equijoin subsumption test

TestStep1: equivalence class construction

{p_partkey,l_partkey} as Vc1 and Qc1Step2: check whether every nontrivial view

equivalence class is a subset of some query equivalence class

Compensating equality constraints Missing cases

Page 10: Optimizing Queries Using Materialized Views Qiang Wang CS848

Range subsumption test

Test Associate (view/query) equivalence classes with

lower and upper bound according to the range predicates; uninitialized bounds are assigned infinitum

Upperbound(Vc1) = 1000 Upperbound(Qc1) = 500

Check whether the view is more tightly constrained than the query on ranges

Compensating range constraints Missing cases

Page 11: Optimizing Queries Using Materialized Views Qiang Wang CS848

Residual subsumption test

Test (a shallow matching algorithm) An expression is represented by a text string and a

list of equivalence class referenced Text string: like ‘%steel%’ Attached equivalence class: Vc1

Check whether every conjunct in the view residual predicate matches a conjunct in the query residual predicate

Compensating residual constraints Missing cases

Page 12: Optimizing Queries Using Materialized Views Qiang Wang CS848

Can the required rows be selected (2)

Check whether the view covers output columns for compensating equality, range, and residual predicates

Page 13: Optimizing Queries Using Materialized Views Qiang Wang CS848

Can output expressions be computed (3)

Check whether all output expressions of the query can be computed from the view“l_expendedprice * l_price”

Page 14: Optimizing Queries Using Materialized Views Qiang Wang CS848

Do rows occur with correct duplication factor(4) Cardinality-preserving joins (table

extension joins) A stronger condition is employed: an

equijoin between all columns in a non-null foreign key in T and a unique key in S

q6wang
under set semantics, it doesn't matter whether there are extra join tables, but under a bag semantics, it affects
Page 15: Optimizing Queries Using Materialized Views Qiang Wang CS848

TestGiven the query references tables T1,…Tn and

the view reference tables T1,…Tn,Tn+1,…Tn+m

Construct foreign key join graphBuild cardinality-preserving relationships

among tablesCheck whether the extra tables can be

eliminated Compensation (not needed)

q6wang
But to facilitate the whole test, it’s needed to conceptually modify the query source table list and also modify the query equivalence classesleave the picture for Hub use
Page 16: Optimizing Queries Using Materialized Views Qiang Wang CS848

Substitution with aggregation views

Special requirements The view contains no aggregation or is less

aggregated than the query Exact subsumption Functionally determination

All columns required to perform further grouping are available in the view output

COUNT,SUM,AVG Compensation

Add extra grouping and aggregation expression computation

q6wang
AVG(E): SUM(E)/Count_big(*)Count(*):SUM(Count_big(*))
Page 17: Optimizing Queries Using Materialized Views Qiang Wang CS848

A clever index ---- filter tree

A1 An…..

G1 Gm….. G1’ Gm’…..

…..….. …..

….. View definition listView definition list

q6wang
1.what is the structure: the (key,pointer) pair and the partitioning idea and the subset/superset relationship employed; also the key of the query.2.motivation : just the discard, more careful check on possible candidates will still be conducted in a later phase
Page 18: Optimizing Queries Using Materialized Views Qiang Wang CS848

Internal structure: lattice index

Tops

Roots

Q:{A,B}

superset

subset

{A,B,C} {A,B,F} {B,C,D,E}

{A,B}

{A} {B}

{B,E}

{D}

q6wang
algorithm:if check whether view is subset, from roots, otherwise from tops;then discard all unqualified ones; access the next level of the tree pointed by the qualified ones and then iterate the check on the lattice index;after the check is completed on the filter tree, all the view definitions left will be checked by the view-matching and compensation will be generated
Page 19: Optimizing Queries Using Materialized Views Qiang Wang CS848

Partitioning conditions

Hub condition Source table condition Output expression Output column Residual constraint Range constraint Grouping expression Grouping column

Page 20: Optimizing Queries Using Materialized Views Qiang Wang CS848

Experiment & extension work

Experiment Concerning 1000 views, candidate set is reduced to

less than 0.4% of the views on average; Optimization time is 0.15 second per query where around 10 different substitutions are generated

Extension Constraint test refinement Relaxation on single-table substitutes

q6wang
TPC-H/R
Page 21: Optimizing Queries Using Materialized Views Qiang Wang CS848

Consideration

How about a consolidated lattice structure over views?Construction && maintenance Introducing DL for reasoning, but how about

the aggregation support?