context tailoring the dbms –to support particular applications beyond alphanumerical data beyond...

Post on 28-Dec-2015

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Context

Tailoring the DBMS – To support particular applications

• Beyond alphanumerical data

• Beyond retrieve + process

– To support particular hardware• New storage devices

– To incorporate novel techniques• New join implementations

Extensibility

• Language extensions– Abstract data types (ADT)– User defined functions (UDF)

• Data management extensions– New access methods– New storage methods

• Query processing extensions– New join methods– New optimization techniques

Starburst Contributions

• Revisited internal data structures– Query graph model– Query execution plan: low-level operators and

stars

• Mechanisms for extensibility– Rules for query rewrite and plan optimization

Predator Contributions

• Enhanced abstract data types– Encapsulation principle applied to storage,

optimization and evaluation– Type centric DBMS design

Outline

• Introduction• Starburst

– Language extensions– Data management extensions– Query processing extensions

• Predator– E-ADT processing

• Summary

Starburst - Language Extensions

• User defined functions (1)– Scalar functions

• In: one or more field values from a single tuple

• Out: a single value

– Aggregate functions • In: one or more field values from several

tuples

• Out: a single value

Starburst - Language Extensions

• User defined functions (2)– Set predicate functions

• In: a simple predicate and a subquery (defines the range for the predicate)

• Out: a boolean value

– Table functions• In: one or several table expressions as

well as field values

• Out: a relation

Starburst – Language Extensions

• Abstract data types– Considered useful for:

• Type checking

• Structuring of users’data

– Add-on to the system design

Starburst – Data Management Extensions

• Uniform record structure:– Header + offset directory + data area– Advantages:

• Support for nested records• Treatment of null values and variable length fields

– Inconvenients:• Overhead per record due to the offset directory

• Core system services– Logging, recovery manager, predicate evaluator, event queues,

lock manager, interface to OS services, debugging, tracing, error reporting.

Starburst – Data Management Extensions

• Storage methods [associated to a relation]– Run-time methods for accessing relations: scan,

fetch, insert, update, delete, destroy– Implementation: the run-time methods are

registered in vector lists– Compile-time cost estimates

• Attachments [associated to a relation]– Access methods, integrity constraints and

trigger extensions

Starburst – Data Management Extensions

• Advantages– New storage methods and attachments can be

added without modifying existing code

• Limitations– Attachments only called after storage methods– Order in which attachments are called in fixed

order

Starburst – Query Processing Extensions

Internal representation of queries– Query graph model

• Beyond parse trees for the low-level plan operators• Used for query rewrite

– Query execution plan• Operator based representation • Strategy alternative rules (stars) to represent

execution plan• Used for query plan generation

Query Graph Model

• Boxes• Stored relations• Derived relations

• Vertices • Setformers iterators: produce tuples for a derived relation• Quantifiers iterators: restrict tuples for a derived relation

• Edges• Range edges connecting a vertex and a box: access to a stored

or a derived relation• Qualifier edges connecting one or more vertices: conjunction

of predicates

Query Rewrite

• Objectives:– Equivalent representation for alternative phrasings of a

query

– Only the DBMS can rewrite queries involving views

• Example rules:– Views may be merged

– Redundant joins may be eliminated

– Selections may be pushed down

Query Rewrite Rules

• A rule transforms a QGM into another QGM• Condition / action: IF THEN rules• Rule engine

– Forward chaining

– Various control strategies for rule application

• Search strategy– Top down (depth first / breadth first)/ bottom up

How to Choose Between

Alternative Rules?

• Cost based decision

• Problem: cost estimates are only known at the query execution plan level

• Approach: several alternatives are kept in the QGM – CHOOSE operation

Query Execution Plan

Execution plan represented using production rules: – Terminals: low-level plan operators

• In: 0 or more streams of tuples• Out: 0 or more streams of tuples• Each stream of tuples is tagged with properties

– Relational: schema information– Operational: order, location– Estimated:

– Non terminals: STAR• Name• Alternative definitions in terms of low-level plan operators or

other STARs

Query Execution Plan

• A query execution plan is a tree of low-level plan operators

• STAR production rules are used for generating query execution plans– General purpose STAR evaluator– Search strategy to choose next STAR to apply– Vector list of stars

Starburst Contributions

• Revisited internal data structures– Query graph model– Query execution plan: low-level operators and

STARs

• Mechanisms for extensibility– Rules for query rewrite and plan optimization

Outline

• Introduction• Starburst

– Language extensions– Data management extensions– Query processing extensions

• Predator– E-ADT processing

• Summary

Basic Techniques for ADTs

• Vector List of ADTs• Each ADT implements:

– Common internal interface for access to ADT values

– Functions for storage and indexed retrieval

• Methods associated to ADT– ADT methods can be composed

– DBMS understands minimal semantics about each method

“Black box” ADT Approach

Motivation for E-ADTs

• Basic observation:– ADT Methods can be expensive!

• Need to identify optimizations on ADT methods

• Need to define a framework for applying these optimizations systematically

Possible Optimizations

• Algorithmic:– Using different algorithms for each method depending

on data characteristics

• Transformational:– Changing the order of methods

• Constraint:– Pushing physical constraints through a method

• Pipelining:– Avoiding materialization of intermediate results

Architectural Framework

Each E-ADT supports some of the following enhancements:– Optimization: transforms a method expression into a

query execution plan expression– Evaluation: routines to execute the query execution

plan expression– Catalog management: routines to store schema

information and maintain statistics– Storage management: physical representation of values

of its type

E-ADT Rewrite Rules

• Some of the optimizations for ADT methods can be applied on a logical representation of queries using rewrite rules

Predator Contributions

• Enhanced abstract data types– Encapsulation principle applied to storage,

optimization and evaluation– Type centric DBMS design

top related