context tailoring the dbms –to support particular applications beyond alphanumerical data beyond...

26
Context Tailoring the DBMS – To support particular applications • Beyond alphanumerical data • Beyond retrieve + process – To support particular hardware • New storage devices – To incorporate novel techniques • New join implementations

Upload: myra-marshall

Post on 28-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Context

Tailoring the DBMS – To support particular applications

• Beyond alphanumerical data

• Beyond retrieve + process

– To support particular hardware• New storage devices

– To incorporate novel techniques• New join implementations

Page 2: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Extensibility

• Language extensions– Abstract data types (ADT)– User defined functions (UDF)

• Data management extensions– New access methods– New storage methods

• Query processing extensions– New join methods– New optimization techniques

Page 3: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst Contributions

• Revisited internal data structures– Query graph model– Query execution plan: low-level operators and

stars

• Mechanisms for extensibility– Rules for query rewrite and plan optimization

Page 4: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Predator Contributions

• Enhanced abstract data types– Encapsulation principle applied to storage,

optimization and evaluation– Type centric DBMS design

Page 5: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Outline

• Introduction• Starburst

– Language extensions– Data management extensions– Query processing extensions

• Predator– E-ADT processing

• Summary

Page 6: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst - Language Extensions

• User defined functions (1)– Scalar functions

• In: one or more field values from a single tuple

• Out: a single value

– Aggregate functions • In: one or more field values from several

tuples

• Out: a single value

Page 7: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst - Language Extensions

• User defined functions (2)– Set predicate functions

• In: a simple predicate and a subquery (defines the range for the predicate)

• Out: a boolean value

– Table functions• In: one or several table expressions as

well as field values

• Out: a relation

Page 8: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst – Language Extensions

• Abstract data types– Considered useful for:

• Type checking

• Structuring of users’data

– Add-on to the system design

Page 9: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst – Data Management Extensions

• Uniform record structure:– Header + offset directory + data area– Advantages:

• Support for nested records• Treatment of null values and variable length fields

– Inconvenients:• Overhead per record due to the offset directory

• Core system services– Logging, recovery manager, predicate evaluator, event queues,

lock manager, interface to OS services, debugging, tracing, error reporting.

Page 10: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst – Data Management Extensions

• Storage methods [associated to a relation]– Run-time methods for accessing relations: scan,

fetch, insert, update, delete, destroy– Implementation: the run-time methods are

registered in vector lists– Compile-time cost estimates

• Attachments [associated to a relation]– Access methods, integrity constraints and

trigger extensions

Page 11: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst – Data Management Extensions

• Advantages– New storage methods and attachments can be

added without modifying existing code

• Limitations– Attachments only called after storage methods– Order in which attachments are called in fixed

order

Page 12: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst – Query Processing Extensions

Internal representation of queries– Query graph model

• Beyond parse trees for the low-level plan operators• Used for query rewrite

– Query execution plan• Operator based representation • Strategy alternative rules (stars) to represent

execution plan• Used for query plan generation

Page 13: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Query Graph Model

• Boxes• Stored relations• Derived relations

• Vertices • Setformers iterators: produce tuples for a derived relation• Quantifiers iterators: restrict tuples for a derived relation

• Edges• Range edges connecting a vertex and a box: access to a stored

or a derived relation• Qualifier edges connecting one or more vertices: conjunction

of predicates

Page 14: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Query Rewrite

• Objectives:– Equivalent representation for alternative phrasings of a

query

– Only the DBMS can rewrite queries involving views

• Example rules:– Views may be merged

– Redundant joins may be eliminated

– Selections may be pushed down

Page 15: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Query Rewrite Rules

• A rule transforms a QGM into another QGM• Condition / action: IF THEN rules• Rule engine

– Forward chaining

– Various control strategies for rule application

• Search strategy– Top down (depth first / breadth first)/ bottom up

Page 16: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

How to Choose Between

Alternative Rules?

• Cost based decision

• Problem: cost estimates are only known at the query execution plan level

• Approach: several alternatives are kept in the QGM – CHOOSE operation

Page 17: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Query Execution Plan

Execution plan represented using production rules: – Terminals: low-level plan operators

• In: 0 or more streams of tuples• Out: 0 or more streams of tuples• Each stream of tuples is tagged with properties

– Relational: schema information– Operational: order, location– Estimated:

– Non terminals: STAR• Name• Alternative definitions in terms of low-level plan operators or

other STARs

Page 18: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Query Execution Plan

• A query execution plan is a tree of low-level plan operators

• STAR production rules are used for generating query execution plans– General purpose STAR evaluator– Search strategy to choose next STAR to apply– Vector list of stars

Page 19: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Starburst Contributions

• Revisited internal data structures– Query graph model– Query execution plan: low-level operators and

STARs

• Mechanisms for extensibility– Rules for query rewrite and plan optimization

Page 20: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Outline

• Introduction• Starburst

– Language extensions– Data management extensions– Query processing extensions

• Predator– E-ADT processing

• Summary

Page 21: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Basic Techniques for ADTs

• Vector List of ADTs• Each ADT implements:

– Common internal interface for access to ADT values

– Functions for storage and indexed retrieval

• Methods associated to ADT– ADT methods can be composed

– DBMS understands minimal semantics about each method

“Black box” ADT Approach

Page 22: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Motivation for E-ADTs

• Basic observation:– ADT Methods can be expensive!

• Need to identify optimizations on ADT methods

• Need to define a framework for applying these optimizations systematically

Page 23: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Possible Optimizations

• Algorithmic:– Using different algorithms for each method depending

on data characteristics

• Transformational:– Changing the order of methods

• Constraint:– Pushing physical constraints through a method

• Pipelining:– Avoiding materialization of intermediate results

Page 24: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Architectural Framework

Each E-ADT supports some of the following enhancements:– Optimization: transforms a method expression into a

query execution plan expression– Evaluation: routines to execute the query execution

plan expression– Catalog management: routines to store schema

information and maintain statistics– Storage management: physical representation of values

of its type

Page 25: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

E-ADT Rewrite Rules

• Some of the optimizations for ADT methods can be applied on a logical representation of queries using rewrite rules

Page 26: Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New

Predator Contributions

• Enhanced abstract data types– Encapsulation principle applied to storage,

optimization and evaluation– Type centric DBMS design