context tailoring the dbms –to support particular applications beyond alphanumerical data beyond...
TRANSCRIPT
Context
Tailoring the DBMS – To support particular applications
• Beyond alphanumerical data
• Beyond retrieve + process
– To support particular hardware• New storage devices
– To incorporate novel techniques• New join implementations
Extensibility
• Language extensions– Abstract data types (ADT)– User defined functions (UDF)
• Data management extensions– New access methods– New storage methods
• Query processing extensions– New join methods– New optimization techniques
Starburst Contributions
• Revisited internal data structures– Query graph model– Query execution plan: low-level operators and
stars
• Mechanisms for extensibility– Rules for query rewrite and plan optimization
Predator Contributions
• Enhanced abstract data types– Encapsulation principle applied to storage,
optimization and evaluation– Type centric DBMS design
Outline
• Introduction• Starburst
– Language extensions– Data management extensions– Query processing extensions
• Predator– E-ADT processing
• Summary
Starburst - Language Extensions
• User defined functions (1)– Scalar functions
• In: one or more field values from a single tuple
• Out: a single value
– Aggregate functions • In: one or more field values from several
tuples
• Out: a single value
Starburst - Language Extensions
• User defined functions (2)– Set predicate functions
• In: a simple predicate and a subquery (defines the range for the predicate)
• Out: a boolean value
– Table functions• In: one or several table expressions as
well as field values
• Out: a relation
Starburst – Language Extensions
• Abstract data types– Considered useful for:
• Type checking
• Structuring of users’data
– Add-on to the system design
Starburst – Data Management Extensions
• Uniform record structure:– Header + offset directory + data area– Advantages:
• Support for nested records• Treatment of null values and variable length fields
– Inconvenients:• Overhead per record due to the offset directory
• Core system services– Logging, recovery manager, predicate evaluator, event queues,
lock manager, interface to OS services, debugging, tracing, error reporting.
Starburst – Data Management Extensions
• Storage methods [associated to a relation]– Run-time methods for accessing relations: scan,
fetch, insert, update, delete, destroy– Implementation: the run-time methods are
registered in vector lists– Compile-time cost estimates
• Attachments [associated to a relation]– Access methods, integrity constraints and
trigger extensions
Starburst – Data Management Extensions
• Advantages– New storage methods and attachments can be
added without modifying existing code
• Limitations– Attachments only called after storage methods– Order in which attachments are called in fixed
order
Starburst – Query Processing Extensions
Internal representation of queries– Query graph model
• Beyond parse trees for the low-level plan operators• Used for query rewrite
– Query execution plan• Operator based representation • Strategy alternative rules (stars) to represent
execution plan• Used for query plan generation
Query Graph Model
• Boxes• Stored relations• Derived relations
• Vertices • Setformers iterators: produce tuples for a derived relation• Quantifiers iterators: restrict tuples for a derived relation
• Edges• Range edges connecting a vertex and a box: access to a stored
or a derived relation• Qualifier edges connecting one or more vertices: conjunction
of predicates
Query Rewrite
• Objectives:– Equivalent representation for alternative phrasings of a
query
– Only the DBMS can rewrite queries involving views
• Example rules:– Views may be merged
– Redundant joins may be eliminated
– Selections may be pushed down
Query Rewrite Rules
• A rule transforms a QGM into another QGM• Condition / action: IF THEN rules• Rule engine
– Forward chaining
– Various control strategies for rule application
• Search strategy– Top down (depth first / breadth first)/ bottom up
How to Choose Between
Alternative Rules?
• Cost based decision
• Problem: cost estimates are only known at the query execution plan level
• Approach: several alternatives are kept in the QGM – CHOOSE operation
Query Execution Plan
Execution plan represented using production rules: – Terminals: low-level plan operators
• In: 0 or more streams of tuples• Out: 0 or more streams of tuples• Each stream of tuples is tagged with properties
– Relational: schema information– Operational: order, location– Estimated:
– Non terminals: STAR• Name• Alternative definitions in terms of low-level plan operators or
other STARs
Query Execution Plan
• A query execution plan is a tree of low-level plan operators
• STAR production rules are used for generating query execution plans– General purpose STAR evaluator– Search strategy to choose next STAR to apply– Vector list of stars
Starburst Contributions
• Revisited internal data structures– Query graph model– Query execution plan: low-level operators and
STARs
• Mechanisms for extensibility– Rules for query rewrite and plan optimization
Outline
• Introduction• Starburst
– Language extensions– Data management extensions– Query processing extensions
• Predator– E-ADT processing
• Summary
Basic Techniques for ADTs
• Vector List of ADTs• Each ADT implements:
– Common internal interface for access to ADT values
– Functions for storage and indexed retrieval
• Methods associated to ADT– ADT methods can be composed
– DBMS understands minimal semantics about each method
“Black box” ADT Approach
Motivation for E-ADTs
• Basic observation:– ADT Methods can be expensive!
• Need to identify optimizations on ADT methods
• Need to define a framework for applying these optimizations systematically
Possible Optimizations
• Algorithmic:– Using different algorithms for each method depending
on data characteristics
• Transformational:– Changing the order of methods
• Constraint:– Pushing physical constraints through a method
• Pipelining:– Avoiding materialization of intermediate results
Architectural Framework
Each E-ADT supports some of the following enhancements:– Optimization: transforms a method expression into a
query execution plan expression– Evaluation: routines to execute the query execution
plan expression– Catalog management: routines to store schema
information and maintain statistics– Storage management: physical representation of values
of its type
E-ADT Rewrite Rules
• Some of the optimizations for ADT methods can be applied on a logical representation of queries using rewrite rules
Predator Contributions
• Enhanced abstract data types– Encapsulation principle applied to storage,
optimization and evaluation– Type centric DBMS design