109.07.2012michael soffner a variability model for query optimizers michael soffner 1, norbert...
TRANSCRIPT
109.07.2012Michael Soffner
A Variability Model for Query Optimizers
Michael Soffner1, Norbert Siegmund1, Marko Rosenmüller1, Janet Siegmund1, Thomas Leich2, Gunter Saake1
1 University of Magdeburg, Germany2 METOP GmbH, Germany
209.07.2012Michael Soffner
• Motivation
• Variability Approach
• System Analysis
• Unified Variability Model
Outline
309.07.2012Michael Soffner
• Database vendors continuously extend functionality to fit to new application domains
• Leads to over bloated systems that have decreased performance and manageability
• Specialized systems outperform RDBMS, e.g., Sensor Networks and Data Warehouses (Stonebraker2005)
Driving factors for Query Optimizer extensions
• SQL conformity to standard
• New indexes, operations, statistics
Result: Increased search space and reduced performance
Motivation
409.07.2012Michael Soffner
• Goal: Specialized query processors by introducing variability
• Selection of only needed functionality and omitting the rest
• Variability through Software Product Lines (SPLs)
Our Approach
Fig.1 Benefits of tailored Query Optimizers
509.07.2012Michael Soffner
Software Product Lines (SPLs)
Use Features to describe a concept in a domain model
609.07.2012Michael Soffner
Product Derivation
Configuration
Feature Model
Reusable Implementation Artifacts
Program Generator Final Product
Dom
ain
Eng
ineeri
ng
Applic
ati
on E
ngin
eeri
ng
709.07.2012Michael Soffner
• 3 Steps to a unified model
Overall Process
Course Model
Course Model
SystemAnalysisSystemAnalysis UnificationUnification
SQLite/Optimizer
Evaluation Algorithm
Simplifi-cation
Strategy
logical
physical
Operations
Standardi-zation
Amelio-ration
SQLite/Optimizer
WhereClause Optimization
OrderBy Optimization
Evaluation Algorithm
Selectivity
Or Optimization
Truncate Optimization
Between Optimization
Greedy
Statistics
Join Nested Loop
Left Deep Tree
Histogram
Estimated Selectivity
MIN/MAX Optimization
Subquery Flattening From-Clause
Simplifi-cation
Or-In-Rewriting
Histogram Based
No. of Entries (Index)
Estimated Selectivity
Frequency
s
s
s
s
s
s
Strategy
logical
physical
Operations
Cost-Based Selection
Standardi-zation
Amelio-ration
AccessPaths
Like Optimization
s
Analyzes
Cardinality
Default Value
Table
RowID
Index
Full Table Scan
Analyzed Value
Default Value
Costs
Multi-OR
Index Range
Index Equal
s
SQLite/Optimizer
WhereClause Optimization
OrderBy Optimization
Evaluation Algorithm
Selectivity
Or Optimization
Truncate Optimization
Between Optimization
Greedy
Statistics
Join Nested Loop
Left Deep Tree
Histogram
Estimated Selectivity
MIN/MAX Optimization
Subquery Flattening From-Clause
Simplifi-cation
Or-In-Rewriting
Histogram Based
No. of Entries (Index)
Estimated Selectivity
Frequency
s
s
s
s
s
s
Strategy
logical
physical
Operations
Cost-Based Selection
Standardi-zation
Amelio-ration
AccessPaths
Like Optimization
s
Analyzes
Cardinality
Default Value
Table
RowID
Index
Full Table Scan
Analyzed Value
Default Value
Costs
Multi-OR
Index Range
Index Equal
s
SQLite/Optimizer
WhereClause Optimization
OrderBy Optimization
Evaluation Algorithm
Selectivity
Or Optimization
Truncate Optimization
Between Optimization
Greedy
Statistics
Join Nested Loop
Left Deep Tree
Histogram
Estimated Selectivity
MIN/MAX Optimization
Subquery Flattening From-Clause
Simplifi-cation
Or-In-Rewriting
Histogram Based
No. of Entries (Index)
Estimated Selectivity
Frequency
s
s
s
s
s
s
Strategy
logical
physical
Operations
Cost-Based Selection
Standardi-zation
Amelio-ration
AccessPaths
Like Optimization
s
Analyzes
Cardinality
Default Value
Table
RowID
Index
Full Table Scan
Analyzed Value
Default Value
Costs
Multi-OR
Index Range
Index Equal
s
Query Optimizer
Join
Evaluation Algorithms
NestedLoop
RecursiveAlgorithm
Hash
Merge
Left Deep
Bushy
Right Deep
Amelioration
View Merging
Inline set returning functions
Expression Preprocessing
Pull Up Subqueries
Reduce Outer Joins
Logical Optimizer
Physical Optimizer
Access Path
Table Scan
Index Scan
Sequential Scan
Full Index Scan
Index Unique Scan
TID/RowID-Scan
Index Range Scan
Index Skip Scan
Fast Full Index Scan
Index Joins
Bitmap Index
Sample Table Scan
Hash Scan
Cluster Scan
BitmapHeap Scan
GeneticAlgorithm
Left Deep
Selectivity
Cost-based Selection
Cost
Cardinality
Statistics
No. of Entries
No. of Blocks
Histogram
Frequency
Most Common Values
Most Common Frequencies
Height Balanced
No. of Distinct Values
Dynamic Sampling
StrategyOperations
Statistic Based
soptional (static)
requires
alternative
optional
mandatory
Standardization Simplification
Estimated Selectivity
Analyze
Predicate Pushing
Rewrite with materialized
views
Between Optimization
Truncate Optimization
Like Optimization
Full Table Scan
Multi-OR
Default Values
Default Values
StatsticBased
HistogramBased
CPU Usage
Memory Usage
Disk I/O
Classification by Jarke (1984)
Oracle, PostgreSQL, SQLite
Unified Model
809.07.2012Michael Soffner
• Generally distinguishes logical and physical optimization
Optimizer Functionality Classification (Jarke)
Logical Physical
Standardization• Transformation into a standardized representation(e.g. predicate normalization)
Evaluation Algorithm• General algorithm that generates the program to a given query(e.g. recursive search)
Simplification• Elimination of redundancies, (e.g. idempotency rules)
Operations• Physical implementations of logical operations(e.g. nested loop join)
Amelioration• Generating semantically equal queries with better performance(e.g. heuristics)
Strategy• Concepts to find best query plan(e.g. cost-based approach)
909.07.2012Michael Soffner
Simplifi-cation
logical Standardi-zation
Amelio-ration
SQLite/Optimizer
Truncate Optimization
Between Optimization
Subquery Flattening From-Clause
Simplifi-cation
Or-In-Rewriting
s
s
s
s
Standardi-zation
Amelio-ration
Like Optimization
s
• Customizable through #ifdef compiler flags static configuration
• All logical optimization features are optional• Only B-Tree indexes• Allows statistics to be omitted statically
SQLite
Selectivity
Statistics Histogram
Estimated SelectivityHistogram
Based
No. of Entries (Index)
Estimated Selectivity
Frequencys
Strategy Cost-Based Selection
Analyze
Default Value
Operationss
1009.07.2012Michael Soffner
Statistics
Seq Page Cost
Random Page Cost
CPU Tuple Cost
CPU Operator Cost
No. of TableEntries
No. of IndexEntries
No. Of Block (per Table)
No. Of Block (per Index)
Cost
Histogram Frequency
Most Common Values
Cost-based
• Most logical optimization feature aim to standardize the input query• No features for special heuristics• Includes inline set returning functions• Two evaluation algorithms: exhaustive search, genetic algorithm• Four index types: b-tree, hash-based and multi-dimension-based
indexes (GIS support)
PostgreSQL
Simplification
Rewrite Rule System
Pull up Sublinks
Inline set-returning functions
Expression Preprocessing
Pull Up Subqueries
Reduce Outer Joins
Logical Standardi-zation
Recursive Near Exhaustive
Search
Genetic Query Optimizer
Evaluation Algorithm
1109.07.2012Michael Soffner
• Special feature: predicates pushing, rewrite materialized views• Most Access Paths• Configuration through Hints
Oracle
SubqueryUnnesting
From-Clause
Where-Clause
View Merging
Predicate Pushing
Rewrite with Materiallized
Views
Logical Optimization Standardization
Amelioration
Access Paths Tablescan
IndexScan
Sequential Scan
Full Index Scan
Full Table Scan
Index Unique Scan
RowID-Scan
Index Range Scan
Index Skip Scan
Fast Full Index Scan
Index Joins
Bitmap Index
Sample Table Scan
Hash Scan
Cluster Scan
Estimation
Statistics
Selectivity
Cardinality
Cost
No. of Distinct Values
Histogram
No. Rows
No. of Disk I/O
Amount of CPU Usage
Amout of Memory Usage
Dynamic Sampling
Height Balanced
Frequency
Internal Default Values
1209.07.2012Michael Soffner
• Goal: System-independent Variability Model
• Identification of feature that implement same functionality
1.Integration
• A1: Same functionality but different names
• A2: Same names but different functionality
• Only semantic descriptions allow a decision
• Basis: Documentation and Source Code
• Example: Nested Loop
Variability Model: Unification Process
SQLite PostgreSQL
Name Nested Loop Nestpath
Source Source-code Comment
typedef definition and cost calculation algorithm
1309.07.2012Michael Soffner
2.Unification
• 1:1 Mapping (Mapping of one Features into one unified Feature)
• 1:n Mapping (Compose multiple system-dependent Features into one
unified Feature)
Variability Model: Unification Process 2
Feature SQLite PostgreSQL Oracle
No. of Table Entries
N/A No. of Table Entries
No. of Rows
Pull Up Subqueries
Subquery Flattening
Pull Up Subqueries Subquery Unnesting
From Clause N/A From Clause
N/A N/A Where Clause
N/A Pull Up Sublinks N/A
1409.07.2012Michael Soffner
Selectivity
Cost-based Selection
Cost
Cardinality
Dynamic Sampling
Strategy
Statistic Based
Estimated Selectivity
Default Values
Default Values
StatsticBased
HistogramBased
CPU Usage
Memory Usage
Disk I/O
Join
NestedLoop
Hash
Merge
Access Path
Table Scan
Index Scan
Sequential Scan
Full Index Scan
Index Unique Scan
TID/RowID-Scan
Sample Table Scan
Operations
AnalyzeFull Table
Scan
Variability Model
Amelioration
View Merging
Inline set returning functions
Expression Preprocessing
Pull Up Subqueries
Reduce Outer Joins
Logical Optimizer
Standardization Simplification
Predicate Pushing
Rewrite with materialized
views
Between Optimization
Truncate Optimization
Like Optimization
Evaluation Algorithms
RecursiveAlgorithm
Left Deep
Bushy
Right Deep
GeneticAlgorithm
Left Deep
Statistics
No. of Entries
No. of Blocks
Histogram
Frequency
Most Common Values
Most Common Frequencies
Height Balanced
No. of Distinct Values
1509.07.2012Michael Soffner
• Provide a basis for implementing configurable query optimizer
• Unified semantic description of query optimizer functionality (Taxonomy/Ontology)
• Provides a foundation for a (semi-)automatic configuration of query optimizers based on application requirements
• Provide a basis for modeling dependencies between query optimizers and deeper layers of DBMSs, e.g., Storage Engine
Conclusion
1609.07.2012Michael Soffner
[Stonebraker2005] M. Stonebraker and U. Cetintemel. One Size Fits All: An Idea Whose Time Has Come and Gone. In Proceedings of the International Conference on Data Engineering (ICDE),pages 2-11, 2005.
[Jarke84] M. Jarke and J. Koch. Query optimization in database systems. ACM Computing Surveys (CSUR), 16:111-152, June 1984. ACM ID: 356928.
References