firebird: cost-based optimization and statistics, by dmitry yemanov (in english)

19

Click here to load reader

Upload: nataly-polyanskaya

Post on 16-Jun-2015

4.224 views

Category:

Technology


2 download

DESCRIPTION

Basic introduction to internal mechanism of Firebird optimizer. How it works, how it decides to use this or that index, why sometimes it fails and what you can do to improve performance? Definitely this presentation will not answer all these questions but it gives you a basic knowledge of Firebird optimizer internals. This is not for all developers and requires some qualification, definitely.

TRANSCRIPT

Page 1: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Cost-based Optimization

andStatistics in Firebird

Dmitry Yemanov

The Firebird Projecthttp://www.firebirdsql.org

Page 2: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Introduction

Optimizer decides how to find all the information required in the most efficient way it can

Different queries and/or fetch strategies may benefit from

different data access paths Some information should exist in order to help the

optimizer in guessing about the best access path

Optimization strategies Rule-based (heuristics) Cost-based (statistics)

Page 3: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Rule-based Optimization

Heuristical definitions Indexed retrieval is better than a full table scan

(and indexed loop join is better than a merge join) B-tree has three levels of depth Compound indices are better than simple ones

Drawbacks Indices could be bad for some operations User intentions are not taken into account Not ready for “ad hoc” queries

Page 4: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Cost-based Optimization

Key points Every operation has an associated cost value Cost value is calculated using statistical data Cost is aggregated from bottom up in the access path

Drawbacks Complex implementation Slow optimization process Requires up-to-date statistics

Page 5: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Basic Terms

Selectivity Represents a fraction of rows from a row set Lies in the value range 0.0 to 1.0

Cardinality Represents number of rows in a row set Base cardinality is the number of rows in a base table

Page 6: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Understanding of Cost

Cost Is a function of the estimated cardinalities Represents computational complexity of the retrieval

Measurement Cost value linearly depends on the number of logical reads

required to perform an operation Logical read is equal to a single page fetch Cost value may also take into account auxiliary steps such

as an external sorting

Page 7: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Cost Measurement (example)

Full table scan cost = base cardinality

Unique index scan cost = b-tree level + 1

Range index scan cost = b-tree level + N + selectivity * base cardinality

(N represents the number of the required leaf page fetches

and thus depends on the average key length)

Page 8: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Cost Aggregation (example)Final Row Set

cost = 9000

Sortcost = 9000

Full Scancost = 1000

Filtercost = 7000

Index Scancost = 5

Loop Joincost = 6000

SELECT *FROM T1 JOIN T2 ON T1.PK = T2.FKWHERE T1.VAL + T2.VAL < 100ORDER BY T1.NUM

Page 9: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Statistics

Information describing data amounts and distribution of values on different levels(table, index, column)

Stored in a database or estimated at runtime

Collected by request or automatically

Page 10: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Core Statistics

Number of Rows in a Table (Base Cardinality) Small tables:

number of used record slots on the data pages Large tables:

number of used data pages / average record length Estimated at runtime

via scanning pointer or data pages

Page 11: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Core Statistics (continued)

Index Selectivity 1 / number of distinct keys in the index Maintained per segment: (A), (A, B), (A, B, C) Assumes uniform distribution of values Calculated during index creation or upon request

(SET STATISTICS statement) Stored on the index root page Visible in RDB$INDICES and RDB$INDEX_SEGMENTS

Page 12: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Decisions Based on Core Statistics

Full Table Scan over Indexed Retrieval Selectivity close to 1.0 suggests a full scan

What Indices to Use Compare index selectivities and index scan costs Consider segment operations for compound indices Calculate selectivities for AND and OR operations

Order of Streams in Loop Joins Calculate costs for different join orders

and choose the best one

Page 13: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Advanced Statistics

Table level Average page fill factor Average row length

(both help with a better base cardinality estimation) Number of rows

(allows to avoid the runtime pages scan)

Page 14: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Advanced Statistics (continued)

Index level B-tree depth Average key length

(both help with a better cost estimation for index scans) Clustering factor

(allows to prefer an index navigation

over an external sort under some conditions;

also could be used to avoid filling the sparse bitmap)

Page 15: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Clustering Factor

Index Key 1

Index Key 2

Index Key 3

Index Key 5

Index Key 4

Data Page 12

Data Page 25

Data Page 28

Data Page 57

Data Page 44

Data Page 12

Data Page 13

Data Page 14

Bad Clustering Factor Good Clustering Factor

Page 16: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Advanced Statistics (continued)

Column level Selectivity

(core feature, required to estimate costs) Number of NULLs

(useful for selectivity estimations for IS [NOT] NULL) Value distribution histogram

(allows selectivity estimations for non-uniform value

distributions)

Page 17: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Sample Histograms

0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500

'A'

'B'

'C'

'D'

1 5 5 5 10 20 50 50 80 100

1. Non-Selective Column

2. Selective Column

Page 18: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Decisions Based on Advanced Statistics

Sort Aggregation vs Hash Aggregation Selectivity of columns being grouped by

Loop Join vs Merge Join vs Hash Join Cardinality of tables and filtering predicates

Index Usage Number of NULLs or histogram

Index Navigation vs External Sorting Clustering factor

Page 19: Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

The Firebird Projectwww.firebirdsql.org