statistics profile for query optimization
DESCRIPTION
Statistics Profile For Query Optimization. WENYI NI. Introduction. What is statistics profile?. Every object has its own status. In order to know its status, we need statistics. The relation between Statistics profile and statistics. Cost Model. From M.Tamer Oszu. - PowerPoint PPT PresentationTRANSCRIPT
05/01/04Spring 2004, CSE8330 Presentition1
Statistics Profile Statistics Profile For For
Query OptimizationQuery Optimization
WENYI NI
05/01/04Spring 2004, CSE8330 Presentition2
Introduction Introduction
What is statistics profile?
•Every object has its own status.
•In order to know its status, we need statistics.
•The relation between Statistics profile and statistics.
05/01/04Spring 2004, CSE8330 Presentition3
When DBMS use statistics profile?
From M.Tamer Oszu
Cost Model
05/01/04Spring 2004, CSE8330 Presentition4
What does statistics profile What does statistics profile collect?collect?The central tendency of the dataThe range of the dataThe size of the dataThe distribution of the data
05/01/04Spring 2004, CSE8330 Presentition5
Common types of statistics Common types of statistics profileprofileTable profileAttribute profileIndex profile
05/01/04Spring 2004, CSE8330 Presentition6
Typical profilesTypical profiles
Table profile
Cardinality
500
Row size 30
Pages 100
Number of attributes
6
Attribute profile
value 100
Max value 100
Min value 0
Size 5
Data distribution
skew
Index profile
Pages 50
Size 5
Distinct values
50
05/01/04Spring 2004, CSE8330 Presentition7
Three ways to collect statisticsThree ways to collect statistics
Exhaustive accumulationSamplingPiggyback
05/01/04Spring 2004, CSE8330 Presentition8
Exhaustive accumulationExhaustive accumulation
Calculate every statistics describer through scanning the related object exhaustively
AdvantageMost AccurateDisadvantageHeavy system load
05/01/04Spring 2004, CSE8330 Presentition9
SamplingSampling
Scan part of the related object. Estimate statistics through sample dataAdvantageLow system overheadDisadvantageStill have overhead. Statistics is not 100% accurate.
05/01/04Spring 2004, CSE8330 Presentition10
PiggybackPiggyback
Collect statistics through data in memory. Slightly change SQL statement to make full use of these data.Types of piggyback
1.Vertical piggyback
2.Horizontal piggyback
3.Mixed piggyback
05/01/04Spring 2004, CSE8330 Presentition11
Vertical piggybackVertical piggyback
Include extra columns during query processingExample:Select student.name from student;rewrite to:Select student.name,student.age from student;
05/01/04Spring 2004, CSE8330 Presentition12
No extra I/O, but extra cpu load. Solution: set piggyback level1.AC1 = { x| x is a column in Table Ri referenced by Query Q}2.AC2 = { x| x is an index column in Table Ri } – AC13.AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC24.AC4 = { x| x is a column in Table Ri }-AC3
Advantage: Choose your piggyback level according to the CPU load
05/01/04Spring 2004, CSE8330 Presentition13
Horizontal piggybackHorizontal piggyback
Include extra rows during query processExample:Select student.name, student.scoreFrom student where score >60;Rewrite to:Select student.name, student.scoreFrom student where score >60 or
student.pid In(Select student.pid for studentWhere score>60); Advantage
05/01/04Spring 2004, CSE8330 Presentition14
Mixed piggybackMixed piggyback
Use both vertical and horizontal piggyback method
Advantage
05/01/04Spring 2004, CSE8330 Presentition15
Value distributionValue distribution
Why we need it?
Example:Select * from StudentWhere score>60;
Size??
Attribute profile: score
Max 100
Min 0
Size 10
Values 101
Distribution table0~10: =1%10~19: =1%20~29: =1%30~39: =3%40~49: =6%50~59: =10%60~69: =10%70~79: =31%80~89: =30%90~100: =10%
05/01/04Spring 2004, CSE8330 Presentition16
Answer:Answer:
Size = 500*0.81*30 = 121.5
Where 500 is the cardinality of the student table. 30 is the size of each record
05/01/04Spring 2004, CSE8330 Presentition17
How to get distribution table?How to get distribution table?
Histogram1. Equal width2. Equal height
0
5
10
15
20
25
30
35
10 20 30 40 50 60 70 80 90 100
Score
Percentage
0
2
4
6
8
10
12
45 56 63 68 73 76 78 85 90 100
Score
Percentage
05/01/04Spring 2004, CSE8330 Presentition18
Bucket numberBucket number
1+ logn [rule of sturge 1927]Example: student table ( 500 records)1+log500 = 10For equal width, put each value into the proper bucketsFor equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order
05/01/04Spring 2004, CSE8330 Presentition19
SamplingSampling
How many sample do we need?A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988)
To gain same error rate for varies size of table,Sample rate drops when size of table grows.Drop rate: log(n)/nExample:20 sample with 2%error rate on table with 100 recordsWe need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records
05/01/04Spring 2004, CSE8330 Presentition20
Summery & Future work Summery & Future work
Low overheadLow error rate, still have room to improveThe way to estimate the size of project and
join operations with statistics still need be improved.
05/01/04Spring 2004, CSE8330 Presentition21
The endThe end