statistics profile for query optimization

21
05/01/04 Spring 2004, CSE8330 Presentition 1 Statistics Profile Statistics Profile For For Query Optimization Query Optimization WENYI NI

Upload: noe

Post on 14-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Statistics Profile For Query Optimization. WENYI NI. Introduction. What is statistics profile?. Every object has its own status. In order to know its status, we need statistics. The relation between Statistics profile and statistics. Cost Model. From M.Tamer Oszu. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition1

Statistics Profile Statistics Profile For For

Query OptimizationQuery Optimization

WENYI NI

Page 2: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition2

Introduction Introduction

What is statistics profile?

•Every object has its own status.

•In order to know its status, we need statistics.

•The relation between Statistics profile and statistics.

Page 3: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition3

When DBMS use statistics profile?

From M.Tamer Oszu

Cost Model

Page 4: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition4

What does statistics profile What does statistics profile collect?collect?The central tendency of the dataThe range of the dataThe size of the dataThe distribution of the data

Page 5: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition5

Common types of statistics Common types of statistics profileprofileTable profileAttribute profileIndex profile

Page 6: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition6

Typical profilesTypical profiles

Table profile

Cardinality

500

Row size 30

Pages 100

Number of attributes

6

Attribute profile

value 100

Max value 100

Min value 0

Size 5

Data distribution

skew

Index profile

Pages 50

Size 5

Distinct values

50

Page 7: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition7

Three ways to collect statisticsThree ways to collect statistics

Exhaustive accumulationSamplingPiggyback

Page 8: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition8

Exhaustive accumulationExhaustive accumulation

Calculate every statistics describer through scanning the related object exhaustively

AdvantageMost AccurateDisadvantageHeavy system load

Page 9: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition9

SamplingSampling

Scan part of the related object. Estimate statistics through sample dataAdvantageLow system overheadDisadvantageStill have overhead. Statistics is not 100% accurate.

Page 10: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition10

PiggybackPiggyback

Collect statistics through data in memory. Slightly change SQL statement to make full use of these data.Types of piggyback

1.Vertical piggyback

2.Horizontal piggyback

3.Mixed piggyback

Page 11: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition11

Vertical piggybackVertical piggyback

Include extra columns during query processingExample:Select student.name from student;rewrite to:Select student.name,student.age from student;

Page 12: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition12

No extra I/O, but extra cpu load. Solution: set piggyback level1.AC1 = { x| x is a column in Table Ri referenced by Query Q}2.AC2 = { x| x is an index column in Table Ri } – AC13.AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC24.AC4 = { x| x is a column in Table Ri }-AC3

Advantage: Choose your piggyback level according to the CPU load

Page 13: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition13

Horizontal piggybackHorizontal piggyback

Include extra rows during query processExample:Select student.name, student.scoreFrom student where score >60;Rewrite to:Select student.name, student.scoreFrom student where score >60 or

student.pid In(Select student.pid for studentWhere score>60); Advantage

Page 14: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition14

Mixed piggybackMixed piggyback

Use both vertical and horizontal piggyback method

Advantage

Page 15: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition15

Value distributionValue distribution

Why we need it?

Example:Select * from StudentWhere score>60;

Size??

Attribute profile: score

Max 100

Min 0

Size 10

Values 101

Distribution table0~10: =1%10~19: =1%20~29: =1%30~39: =3%40~49: =6%50~59: =10%60~69: =10%70~79: =31%80~89: =30%90~100: =10%

Page 16: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition16

Answer:Answer:

Size = 500*0.81*30 = 121.5

Where 500 is the cardinality of the student table. 30 is the size of each record

Page 17: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition17

How to get distribution table?How to get distribution table?

Histogram1. Equal width2. Equal height

0

5

10

15

20

25

30

35

10 20 30 40 50 60 70 80 90 100

Score

Percentage

0

2

4

6

8

10

12

45 56 63 68 73 76 78 85 90 100

Score

Percentage

Page 18: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition18

Bucket numberBucket number

1+ logn [rule of sturge 1927]Example: student table ( 500 records)1+log500 = 10For equal width, put each value into the proper bucketsFor equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order

Page 19: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition19

SamplingSampling

How many sample do we need?A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988)

To gain same error rate for varies size of table,Sample rate drops when size of table grows.Drop rate: log(n)/nExample:20 sample with 2%error rate on table with 100 recordsWe need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records

Page 20: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition20

Summery & Future work Summery & Future work

Low overheadLow error rate, still have room to improveThe way to estimate the size of project and

join operations with statistics still need be improved.

Page 21: Statistics Profile  For  Query Optimization

05/01/04Spring 2004, CSE8330 Presentition21

The endThe end