icicles: self-tuning samples for approximate query answering by venkatesh ganti , mong li lee,...

30
Harikrishnan Karunakaran Sulabha Balan ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan CSE 6339

Upload: enya

Post on 25-Feb-2016

44 views

Category:

Documents


1 download

DESCRIPTION

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan. Harikrishnan Karunakaran Sulabha Balan. CSE 6339 . Outline. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality & Performance Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Harikrishnan Karunakaran Sulabha Balan

ICICLES: Self-tuning Samples for Approximate Query Answering

By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan

CSE 6339

Page 2: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Introduction

Icicles

Icicle Maintenance

Icicle-Based Estimators

Quality & Performance

Conclusion

Outline

Page 3: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Analysis of data in data warehouses useful in decision support

Users of decision support systems want interactive systems

OLAP – Online Analytical Processing Aggregate Query Answering Systems

(AQUA) developed to reduce response time to desirable levels

Tolerant of approximate results

Introduction

Page 4: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Various Approaches

Sampling-based

Histogram-based

Clustering

Probabilistic

Wavelet-based

Approximate Querying

Page 5: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Uniform Random Sampling

Branch

State Sales

1 CA 80K2 TX 42K3 CA 40K4 CA 42K5 TX 75K6 CA 48K7 TX 55K8 TX 38K9 CA 40K10 CA 41K

Branch

State Sales

2 TX 42K4 CA 42K6 CA 48K8 TX 38K10 CA 41K

50%Sample

SELECT SUM(sales) x 2 AS cntFROM s_salesWHERE state = ‘TX’

S_sales

scale factor

Sales

Page 6: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Biased Sampling

Sample relation for aggregation query workload regarding Texas branches

Branch

State Sales

1 CA 80K2 TX 42K3 CA 40K4 CA 42K5 TX 75K6 CA 48K7 TX 55K8 TX 38K9 CA 40K10 CA 41K

Branch

State Sales

2 TX 42K4 CA 42K5 TX 75K7 TX 55K8 TX 38K

SalesS_sales

Page 7: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

All tuples in a Uniform Random Sample are treated as equally important for answering queries

Sample needs to be tuned to contain tuples which are more relevant to answer queries in a workload

Need for a dynamic algorithm that changes the sample as and according to suit the queries being executed in the workload

Methodology

Page 8: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Join of a Uniform Random Sample of a Fact Table with a set of accompanying Dimension Tables

Join Synopsis

SELECT COUNT(*), AVG(LI Extendedprice), SUM(LI Extendedprice) FROM LI, C, O, S, N, R WHERE C Custkey=O Custkey AND O Orderkey=LI

Orderkey AND LI Suppkey=S Suppkey AND C Nationkey=N

Nationkey AND N Regionkey=R Regionkey AND R Name=North

America AND O Orderdate01-01-1998 AND O Orderdate12-31-

1998;

Page 9: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Any aggregate query on the fact table can be answered approximately using exactly one of a smaller number of synopses

Uniform Random Sample of Relation wastes memory

OLAP queries exhibit locality in their data access

Need for Icicles

Page 10: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Class of samples to capture data locality of aggregate queries of foreign key joins

Identify focus of a query workload and sample accordingly

Is a uniform random sample of a multiset of tuples L, which is the union of R and all sets of tuples that were required to answer queries in the workload (an extension of R)

Is a non-uniform sample of the original relation R

Icicles

Page 11: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Icicle L

Page 12: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Icicle Maintanence Algorithm

Page 13: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Algorithm is efficient due to

Uniform Random Sample of L ensures tuple’s selection in its icicle is proportional to it’s frequency

Incremental maintenance of icicle requires only the segment of R that satisfies the new query from the workload

Reservoir Sampling Algorithm

Icicle Maintanence Algorithm

Page 14: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Icicle Maintenance Example

SELECT average(*)FROM widget-tunersWHERE date.month = ‘April’

Page 15: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

• In spite of unified sampling being used the result is a biased sample

• Frequency Relation maintained over all tuples in relation

• Different Estimation mechanisms for Average, Count and Sum

Icicle-Based Estimators

Page 16: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Average Average taken over set of distinct sample tuples that satisfy the query

predicate of the average query is a pretty good estimate of the average Count Sum of Expected Contributions of all tuples in the sample that

satisfy the given query Sum Estimate is given by the product of the average and the count

estimates

Estimators

Page 17: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Frequency Attribute added to the Relation

Starting Frequency set to 1 for all tuples

Incremented each time tuple is used to answer a query

Frequencies of relevant tuples updated only when icicle updated with new query

Maintaining Frequency Relation

Page 18: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

When queries exhibit data locality then icicle is constituted of more tuples from frequently accessed subsets of the relation

Accuracy improves with increase in number of tuples used to compute it

Class consisting of queries ‘focused’ with respect to workload will obtain more accurate approximate answers from the icicle

Quality Guarantees

Page 19: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Quality Guarantees contd...

Page 20: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Performance EvaluationSELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)FROM LI, C, O, S, N, RWHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)FROM LICOS-icicle, N, RWHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Qworkload : Template for generating workloads

Template for obtaining approximate answers

Page 21: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Performance Evaluation contd...

Page 22: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

The Error Plots for Comparison

Static uniform random sample on Join Synopsis

Icicle as it evolves with the workload

Icicle-Complete which is formed after entire workload has been executed once

Performance Evaluation contd...

Page 23: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Focused Queries

Performance Evaluation contd...

Page 24: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Performance Evaluation contd...

Mixed Workload

Page 25: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Rapid decrease in relative error of query answers from icicles with queries focused on a set of core tuples

Icicle plot shows a convergence to the Icicle-Complete plot

Quick Convergence of Icicle plot towards Icicle-Complete means Icicle adapts fast

Observations (focused)

Page 26: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Improvement due to usage of icicles is not significant

Can be concluded that icicles are at worst as good as the static samples

Observations (mixed)

Page 27: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Icicles provide class of samples that adapt according to the characteristics of the workload

It can never be worse than the case of static sampling

It focuses on relatively small subsets in the relation

Conclusion

Page 28: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

There is no significant gains in the case of Uniform Workload

There is a trade-off between accuracy and cost

Restricted to certain scenarios where the queries tend to be increasingly focused towards the workload.

Inferences

Page 29: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.

S Acharya, PB Gibbons, V Poosala, S Ramaswamy Join synopses for approximate query answering. ACM SIGMOD Record 1999

References

Page 30: ICICLES: Self-tuning Samples for Approximate Query Answering By  Venkatesh Ganti ,  Mong  Li Lee, and  Raghu Ramakrishnan

Thank You

Questions?