developing a characterization of business intelligence workloads for sizing new database systems

19
Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat Martin (Queen’s University) David B. Skillicorn (Queen’s University) Haider Rizvi (IBM Canada)

Upload: haracha

Post on 05-Jan-2016

37 views

Category:

Documents


3 download

DESCRIPTION

Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems. Ted J. Wasserman (IBM Corp. / Queen’s University) Pat Martin (Queen’s University) David B. Skillicorn (Queen’s University) Haider Rizvi (IBM Canada). Outline. Background and Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

Developing a Characterization of Business Intelligence Workloads for Sizing New Database SystemsTed J. Wasserman (IBM Corp. / Queen’s University)

Pat Martin (Queen’s University)David B. Skillicorn (Queen’s University)Haider Rizvi (IBM Canada)

Page 2: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

2

Outline

Background and Motivation Workload Characterization Analysis Results Future work

Page 3: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

3

What is Sizing?

Estimating the amount of physical computing resources needed to support a new workload

Processor (CPU), disk, memory

Use of simplifying assumptions, extrapolations, estimations, projections, rules-of-thumb, prior experience, etc.

Due to lack of available information about new application

Sizing is more of an art, than a science

Page 4: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

4

Our Sizing Approach

1) Collect the required high-level input data from the customer2) Cross-check and verify input data, making assumptions and

estimates if needed3) Determine the required system resource demands for each

workload class and type4) Aggregate the different workload types and classes’ resource

demands to determine the overall requirements5) Determine which hardware configurations will meet the required

resource demands6) Produce a ranked list of hardware configurations

* For more details, see: Wasserman, T.J., Martin, P., Rizvi, H. Sizing DB2 Servers for Business Intelligence Workloads. In Proc. of CASCON2004, October 2004, Toronto, Canada.

Page 5: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

5

Motivation

Problem: How does a customer describe the workload of their new application? May not know the exact queries yet No production-level measurements available Only vague, high-level information

Solution: Study the characteristics of a proxy workload (TPC-H) and have the customer describe their workload in terms of the approximate performance goals and mix of the different classes of queries inherent in the proxy workload

Page 6: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

6

Workload Characterization Analysis Partition queries into a few general classes

based on their resource usage Need to keep simple so that customer can

understand and relate to partitions Each class will comprise the queries that are

similar to each other based on resource usage and other characteristics

Page 7: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

7

Data Collection

Data from 5 recent TPC-H benchmark power runs used (*pre-audited runs)

Hardware configurations and database scales varied across benchmarks balanced system configurations were used

Data collected using standard OS monitoring tools at 5 second intervals for each query

Page 8: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

8

Parameter Selection

Query parameters monitored and used in analysis: Query response time (seconds) Average (user) CPU utilization Average MB/second rate Average IO/second rate Size of largest n-way table join

The above set was sufficient for our analysis

Page 9: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

9

Data Normalization

Within each benchmark run, data normalized Each benchmark data set transformed to one with

a 0-mean and std. dev. of 1 Normalized query data from each benchmark

combined and used for analysis

Page 10: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

10

Partitioning Techniques

Goal: Partition workload into classes or clusters so that objects within a cluster are similar to each other, but are dissimilar to objects in other clusters

Singular Value Decomposition (SVD) & SemiDiscrete Decomposition (SDD) Matrix decomposition techniques Unsupervised data mining Good at revealing underlying or ‘hidden’ factors in

data typical of real-world processes

Page 11: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

11

Singular Value Decomposition (SVD) A matrix, A, can be decomposed as:

A = U S VT

where U is n x r, V is r x r, S is an r x r diagonal matrix whose entries are decreasing (the singular values), U and V are orthogonal

The singular values indicate how important each new dimension is in representing the structure of A

Page 12: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

12

Singular Value Decomposition (2) Can be regarded as transforming the original

space to new axes such that as much variation as possible is expressed along the first axis, as much as possible of what remains along the second, and so on

Page 13: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

13

SemiDiscrete Decomposition (SDD) The SDD of A is given by

A = X D Y

where X is n x k, D is a k x k diagonal matrix, and Y is k x m (for arbitrary k)

The matrices X and Y have entries that are only -1, 0, or +1

Page 14: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

14

SemiDiscrete Decomposition (2) For any decomposition, the product of the ith

column of X, the ith diagonal element of D and the ith row of Y is a matrix of the same shape as A

In SDD, each of these layer matrices describes a `bump’ in the data, a region (not necessarily contiguous) of large magnitude

Page 15: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

15

Analysis Results

C1

C4

C2

C3

Page 16: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

16

Results

Four clusters of queries Cluster 1: Q1, Q3, Q4, Q5, Q6, Q11, Q14, Q14, Q19

“Moderate Complexity”

Cluster 2: Q2, Q20 “Simple Complexity”

Cluster 3: Q7, Q8, Q9, Q18, Q21 “High Complexity”

Cluster 4: Q10, Q13, Q15, Q16, Q22 “Trivial Complexity”

Page 17: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

17

Results (2)

Queries appear to scale well across different system architectures and database sizes

Attempt to understand meaning of the new “dimensions” of the SVD analysis U1 – CPU vs. IO-bound queries U2 – Query Response Times U3 – Sequential-IO intensive vs.

Random-IO Intensive

Page 18: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

18

Future Work

Perform analysis on larger set of data Use more robust/representative workload Extend to other workload types (e.g. OLTP)

Page 19: Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems

19

Fin

Thank you.