benchmarking dbms’s for communication cost analysis

36
Benchmarking DBMS’s for Communication Cost Analysis A Work Term Report Presentation Tony Young M.Math Candidate May 27 th , 2005

Upload: upton-suarez

Post on 31-Dec-2015

23 views

Category:

Documents


1 download

DESCRIPTION

Benchmarking DBMS’s for Communication Cost Analysis. A Work Term Report Presentation Tony Young M.Math Candidate May 27 th , 2005. Introduction. What is a federated system? Travelocity Remote searches of airline databases Performs bookings, adds payment details, etc. Google Scholar - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Benchmarking DBMS’s for Communication Cost Analysis

Benchmarking DBMS’s for Communication Cost Analysis

A Work Term Report Presentation

Tony Young

M.Math Candidate

May 27th, 2005

Page 2: Benchmarking DBMS’s for Communication Cost Analysis

Introduction What is a federated system? Travelocity

Remote searches of airline databases Performs bookings, adds payment details, etc.

Google Scholar Remote searches of ACM, IEEE, etc. databases Presents consolidated view of papers matching

common search criteria

Page 3: Benchmarking DBMS’s for Communication Cost Analysis

Outline Introduction Organization Optimization Global Cost Modeling Experiments Experimental Procedure Results Conclusion Future Work

Page 4: Benchmarking DBMS’s for Communication Cost Analysis

Organization

Multidatabase Language Approach Pass-through Querying Global Schema Approach

Page 5: Benchmarking DBMS’s for Communication Cost Analysis

Organization

Global schema approach Burden of integration is on global DBA Logical global schema Functional compensation Possibly high maintenance

Page 6: Benchmarking DBMS’s for Communication Cost Analysis

Organization

Global Schema Approach

Physical Org. Logical Org.

Page 7: Benchmarking DBMS’s for Communication Cost Analysis

Optimization Optimization challenges for the FDBS

Remote site autonomy Remote parameters Translation Heterogeneous capabilities Additional costs

From the perspective of the remote source, the FDBS is just another application requesting data!

Page 8: Benchmarking DBMS’s for Communication Cost Analysis

Optimization

Omni module in iAnywhere ASA Supports GS approach and pass-through

querying Performance of global queries is not as good

as local queries

Page 9: Benchmarking DBMS’s for Communication Cost Analysis

Global Cost Modeling Many factors must be taken into account

Optimization Cost (OPT) Communication Cost (COMM) Execution Cost (EXEC) Sub-query/Method Call Costs (SM) Reformatting Costs (RF)

Working Cost Model

Page 10: Benchmarking DBMS’s for Communication Cost Analysis

Global Cost Modeling Interest for this project is communication cost

LS = Link Speed S = Source/DBMS DS = Data Size DT = Data Type PF = Prefetch Status PS = Packet Size R = Processor Speed

Page 11: Benchmarking DBMS’s for Communication Cost Analysis

Experiments Goal

Determine if communication cost can be modeled using simple network applications

Determine what factors affect communication cost

Two sets of experiments Pure network benchmarking DBMS benchmarking

Varied each factor mentioned previously, one at a time

Page 12: Benchmarking DBMS’s for Communication Cost Analysis

Experimental Procedure

Hot cache 30 trials Experimental error below 5% Parameters varied during both sets of

experiments Semantics of prefetching for network

benchmarking

Page 13: Benchmarking DBMS’s for Communication Cost Analysis

Experimental Procedure

Applications DBCreate NetBench DBBench ResultParse

Page 14: Benchmarking DBMS’s for Communication Cost Analysis

Experimental Procedure Recall the working cost model

Used two types of queries SELECT * ROW SELECT MAX(COLUMN) MAX

Ensure no indexes were created Determining communication cost

Page 15: Benchmarking DBMS’s for Communication Cost Analysis

Experimental Procedure Recording query execution time

Page 16: Benchmarking DBMS’s for Communication Cost Analysis

Experimental Procedure Many ways to calculate

Similar overhead in both types of queries

Assumptions Hot cache Transfer of max() value negligible Loop evaluation is negligible

Page 17: Benchmarking DBMS’s for Communication Cost Analysis

Results Results Table

DBMS (S)

Source PS CPU LS PF … MIN MED MAX (bytes) (MHz) (Mb/s) (ms) (ms) (ms) System 1 1460 450 10 Off 0.8028 1.0730 2.6962 System 2 4096 450 10 Off … 1.2644 1.5728 4.3286 System 3 32767 450 10 Off 0.7996 1.0190 3.2946 System 4 2048 450 10 Off 0.9986 1.2402 2.7398 … … … … … … … … System 1 1460 450 10 On 0.1032 0.2896 1.5936 System 2 4096 450 10 On … 0.1270 0.2362 2.2414 System 3 327 67 450 10 On 0.14 06 0.33 24 2.74 04 System 4 2048 450 10 On 0.22 40 0.44 06 1.32 36

Page 18: Benchmarking DBMS’s for Communication Cost Analysis

Results

Link Speed (LS)

Source Avg LS

(% Reduction)

System 1 23.79

System 2 12.34

System 3 36.37

System 4 20.61

NetBench 48.90

Page 19: Benchmarking DBMS’s for Communication Cost Analysis

Results

Link Speed (LS)

Page 20: Benchmarking DBMS’s for Communication Cost Analysis

Results

Data Size (DS)

Page 21: Benchmarking DBMS’s for Communication Cost Analysis

Results

Data Type (DT)

Page 22: Benchmarking DBMS’s for Communication Cost Analysis

Results

Prefetch Status (PF)

Source Avg PF

(% Reduction)

System 1 84.14

System 2 87.90

System 3 79.66

System 4 75.82

NetBench 99.58

Page 23: Benchmarking DBMS’s for Communication Cost Analysis

Results

Packet Size (PS)

Source Avg PS

(% Reduction)

System 1 2.30

System 2 1.08

System 3 0.76

System 4 -2.52

NetBench 1.02

Page 24: Benchmarking DBMS’s for Communication Cost Analysis

Results

Server CPU Speed (CPU)

Source Avg CPU

(% Reduction)

System 1 11.29

System 2 6.37

System 3 10.69

System 4 6.56

NetBench 12.54

Page 25: Benchmarking DBMS’s for Communication Cost Analysis

Results Other notes

Dominant Factors Consistency

Source Avg Time

(% of NetBench)

System 1 173.04

System 2 239.14

System 3 177.44

System 4 261.86

NetBench 100.00

Source Avg Rel

Error (%)

System 1 0.0274

System 2 0.2026

System 3 0.5756

System 4 0.5043

NetBench 0.0023

Efficiency of Link Usage

Page 26: Benchmarking DBMS’s for Communication Cost Analysis

Conclusion

Many factors need to be included in cost models Dominant Factors Affecting Factors

Communication cost is not a pure networking problem

Page 27: Benchmarking DBMS’s for Communication Cost Analysis

Conclusion Each DBMS is different in added

overhead Systems are consistent in overhead Efficiency of link use could improve Ease of control of the factors

Easily controllable Not easily controllable

Much work still to be done!

Page 28: Benchmarking DBMS’s for Communication Cost Analysis

Future Work

Collection of additional data Generation and testing of a

communication cost model Gathering and analysis of other global

cost model parameters

Page 29: Benchmarking DBMS’s for Communication Cost Analysis

Acknowledgements iAnywhere for their support

Glenn and Ivan• Support and countless questions

Mike, Anil, Ani, Dan, Matthew• Help and guidance

Mark, Scott and Dave• Hardware loans

Karim, Graham and Ian• Software help

Frank Arranging the work term and help with the report and talk

Page 30: Benchmarking DBMS’s for Communication Cost Analysis

Want More?

Check out the work term report at http://www.tonyyoung.ca/wtr.pdf

Page 31: Benchmarking DBMS’s for Communication Cost Analysis

Optimization Semijoin algorithm

Site selection Remote reduction Global reduction Assembly

Minimizes communication costs Exploits heterogeneous capabilities

Page 32: Benchmarking DBMS’s for Communication Cost Analysis

Optimization Replicate algorithm

Site selection Data transfer Query execution Assembly

Minimizes query response time Exploits varying hardware configurations

Page 33: Benchmarking DBMS’s for Communication Cost Analysis

Optimization

Difference between semijoin and replicate Assumptions made Execution location

Page 34: Benchmarking DBMS’s for Communication Cost Analysis

Optimization Garlic

Fire access STAR’s Fire join STAR’s Fire FinishRoot STAR

Hybrid of semijoin and replicate algorithms

Large amount of overhead

Page 35: Benchmarking DBMS’s for Communication Cost Analysis

Motivation

Proliferation of heterogeneous DBMS’s Data sharing within organizations Differing rates of technology adoption Mergers and acquisitions Geographic separation of teams

Page 36: Benchmarking DBMS’s for Communication Cost Analysis

Want More?

Check out the work term report at http://www.tonyyoung.ca/wtr.pdf