1 thesis proposal zachary kurmas (v4.0– 24 april 03)

64
1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

1

Thesis Proposal

Zachary Kurmas

(v4.0– 24 April 03)

Page 2: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

2

Outline

• Motivation and discussion of problem• Overview of of solution• Contributions• Proposal• Future Work• Timeline• Details of solution (time permitting)

Page 3: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

3

Typical disk array

Controller ACache

Controller BCache

SCSI Buses

Fibre Channel

Hosts

Page 4: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

4

Motivation

• Potential storage system designs and automated configuration algorithms must be evaluated with respect to some set of workloads.• Ideally, these workloads are actual production

workloads.• This is usually impossible

• Two alternatives• Replay traces of production workloads• Construct and use synthetic workload

Page 5: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

5

Problem

• The currently available set of workload traces and synthetic workloads are not sufficient• Can’t get enough of right traces

• Companies don’t like to give them out• No traces of future workloads

• Quality of synthetic workloads too low• High-quality synthetic workload must share certain

key properties with production workload• These properties are currently found by trial-and-

error and domain expertise

Page 6: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

6

Solution

• Improve quality of synthetic I/O workloads• Automatically determine what properties

a synthetic workload must share with the production workload on which it is based(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131

...

Production Workload List of Properties SyntheticWorkload

(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131

...

CDF of Response Time

Page 7: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

7

Contributions

• Prototype system to automatically determine what properties a synthetic workload must share with the production workload on which it is based• Library of possible properties and corresponding generation

techniques• Algorithm for searching through library

• Examination of tradeoffs between size and complexity of properties and quality of synthetic workloads

• Evaluation of whether improved synthetic workloads enable us to make better design decisions

• Exploration of workload scaling using identified properties

Page 8: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

16

Review of problem

• Not enough input for evaluations of storage systems• Too few workload traces• Traces not always right answer• Synthetic workloads are not practical

• Don’t know precisely what makes synthetic workloads representative

• Trial-and-error too cumbersome• Can’t maintain every conceivable attribute-

value

Page 9: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

17

Outline

• Discussion of problem• Overview of of solution• Contributions• Proposal• Future Work• Timeline• Details of solution (time permitting)

Page 10: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

18

Attributes / Attribute-Values

• An attribute is the name or description of a property• Read percentage• Mean interarrival time

• An attribute-value is the actual value of the measurement (i.e., the actual property.) • Read percentage of 67• Mean interarrival time of .8ms

Page 11: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

19

Requirements of attributes

• Attribute-values are properties of only the workload.• Response time not an attribute because

attribute-value depends on both workload and disk array

• Attributes must be quantifiable• “Locality” and “burstiness” are

qualitative concepts. “runCount” and “Hurst parameter” are attributes

Page 12: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

20

The Distiller

• Automate process of choosing necessary attribute-values

• Input: workload trace and large set of attributes

• Output: set of attributes that identifies those attribute-values that synthetic workload must share with target

• Helps identify type of any necessary attribute missing from library• (if no known set of attribute-values leads to a

representative synthetic workload)

Page 13: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

21

Basic Idea

• Basic Idea • Begin with simple attribute-values

• (distributions of I/O request parameters)

• Iteratively add attribute-values until evaluation of original and synthetic workloads is similar

(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131

...

Production Workload Attribute-value List SyntheticWorkload

(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131

...

CDF of Response Time

Page 14: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

23

Challenges: Which attribute to add

• Each iteration takes many minutes; therefore, we must limit the number of iterations

• Addition of necessary attribute-values does not always result immediately in improvement

• Fewer attributes better• Smaller compact representation• Less complex generation techniques• Generation techniques for some attributes can

interfere with each other• E.g., distribution of location and jump distance

Page 15: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

26

Outline

• Discussion of problem• Overview of of solution• Contributions

• The Distiller itself• An analysis of the key attributes for many different

workloads• A an analysis of the potential uses of synthetic workloads

• Proposal• Future Work• Timeline• Details of solution (time permitting)

Page 16: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

27

The Distiller itself

• Distiller makes generating representative synthetic workloads practical• Encourage companies to make evaluation

workloads available• More accurate / relevant research results

• Basis for “what-if” evaluations• Future estimations• Stability estimates• Improved relevance of old evaluation workloads

(possibly)

• Distiller provides library of attributes and corresponding generation techniques

Page 17: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

28

Analysis of key attributes for many workloads

• Attribute-values that lead to representative synthetic workloads describe what makes the workload behave like or unlike other workloads• The “essence” of the workload

• Possible to study essence of many different (workload, storage system) pairs and look for interesting trends or patterns

Page 18: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

29

Potential benefits of analyses

• Help us learn about how workloads and storage systems interact• Attribute-values contain all info

necessary to predict behavior• Focus researchers’ attention on

concentrated information• Help development of analytical models• Identify potential areas of improvement

qq

Page 19: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

30

Outline

• Discussion of problem• Overview of of solution• Contributions• Proposal

• Evaluate the correctness of the Distiller• Examine the attributes chosen for different

workload/storage system pairs• Show that the resulting synthetic workloads are useful

• Future Work• Timeline• Details of solution (time permitting)

Page 20: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

31

Evaluate correctness of Distiller

• Show that the Distiller works for:• One definition of “representative”:

response time distribution • Up to three storage systems: FC-60, FC-

30, and JBOD (Just a Bunch Of Disks)• Several artificial workloads• Five production workloads: Open Mail,

TPC-C, TPC-H, file system trace• Stopping Condition:

• Distiller can correctly identify key attributes for artificial workloads

Page 21: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

32

Definition of “representative”

• Design decisions almost always based on performance. Thus, matching response time distributions should be a stronger condition than most design decisions

• Distribution of response time stronger condition than mean response time

• Many decisions decide between competing configurations. Showing applicability across storage system configurations is my next evaluation

•Workloads are considered representative when RMS difference between distributions of response time is sufficiently small

SecondsN

umbe

r of

IO

s

representative

Not representative

Page 22: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

38

Outline

• Discussion of problem• Overview of of solution• Contributions• Proposal

• Demonstrate that the Distiller works • Examine the attributes chosen for different

workload/storage system pairs• Show that the resulting synthetic workloads are useful

• Future Work• Timeline• Details of solution (time permitting)

Page 23: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

39

Learn about attributes (1)

• Determine if attributes depend on the workload.• My Guess: Yes. Locality attributes are probably

different for write-only workload on FC-60

• Determine if attributes depend on the storage system?• Answer: They must. Storage system with constant

2min response time has no important attributes• Better objective: Compare attributes chosen for

similar storage systems / system configurations.• How much overlap?

Page 24: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

40

Learn about attributes (2)

• Determine which set of attributes does best overall (for a given storage system configuration)• average over all workloads• best worst-case• Can either of these be used in practice for all

wklds?• Attempt to find a single set of attributes

that works for almost all workloads (e.g. take union of all chosen attributes)• Examine complexity (e.g., number of

attributes) of such a set

Page 25: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

41

Learn about attributes (3)

• Examine changes in attributes and attribute-values over time.• Compare traces of a file system taken in 1992,

1996, 1999, and 2002.• Attempt to develop scaling rules.

• Examine tradeoffs between accuracy and complexity.

• Attempt assign a “percent contribution” to each attribute and/or attribute group?

Page 26: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

42

Outline

• Discussion of problem• Overview of of solution• Contributions• Proposal

• Demonstrate that the Distiller works • Examine the attributes chosen for different

workload/storage system pairs• Show that the resulting synthetic workloads are useful

• Future Work• Timeline• Details of solution (time permitting)

Page 27: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

43

Apply to real life

• Show that synthetic workloads can be used to make design decisions

• Show that currently available traces not adequate

• Show usefulness of “knobs”

Page 28: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

44

Synthetic workloads useful

• Show that synthetic workloads can be used in place of real workloads to make simple design decision• Cache size• Prefetch length• High-water mark of write-back cache

• Complex design decisions basis for entire Ph.D. theses. Can’t practically reproduce at Tech.

• Use Pantheon disk simulator to simulate effects of changing above parameters

Page 29: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

45

Synthetic workloads useful (2)

• Take production workload trace• Simulate performance given different prefetch

lengths.• Choose best• Take synthetic workload based on production

workload• Simulate performance given different prefetch lengths• Compare best to best for production workload

• For cache size, find best performance/$ mark.

Page 30: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

46

Page 31: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

47

Show available traces inadequate

• Use Pantheon disk simulator to show that using the cello92 and cello02 traces to evaluate simple design decisions results in different answers.

• From this we infer that using cello92 traces to justify more complex design decisions also produces incorrect answer

Page 32: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

48

Turning “knobs” useful

• Show that turning “knobs” of compact representation better than ad-hoc modifications to workload traces• Show that turning arrival time knob better

than contracting interarrival times• Show that turning request size knob better

than ad-hoc doubling of request size and location values.

• Evaluate turning of knobs versus removing ½ of cello I/Os based on process ID

Page 33: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

49

Future Work

• Optimality • Find “smallest” set of attributes per

workload (e.g. set of attributes with smallest compact representation)

• Find smallest set of attributes per storage system (if possible)

• Use chosen attributes to develop analytical model of performance• Formula for performance, not simulation

Page 34: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

50

Timeline• April June: Run Distiller on many workloads

• Submit results to MASCOTS• June July: Analyze changes over different workloads /

storage systems. • Submit results to CMG conference

• July August • Find best overall set of attributes. Find best worst-case

• September October: Attempt to develop set of attributes that works for all workloads on a given storage system• Submit results to SIGMETRICS and/or FAST

• November December: Evaluate different what-if scenarios.

• February 2004: Defense• January 2004 February 2004: write• March 2004 April 2004: interview• May 2004 July 2004: write• August 2004: graduate

Page 35: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

51

Outline

• Discussion of problem• Overview of of solution• Contributions• Proposal• Future Work• Timeline• Details of solution (time permitting)

Page 36: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

52

Generating Synthetic Workload

• To generate synthetic workload, randomly choose value for each element in table

• Attribute-values put restrictions on values chosen

• Adding attribute-values reduces the difference between synthetic and production workloads

(R, 1024, 42912, 10)(W, 8192, 12493, 12)(W, 2048, 20938, 15)(R, 2048, 43943, 2)(W 8192, 98238, 11)(W 8192, 76232, 23)

ReadWrite

RequestSize Location

ArrivalTime

Page 37: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

53

Mean Arrival Time

Arrival Time Dist.

Hurst Parameter

Mean Request Size

Request Size Dist.

Request Size Attrib 3

Request Size Attrib 4 COV of Arrival Time

Dist. of Locations Read/Write ratio

Mean run length Markov Read/Write

Jump Distance R/W Attrib. #3

Proximity Munge R/W Attrib #4

Mean Read Size D. of (R,W) Locations

Read Rqst. Size Dist. Mean R,W run length

Mean (R, W) Sizes R/W Jump Distance

(R, W) Size Dists. R/WProximity Munge

Mean Arrival Time

Arrival Time Dist.

Hurst Parameter

Mean Request Size

Request Size Dist.

Request Size Attrib 3

Request Size Attrib 4 COV of Arrival Time

Dist. of Locations Read/Write ratio

Mean run length Markov Read/Write

Jump Distance R/W Attrib. #3

Proximity Munge R/W Attrib #4

Mean Read Size D. of (R,W) Locations

Read Rqst. Size Dist. Mean R,W run length

Mean (R, W) Sizes R/W Jump Distance

(R, W) Size Dists. R/WProximity Munge

Choosing Attribute Wisely

• Challenge• Not all attributes useful• Some attributes partially

redundant• Can’t test all attributes

• My Solution• Group attributes • Evaluate whole groups at once

Attributes

Page 38: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

54

Attribute Groups

• Attributes measure one or more parameters• Mean Request Size Request Size• Distribution of Location Location• Burstiness Interarrival Time• Request Size • Read/Write

• Attributes grouped by parameter(s) measured• Location = {mean location, distribution of location,

locality, mean jump distance, mean run length, ...}• Arrival Time = {mean interarrival time, Markov

model of interarrival time, Hurst parameter, etc. }

Distribution of Read Size

Page 39: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

55

Attribute Groups

• Each group corresponds to each column or set of columns• Operation Type• Request Size• {Arrival Time,

Location}

• Measures patterns within column(s)

(R, 1024, 42912, 10)(W, 8192, 12493, 12)(W, 2048, 20938, 15)(R, 2048, 43943, 2)(W, 8192, 98238, 11)(W, 8192, 76232, 23)

ReadWrite

RequestSize Location

ArrivalTime

Workload

Page 40: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

56

121315

Do I need (more) attributes from the {Arrival Time} group?

• Idea #1: Add “best” attribute from {Arrival Time} and measure improvement• Amount of improvement implies potential benefit

R/W RS Loc AT R/W RS Loc AT

Current Attributes Attributes for Test

R, 1024, 10242W, 2048, 11224R, 1024, 10252

Current

Current

Current

Current

R, 1024, 10242W, 2048, 11224R, 1024, 10252

Current

Current

121415P

erfe

ct

Page 41: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

57

Problem with idea #1

• Errors involving other parameters can interfere• Very random reads can overshadow moderate

queuing effects

R/W RS Loc AT R/W RS Loc AT

Per

fectCurrent

Current Attributes Attributes for Test

Cur

rentCurrent

Page 42: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

58

Idea #2 --- Idea #1 “backwards”

• Look at a synthetic workload in which everything except Arrival Time is “perfect”.• Change in performance implies importance of

group.

R/W RS Loc AT

Cur

rent

Current Arrival Time Attributes

Perfect

Everything PerfectR/W RS Loc AT

Production Workload

Perfect

Workload Trace R/W RS Loc AT

Workload Trace

Page 43: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

59

Problem with idea #2

• Workload on left missing not only {Arrival Time}• Also missing {Arrival Time, Request Size}, {Arrival Time,

Location} and {Arrival Time, Operation Type}• Cause of any difference not clear

R/WRS Loc AT R/W RS Loc AT

Current Operation Type Attributes Workload Trace

Production Workload

Production Workload

Cur

rent

Page 44: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

60

Solution

• Remove {Arrival Time, Request Size}, {Arrival Time, Location} and {Arrival Time, Operation Type} from workload trace by “rotating” arrival times.• Only difference between workloads is {Arrival Time}

R/W RS Loc AT

Cur

rent

R/W RS Loc AT

Current Operation Type Attributes

“Rotated” Arrival Time

Production Workload

Production Workload

Prod

ucti

on

Wor

kloa

d

Page 45: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

61

Process

• Add {Operation Type} attributes until two workloads below are representative

• Repeat for other attribute groups

R/W RS Loc AT

Cur

rent

R/W RS Loc AT

Current Operation Type Attributes

“Rotated” Operation Types

Production Workload

Production Workload

Prod

ucti

on

Wor

kloa

d

Page 46: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

62

Hints

• If Distiller is unable to find attributes for a particular group, it identifies the deficiency• Helps people develop new attributes

• Attributes for multi-parameter groups must be compatible with single parameter groups• {Operation Type, Location} attribute must

maintain same properties as chosen {Operation Type} and {Location} parameters

Page 47: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

63

End Of Talk

Page 48: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

64

Problem:

• Lack of traces for researchers• …. Papers use same … traces

• Traces used may or may not be representative of actual production workloads

• When traces not sufficient, really bad synthetic workloads used instead

• We don’t know how to easily produce representative synthetic workload• Lack of synthetic workload generation ability

suggests lack of understanding of disk array and storage system interactions

Page 49: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

65

Proposed Solution

• Improve our ability to generate synthetic workloads

• (Discuss previous work)

Page 50: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

66

Workload Characteristic

• Characteristic: A property of a workload (or workload trace) that can be measured.• 27% reads• Mean request size of 8KB

• Must be property of workload alone• Response time not workload characteristics, but

characteristics of both workload and storage system

• Must be concrete measurable property.• “burstiness” and “locality” too vague.

• Also called “attribute-values”

Page 51: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

67

Attributes

• Attribute: The “name” of a characteristic• Attribute Characteristic• eye color blue eyes• Read percentage 27% reads• mean request size mean size: 8KB

• Hence, characteristics also called “attribute-values”

Page 52: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

68

How the Distiller works

• Partition known attributes into groups• All characteristics in each group contain

similar information

• Choose a “complete” set of characteristics from each group.• i.e. choose a set of characteristics that

contains all the necessary information from the group

Page 53: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

69

$21,000 question

• What attribute-values must a synthetic workload should share with the production workload in order to be representative?• Do the attributes depend on the workload?• Do the attributes depend on the storage

system?• If so, how can we find them easily?• If not, what are they?

Page 54: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

70

Trivial “solution” doesn’t work

• Trivial solution: Use many attribute-values• Problems with trivial solution

• Many attribute-values contain irrelevant info.• Many attribute-values contain duplicate info.• High-level description too large and complex

• Negates advantages of synthetic workload

• Generating synthetic workload too difficult • Obvious algorithms for generating attribute-values

often interfere with each other.

Page 55: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

71

Challenges of Useful Solution

• Solution: Choose small set of “important” attribute-values• That is, attribute-values that have the

most impact on evaluation

• Challenges• Estimating impact of single attribute-

value on evaluation• Finding small set of attribute-value with

“disjoint” information

Page 56: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

72

Goal of Distiller

CDF of Response Time

• will have evaluation similar to original.

(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131

...

Original Workload

• Given a workload and storage system, • automatically find a set of attributes, so

Attribute List SyntheticWorkload(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131

...

• synthetic workloads with the same values

Page 57: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

73

High-level approach

• Divide and Conquer• Partition attributes into groups

according to “type of information”• Recall some attributes describe similar info.

• Find a set of attribute-values that contains all the information for a particular group

• (No, its not that simple …)

Page 58: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

74

Workload

• I/O request has four parameters• Read/Write type• Request Size• Location• Arrival Time

• shown in ms

• Workload series of I/O requests • Trace can be viewed as a

table with four columns

(R, 1024, 42912, 10)(W, 8192, 12493, 12)(W, 2048, 20938, 15)(R, 2048, 43943, 2)(W 8192, 98238, 11)(W 8192, 76232, 23)

ReadWrite

RequestSize Location

ArrivalTime

Page 59: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

75

“Engineering” Contributions

• Finding representative synthetic workloads becomes practical• Basis for evaluations when traces are unavailable• Basis for “what-if” evaluations

• Provides basis for workload similarity metric• “Table-based” models

• Highlight what workload features a storage system handles best• Help configure storage system for new workload

Page 60: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

76

Page 61: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

77

Apply to “real life”

• Attempt to generate a representative synthetic workload when no trace exists• Choose workload trace and “hide” it• Use lessons from previous slides to choose attributes based

on similar workloads• Compare synthetic workload to trace

• Compare “what-if” workload based on chosen attributes to ad-hoc “what-if” workload• Play workload twice as fast• “bootstrapping”

• Attempt to find attributes that determine whether to use Raid 1/0 or Raid 5

Page 62: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

78

Proposal: Apply to Real Life (2)

• Attempt to build “table-based” model of performance• n-dimensional table• Each axis represents one attribute• fill element (w, x, y, …) with performance of

workload with attribute-values w, x, y, …• Given new workload

• compute attribute-values w, x, y, … • Value in corresponding table element estimate of

performance

Page 63: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

79

Problem

• Both alternatives have problems:•Workload traces

•Companies don’t like to give them out•Don’t always meet the researcher’s

needs

•Synthetic workloads•Difficult and tedious to generate

correctly

Page 64: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)

80

Overview

• Motivation: Storage system design studies and automated management systems require workloads to drive evaluations

• Problem: Neither traces of production workloads nor simple synthetic workloads are sufficient to drive experimental evaluation

• My solution: Improve the quality of synthetic storage workloads by automatically determining what properties synthetic workloads must share with the production workloads they model