computer metrics: measuring and managing the performance

33
1. COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE, RESOURCES, AND COSTS OF LARGE CONPUTER SYSTEMS ABSTRACT Dr. H.W. Barry Merrill Merrill Consultants Managing and measuring the performance, resources, and costs of a large complex of central and distributed computers with distributed interactive users requires the establishment of repeatable, meaningful, and measurable service objectives (such as interactive time, batch turnaround, and availability). It requires accurate measures of resource consumption (processor time, I/O device activity, real memory utilization) by which the capacity to meet those service objectives can be acqu1rErl in the most timely fashion and at the most cost-effective rate. A case study describes a cost-effective measurement system that captures and analyzes data and answers typical management questions with interactive graphical reporting. The configuration and workload of the computer system are also described. INTRODUCTION Computer metrics, often called computer performance evaluation, capacity planning, or sys·tem tuning, is an emerging technology in large, multi-user computer systems. Mot,ivated by the conflicting goals of increased user productivity anrl reduced costs, the data center management of these large systems requires hard data to justify requests for resource acquisition. Most of the research to date has been ad hoc and unique to each specific computer installation. Reference (1) has been reviewed as the landmark reference in the field; references (2) and (3) contain 164 of the best current papers ranging from tutorial to technical; and reference (4) describes SAS software. This paper describes a case study of an approach to computer Elctrics used in over 1400 installations worldwide. The paper, which is also a brief tutorial on the subject, first descrihes goals and methods of the technology and then quantifies the environment of the case study. The capabilities and costs of the measurement systems are described, and examples of graphical management reporting [,re discussed. The cost of the measurement system is shown to be effective, and the conclusions are presented. GOALS AND METHODS The basic problem in large installations is the management of r.hared reRources, balanced by the service reqnirtlments of users of this network. The most important facet of the solution is the establishment of service objectives. Only when the supplier of computing has quantified the service to be delivered in a measurable fashion can the users of computing evaluate services received relative to the cost of computing. To be successful, service objectives must: - be measurable be repeatable - be understandable by the typical user - directly with the user's perception of service - allow reflection of true exception conditions - be directly controllable by the resourceR applied by the computer installation. 390

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

1. COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE, RESOURCES, AND COSTS OF LARGE CONPUTER SYSTEMS

ABSTRACT

Dr. H.W. Barry Merrill

Merrill Consultants

Managing and measuring the performance, resources, and costs of a large complex of central and distributed computers with distributed interactive users requires the establishment of repeatable, meaningful, and measurable service objectives (such as interactive respons~ time, batch turnaround, and availability). It requires accurate measures of resource consumption (processor time, I/O device activity, real memory utilization) by which the capacity to meet those service objectives can be acqu1rErl in the most timely fashion and at the most cost-effective rate. A case study describes a cost-effective measurement system that captures and analyzes data and answers typical management questions with interactive graphical reporting. The configuration and workload of the computer system are also described.

INTRODUCTION

Computer metrics, often called computer performance evaluation, capacity planning, or sys·tem tuning, is an emerging technology in large, multi-user computer systems. Mot,ivated by the conflicting goals of increased user productivity anrl reduced costs, the data center management of these large systems requires hard data to justify requests for resource acquisition. Most of the research to date has been ad hoc and unique to each specific computer installation. Reference (1) has been reviewed as the landmark reference in the field; references (2) and (3) contain 164 of the best current papers ranging from tutorial to technical; and reference (4) describes SAS software.

This paper describes a case study of an approach to computer Elctrics used in over 1400 installations worldwide. The paper, which is also a brief tutorial on the subject, first descrihes goals and methods of the technology and then quantifies the environment of the case study. The capabilities and costs of the measurement systems are described, and examples of graphical management reporting [,re discussed. The cost of the measurement system is shown to be effective, and the conclusions are presented.

GOALS AND METHODS

The basic problem in large installations is the management of r.hared reRources, balanced by the service reqnirtlments of users of this network. The most important facet of the solution is the establishment of service objectives. Only when the supplier of computing has quantified the service to be delivered in a measurable fashion can the users of computing evaluate services received relative to the cost of computing.

To be successful, service objectives must:

- be measurable be repeatable

- be understandable by the typical user - corre]a~e directly with the user's perception of service - allow reflection of true exception conditions - be directly controllable by the resourceR applied by the

computer installation.

390

Page 2: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Successful service objectives for the four most important subsystems are described in Table 42.1-

A key additional ingredient in these objectives is the manner in which they are expressed. The use of average (mean) values has been found to be quite misleading. In general. the mean is an unsatisfactory expression of a service objective. Since the primary purpose of the service objective is to allow the supplier to communicate service to the user. the metric used must be human oriented. Mean values. in spite of their strong mathematical heritage, do not relate to human perceptions. A human wants to know what happens most of the time. The average value, which is the sum of all observations divided by the number of observations, is never actually observed by the user. By recognizing this need in computer service objectives and by expressing objectives as frequency of occurrence ("94 % of the time this will happen"). we have found not only a metric that relates to the user. but also one that meets the other criteria for effective service objectives. Specific techniques for establishing service objectives are addressed in (1). The actual measure of service used (internal response, turnaround, queue time, and so forth) is dependent on the hardware and software architecture that provides the computing and is a funct~on of the nature and purpose of the specific computing installation. The methbd of exposition (percentage of occurrences meeting a stated goal). however. appears almost invariant in well-managed computing facilities. '

Subsystem

batch

TSO

IMS

Table 42.1 Service Objectives

Measure

Percentage of jobs meeting requested IWT, (Initiation Wait Time).'

Users submit jobs requesting initiation wait times of 15 min •• 30 min.. 1 hour. 2, hours. or 4 hours. Time to initiate is 'measured by SMF.

Trivial transactions meeting 4-second internal response. TSO/MON na~me table defines trivial. Internal response measured by TSO!HON.

IMS queue met expected response.

Goal

94 %

92 %

95 %

,Service Time met expected response. 98 %

CICS

CONTROL/IMS measures input queue time and service time separately. Expected queue time is calculated based on transaction class or priority. Expected service time is calculated based on resources measured by CONTROL/ms.

FAST transactions met 4-second response. Internal response measured by PAIL Transactions classified as FAST if AMCT (I/O count) is less than five and transaction name is not in a table of "bad guys".

92 %

Therefore, capturing the service and resource data becomes a crucial element in managing the facility. Without measurable service objectives. when users complain. perhaps the wrong resources are expanded. By correlating service delivered with resources consumed. however. the limiting resource can be identified and options evaluated for cost-effectiveness. Additional resources can be acquired, the application can be rescheduled to a time when that particular resource is plentiful. or the application can be redesigned in light of the limited resource.

391

Page 3: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Measuring and managing service objectives is necessary for system tuning to identify and eliminate bottlenecks to performance. Management is not i.nterested in the raw power of the configuration to process data but. rather. in knowing the capacity in terms of how much work can be' delivered while meeting service objectives. This is called goal level capacity. The specific techniques described in (1) are summarized below. '

Analysis of capacity by workload 1I!.easurement requires these preconditions:

- The system must be tuned. Known bottlenecks to performance have been eliminated and the I/O configuration has been implemented to minimize contention. An untuned system displays erratic response. causing inaccurate capacity measurement.

- WQrk must execute when need~d by the user. The shape of the workload represents real demand required by the business and not an artificial shape created by the supplier's arbitrary resource or scheduling constraints. A batch scheduling system that relates directly to users' requests based on timeliness guarantees this condition. Batch scheduling systems based only on resource requirements generally fail this test since they place arbitrary constraints on when various classes of work are actually executed.

With these preconditions met. hourly resource data are analyzed. Since not all resource utilization is accurately attributed to the workload. linear regression is used to distribute the unattributed (but measured) overhead to the workloads that generated that overhead. The service objectives achieved during each hour are then plotted against workload to measure the knee of the response curve and to quantify the relationship between work and service. The initial result is the hourly capacity in work units per hour of the system (hardware. software. memory. and I/O configuration-dependent) to deliver work and concurrently meet service objectives.

The extension from hourly capacity to daily pri,me shift capacity is accomplished by first plotting the actual hourly workload profile. hour by hour of prime shift. The shape of the .profile is preserved. and that curve "7ith the same shape is raised until the peak value. of the profile for any hour equals the hourly capacity value. Integrating under the raised curve then provides the' real daily capacity. This technique simply redefines real capacity in terms of the present configt1ration and the present demand by users. It is' a stable measure unless the configuration is changed (by adding resources or by changing system or application software) or the demand profile changes. The shape is usually constant unless personnels' working hours are changed.

,,-.,

392

Page 4: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

463

NOS/8E

Processor: CYBER 760 MIPS Rate: 10 MIPS Capacity: 10 1/0 Channels: 4 Central Memory: 262k Extended Memory: l000k

telecommunications lines

MVS Batch

Processor: IBM JOJ3MP MIPS Rate: 5 MIPS Capacity: 10 110 Channels: 16 Real Memory: 28MB Paging Memory: 8:50M8 Paging Devices: 10

MVS On-line

Processor: IBM 308IG MIPS Rate: 5 MIPS Cdpacity: 10 110 Channels: 24 Real Memory: 32MB Paging Memory: 750MB Paging Devices: I}

1 MB= 1024· 1024 bytes= 1.048,576 bytes

Configuration and Description of Ihe Five ProceSSOR

Figure 42.2

DESCRIPTION OF THE CONFIGURATION AND WORKLOAD

VWCMS

Processor: IBM 30))U MIPS Rate: 5 MIPS Capacity: 5 110 Channels: T 6 Real Memory: 16MB Paging Memory: 100MB Paging Devices: 2

11 Spool Devices 3500 MB

MVSTSO

Processor: IBM 3081 K MIPS Rate: 7 MIPS Capacity: 14 110 Channels: 24 . Real Memory: 32MB" Paging Memory: 700,,,",8 Paging Devices: 14

The Sun Company Information Systems an annual budget· of $ 36.000.000 equipment. and personnel.

Division manages a computer systems network with. for hardware. systems software, communications

Figure 42.2 describes the 5 central processors that support this network and the associated paging subsystems for the virtual memory systems. The 5 processors share 11 d:l.rect-access storage devices of SPOOL that are used for staging and exchanging jobs' input and output between systems. Once selected for execution by the job entry system (JES). the job executes completely on 1 of the 5 processors. placing 'its printed output on the SPOOL. ~~en the job is completed, output is transmitted to the remote locations for printing or display. Users communi.cate interactively through the telecommunications network to the processor that hosts their particular application.

393

Page 5: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Quantity

23

40

Quantity

60

400

12

Table 42.3 I/O Configulution - MVS Systems

Type

3330-1

3350

3380

TAPE DRIVES

Type

3420-Model 8

3420-Model 4

ON-LINE DISK VOLUMES

Transfer Rate (kilobytes per second)

806

1198

30CO

Total On-Line Disk Storage Megabytes: 140,360

Transfer Rate (kilobytes per second)

12JO

470

Storage Capacity

per volume (megabytes)

100

317

630

Table 42.3 describes the I/O configuration shared by the three processors using the multiple virtual storage operating system, which accounts for 90% of the workload. Tape drives are fully shareable among processors, with softwp.re allocating each drive to a single task when needed. Disk drives are fully shareable among processors so that the failure of a processor does not prevent access to data on a particular disk. However, to minimize contention delays, data on a single disk are application specific; that is, the disk contains data only for a specific application, such as TSO. The path (the control unit and channel) is logically isolated to the _processor in which that application normally executes. (Although there always exist some data, such as catalogs, that must be shared. logical isolation is the design objective in the placement of data and is maintained to the highest degree possible.)

Table 42.4 quantifies the telecommunications environment tbat supports the 4099 terminals that connect to the processor complex. Although host proceSSOl-S are locat(~d in two sites in Dallas, Texas, users of the network are located all across the Unite.d States and Canada, with heavy concentrations in Dallas, Philadelphia, California. and Illinois.

Quantity

4

Table 42.4 Telecommunications Environment

COMMUNICATIONS CCNTROLLERS

Memory Type kilobytes Protocol

3705 512 SDLC

Comten 512 Bisync, Async

394

Number of line!'; rer ('ontroll,!!

60

240

Page 6: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

TELECOMMUN1CA7IONS LINES

Quantity Line speed (bits per second)

8 56000

150 9600

203 4800

102 300/1200

Table 42.S describes the workloads and systems that execute on the three MVS processors. The acronyms may be unfamiliar. so a brief description follows:

:!:uitiators

1MS

CICS

TSO

TNYLBUR

ADAMS

VTAN

JES

one batch job owns an initiator during its execution.

on-line system used for data base inquiry and update. especially with complex data.

on-line system used for update and inquiry that is simpler but faster than IMS.

interactive syptem used heavily for program development and execution of management (end-user oriented) ded.sion­support: systems.

edit and submit interactive system.

data base manager accessed by TSO and bateh l1sers.

the primary terminal access manager.

job entry subsystem. controls batch and all printing.

Ta.ble 42.5 MVS Workloads

3033-MP Batch

3081-D TSO

3081-G On-line

40 Initiators 250 T50

Users

Test IMS Control 3 WYIBURS

2. Test IMS 4 PDABAS

Regions Nucleus

2. Test CICS Backup 1MS

Order Entry

TCAM

395

Production IMS

Control

8 IMS Message Regions

3 Product:ion

CICS

VTM1 Applications:

VTAMPRNT

Page 7: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Table 42.6 quantifies daily workload executed on the three systems. These counts of tasks and concurrent users clearly describe a very large system; there are about fifty installations of similar size in the United States alone.

MVS

CMS

CYBER

Batch Steps

Table 42.6 Daily Workload Volumes

CICS Transactions

IMS Transactions

TSO Prime Transactions

Concurrent TSO Users

Concurrent IMS Users

Concurrent CICS Users

Concurrent WYLBUR Users

Concurrent JES2 Remotes

Session Intervals

Concurrent Users

Batch Jobs

Time.-Sharing Sessions

22365

223510

98436

147719

213

500

476

10

147

2123

28

592

80

Table 42.7 quantif:f.es distribution of the budget at a high level. Only the staff that actually operates and manages the hardware and software described before (approximately 150 people) is included in the personnel cost. Application programmers and end users of these systems are excluded from this figure.

Table 42.7 Cost Distribution

Salaries

Local taxes

Electric power

Software rental

Maintellance of facilities

IBM CPUs and channels

CYBER CPU. disk

396

23 %

2 %

2 %

2 %

7 %

12 %

]0 %

Page 8: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Voice network 9 %

Disk drives and controllers 8 %

Dedicated lines 6 %

Dial-in lines 4 %

Tape drives and controllers 3 %

3705 communications controllers 2 %

Modems and so forth %

Miscellaneous 8 %

Annual Budget $ 36,000,000

Capabilities and cost of the measure~ent system To manage an installation of this size, we have found that it is not only ma.ndatory to measure service and resources, but it can be done in an extremely economical fashion, provided some intelliger.t choices are made. Table 42.8 quantifies volumetrics of the performance data produced th?t are thought to be required for effective management and measurement in this facility. (A volumetric is a generic term for data elements that describe service or resources.)

Table 42.8 Daily Record Volumes

Average record Record K bytes

Source count length of data

MVS

SMF 780564 248 194214

PAIl 223510 88 19668

CYBER 10131 80 810

Dayfile

rMS Account Cards l339] 80 1071

More specifically. detailed event recorch; written on the MVS systems (Table 42.9) show- both the quantity and quality of the data that are automatically created by the operating system' s accounting, resource measurement. and service measurement routinE's. In spite of the breaclth of vendo:t-created event records, we have found it necessary to use the operating system exit facilities to create thE' additional event records listed in Table 42.10.

397 •

Page 9: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

, !~

~ -,

~ ~ ,'j 1"' f

r Table 42.9 ~:

ff Systems Management Facility (SMF) \' ~; Vendor-Created Records Written Daily ~~ ~

Logical SAS I records K bytes observations F-" r Type 0 (Sys startup) 1 1 '.,'

F Type 2. 3 Dump SMF 12 12 ~ :)

Type 4. 34 Step term 22365 7092 25302 I: 1: Type 5. 35 Job term 8200 1286 8200 /l,

~ Type 6 File print 8035 824 7474 ~:

~ Type 7 Lost SMF data 0 0 ~:,

Type 14 Input file 129115 38310 138 f: ;;, 2.:

Type 15 Output file 89968 25981 " f. ~; Type 17 Scratch 11885 1140 1

t

~, Type 18 Rename 104 14

:~ Type 20 Initiation 8759 930 2338 ~~ ;.;

Type 21 Tape mount 8172 360 8172 (.

),

i' f ~" Type 26 Job purge 9316 3205 8257 t ~~ ,.

1-' Type 30 Workload 45031 34057

~', Type 40 Allocation 115736 8711 44252 r~

!' Type 47-48 RJE Ses 970 69 1116 .' !; I; Type 50 VTAM buffer.s 314 18 I Type 52-53 RJE Ses 574 42 [

Type 62-69 VSAM open 31338 8656 x' f ~ Type 70 RMF CPU 75 47 150 r ~.

Type 71 RMF paging 75 31 75 ~

~ Type 72 RMF workload 8475 1686 2045 ~. Type 73 RMF channels 75 154 2751 I Type 74 RMF DASD I/O 150 3442 14548

Type 75 RMF Page I/O 788 123 788

Type 90 Operator acts 8

Type 110 CICS trans 49 1128 6648

Record Totals 499584 137370 " f,

! ~~

398

Page 10: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

;.:

Type 129

Type 130

Type 175

Type 201

Type 210

Type 214

Type 217

Type 218

Type 225

Type 229

Type 2:;0

Type 231

Type 249

Type 250

TYJ'e 254

Table 42.10 Systems Management Facility (SMF)

Installation-Created SMF Records Written Daily

SMF Total record SMF

count K bytes

.Job initiation 8205 1148

Interval 199 105

VTAM terminal 45334 1587

Security 1749 182

WYLBUR session 195 21

Archival 14199 1349

TSO/MON system 653 1730

TSO/MON call 2777 690

.JES operator 4569 258

Tape mounc 7529 482

Audit tape 166 18

Application 20

IMS program 68367 12852

IMS tlansaction 98436 34156

ADABAS trans 876 140

·Installation Totals 253953 54723

SAS observations

8205

31794

195

9713

1560

2353

7529

166

20

68367

98436

876

These event records are written to the System Management Facility (SMF) file as their events occur. However, to measure response or resources requires that these event records be processed and synthesized into humanly-perceived events, such as command response time, program fleIDory usage, processor active time for a computer, and so forth. Conversion from raw SMF data to information is a complex software prnblem because of the comrlexity of the possible event records that might occur and because of the variety of data formats in the records written by the operating system. The solution was made feasible by a high-level language system, the SAS System, (4) that is powerful enough to handle this variety of data forms and is so efficient in processing this large volume of data records that it is the de facto stancllrd language for processing SMF data. The algorithms (1) that map the raw data to information are "rritten in SAS software. The data are in use at some 1400 installatioas worldwide, and they are referred to as the PDB (performance data base, after their end product., a SAS data library of information and reports). The execution resources to process the 200,000 kilobytes of daily event records into the PDb are (luantifieci in Table 42.11. Clearly, this processing of over 1 million records (typicalIy 200 bytes long) daily if, 60 CPU minutes per day demonstrates the remarkablE rower of the SAS System and the PDB algorithms.

399

Page 11: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Table 42.11 Monthly R~source Costs to Build the

Performance Data Base

Total monthly 3081-D Job description CPU minutes

CYBER systefu - total 46

VM system - total 16

MVS systems - total 1760

Major subsystems within MVS: CPU Min

Dumping accounting data to tape 476

Euild daily FDB from accounting dll.ta 992

Daily reports and backups 154

Weekly reports 73

Monthly reports 37

Build customer-splitout data base 28

Grand total of all systems 1,822

These 1,822 CPU minutes per month Are equivalent to only 60 minutes per day.

Goal level capacity analysis described earlier was performec'. on these tllI·ee MVS machines. That study found the prime-time (II-hour shift) goal level capacity to be 2,271 CPU minutes deliverable to batch each day, with batch service goals being met, or a total monthly (prime and nonprime) capacity of 148,653 CPU minutes. If v]e compare the total cost of 1,822 minutes monthly to build the PDB and to execute all the performance and capacity analysis reportf' with the configuration's monthly capacity, the cost of executing the total PDB measurement and reporting system represents only 1.2% of the total capacity! Not only does this demonstrate cost-effectiveness of the measurement system, but since the daily running can be scheduled in the least busy time of day, true cost is essentially zero becauRc capacity cost is set by peak time requirements.

Examples of management reporting The presentation of tables of numbers is often useful during analysis and is appropriate for presentations to technical audiences. However, communication of computer capacity and performan~e measures to the senior, nontechnical management requires graphical presentat:lon. (ftA picture is worth a thousand words. ft) The graphical capabilites of the SAS System make it simple to create a graphical display of the performance data base data.

The graphs are used to show mana.gement the quality of service delivered to the computing users. The use of capacity is also mapped to the internlll business organizations that consumed resources. Management can then determine if the business purpose served by a part of the company justifies that part's consumption of computer capacity.

400

I-

Page 12: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

The performance data base contains trend data for each w~ek for the past several years. This permits simple graphical display versus time so that not only is the current service and resource consumption measured and managed (tactical performance ma.nagement), but also the long-range trends (strategic capacity planning) are tracked.

Management of many businesses is based on monthly data, but we have found the week a far more stable measure. Not only dotos weekly reporting provide more timely precursors of trouble, but also the trends are significantly more accurate and robust since each week has the same number of days. Monthly resource data points suffer from too much variability because there are as few as 18 and as many as 23 working days in a month. Even weekly data must be cleaned of outliers before mathematical analysis is applied; the 6 weeks during which major holidays- occur in the U.S. must be deleted from analysis if typical trends are to be observed.

Using this weekly trend data, management is kept informed of the productiv:f.ty and efficiency of the computing facility. The following. graphs are a subset of the graphs used in presentations to senior management and are available on-line to all levels of management. The manager simply logs on to the TSO application with his CRT terminal, enters a single command, and is presented with a menu from which graphs are selected. Graphs are created on-line in response to the menu selection. Each graph requires approximately 10 seconds elapsed time for creation and transmission on a 9600 BPS line.

Management first asks two questions of the computer facility: how good was the service, and how much work did we deliver? Figures 42.12 through 42.15 answer these quality-of-service questions:

Figure 42.12 CICS (Prime Time) Performance. Percentage of fast CICS transactions in prime time (7 a.m. to 6 p.m.) that received internal response time of less than 4 seconds is tracked. The step decrease in August 1981 was the result of a change in response roeasurement due to software maintenance.' The step increase in October 1982 was the result of additional resources (a new CPU).

Figure 42.13 TSO (Prime Time) Performance. Percentage of fast TSO transactions in prime time that received internal response time of less than 4 seconds is tracked. In spite of a substantial growth in number of users, tactical management of resources has kept TSO response very stable. Without measures of actual response, tactical movement of work to other processors and the incremental addition of resources (especiaJ.1y memory to meet growth) would not have been possible.

Figure 42.14 Batch (Prime Time) Performance. Percentage of batch jobs that met user-requested initiation wait time (IWT) of 15 minutes is seen to be very consistent except for 5 weeks. The failed weeks correspond to over-capacity weeks because outages usually reduced available capacity. Even bad weeks show 95% satisfaction for this critical IS-minute IWT category, which accounts for over half of prime time batch work.

CICS (PRIME TIME) PERFORMANCE PERCENTAGE OF FAST TRANSACTIONS RESPONDING IN LESS THAN 4 SECONDS

\ "" .~ 10°1 ~ j,'./ 'yilt

95i~~!------------------------

L 90'+---------------------------

J J A SON 0 J F M A M J J A S OND J FHAM J U UU E C 0 E A E A P A U U U E C DE AE AP AU NLGPTVCNBRRYNLGPTVCNBRRYN 88 eBB B a B B B B a B Bee Bee B B BBB B 2222222:3:3:3:3:3:3:3:3:3:3 3::1 4 4 4 4 4 4

HEEKS

Figure 42.12

401

Page 13: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

:]

TSO (PRIME TIME) PERFORMANCE PERCENTAGE OF FAST TRANSACTIONS RESPONDING IN LESS THAN 4 SECONDS

100.0+----------------

95. o-P'InJ!....,..---t:~~iIfA~-----

90.0+-----------------

J J A SONDJ F M AM J J A S a NDJ F M AMJ UU U E CDE A E AP AUUUE C 0 EA E APA U NL G P TVC N BRRY NL G P T VC N B RRY N 8 B B B BBB 8 B BB88B 8 B BBB8 B8B a B 2222222333333333333444444

WEEKS

Figure 42.13

BATCH (PRIME TIME) PERFORMANCE PERCENTAGE OF JOBS MEETING REQUESTED 15 MINUTE IWT

100.0 .....,..,v1(lN'~ !, T

95.0+--~II-----+-l------

90.0+-------~----------

85.ol.,., . .-,-,-.-r-r,.,--",.,--"" "-'rT"T

JJASONOJFMAMJJASONOJFMAMJ U U U. E C 0 E A E A P A U U U E CaE A E A P A U NLGPTVCNBRRYNLGPTVCNBRRYN B BBB 8 B B B B888 a a BBa a aBe a B aa 2222222333333333333444444

WEEKS

Figure 42.14

402

Page 14: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

\

\ ..

Figure 42.15 IMS (Prime Time) Performance. The IMS transaction service time goal shows the service degrade until September 1982 when tactical management applied more resources to support IMS, with the resultant step increase in service. Service remained stable for 4 months until a new application again overloaded IMS, and resources were acquired to restore service levels.

IMS (PRIME TIME) PERFORMANCE

M E T

PERCENTAGE OF RESPONSES THAT MET SERVICE TIME GOALS

100+-----------------

95 A I

90'+-----------------

J J ASOND J FM A MJJ ASONDJFM A MJ UUUECOEAEAPAUUUECOEAEAPAU NLGPTVCNBRRYNLGPTVCNBRRYN BBBBBBBBBBBBBBBBBBBBBBBBB 2222222333333333333444444

WEEKS

Figure 42.15

M,anagement concern with operating system overhead (which is not usually directly billed but is distrihuted through the pricing mechanism of the computer facility) is addressed in Figures 42.16 and 42.17, shown in the color section.

Figure 42.16 Prime Time HaI'dware CRU Totals. The unit of work. the computer resource unit (CRU) , is a compOSite of CPU time and I/O counts, weighing the processor time more heavily. The three lines show the total growth of work (top), work directly mellsurable and attributahle to a task (middle), and the difference in total and identifiable work (bottom). Thus, the bottom line measures the operating system overhead that is not directly attributable to individual tasks. Downward spikes in the three graphs are the decrease in workload during weeks with a major holiday.

Fi~ure 42.17 Plot of Overhead and Identified Hardware CRU. Data of Figure 42.17 are expre!'sed as percentages of total work. The middle reflects identified work and the bottom reflects cperating system overhead. Excluding a brief spike in 1981, which was due to improper installation of software maintenance, the system overhead has remained a consistent percentage of totAl work, as a well-designed operating system shoulc1.

Management is concerned with overall cape,city while meeting these goals.

Figure 42.18 Hardware CRU Growth Toward Capacity. Total work delivered is plotted against the installed capacity in absolute computer resource units.

403

Page 15: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

HARDWARE CRU GROWTH TOWARD CAPACITY PRIME SHIFT ONLY

850000j 800000 ~ 750000 j ~J ~ 700000l .. 650000-:1 " ~ J

~~~~~~I-1 'GO.\)4 'f\ l't "'I.t'lif'/'8"f"! C 500000 '\it y~ :! "

~ :~~~~~ *" • 350000 " r 300000 250000 200000

;~~~~~~-'r'-" ,'-" ,,-r,T'-'rT'-'rT' "rT' "c-r, "c-r, ,,-.--, ,,-.--, ,,-.--, ,'~, JJASONDJFMAMJJASDNDJFMAMJ UUUECOEAEAPAUUUECOEAEAPAU NLGPTVCNBRRYNLGPTVCNBRRYN 8888888888888888888888888 2222222333333333333444444

WEEKS

Figure 42.18

Figure 42.19 Weekly Percentage of Current Prime Capacity. Work delivered is expressed as a percentage of installed capacity.

WEEKLY PERCENT OF CURRENT PRIME CAPACITY

40

20

JJASONDJFMAMJJASONDJFMAMJ UUUECOEAEAPAUUUECOEAEAPAU NLGPTVCNBRRYNLGPTVCNBRRYN 8888888888888888888888888 2222222333333333333444444

WEEKS

Figure 42.19

404

Page 16: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

, ~ ~

~ I.: ,-,'. f,: f

(\ ~"

i ~. \. ('

:.: ,

~ ~~

~' ~.

~ ~

~; k: !.

~: ~~

~ ~ ~. \,~ ~:

~ "

~ , ~'J

f f, ~:

i ~ k

I, ito""

~ i

, ~

\

Figures 42.18 and 42.19 show that capacity was exceeded for a 7-week period in 1982. This was known in advance. and management chose to ride out the capacity shortage until the new processor was installed because of financial pressure at that time. Because they received advance notice of the decision to ride out. users cooperated. and complaints were minimal.

Management wanted to know the relative usage of prime and nonprime shifts.

Figure 42.20 Distribution of Sun Company Batch by Shift. Percentage of total CRU (nonprime slightly higher than prime) is plotted showing a consistent distribution. Unfortunately. management had hoped to persuade customers to migrate from prime to nonprime time during this period. This plot shows that the particular incentive pricing strategy chosen was ineffective since there was no significant change in the percentage of work in nonprime time. A new incentive pricing was then chosen. which did work.

DISTRIBUTION OF SUN COMPANY BATCH BY SHIFT 100.01 )(HlllElIl!l!1I HHHHHHHHHUlEHHlE !EHUII lElElflEH II!!H!! lEHHHUlElIlOE HHHlllEIIHIIHHIUEJ(

P . E R j C E 75.0 N

o F 50.0

.T

A 25.0

F A E N 8 8 8

2

LEGEND: SHIFT

A P R 8 2

M J A U U N L G 8 8 8 2 2 2

WEEKS

*-)E""* ALL BATCHCRU +-+-+ PRIME BATCH

Figure 42.20

S 0 N D J F E C 0 E A E P T V C 8

8 8 8 8 8 2 2 2 3 3

6- 5 -a NON-PRIME BATCH

Management needs to know how capacity is being used. What categories of work use ~lhat portion of the cap.'icity? What business elements (regions or companies) are driving the growth in usage? How are individual companies within the organization using the computing facility? These types of questions are answered in the following graphs. shown in the color section.

405

Page 17: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Figure 42.21 Overall Category Breakouts. Percentage of work is distributed by the six major workload categories of batch, TSO, CICS, ADABAS, IMS, and non-IWT scheduled batch), in relative order from top to bottom, showing a slight decrea.se in batch and a slight increase in TSO work.

Figure 42.22 Overall Regional Breakouts. Total CRU distributed by the seven major business elements of the Sun Company (Corporate Financial Division, Exploration and Production, Human Resources, Information Systems Division Staff, Network Services, Refining and Marketing and Sun Information Commercial Services).

Figure 42.23 Regional Breakout by Work Type and Shift. Total CRU of a particular region, SIS, is further plotted as CRU to expose the type of work and relative growth within this business element. This level of detail is I'.ecessary in order to project future requirements from past history. It allows planning for tbe capacity change that will occur when SIS leaves the network in 1985.

Figure 42.24 Category Breakout by Region and Shift. This is the inverse of the prior graph. A work category, CICS, is'decomposed into business elements that consume CRU in the CICS subsystems, and l:iighly dissimilar patterns are seen for different divisions.

Figure 42.25 Regional Percentage Breakout by Work type and Shift. This graph shows the usage of one business element, E&P Company, by work type. The upper lines separate batch and TSO. The double-humped bottom line is most interesting because it demonstrates a failed application project. A large application wss designed and tested (the first and smaller peak). The commitment was made to inplement based on this test, but the actual execution costs were not evaluated during the testing phase. As the system entered production, its resource consumption grew (the second and larger peak), and the design was recognized as too expensive to execute and was terminated. This graph was effective in instituting a new policy to require that execution costs be considered an inherent part of new applications during thei.r testing phase. No table of data was as effective as this graph.

SUMMARY

This paper has shown that a wide range of management questions can be easily aIlswered with data in the performance data base, which is produced from operating system created accounting and performance records. As a brief introduction to computer metrics, it was not the goal to present specific results as much as it was to demonstrate management's need for these analyses and to show that analysis can be done with minimal expense. The use of a common data. source for daily performance evaluation and management, system tuning, billing, and capacity planning has significantly reduced conflicts between different operating groups that heretofore reported their own data in their own fashion. By managing the system based on measured service objectives, user and supplier have botb agreed on what constitutes acceptable service, and the supplier can manage resource acquisition to ensure that acceptable service is delivered.

Perhaps the best demonstration of the value of computer metrics comes from the continued use of the PDB at Sun Company since 1976 as the single source of data for managing service, capacity planning, and cost recovery of resources while that data center grew from two 2.5 MIP processors to the present 8 CPUs, totaling over 37 MIPs, without any increase in the 5-person staff of the computer metrics group.

406

Page 18: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

PRIME TIME HARDWARE CRU TOTALS 800000 750000 700000 650000

C 600000 R 550000

U 500000 r~-iIrI~~~~~~~~r1i 450000 T 400000 o 350000 ri'Qffi~:I/'"'I!I T 300000 A 250000 L 200000 . ---..di!IIiItiJ: S 150000 ~lIII~ ~ ~

100000 u "

50000 "'r-I'""'"'T""..,.-,........--r-T""'T-'-T"'"""T'''''''-'''--'-"T""...,.....,--.-....,.....,r-r..,.-,-.,-

LEGEND: TYPE

J J A SON D J F M A M J J A SON D J F M A M J U U U E C 0 E A E A P A U U U E C 0 E A E A P A U N L G P T V C N B R R Y N L G P T V C N B R R Y N 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 222 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4

WEEKS

8H8H8 IDENTIFIABLE CRU 8H8H8 TOTAL CRU'S

Figure 42.16

8H8H8 OVERHEAD CRU'S

PLOT OF OVERHEAD AND IDENTIFED HARDWARE CRU ~ l~~~~IIIIEIII~IBllmlml ~"~I~lIm'B'E'm'~"'~Ii~"m'm"m' ~lIm'ma~m!l~1II1

R 85~ C 80

~;~~B8MII~ 65 60

o 551 r 50

45 T 40 o 35 T30.~Ii~ A 25 L 20...,.....1'""'"'T""..,.-,........--r-T""'T-,-T"'"""T'--.-~-"T""...,.....,--r-....,.....,r-T'"..,.-rT

LEGEND: TYPE

J J A SON D J F M A M J J A SON D J F M A M J U U U E C 0 E A E A P A U U U E C 0 E A E A P A U N L G P T V C N B R R Y N L G P T V C N B R R Y N 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 222 2 222 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4

WEEKS

8H8H8 IDENTIFIABLE CRU EH8Hs TOTAL CRU'S

Figure 42.17

407

8H8H3 OVERHEAD CRU'S

Page 19: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

~'1 ~ 1;

" ~

r f:

j.: t i ~ f ~~ ~ v

f :' .I-, ~ ;; ~ r ,

" ,~ .. 'i. .~ ?~ ;, ~,

r i:

p E R C E N T

C R U

OVERALL CATEGORY BREAKOUTS

75

50

0000880800009898000000980000098000000000000098000000

31DEC82 19FEB83 10APR83 30MAY83 19JUL83 07SEP83 270CT83 16DEC83

WEEVS

LEGEND: CATEGORY e-e-e TOTAL e-e-e TSO ~CICS ~IMS

Figure 42.21

e-e-e BATCH IWT ~ ADABAS NUC ~ ERRORS ~ ONLINE

OVERALL REGIONAL BREAKOUTS SHIFT=PRIt1E

6000001

~~ 6 400000 ~ {\ T A L

C 200000 ~

~@~ 12/31/82 03/16/83 05/30/83 08/13/83· 10/27/83 01/10/84

LEGEND: REGION

vJEEK

e-e-e TOTAL W e-e-e CNF +-++ ERRORS ~NS ~SIS

Figure 42.22

408

e-e-e CFD ~E&P ~ ISDOVR *""*""* R & M

Page 20: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

~:: ~ t{, }.

~ ,-~, ~ ~

~r

f;

f. f' " f t " f· i ~ ~~ 1 };,

~, )

; \'

~', }; ~,

~'. t; j: t.;

~ ,. ~ ~ }l n.: ~' I' j-,

fi f' ~\

~~

~ ,",I

! I

, \\

! ~ It ~,

CATEGORY BREAKOUT BY REGION AND SHFT 10°1 D888888888800888888888888888800888800888888880088880

p E R C E N T

C I C S

C R U

75

50

25

0t~, 31DEC82 19FEB83 10APR83 30MAY83 19JUL83 07SEP83 270CT83 16DEC83

WEEKS

LEGEND: REGION s-s-e TOTA ~ ERRORS

e-e-e CFD ~NS

e-e-e E & P ~SIS

figure 42.24

REFERENCES

Because research in computer metrics has been primarily ad hoc. pragmatic. and specific to an installation's needs. references are primarily in the anual' proceedings of the Computer Measurement Group. or,in the proceedings of User's Groups of specific vendors' hardware (such as IBM's. SHARE and GUIDE. DEC's DECUS and CDC's VIM).

1. Meri1l. H.W. "Barry" (1980). Meri1l's Guide to Computer Performance Evaluation. Cary. ~C: SAS Institute Inc •• 336 pp.

2. Dodson. George W •• et al. (1983). Proceedings of the 1983 Computer ~easurement Group International Conference. Phoenix. AZ: The Computer Measurement Group. P.O. Box 26063. 481 pp.

3. Heidel. Ruth (April 1980 - July 1982). Computer Management and Evaluation: Selected Papers from the SHARE Project. Chicago. IL: SHARE Inc •• Volume VI,662'pp.

4. SAS Institute Inc. (1982), SAS User's Guide: Basics, 1982 Edition, Cary, NC: SAS Institute Inc., 923 pp.

409

Page 21: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

ABSTRACT

II. REDUCING CPU CONSUMPTION WITH PROPER I/O BLOCKSIZE AND BUFFERING

Dr.H.W. Barry Merrill

Merrill Consultants

This paper postulates that CPU cost. real memory cost. and DASD storage cost are jointly optimized when the blocksize and buffer number are chosen in such a way as to minimize BUFNO while moving one track of data at a time. Furthermore. through exits in DF/DS at open. it appears possible to override poor blocksize choices to reduce the CPU real memory and elapsed time without reblocking the data. Finally. since the true optimum requires maximization of blocksize (and. hence. a potential recompile). another exit is discussed that can allow identification of programs that will need recompiling. These exits allow an installation to migrate user data safely., increase its blocksize. and specify the optimal buffer number. with total user and application transparency.

INTRODUCTION

The real resources that a task consumes when performing sequential input or output operations are processor execution time (CPU seconds). real memory pages (average working set size). real memory occupancy ti.me (page-seconds). the number of blocks of data transferred (EXCPs). the number of physical operations necessary to transfer the data (SIOs). and the resultant elapsed run time. These real resources can be reduced when the physical characteristics of the data transfer operations are matched to the hardware and operating system design.

The present sequential I/O design has not changed in principle ~ince OS/360. The user's program requests records that the operating system combines into blocks. A block. when stored on a media. exists as a contiguous physical entity. and the length of a block is its BLKSIZE value. Blocksizes can be as small as 16 bytes or as large as 32.768 (actually. 32.760 is a software limit in MVS). if the media can support that much data.

When' I/O is transferred between media and memory. the smallest unit of transfer is a block. However. sequential access methods provide for movement of more than one block in a single operation by allowing you to specify (in JCL or by system default) some number of buffers using the BUFNO parameter.

A;I.though the user requests records that are decoded from a block by the access method. BUFNO of these blocks is transferred by one physical operation called a start I/O. or SIO. Thus. the blocksize is an attribute of the file. and the buffer number is set when the file is opened.

For many years. performance analysts have shown appreciable savings in real resources by increasing the blocksize and buffer number. but few installations have taken aggressive action to correct poor choices.

There are several reasons for their inaction:

- The real cost. in management-understandable terms. had not been shown.

410

Page 22: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

- A systems programmer claimed that more real memory would be needed, and no immediate contradictory evidence was demonstrated.

- The installation was willing to change the blocksize. but it could not guarantee that the change would be transparent. The installation's programs can have internally-specified blocksizes. which would require identification and recompilation. The perceived personnel costs of making the changes outweigh ted the perceived cost savings. The application personnel have insufficient knowledge of JCL to be sure of themselves and will not change something that is working now.

This paper shows the real cost of poor choices of the blocksize and buffer attributes. proves that the system programmer's claim is false. and discusses the use of open exits that permit resolution of the final three objections.

THE EXPERIMENT

Twelve pairs of sequential files containing the same 69.000 80-byte (nonrepeating character) logical records were built with blocksizes of 800. 1680, 2480. 3360, 4800. 5440, 6320, 7440, 9040. 11440. 15440 and 23440 bytes on 3380 disks.

A QSAM assembly-program (written by Carol Toll and shown in Appendix I), which did nothing but OPEN, GET. and PUT between each pair of files, was executed repeatedly • iterating the number of buffers from 1 to as many as were necessary for BLKSIZE times BUFNO to exceed the track size. Note that BUFNO is limited to a maximum of 30 by MVS and. thus. only if blocksize is greater than 1076 can full-track I/O be performed on 3380s.

The SMF step termination records were collected and analyzed with the SAS System to determine the impact on real resource costs. Linear regression was used to determine the CPU cost (using total'CPU TCB plus SRB) of each block and each SID. The regression results are provided in Table 42.12. All runs were executed on a 3033-MP under MVS/370 SP1.3.1 in the fall of 1983.

Table 42.26 Regression Results 3033 MP

PROC SYSREG DATA = STEPS; MODEL CPUTM ~ SID EXCP;

SSE 3.767

DFE 273

MSE .0138

Parameter Variable estimate

Intercept 1.1944

SID 0.001389

EXCP .0001516

F ratio

Prob F

R-square

Standard error

0.133

.000020

.0000046

3823.4

.0001

.9655

T ratio

89.88

68.11

32.73

The equatiori of the total CPU time (TCB + SRB) in seconds as a function of SIOs and EXCPs for QSAM I/O on an MP3033 with MVS/370 is:

* * CPU seconds = 1.1944 + 0.001389 SIOs + .0001516 EXCPs

411

Page 23: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

INTERPRETATION OF REGRESSION RESULTS

The INTERCEPT is the asymptotic CPU time as the data, length transferred per S10 (BLKSIZE times BUFNO) grows infinitely large. The total data in the file were (69000 times 80) 5.52 million bytes. Thus. the CPU cost to process 1 byte of data (excluding

,the CPU cost of the actual I/O operations. which is described by the SIO and EXCP coefficients) is:

CPU seconds * 6 per byte = 1.1944/(5.52 10 ) = 216 (nanosecond/byte)

Note: if the processor is rated at 5 MIPS. 1 instruction requires 200 nanoseconds. Thus. it appears that 1 machine instruction on the average is required to process 1 byte.

The SIO coefficient of 0.001389 seconds. or 1.389 milliseconds. is the CPU cost of each physical I/O operation and is independent of how many blocks were transferred. This is the cost of physical transfer.

The I/O UNITS coefficient of 0.0001516. or 151.6 microseconds. is the CPU cost of each block transferred within a QSAM SIO. This is the cost of managing each buffer's data.

USING THE EQUATION

With this equation for CPU cost to perform QSAM I/O. it is easy to calculate the impact of changing the blocksize. For example. a 1000-byte blocksize using the default QSAM BUFNO of 5 can be compared to half-track blocking of 23000 and BUFNO of 2.

BLKSIZE = 1000 BUFNO = 5 EXCPS = 10000 (assumed) Now SIO EXCPS/BUFNO 10000/5 = 2000

and CPU = 0.001389 * 2000 + .0001516 * 10000 = 4.294 Now if BLKSIZE = 23000 and BUFNO = 2

218

seconds.

then EXCPS = 10000/23 = 435 and SIO = 435/2 and CPU = 0.001389 * 218 + 0.0001516 * 435 0.3687 seconds.

PERCENT CPU WASTED = (4.294 - 0.3687)/4.294 = 91.4%

The CPU cost equation was then applied to all QSAM activity for a day from two 3081Ks and a 3033 MP. Then two possible blocksize choices were compared: IBM's 6K recommendation or the optimum 23K half-track I/O (Table 42.27).

Resource

EXCP (x 1000)

S10 (x 1000)

CPU Seconds

(6K saved

Table 42.27 Daily Cost Savings with Increased Blocksize

DISK

Present 6144 count blocksize

8190 2885

1711 507

3618 1141

2447) half track saved

412

Half-track blocksize

1004

502

785

2833 seconds

Page 24: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

\

\ ..

TAPE Resource Present count 32760 blocksize

EXCP (x 1000) 10860· 3254

SIO (x 1000) 2181 3254*

CPU seconds 4675 2724

saved 1951 seconds daily CPU saved (-QSAM) = 4784 seconds

* Increased because buffer was limited to one. whereas there is currently no limit on BUFNO.

Cost extension beyond QSAM If the true cost of an SIO from the QSAM analysis can be applied to all non-VSAM SIOs. the total processors seconds spent in I/O can be estimated. The daily SIO counts by access method were estimated from t.he measured EXCP count. and the QSAM S10 cost was applied to the estimate (Table 42.28).

Table 42.28 Estimated CPU Costs Attributed to I/O Operations

Access EXCP Estimated Estimated method count SIO count CPU consumed

QSAM 19.050.000 3.892.000 8.293

BSAM 6.564.000 6,564,000 9.117

BISAM 2.050.000 2,050,000 2.847

EXCP 6,186,000 6,186,000 8.592

QISAM 1,635,000 1,635,000 2,271

BPAM 1,735,000 1,735,000 2,409

BDAM 3,684,000 3,684,000 5,117

SPOOL 25,484.000 509,000 707

I/O Total 25,746,000 39,455

Total daily 3330 seconds recorded 340,000

Total daily MVS seconds recorded 92,349

Total daily TCB seconds 215,793

Total daily SRB seconds 31,858

Total daily TCB·+ SRB seconds 247,651

Non-VSAM I/O cost is 39,455/247,651 = 16%

413

Page 25: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

OPTIMUM PERFORMANCE

The QSAM DASD experiments were graphically analyzed. Knowing the cost per I/O did not motivate users or management, nor did it answer the system programmer's concern for memory. Each of the nine resources was plotted against the blocksize for all runs. The number of buffers used for tla,at run is printed at the observed intersection, allowing three-dimensional analysis'on" two dimensions. These nine plots show the range of resource from maximum to minimum, as well as the shape of the trend.

Figure Title

42.29 Elapsed Run Time

42.30 Step CPU TCB Time

42.31 Step CPU SRB Time

44.32 Step Total CPU Time

42.33 Physical I/O Operations

42.34 Blocks of Data Transferred

42.35 Real Memory Page Occupancy

42.36 Average Real Memory Working Set Size (K)

42.37" Private Area Virtual Size (K)

ELAPSTM

CPUTCBTM

CPUSRBTM

CPUTM

SIO

EXCPS

PAGESECS

AVGWKSET

MAXADRSP

Ratio of maximum to

minimum value

32:1

4:1

23:1

5: 1

70:1

14:1

10:1

4:1

1.5: 1

Figures 42.29 through 42.37 clearly show monotonic, dramatic reductions of resources as blocks:l.ze is increased.

The second set of figures shows three of the preceding resources versus data length per SIO (BUFNO times BLKSIZE). Here the effect of full-track I/O per SIO is clear. .

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

ELAPSTM 400

200

I :

ui hili """ r" 1 r'! 'Tll' 11'TTf"'TT'11TTT''''''~

L.'OOO 15000 20C~O

Figure 42.29 Elapsed Run Time

414

Page 26: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

, ~.

aSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

CPUTCBTtJ, D.O

2.0

0.0

-j ,

; j

~ 1

~ ~

. , i ,3 j' 11 I I

II Ii

~~T"""""'" !"II'''''I''" o 5000 10000 15000 20000

BLKSIZE

Figure 42.30 Step CPU TCB Time

aSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

o 5000 10000 15000 20000

BLKSIZE

Figure 42.31 Step CPU SRB Time

415

Page 27: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

.,

.::; .,

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

CPUTM 7.5--]

J

5.0-i

j j I

1 1

2.5-;

i j

a . O~TTT"'1l"TTTO rt-T"rl"1 Iii iii I I Iii iii iii i , I , I , i i • I

o 5000 10000 15000 20000 25000

BLKSIZE

Figure 42.32 Step Total CPU Time

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

SID 4000

2000

o

2 1

J 2

5000 10000 15000 20000 25000

BLKSIZE

Figure 42.33 Physical I/O Operations

416

Page 28: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

EXCPS 7500 j

5000

2500

o

·0

• •

""

""" " " , "

5000 10000 15000 20000 25000

BLKSIZE

Figure 42.34 Blocks of Data Transferred

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

PAGESECS 600

400

200

o

o

11 " .. i •

• , I i 2 5

5000 10000 15000 20000

BLKSIZE

Figure 42.35 Real Memory Page Occupancy

417

Page 29: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

AVGWKSET 150

100

50

o

o 5000 10000 15000 20000

BLKSIZE

Figure 42.36 Average Real Memory Working Set Size (K)

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

MAXADRSP 700

600

500

400

o ~ooo 10000 15000 20000

BLKSIZE

Figure 42.37 Private Area Virtual Size (K)

418

Page 30: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

Figure

42.38

42.39

42.40

Title

Total CPU Time

Real Memory Occupancy

Average Working Set

CPUTM

PAGESECS

AVGWKSET

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

CPUTM 7.5

5.0

2.5

o 30000 60000 90000 ~2000:::

DATA LENGTH

Figure 42.38 Total CPU Time

419

Page 31: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

400

200 • 4 ~ § 5 4

o

o ,30000 60000 90000 120000

DATA LENGTH~

Figure 42.39 Real Memory Occupancy

QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT

DATA LENGTH

Figure 42.40 Average Working Set

420

Page 32: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

IDENTIFYING PROGRAMS THAT SPECIFY DCB ATTRIBUTES

1m lementation- of optimal I/O always requires JCL changes. If the installation is wi:e and demands that all DCB attrihutes are specified externally to the program, then only JCL changes are required. However, since few installations have such stringent enforcement of standards that can be assumed to be in effect, implementation of optimal blocksize can be accelerated by using the DFDS~0l'EN EXIT described in the excellent Washington Systems Center Technical Bulletin GG2~~9306-0~, "Using Data Facility Device Support for DASD Space Management Assistance, by P •• Henning.

The open exit presents two DCB areas: one is the unmodified user's DCB, and the other is open's DCB area with the JFCB merged into it. You could write your own code that examines the user's DCB area to execute in this exit. If the DCB attributed BLKSIZE is specified in the user's DCB, then an SMF record can be written from the exit, identifying those programs that require recompile before their. files can be reblocked. Thus, by use of the DFDS OPEN EXIT, the installation can reb lock data sets and guarantee that they will create no ABENDS. A write to programmer (WTO ROUTCDE=ll) can also be issued from DFDS OPEN EXIT IFGOEXOB so that (if programmers read their SYSMSG) you can advise them that they are violating standards.

ALTERING BUFFER NUMBER IN DFDS OPEN EXIT

Although not yet tested at Sun Company, serious investigation of using the DFDS OPEN EXIT to alter the BUFNO parameter is being studied. Even though the preceding analysis clearly shows that increased blocksize achieves true optimum, there is sufficient return in processor utilization alone to justify the programming and testing time to code this exit. In the exit, the number of buffers can be expanded to the correct number for full-track data transfer, without alteration of either the user's JCL or the user's programs. It is hoped that this work will be presented at a future meeting of SHARE. These additional references are useful for technical examples and discussion of the exit:

APPENDIX I

Data FaCility/Device Support Users' Guide SC26-3952-0 Technical Newsletter to SCZ6-3952-0 SN26-0888 Search INFO/SYSTEM File A Keywords DFDS EXIT.

Assembler program written by Carol Toll to perform QSAM I/O

WITELOOP ENTER REGEQUS=YES OPEN (DDl"DD2,(OUTPUT))

LOOP GET DDl,RECORD PUT DD2,RECORD B LOOP

EOF CLOSE (DDl .. DD2) LEAVE

DDI DCB DDNAME=IN,MACRF=GM,DSORG=PS,EODAD=EOF DDZ DCB DDNAME=OUT,MACRF=PM,DSORG=PS RECORD DS XL32760

END

421

Page 33: COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE

APPENDIX II

Evolvement of Workload Efficiencies Project at Sun Company

Gary Miley

An opportunity was perceived within Sun Company to improve the efficiency of resource utilization at the business data center. Management support initiated a project team to analyze and study the workload profile and to recommend and implement, where possible. actions to reduce resource consumption of data-processing services.

Initial project activities focused on the preceding analysis of access method I/O usage and performance. Analysis of various blocksizes and numbers of buffers for sequential processing supported the concept of full-track blocking on 3350 storage and half-track blocking on 3380 storage.

Results of the comparison showed a dramatic need to develop installation recommendations for both tape and disk data sets. The resulting blocksize recommendation for DASD sequential data sets was a compromise between the optimum and the actual data center data management environment of mixed device types (3350. 3380). The compromise for DASD was 9080. with no compromise on tapes at 32.760 blocksize. Recognize that even the 9080 DASD blocksize yields a near-optimum data transfer to 3380. with the QSAM BUFNO default of 5 buffers.

Further analysis of the SMF data revealed some surprises that would dictate a data set approach for Sun's workload: tape data sets should be the primary target for improving blocksize performance. The second category of data sets to review was temporary disk files - an analysis of proclibs would be in order here; and finally, permanent disk data sets should be moved with larger blocksize.

The efficiencies project then confronted these issues: how to communicate recommendations to the user community and how to identify the best candidates for reblocking.

The issue of user communication was addressed by the following strategy. An on-line information base was created and referenced by an article in the corporate information systems periodic newsletter announcing the existence of the efficiencies project. After a pilot effort internal to the information systems function, personal visits and presentations would follow in the user organizations.

The issue of identifying reb locking candidates was addressed by the use of UCC's TMS product and Software Module Marketing's DMS/OS product to identify the number of accesses (opens) since data set creation. This strategy allows the information systems function to approach the user community with intelligent information that quantifies the benefit to the user in the form of reduced resource cost. The project acknowledged that not all owners of sequential data sets would increase blocksize to the project recommendation; to improve the system performance for those data sets. a DFDS open exit module will be implemented to increase the number of buffers for sequential access. This exit would calculate the number of buffers required to transfer data at full track data transfer, subject to the SAM-E restriction of 30 buffers per SIO.

To summarize Sun's approach: quantify and communicate to user organizations the benefits to both the users' productivity and resource cost reduction by improving the performance of accessing sequential data, and implement the DFDS OPEN exit to gain system improvements even when users fail to reblock their data sets. In either case, system performance gains will be realized.

422