computer metrics: measuring and managing the performance
TRANSCRIPT
1. COMPUTER METRICS: MEASURING AND MANAGING THE PERFORMANCE, RESOURCES, AND COSTS OF LARGE CONPUTER SYSTEMS
ABSTRACT
Dr. H.W. Barry Merrill
Merrill Consultants
Managing and measuring the performance, resources, and costs of a large complex of central and distributed computers with distributed interactive users requires the establishment of repeatable, meaningful, and measurable service objectives (such as interactive respons~ time, batch turnaround, and availability). It requires accurate measures of resource consumption (processor time, I/O device activity, real memory utilization) by which the capacity to meet those service objectives can be acqu1rErl in the most timely fashion and at the most cost-effective rate. A case study describes a cost-effective measurement system that captures and analyzes data and answers typical management questions with interactive graphical reporting. The configuration and workload of the computer system are also described.
INTRODUCTION
Computer metrics, often called computer performance evaluation, capacity planning, or sys·tem tuning, is an emerging technology in large, multi-user computer systems. Mot,ivated by the conflicting goals of increased user productivity anrl reduced costs, the data center management of these large systems requires hard data to justify requests for resource acquisition. Most of the research to date has been ad hoc and unique to each specific computer installation. Reference (1) has been reviewed as the landmark reference in the field; references (2) and (3) contain 164 of the best current papers ranging from tutorial to technical; and reference (4) describes SAS software.
This paper describes a case study of an approach to computer Elctrics used in over 1400 installations worldwide. The paper, which is also a brief tutorial on the subject, first descrihes goals and methods of the technology and then quantifies the environment of the case study. The capabilities and costs of the measurement systems are described, and examples of graphical management reporting [,re discussed. The cost of the measurement system is shown to be effective, and the conclusions are presented.
GOALS AND METHODS
The basic problem in large installations is the management of r.hared reRources, balanced by the service reqnirtlments of users of this network. The most important facet of the solution is the establishment of service objectives. Only when the supplier of computing has quantified the service to be delivered in a measurable fashion can the users of computing evaluate services received relative to the cost of computing.
To be successful, service objectives must:
- be measurable be repeatable
- be understandable by the typical user - corre]a~e directly with the user's perception of service - allow reflection of true exception conditions - be directly controllable by the resourceR applied by the
computer installation.
390
Successful service objectives for the four most important subsystems are described in Table 42.1-
A key additional ingredient in these objectives is the manner in which they are expressed. The use of average (mean) values has been found to be quite misleading. In general. the mean is an unsatisfactory expression of a service objective. Since the primary purpose of the service objective is to allow the supplier to communicate service to the user. the metric used must be human oriented. Mean values. in spite of their strong mathematical heritage, do not relate to human perceptions. A human wants to know what happens most of the time. The average value, which is the sum of all observations divided by the number of observations, is never actually observed by the user. By recognizing this need in computer service objectives and by expressing objectives as frequency of occurrence ("94 % of the time this will happen"). we have found not only a metric that relates to the user. but also one that meets the other criteria for effective service objectives. Specific techniques for establishing service objectives are addressed in (1). The actual measure of service used (internal response, turnaround, queue time, and so forth) is dependent on the hardware and software architecture that provides the computing and is a funct~on of the nature and purpose of the specific computing installation. The methbd of exposition (percentage of occurrences meeting a stated goal). however. appears almost invariant in well-managed computing facilities. '
Subsystem
batch
TSO
IMS
Table 42.1 Service Objectives
Measure
Percentage of jobs meeting requested IWT, (Initiation Wait Time).'
Users submit jobs requesting initiation wait times of 15 min •• 30 min.. 1 hour. 2, hours. or 4 hours. Time to initiate is 'measured by SMF.
Trivial transactions meeting 4-second internal response. TSO/MON na~me table defines trivial. Internal response measured by TSO!HON.
IMS queue met expected response.
Goal
94 %
92 %
95 %
,Service Time met expected response. 98 %
CICS
CONTROL/IMS measures input queue time and service time separately. Expected queue time is calculated based on transaction class or priority. Expected service time is calculated based on resources measured by CONTROL/ms.
FAST transactions met 4-second response. Internal response measured by PAIL Transactions classified as FAST if AMCT (I/O count) is less than five and transaction name is not in a table of "bad guys".
92 %
Therefore, capturing the service and resource data becomes a crucial element in managing the facility. Without measurable service objectives. when users complain. perhaps the wrong resources are expanded. By correlating service delivered with resources consumed. however. the limiting resource can be identified and options evaluated for cost-effectiveness. Additional resources can be acquired, the application can be rescheduled to a time when that particular resource is plentiful. or the application can be redesigned in light of the limited resource.
391
Measuring and managing service objectives is necessary for system tuning to identify and eliminate bottlenecks to performance. Management is not i.nterested in the raw power of the configuration to process data but. rather. in knowing the capacity in terms of how much work can be' delivered while meeting service objectives. This is called goal level capacity. The specific techniques described in (1) are summarized below. '
Analysis of capacity by workload 1I!.easurement requires these preconditions:
- The system must be tuned. Known bottlenecks to performance have been eliminated and the I/O configuration has been implemented to minimize contention. An untuned system displays erratic response. causing inaccurate capacity measurement.
- WQrk must execute when need~d by the user. The shape of the workload represents real demand required by the business and not an artificial shape created by the supplier's arbitrary resource or scheduling constraints. A batch scheduling system that relates directly to users' requests based on timeliness guarantees this condition. Batch scheduling systems based only on resource requirements generally fail this test since they place arbitrary constraints on when various classes of work are actually executed.
With these preconditions met. hourly resource data are analyzed. Since not all resource utilization is accurately attributed to the workload. linear regression is used to distribute the unattributed (but measured) overhead to the workloads that generated that overhead. The service objectives achieved during each hour are then plotted against workload to measure the knee of the response curve and to quantify the relationship between work and service. The initial result is the hourly capacity in work units per hour of the system (hardware. software. memory. and I/O configuration-dependent) to deliver work and concurrently meet service objectives.
The extension from hourly capacity to daily pri,me shift capacity is accomplished by first plotting the actual hourly workload profile. hour by hour of prime shift. The shape of the .profile is preserved. and that curve "7ith the same shape is raised until the peak value. of the profile for any hour equals the hourly capacity value. Integrating under the raised curve then provides the' real daily capacity. This technique simply redefines real capacity in terms of the present configt1ration and the present demand by users. It is' a stable measure unless the configuration is changed (by adding resources or by changing system or application software) or the demand profile changes. The shape is usually constant unless personnels' working hours are changed.
,,-.,
392
463
NOS/8E
Processor: CYBER 760 MIPS Rate: 10 MIPS Capacity: 10 1/0 Channels: 4 Central Memory: 262k Extended Memory: l000k
telecommunications lines
MVS Batch
Processor: IBM JOJ3MP MIPS Rate: 5 MIPS Capacity: 10 110 Channels: 16 Real Memory: 28MB Paging Memory: 8:50M8 Paging Devices: 10
MVS On-line
Processor: IBM 308IG MIPS Rate: 5 MIPS Cdpacity: 10 110 Channels: 24 Real Memory: 32MB Paging Memory: 750MB Paging Devices: I}
1 MB= 1024· 1024 bytes= 1.048,576 bytes
Configuration and Description of Ihe Five ProceSSOR
Figure 42.2
DESCRIPTION OF THE CONFIGURATION AND WORKLOAD
VWCMS
Processor: IBM 30))U MIPS Rate: 5 MIPS Capacity: 5 110 Channels: T 6 Real Memory: 16MB Paging Memory: 100MB Paging Devices: 2
11 Spool Devices 3500 MB
MVSTSO
Processor: IBM 3081 K MIPS Rate: 7 MIPS Capacity: 14 110 Channels: 24 . Real Memory: 32MB" Paging Memory: 700,,,",8 Paging Devices: 14
The Sun Company Information Systems an annual budget· of $ 36.000.000 equipment. and personnel.
Division manages a computer systems network with. for hardware. systems software, communications
Figure 42.2 describes the 5 central processors that support this network and the associated paging subsystems for the virtual memory systems. The 5 processors share 11 d:l.rect-access storage devices of SPOOL that are used for staging and exchanging jobs' input and output between systems. Once selected for execution by the job entry system (JES). the job executes completely on 1 of the 5 processors. placing 'its printed output on the SPOOL. ~~en the job is completed, output is transmitted to the remote locations for printing or display. Users communi.cate interactively through the telecommunications network to the processor that hosts their particular application.
393
Quantity
23
40
Quantity
60
400
12
Table 42.3 I/O Configulution - MVS Systems
Type
3330-1
3350
3380
TAPE DRIVES
Type
3420-Model 8
3420-Model 4
ON-LINE DISK VOLUMES
Transfer Rate (kilobytes per second)
806
1198
30CO
Total On-Line Disk Storage Megabytes: 140,360
Transfer Rate (kilobytes per second)
12JO
470
Storage Capacity
per volume (megabytes)
100
317
630
Table 42.3 describes the I/O configuration shared by the three processors using the multiple virtual storage operating system, which accounts for 90% of the workload. Tape drives are fully shareable among processors, with softwp.re allocating each drive to a single task when needed. Disk drives are fully shareable among processors so that the failure of a processor does not prevent access to data on a particular disk. However, to minimize contention delays, data on a single disk are application specific; that is, the disk contains data only for a specific application, such as TSO. The path (the control unit and channel) is logically isolated to the _processor in which that application normally executes. (Although there always exist some data, such as catalogs, that must be shared. logical isolation is the design objective in the placement of data and is maintained to the highest degree possible.)
Table 42.4 quantifies the telecommunications environment tbat supports the 4099 terminals that connect to the processor complex. Although host proceSSOl-S are locat(~d in two sites in Dallas, Texas, users of the network are located all across the Unite.d States and Canada, with heavy concentrations in Dallas, Philadelphia, California. and Illinois.
Quantity
4
Table 42.4 Telecommunications Environment
COMMUNICATIONS CCNTROLLERS
Memory Type kilobytes Protocol
3705 512 SDLC
Comten 512 Bisync, Async
394
Number of line!'; rer ('ontroll,!!
60
240
TELECOMMUN1CA7IONS LINES
Quantity Line speed (bits per second)
8 56000
150 9600
203 4800
102 300/1200
Table 42.S describes the workloads and systems that execute on the three MVS processors. The acronyms may be unfamiliar. so a brief description follows:
:!:uitiators
1MS
CICS
TSO
TNYLBUR
ADAMS
VTAN
JES
one batch job owns an initiator during its execution.
on-line system used for data base inquiry and update. especially with complex data.
on-line system used for update and inquiry that is simpler but faster than IMS.
interactive syptem used heavily for program development and execution of management (end-user oriented) ded.sionsupport: systems.
edit and submit interactive system.
data base manager accessed by TSO and bateh l1sers.
the primary terminal access manager.
job entry subsystem. controls batch and all printing.
Ta.ble 42.5 MVS Workloads
3033-MP Batch
3081-D TSO
3081-G On-line
40 Initiators 250 T50
Users
Test IMS Control 3 WYIBURS
2. Test IMS 4 PDABAS
Regions Nucleus
2. Test CICS Backup 1MS
Order Entry
TCAM
395
Production IMS
Control
8 IMS Message Regions
3 Product:ion
CICS
VTM1 Applications:
VTAMPRNT
Table 42.6 quantifies daily workload executed on the three systems. These counts of tasks and concurrent users clearly describe a very large system; there are about fifty installations of similar size in the United States alone.
MVS
CMS
CYBER
Batch Steps
Table 42.6 Daily Workload Volumes
CICS Transactions
IMS Transactions
TSO Prime Transactions
Concurrent TSO Users
Concurrent IMS Users
Concurrent CICS Users
Concurrent WYLBUR Users
Concurrent JES2 Remotes
Session Intervals
Concurrent Users
Batch Jobs
Time.-Sharing Sessions
22365
223510
98436
147719
213
500
476
10
147
2123
28
592
80
Table 42.7 quantif:f.es distribution of the budget at a high level. Only the staff that actually operates and manages the hardware and software described before (approximately 150 people) is included in the personnel cost. Application programmers and end users of these systems are excluded from this figure.
Table 42.7 Cost Distribution
Salaries
Local taxes
Electric power
Software rental
Maintellance of facilities
IBM CPUs and channels
CYBER CPU. disk
396
23 %
2 %
2 %
2 %
7 %
12 %
]0 %
Voice network 9 %
Disk drives and controllers 8 %
Dedicated lines 6 %
Dial-in lines 4 %
Tape drives and controllers 3 %
3705 communications controllers 2 %
Modems and so forth %
Miscellaneous 8 %
Annual Budget $ 36,000,000
Capabilities and cost of the measure~ent system To manage an installation of this size, we have found that it is not only ma.ndatory to measure service and resources, but it can be done in an extremely economical fashion, provided some intelliger.t choices are made. Table 42.8 quantifies volumetrics of the performance data produced th?t are thought to be required for effective management and measurement in this facility. (A volumetric is a generic term for data elements that describe service or resources.)
Table 42.8 Daily Record Volumes
Average record Record K bytes
Source count length of data
MVS
SMF 780564 248 194214
PAIl 223510 88 19668
CYBER 10131 80 810
Dayfile
rMS Account Cards l339] 80 1071
More specifically. detailed event recorch; written on the MVS systems (Table 42.9) show- both the quantity and quality of the data that are automatically created by the operating system' s accounting, resource measurement. and service measurement routinE's. In spite of the breaclth of vendo:t-created event records, we have found it necessary to use the operating system exit facilities to create thE' additional event records listed in Table 42.10.
397 •
, !~
~ -,
~ ~ ,'j 1"' f
r Table 42.9 ~:
ff Systems Management Facility (SMF) \' ~; Vendor-Created Records Written Daily ~~ ~
Logical SAS I records K bytes observations F-" r Type 0 (Sys startup) 1 1 '.,'
F Type 2. 3 Dump SMF 12 12 ~ :)
Type 4. 34 Step term 22365 7092 25302 I: 1: Type 5. 35 Job term 8200 1286 8200 /l,
~ Type 6 File print 8035 824 7474 ~:
~ Type 7 Lost SMF data 0 0 ~:,
Type 14 Input file 129115 38310 138 f: ;;, 2.:
Type 15 Output file 89968 25981 " f. ~; Type 17 Scratch 11885 1140 1
t
~, Type 18 Rename 104 14
:~ Type 20 Initiation 8759 930 2338 ~~ ;.;
Type 21 Tape mount 8172 360 8172 (.
),
i' f ~" Type 26 Job purge 9316 3205 8257 t ~~ ,.
1-' Type 30 Workload 45031 34057
~', Type 40 Allocation 115736 8711 44252 r~
!' Type 47-48 RJE Ses 970 69 1116 .' !; I; Type 50 VTAM buffer.s 314 18 I Type 52-53 RJE Ses 574 42 [
Type 62-69 VSAM open 31338 8656 x' f ~ Type 70 RMF CPU 75 47 150 r ~.
Type 71 RMF paging 75 31 75 ~
~ Type 72 RMF workload 8475 1686 2045 ~. Type 73 RMF channels 75 154 2751 I Type 74 RMF DASD I/O 150 3442 14548
Type 75 RMF Page I/O 788 123 788
Type 90 Operator acts 8
Type 110 CICS trans 49 1128 6648
Record Totals 499584 137370 " f,
! ~~
398
;.:
Type 129
Type 130
Type 175
Type 201
Type 210
Type 214
Type 217
Type 218
Type 225
Type 229
Type 2:;0
Type 231
Type 249
Type 250
TYJ'e 254
Table 42.10 Systems Management Facility (SMF)
Installation-Created SMF Records Written Daily
SMF Total record SMF
count K bytes
.Job initiation 8205 1148
Interval 199 105
VTAM terminal 45334 1587
Security 1749 182
WYLBUR session 195 21
Archival 14199 1349
TSO/MON system 653 1730
TSO/MON call 2777 690
.JES operator 4569 258
Tape mounc 7529 482
Audit tape 166 18
Application 20
IMS program 68367 12852
IMS tlansaction 98436 34156
ADABAS trans 876 140
·Installation Totals 253953 54723
SAS observations
8205
31794
195
9713
1560
2353
7529
166
20
68367
98436
876
These event records are written to the System Management Facility (SMF) file as their events occur. However, to measure response or resources requires that these event records be processed and synthesized into humanly-perceived events, such as command response time, program fleIDory usage, processor active time for a computer, and so forth. Conversion from raw SMF data to information is a complex software prnblem because of the comrlexity of the possible event records that might occur and because of the variety of data formats in the records written by the operating system. The solution was made feasible by a high-level language system, the SAS System, (4) that is powerful enough to handle this variety of data forms and is so efficient in processing this large volume of data records that it is the de facto stancllrd language for processing SMF data. The algorithms (1) that map the raw data to information are "rritten in SAS software. The data are in use at some 1400 installatioas worldwide, and they are referred to as the PDB (performance data base, after their end product., a SAS data library of information and reports). The execution resources to process the 200,000 kilobytes of daily event records into the PDb are (luantifieci in Table 42.11. Clearly, this processing of over 1 million records (typicalIy 200 bytes long) daily if, 60 CPU minutes per day demonstrates the remarkablE rower of the SAS System and the PDB algorithms.
399
Table 42.11 Monthly R~source Costs to Build the
Performance Data Base
Total monthly 3081-D Job description CPU minutes
CYBER systefu - total 46
VM system - total 16
MVS systems - total 1760
Major subsystems within MVS: CPU Min
Dumping accounting data to tape 476
Euild daily FDB from accounting dll.ta 992
Daily reports and backups 154
Weekly reports 73
Monthly reports 37
Build customer-splitout data base 28
Grand total of all systems 1,822
These 1,822 CPU minutes per month Are equivalent to only 60 minutes per day.
Goal level capacity analysis described earlier was performec'. on these tllI·ee MVS machines. That study found the prime-time (II-hour shift) goal level capacity to be 2,271 CPU minutes deliverable to batch each day, with batch service goals being met, or a total monthly (prime and nonprime) capacity of 148,653 CPU minutes. If v]e compare the total cost of 1,822 minutes monthly to build the PDB and to execute all the performance and capacity analysis reportf' with the configuration's monthly capacity, the cost of executing the total PDB measurement and reporting system represents only 1.2% of the total capacity! Not only does this demonstrate cost-effectiveness of the measurement system, but since the daily running can be scheduled in the least busy time of day, true cost is essentially zero becauRc capacity cost is set by peak time requirements.
Examples of management reporting The presentation of tables of numbers is often useful during analysis and is appropriate for presentations to technical audiences. However, communication of computer capacity and performan~e measures to the senior, nontechnical management requires graphical presentat:lon. (ftA picture is worth a thousand words. ft) The graphical capabilites of the SAS System make it simple to create a graphical display of the performance data base data.
The graphs are used to show mana.gement the quality of service delivered to the computing users. The use of capacity is also mapped to the internlll business organizations that consumed resources. Management can then determine if the business purpose served by a part of the company justifies that part's consumption of computer capacity.
400
I-
The performance data base contains trend data for each w~ek for the past several years. This permits simple graphical display versus time so that not only is the current service and resource consumption measured and managed (tactical performance ma.nagement), but also the long-range trends (strategic capacity planning) are tracked.
Management of many businesses is based on monthly data, but we have found the week a far more stable measure. Not only dotos weekly reporting provide more timely precursors of trouble, but also the trends are significantly more accurate and robust since each week has the same number of days. Monthly resource data points suffer from too much variability because there are as few as 18 and as many as 23 working days in a month. Even weekly data must be cleaned of outliers before mathematical analysis is applied; the 6 weeks during which major holidays- occur in the U.S. must be deleted from analysis if typical trends are to be observed.
Using this weekly trend data, management is kept informed of the productiv:f.ty and efficiency of the computing facility. The following. graphs are a subset of the graphs used in presentations to senior management and are available on-line to all levels of management. The manager simply logs on to the TSO application with his CRT terminal, enters a single command, and is presented with a menu from which graphs are selected. Graphs are created on-line in response to the menu selection. Each graph requires approximately 10 seconds elapsed time for creation and transmission on a 9600 BPS line.
Management first asks two questions of the computer facility: how good was the service, and how much work did we deliver? Figures 42.12 through 42.15 answer these quality-of-service questions:
Figure 42.12 CICS (Prime Time) Performance. Percentage of fast CICS transactions in prime time (7 a.m. to 6 p.m.) that received internal response time of less than 4 seconds is tracked. The step decrease in August 1981 was the result of a change in response roeasurement due to software maintenance.' The step increase in October 1982 was the result of additional resources (a new CPU).
Figure 42.13 TSO (Prime Time) Performance. Percentage of fast TSO transactions in prime time that received internal response time of less than 4 seconds is tracked. In spite of a substantial growth in number of users, tactical management of resources has kept TSO response very stable. Without measures of actual response, tactical movement of work to other processors and the incremental addition of resources (especiaJ.1y memory to meet growth) would not have been possible.
Figure 42.14 Batch (Prime Time) Performance. Percentage of batch jobs that met user-requested initiation wait time (IWT) of 15 minutes is seen to be very consistent except for 5 weeks. The failed weeks correspond to over-capacity weeks because outages usually reduced available capacity. Even bad weeks show 95% satisfaction for this critical IS-minute IWT category, which accounts for over half of prime time batch work.
CICS (PRIME TIME) PERFORMANCE PERCENTAGE OF FAST TRANSACTIONS RESPONDING IN LESS THAN 4 SECONDS
\ "" .~ 10°1 ~ j,'./ 'yilt
95i~~!------------------------
L 90'+---------------------------
J J A SON 0 J F M A M J J A S OND J FHAM J U UU E C 0 E A E A P A U U U E C DE AE AP AU NLGPTVCNBRRYNLGPTVCNBRRYN 88 eBB B a B B B B a B Bee Bee B B BBB B 2222222:3:3:3:3:3:3:3:3:3:3 3::1 4 4 4 4 4 4
HEEKS
Figure 42.12
401
:]
TSO (PRIME TIME) PERFORMANCE PERCENTAGE OF FAST TRANSACTIONS RESPONDING IN LESS THAN 4 SECONDS
100.0+----------------
95. o-P'InJ!....,..---t:~~iIfA~-----
90.0+-----------------
J J A SONDJ F M AM J J A S a NDJ F M AMJ UU U E CDE A E AP AUUUE C 0 EA E APA U NL G P TVC N BRRY NL G P T VC N B RRY N 8 B B B BBB 8 B BB88B 8 B BBB8 B8B a B 2222222333333333333444444
WEEKS
Figure 42.13
BATCH (PRIME TIME) PERFORMANCE PERCENTAGE OF JOBS MEETING REQUESTED 15 MINUTE IWT
100.0 .....,..,v1(lN'~ !, T
95.0+--~II-----+-l------
90.0+-------~----------
85.ol.,., . .-,-,-.-r-r,.,--",.,--"" "-'rT"T
JJASONOJFMAMJJASONOJFMAMJ U U U. E C 0 E A E A P A U U U E CaE A E A P A U NLGPTVCNBRRYNLGPTVCNBRRYN B BBB 8 B B B B888 a a BBa a aBe a B aa 2222222333333333333444444
WEEKS
Figure 42.14
402
\
\ ..
Figure 42.15 IMS (Prime Time) Performance. The IMS transaction service time goal shows the service degrade until September 1982 when tactical management applied more resources to support IMS, with the resultant step increase in service. Service remained stable for 4 months until a new application again overloaded IMS, and resources were acquired to restore service levels.
IMS (PRIME TIME) PERFORMANCE
M E T
PERCENTAGE OF RESPONSES THAT MET SERVICE TIME GOALS
100+-----------------
95 A I
90'+-----------------
J J ASOND J FM A MJJ ASONDJFM A MJ UUUECOEAEAPAUUUECOEAEAPAU NLGPTVCNBRRYNLGPTVCNBRRYN BBBBBBBBBBBBBBBBBBBBBBBBB 2222222333333333333444444
WEEKS
Figure 42.15
M,anagement concern with operating system overhead (which is not usually directly billed but is distrihuted through the pricing mechanism of the computer facility) is addressed in Figures 42.16 and 42.17, shown in the color section.
Figure 42.16 Prime Time HaI'dware CRU Totals. The unit of work. the computer resource unit (CRU) , is a compOSite of CPU time and I/O counts, weighing the processor time more heavily. The three lines show the total growth of work (top), work directly mellsurable and attributahle to a task (middle), and the difference in total and identifiable work (bottom). Thus, the bottom line measures the operating system overhead that is not directly attributable to individual tasks. Downward spikes in the three graphs are the decrease in workload during weeks with a major holiday.
Fi~ure 42.17 Plot of Overhead and Identified Hardware CRU. Data of Figure 42.17 are expre!'sed as percentages of total work. The middle reflects identified work and the bottom reflects cperating system overhead. Excluding a brief spike in 1981, which was due to improper installation of software maintenance, the system overhead has remained a consistent percentage of totAl work, as a well-designed operating system shoulc1.
Management is concerned with overall cape,city while meeting these goals.
Figure 42.18 Hardware CRU Growth Toward Capacity. Total work delivered is plotted against the installed capacity in absolute computer resource units.
403
HARDWARE CRU GROWTH TOWARD CAPACITY PRIME SHIFT ONLY
850000j 800000 ~ 750000 j ~J ~ 700000l .. 650000-:1 " ~ J
~~~~~~I-1 'GO.\)4 'f\ l't "'I.t'lif'/'8"f"! C 500000 '\it y~ :! "
~ :~~~~~ *" • 350000 " r 300000 250000 200000
;~~~~~~-'r'-" ,'-" ,,-r,T'-'rT'-'rT' "rT' "c-r, "c-r, ,,-.--, ,,-.--, ,,-.--, ,'~, JJASONDJFMAMJJASDNDJFMAMJ UUUECOEAEAPAUUUECOEAEAPAU NLGPTVCNBRRYNLGPTVCNBRRYN 8888888888888888888888888 2222222333333333333444444
WEEKS
Figure 42.18
Figure 42.19 Weekly Percentage of Current Prime Capacity. Work delivered is expressed as a percentage of installed capacity.
WEEKLY PERCENT OF CURRENT PRIME CAPACITY
40
20
JJASONDJFMAMJJASONDJFMAMJ UUUECOEAEAPAUUUECOEAEAPAU NLGPTVCNBRRYNLGPTVCNBRRYN 8888888888888888888888888 2222222333333333333444444
WEEKS
Figure 42.19
404
, ~ ~
~ I.: ,-,'. f,: f
(\ ~"
i ~. \. ('
:.: ,
~ ~~
~' ~.
~ ~
~; k: !.
~: ~~
~ ~ ~. \,~ ~:
~ "
~ , ~'J
f f, ~:
i ~ k
I, ito""
~ i
, ~
\
Figures 42.18 and 42.19 show that capacity was exceeded for a 7-week period in 1982. This was known in advance. and management chose to ride out the capacity shortage until the new processor was installed because of financial pressure at that time. Because they received advance notice of the decision to ride out. users cooperated. and complaints were minimal.
Management wanted to know the relative usage of prime and nonprime shifts.
Figure 42.20 Distribution of Sun Company Batch by Shift. Percentage of total CRU (nonprime slightly higher than prime) is plotted showing a consistent distribution. Unfortunately. management had hoped to persuade customers to migrate from prime to nonprime time during this period. This plot shows that the particular incentive pricing strategy chosen was ineffective since there was no significant change in the percentage of work in nonprime time. A new incentive pricing was then chosen. which did work.
DISTRIBUTION OF SUN COMPANY BATCH BY SHIFT 100.01 )(HlllElIl!l!1I HHHHHHHHHUlEHHlE !EHUII lElElflEH II!!H!! lEHHHUlElIlOE HHHlllEIIHIIHHIUEJ(
P . E R j C E 75.0 N
o F 50.0
.T
A 25.0
F A E N 8 8 8
2
LEGEND: SHIFT
A P R 8 2
M J A U U N L G 8 8 8 2 2 2
WEEKS
*-)E""* ALL BATCHCRU +-+-+ PRIME BATCH
Figure 42.20
S 0 N D J F E C 0 E A E P T V C 8
8 8 8 8 8 2 2 2 3 3
6- 5 -a NON-PRIME BATCH
Management needs to know how capacity is being used. What categories of work use ~lhat portion of the cap.'icity? What business elements (regions or companies) are driving the growth in usage? How are individual companies within the organization using the computing facility? These types of questions are answered in the following graphs. shown in the color section.
405
Figure 42.21 Overall Category Breakouts. Percentage of work is distributed by the six major workload categories of batch, TSO, CICS, ADABAS, IMS, and non-IWT scheduled batch), in relative order from top to bottom, showing a slight decrea.se in batch and a slight increase in TSO work.
Figure 42.22 Overall Regional Breakouts. Total CRU distributed by the seven major business elements of the Sun Company (Corporate Financial Division, Exploration and Production, Human Resources, Information Systems Division Staff, Network Services, Refining and Marketing and Sun Information Commercial Services).
Figure 42.23 Regional Breakout by Work Type and Shift. Total CRU of a particular region, SIS, is further plotted as CRU to expose the type of work and relative growth within this business element. This level of detail is I'.ecessary in order to project future requirements from past history. It allows planning for tbe capacity change that will occur when SIS leaves the network in 1985.
Figure 42.24 Category Breakout by Region and Shift. This is the inverse of the prior graph. A work category, CICS, is'decomposed into business elements that consume CRU in the CICS subsystems, and l:iighly dissimilar patterns are seen for different divisions.
Figure 42.25 Regional Percentage Breakout by Work type and Shift. This graph shows the usage of one business element, E&P Company, by work type. The upper lines separate batch and TSO. The double-humped bottom line is most interesting because it demonstrates a failed application project. A large application wss designed and tested (the first and smaller peak). The commitment was made to inplement based on this test, but the actual execution costs were not evaluated during the testing phase. As the system entered production, its resource consumption grew (the second and larger peak), and the design was recognized as too expensive to execute and was terminated. This graph was effective in instituting a new policy to require that execution costs be considered an inherent part of new applications during thei.r testing phase. No table of data was as effective as this graph.
SUMMARY
This paper has shown that a wide range of management questions can be easily aIlswered with data in the performance data base, which is produced from operating system created accounting and performance records. As a brief introduction to computer metrics, it was not the goal to present specific results as much as it was to demonstrate management's need for these analyses and to show that analysis can be done with minimal expense. The use of a common data. source for daily performance evaluation and management, system tuning, billing, and capacity planning has significantly reduced conflicts between different operating groups that heretofore reported their own data in their own fashion. By managing the system based on measured service objectives, user and supplier have botb agreed on what constitutes acceptable service, and the supplier can manage resource acquisition to ensure that acceptable service is delivered.
Perhaps the best demonstration of the value of computer metrics comes from the continued use of the PDB at Sun Company since 1976 as the single source of data for managing service, capacity planning, and cost recovery of resources while that data center grew from two 2.5 MIP processors to the present 8 CPUs, totaling over 37 MIPs, without any increase in the 5-person staff of the computer metrics group.
406
PRIME TIME HARDWARE CRU TOTALS 800000 750000 700000 650000
C 600000 R 550000
U 500000 r~-iIrI~~~~~~~~r1i 450000 T 400000 o 350000 ri'Qffi~:I/'"'I!I T 300000 A 250000 L 200000 . ---..di!IIiItiJ: S 150000 ~lIII~ ~ ~
100000 u "
50000 "'r-I'""'"'T""..,.-,........--r-T""'T-'-T"'"""T'''''''-'''--'-"T""...,.....,--.-....,.....,r-r..,.-,-.,-
LEGEND: TYPE
J J A SON D J F M A M J J A SON D J F M A M J U U U E C 0 E A E A P A U U U E C 0 E A E A P A U N L G P T V C N B R R Y N L G P T V C N B R R Y N 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 222 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
WEEKS
8H8H8 IDENTIFIABLE CRU 8H8H8 TOTAL CRU'S
Figure 42.16
8H8H8 OVERHEAD CRU'S
PLOT OF OVERHEAD AND IDENTIFED HARDWARE CRU ~ l~~~~IIIIEIII~IBllmlml ~"~I~lIm'B'E'm'~"'~Ii~"m'm"m' ~lIm'ma~m!l~1II1
R 85~ C 80
~;~~B8MII~ 65 60
o 551 r 50
45 T 40 o 35 T30.~Ii~ A 25 L 20...,.....1'""'"'T""..,.-,........--r-T""'T-,-T"'"""T'--.-~-"T""...,.....,--r-....,.....,r-T'"..,.-rT
LEGEND: TYPE
J J A SON D J F M A M J J A SON D J F M A M J U U U E C 0 E A E A P A U U U E C 0 E A E A P A U N L G P T V C N B R R Y N L G P T V C N B R R Y N 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 222 2 222 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
WEEKS
8H8H8 IDENTIFIABLE CRU EH8Hs TOTAL CRU'S
Figure 42.17
407
8H8H3 OVERHEAD CRU'S
~'1 ~ 1;
" ~
r f:
j.: t i ~ f ~~ ~ v
f :' .I-, ~ ;; ~ r ,
" ,~ .. 'i. .~ ?~ ;, ~,
r i:
p E R C E N T
C R U
OVERALL CATEGORY BREAKOUTS
75
50
0000880800009898000000980000098000000000000098000000
31DEC82 19FEB83 10APR83 30MAY83 19JUL83 07SEP83 270CT83 16DEC83
WEEVS
LEGEND: CATEGORY e-e-e TOTAL e-e-e TSO ~CICS ~IMS
Figure 42.21
e-e-e BATCH IWT ~ ADABAS NUC ~ ERRORS ~ ONLINE
OVERALL REGIONAL BREAKOUTS SHIFT=PRIt1E
6000001
~~ 6 400000 ~ {\ T A L
C 200000 ~
~@~ 12/31/82 03/16/83 05/30/83 08/13/83· 10/27/83 01/10/84
LEGEND: REGION
vJEEK
e-e-e TOTAL W e-e-e CNF +-++ ERRORS ~NS ~SIS
Figure 42.22
408
e-e-e CFD ~E&P ~ ISDOVR *""*""* R & M
~:: ~ t{, }.
~ ,-~, ~ ~
~r
f;
f. f' " f t " f· i ~ ~~ 1 };,
~, )
; \'
~', }; ~,
~'. t; j: t.;
~ ,. ~ ~ }l n.: ~' I' j-,
fi f' ~\
~~
~ ,",I
! I
, \\
! ~ It ~,
CATEGORY BREAKOUT BY REGION AND SHFT 10°1 D888888888800888888888888888800888800888888880088880
p E R C E N T
C I C S
C R U
75
50
25
0t~, 31DEC82 19FEB83 10APR83 30MAY83 19JUL83 07SEP83 270CT83 16DEC83
WEEKS
LEGEND: REGION s-s-e TOTA ~ ERRORS
e-e-e CFD ~NS
e-e-e E & P ~SIS
figure 42.24
REFERENCES
Because research in computer metrics has been primarily ad hoc. pragmatic. and specific to an installation's needs. references are primarily in the anual' proceedings of the Computer Measurement Group. or,in the proceedings of User's Groups of specific vendors' hardware (such as IBM's. SHARE and GUIDE. DEC's DECUS and CDC's VIM).
1. Meri1l. H.W. "Barry" (1980). Meri1l's Guide to Computer Performance Evaluation. Cary. ~C: SAS Institute Inc •• 336 pp.
2. Dodson. George W •• et al. (1983). Proceedings of the 1983 Computer ~easurement Group International Conference. Phoenix. AZ: The Computer Measurement Group. P.O. Box 26063. 481 pp.
3. Heidel. Ruth (April 1980 - July 1982). Computer Management and Evaluation: Selected Papers from the SHARE Project. Chicago. IL: SHARE Inc •• Volume VI,662'pp.
4. SAS Institute Inc. (1982), SAS User's Guide: Basics, 1982 Edition, Cary, NC: SAS Institute Inc., 923 pp.
409
ABSTRACT
II. REDUCING CPU CONSUMPTION WITH PROPER I/O BLOCKSIZE AND BUFFERING
Dr.H.W. Barry Merrill
Merrill Consultants
This paper postulates that CPU cost. real memory cost. and DASD storage cost are jointly optimized when the blocksize and buffer number are chosen in such a way as to minimize BUFNO while moving one track of data at a time. Furthermore. through exits in DF/DS at open. it appears possible to override poor blocksize choices to reduce the CPU real memory and elapsed time without reblocking the data. Finally. since the true optimum requires maximization of blocksize (and. hence. a potential recompile). another exit is discussed that can allow identification of programs that will need recompiling. These exits allow an installation to migrate user data safely., increase its blocksize. and specify the optimal buffer number. with total user and application transparency.
INTRODUCTION
The real resources that a task consumes when performing sequential input or output operations are processor execution time (CPU seconds). real memory pages (average working set size). real memory occupancy ti.me (page-seconds). the number of blocks of data transferred (EXCPs). the number of physical operations necessary to transfer the data (SIOs). and the resultant elapsed run time. These real resources can be reduced when the physical characteristics of the data transfer operations are matched to the hardware and operating system design.
The present sequential I/O design has not changed in principle ~ince OS/360. The user's program requests records that the operating system combines into blocks. A block. when stored on a media. exists as a contiguous physical entity. and the length of a block is its BLKSIZE value. Blocksizes can be as small as 16 bytes or as large as 32.768 (actually. 32.760 is a software limit in MVS). if the media can support that much data.
When' I/O is transferred between media and memory. the smallest unit of transfer is a block. However. sequential access methods provide for movement of more than one block in a single operation by allowing you to specify (in JCL or by system default) some number of buffers using the BUFNO parameter.
A;I.though the user requests records that are decoded from a block by the access method. BUFNO of these blocks is transferred by one physical operation called a start I/O. or SIO. Thus. the blocksize is an attribute of the file. and the buffer number is set when the file is opened.
For many years. performance analysts have shown appreciable savings in real resources by increasing the blocksize and buffer number. but few installations have taken aggressive action to correct poor choices.
There are several reasons for their inaction:
- The real cost. in management-understandable terms. had not been shown.
410
- A systems programmer claimed that more real memory would be needed, and no immediate contradictory evidence was demonstrated.
- The installation was willing to change the blocksize. but it could not guarantee that the change would be transparent. The installation's programs can have internally-specified blocksizes. which would require identification and recompilation. The perceived personnel costs of making the changes outweigh ted the perceived cost savings. The application personnel have insufficient knowledge of JCL to be sure of themselves and will not change something that is working now.
This paper shows the real cost of poor choices of the blocksize and buffer attributes. proves that the system programmer's claim is false. and discusses the use of open exits that permit resolution of the final three objections.
THE EXPERIMENT
Twelve pairs of sequential files containing the same 69.000 80-byte (nonrepeating character) logical records were built with blocksizes of 800. 1680, 2480. 3360, 4800. 5440, 6320, 7440, 9040. 11440. 15440 and 23440 bytes on 3380 disks.
A QSAM assembly-program (written by Carol Toll and shown in Appendix I), which did nothing but OPEN, GET. and PUT between each pair of files, was executed repeatedly • iterating the number of buffers from 1 to as many as were necessary for BLKSIZE times BUFNO to exceed the track size. Note that BUFNO is limited to a maximum of 30 by MVS and. thus. only if blocksize is greater than 1076 can full-track I/O be performed on 3380s.
The SMF step termination records were collected and analyzed with the SAS System to determine the impact on real resource costs. Linear regression was used to determine the CPU cost (using total'CPU TCB plus SRB) of each block and each SID. The regression results are provided in Table 42.12. All runs were executed on a 3033-MP under MVS/370 SP1.3.1 in the fall of 1983.
Table 42.26 Regression Results 3033 MP
PROC SYSREG DATA = STEPS; MODEL CPUTM ~ SID EXCP;
SSE 3.767
DFE 273
MSE .0138
Parameter Variable estimate
Intercept 1.1944
SID 0.001389
EXCP .0001516
F ratio
Prob F
R-square
Standard error
0.133
.000020
.0000046
3823.4
.0001
.9655
T ratio
89.88
68.11
32.73
The equatiori of the total CPU time (TCB + SRB) in seconds as a function of SIOs and EXCPs for QSAM I/O on an MP3033 with MVS/370 is:
* * CPU seconds = 1.1944 + 0.001389 SIOs + .0001516 EXCPs
411
INTERPRETATION OF REGRESSION RESULTS
The INTERCEPT is the asymptotic CPU time as the data, length transferred per S10 (BLKSIZE times BUFNO) grows infinitely large. The total data in the file were (69000 times 80) 5.52 million bytes. Thus. the CPU cost to process 1 byte of data (excluding
,the CPU cost of the actual I/O operations. which is described by the SIO and EXCP coefficients) is:
CPU seconds * 6 per byte = 1.1944/(5.52 10 ) = 216 (nanosecond/byte)
Note: if the processor is rated at 5 MIPS. 1 instruction requires 200 nanoseconds. Thus. it appears that 1 machine instruction on the average is required to process 1 byte.
The SIO coefficient of 0.001389 seconds. or 1.389 milliseconds. is the CPU cost of each physical I/O operation and is independent of how many blocks were transferred. This is the cost of physical transfer.
The I/O UNITS coefficient of 0.0001516. or 151.6 microseconds. is the CPU cost of each block transferred within a QSAM SIO. This is the cost of managing each buffer's data.
USING THE EQUATION
With this equation for CPU cost to perform QSAM I/O. it is easy to calculate the impact of changing the blocksize. For example. a 1000-byte blocksize using the default QSAM BUFNO of 5 can be compared to half-track blocking of 23000 and BUFNO of 2.
BLKSIZE = 1000 BUFNO = 5 EXCPS = 10000 (assumed) Now SIO EXCPS/BUFNO 10000/5 = 2000
and CPU = 0.001389 * 2000 + .0001516 * 10000 = 4.294 Now if BLKSIZE = 23000 and BUFNO = 2
218
seconds.
then EXCPS = 10000/23 = 435 and SIO = 435/2 and CPU = 0.001389 * 218 + 0.0001516 * 435 0.3687 seconds.
PERCENT CPU WASTED = (4.294 - 0.3687)/4.294 = 91.4%
The CPU cost equation was then applied to all QSAM activity for a day from two 3081Ks and a 3033 MP. Then two possible blocksize choices were compared: IBM's 6K recommendation or the optimum 23K half-track I/O (Table 42.27).
Resource
EXCP (x 1000)
S10 (x 1000)
CPU Seconds
(6K saved
Table 42.27 Daily Cost Savings with Increased Blocksize
DISK
Present 6144 count blocksize
8190 2885
1711 507
3618 1141
2447) half track saved
412
Half-track blocksize
1004
502
785
2833 seconds
\
\ ..
TAPE Resource Present count 32760 blocksize
EXCP (x 1000) 10860· 3254
SIO (x 1000) 2181 3254*
CPU seconds 4675 2724
saved 1951 seconds daily CPU saved (-QSAM) = 4784 seconds
* Increased because buffer was limited to one. whereas there is currently no limit on BUFNO.
Cost extension beyond QSAM If the true cost of an SIO from the QSAM analysis can be applied to all non-VSAM SIOs. the total processors seconds spent in I/O can be estimated. The daily SIO counts by access method were estimated from t.he measured EXCP count. and the QSAM S10 cost was applied to the estimate (Table 42.28).
Table 42.28 Estimated CPU Costs Attributed to I/O Operations
Access EXCP Estimated Estimated method count SIO count CPU consumed
QSAM 19.050.000 3.892.000 8.293
BSAM 6.564.000 6,564,000 9.117
BISAM 2.050.000 2,050,000 2.847
EXCP 6,186,000 6,186,000 8.592
QISAM 1,635,000 1,635,000 2,271
BPAM 1,735,000 1,735,000 2,409
BDAM 3,684,000 3,684,000 5,117
SPOOL 25,484.000 509,000 707
I/O Total 25,746,000 39,455
Total daily 3330 seconds recorded 340,000
Total daily MVS seconds recorded 92,349
Total daily TCB seconds 215,793
Total daily SRB seconds 31,858
Total daily TCB·+ SRB seconds 247,651
Non-VSAM I/O cost is 39,455/247,651 = 16%
413
OPTIMUM PERFORMANCE
The QSAM DASD experiments were graphically analyzed. Knowing the cost per I/O did not motivate users or management, nor did it answer the system programmer's concern for memory. Each of the nine resources was plotted against the blocksize for all runs. The number of buffers used for tla,at run is printed at the observed intersection, allowing three-dimensional analysis'on" two dimensions. These nine plots show the range of resource from maximum to minimum, as well as the shape of the trend.
Figure Title
42.29 Elapsed Run Time
42.30 Step CPU TCB Time
42.31 Step CPU SRB Time
44.32 Step Total CPU Time
42.33 Physical I/O Operations
42.34 Blocks of Data Transferred
42.35 Real Memory Page Occupancy
42.36 Average Real Memory Working Set Size (K)
42.37" Private Area Virtual Size (K)
ELAPSTM
CPUTCBTM
CPUSRBTM
CPUTM
SIO
EXCPS
PAGESECS
AVGWKSET
MAXADRSP
Ratio of maximum to
minimum value
32:1
4:1
23:1
5: 1
70:1
14:1
10:1
4:1
1.5: 1
Figures 42.29 through 42.37 clearly show monotonic, dramatic reductions of resources as blocks:l.ze is increased.
The second set of figures shows three of the preceding resources versus data length per SIO (BUFNO times BLKSIZE). Here the effect of full-track I/O per SIO is clear. .
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
ELAPSTM 400
200
I :
ui hili """ r" 1 r'! 'Tll' 11'TTf"'TT'11TTT''''''~
L.'OOO 15000 20C~O
Figure 42.29 Elapsed Run Time
414
, ~.
aSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
CPUTCBTtJ, D.O
2.0
0.0
-j ,
; j
~ 1
~ ~
. , i ,3 j' 11 I I
II Ii
~~T"""""'" !"II'''''I''" o 5000 10000 15000 20000
BLKSIZE
Figure 42.30 Step CPU TCB Time
aSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
o 5000 10000 15000 20000
BLKSIZE
Figure 42.31 Step CPU SRB Time
415
.,
.::; .,
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
CPUTM 7.5--]
J
5.0-i
j j I
1 1
2.5-;
i j
a . O~TTT"'1l"TTTO rt-T"rl"1 Iii iii I I Iii iii iii i , I , I , i i • I
o 5000 10000 15000 20000 25000
BLKSIZE
Figure 42.32 Step Total CPU Time
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
SID 4000
2000
o
2 1
J 2
5000 10000 15000 20000 25000
BLKSIZE
Figure 42.33 Physical I/O Operations
416
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
EXCPS 7500 j
5000
2500
o
·0
•
• •
""
""" " " , "
5000 10000 15000 20000 25000
BLKSIZE
Figure 42.34 Blocks of Data Transferred
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
PAGESECS 600
400
200
o
o
11 " .. i •
• , I i 2 5
5000 10000 15000 20000
BLKSIZE
Figure 42.35 Real Memory Page Occupancy
417
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
AVGWKSET 150
100
50
o
o 5000 10000 15000 20000
BLKSIZE
Figure 42.36 Average Real Memory Working Set Size (K)
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
MAXADRSP 700
600
500
400
o ~ooo 10000 15000 20000
BLKSIZE
Figure 42.37 Private Area Virtual Size (K)
418
Figure
42.38
42.39
42.40
Title
Total CPU Time
Real Memory Occupancy
Average Working Set
CPUTM
PAGESECS
AVGWKSET
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
CPUTM 7.5
5.0
2.5
o 30000 60000 90000 ~2000:::
DATA LENGTH
Figure 42.38 Total CPU Time
419
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
400
200 • 4 ~ § 5 4
o
o ,30000 60000 90000 120000
DATA LENGTH~
Figure 42.39 Real Memory Occupancy
QSAM BLOCKSIZE AND BUFFER NUMBER IMPACT
DATA LENGTH
Figure 42.40 Average Working Set
420
IDENTIFYING PROGRAMS THAT SPECIFY DCB ATTRIBUTES
1m lementation- of optimal I/O always requires JCL changes. If the installation is wi:e and demands that all DCB attrihutes are specified externally to the program, then only JCL changes are required. However, since few installations have such stringent enforcement of standards that can be assumed to be in effect, implementation of optimal blocksize can be accelerated by using the DFDS~0l'EN EXIT described in the excellent Washington Systems Center Technical Bulletin GG2~~9306-0~, "Using Data Facility Device Support for DASD Space Management Assistance, by P •• Henning.
The open exit presents two DCB areas: one is the unmodified user's DCB, and the other is open's DCB area with the JFCB merged into it. You could write your own code that examines the user's DCB area to execute in this exit. If the DCB attributed BLKSIZE is specified in the user's DCB, then an SMF record can be written from the exit, identifying those programs that require recompile before their. files can be reblocked. Thus, by use of the DFDS OPEN EXIT, the installation can reb lock data sets and guarantee that they will create no ABENDS. A write to programmer (WTO ROUTCDE=ll) can also be issued from DFDS OPEN EXIT IFGOEXOB so that (if programmers read their SYSMSG) you can advise them that they are violating standards.
ALTERING BUFFER NUMBER IN DFDS OPEN EXIT
Although not yet tested at Sun Company, serious investigation of using the DFDS OPEN EXIT to alter the BUFNO parameter is being studied. Even though the preceding analysis clearly shows that increased blocksize achieves true optimum, there is sufficient return in processor utilization alone to justify the programming and testing time to code this exit. In the exit, the number of buffers can be expanded to the correct number for full-track data transfer, without alteration of either the user's JCL or the user's programs. It is hoped that this work will be presented at a future meeting of SHARE. These additional references are useful for technical examples and discussion of the exit:
APPENDIX I
Data FaCility/Device Support Users' Guide SC26-3952-0 Technical Newsletter to SCZ6-3952-0 SN26-0888 Search INFO/SYSTEM File A Keywords DFDS EXIT.
Assembler program written by Carol Toll to perform QSAM I/O
WITELOOP ENTER REGEQUS=YES OPEN (DDl"DD2,(OUTPUT))
LOOP GET DDl,RECORD PUT DD2,RECORD B LOOP
EOF CLOSE (DDl .. DD2) LEAVE
DDI DCB DDNAME=IN,MACRF=GM,DSORG=PS,EODAD=EOF DDZ DCB DDNAME=OUT,MACRF=PM,DSORG=PS RECORD DS XL32760
END
421
APPENDIX II
Evolvement of Workload Efficiencies Project at Sun Company
Gary Miley
An opportunity was perceived within Sun Company to improve the efficiency of resource utilization at the business data center. Management support initiated a project team to analyze and study the workload profile and to recommend and implement, where possible. actions to reduce resource consumption of data-processing services.
Initial project activities focused on the preceding analysis of access method I/O usage and performance. Analysis of various blocksizes and numbers of buffers for sequential processing supported the concept of full-track blocking on 3350 storage and half-track blocking on 3380 storage.
Results of the comparison showed a dramatic need to develop installation recommendations for both tape and disk data sets. The resulting blocksize recommendation for DASD sequential data sets was a compromise between the optimum and the actual data center data management environment of mixed device types (3350. 3380). The compromise for DASD was 9080. with no compromise on tapes at 32.760 blocksize. Recognize that even the 9080 DASD blocksize yields a near-optimum data transfer to 3380. with the QSAM BUFNO default of 5 buffers.
Further analysis of the SMF data revealed some surprises that would dictate a data set approach for Sun's workload: tape data sets should be the primary target for improving blocksize performance. The second category of data sets to review was temporary disk files - an analysis of proclibs would be in order here; and finally, permanent disk data sets should be moved with larger blocksize.
The efficiencies project then confronted these issues: how to communicate recommendations to the user community and how to identify the best candidates for reblocking.
The issue of user communication was addressed by the following strategy. An on-line information base was created and referenced by an article in the corporate information systems periodic newsletter announcing the existence of the efficiencies project. After a pilot effort internal to the information systems function, personal visits and presentations would follow in the user organizations.
The issue of identifying reb locking candidates was addressed by the use of UCC's TMS product and Software Module Marketing's DMS/OS product to identify the number of accesses (opens) since data set creation. This strategy allows the information systems function to approach the user community with intelligent information that quantifies the benefit to the user in the form of reduced resource cost. The project acknowledged that not all owners of sequential data sets would increase blocksize to the project recommendation; to improve the system performance for those data sets. a DFDS open exit module will be implemented to increase the number of buffers for sequential access. This exit would calculate the number of buffers required to transfer data at full track data transfer, subject to the SAM-E restriction of 30 buffers per SIO.
To summarize Sun's approach: quantify and communicate to user organizations the benefits to both the users' productivity and resource cost reduction by improving the performance of accessing sequential data, and implement the DFDS OPEN exit to gain system improvements even when users fail to reblock their data sets. In either case, system performance gains will be realized.
422