©copyright 1998-2010, computer management sciences, inc., alexandria, va 1 an expert system...

60
1 ©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA ww An Expert System designed to evaluate IBM z/OS systems

Upload: erik-harmon

Post on 26-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

1©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

An Expert System

designed to evaluate

IBM z/OS systems

2©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Product Overview

• Helps analyze performance of z/OS systems.

• Written in SAS (only SAS/BASE is required).

• Runs as a batch job on mainframe (or on PC).

• Processes data in a standard performance data base (either MXG, SAS/ITRM, or MICS).

• Produces narrative reports showing results from analysis!

• Product is updated every six months

• 45-day trial is available (see license agreement for details).

3©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Components Delivered• SRM Component * March 1991

• TSO Component * April 1991

• MVS Component * June 1991

* These legacy components apply only to Compatibility Mode

• DASD Component October 1991

• CICS Component May 1992

• WLM Component April 1995

• DB2 Component October 1999

• WMQ Component June 2004

4©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Product Documentation

Each component has an extensive User Manualavailable in hard-copy or CD, and web-enabled

• Describes the likely impact of each finding

• Discusses the performance issues associated with each finding

• Suggests ways to improve performance and describes alternative solutions

• Provides specific references to IBM or other documents relating to the findings

• More than 4,000 pages for all components

5©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WLM Component•Checks for problems in service definition

•Identifies reasons performance goals were missed

•Analyzes general system problems:

• Coupling facility/XCF

• Paging subsystem

• System logger

• WLM-managed initiators

• Excessive CPU use by SYSTEM or SYSSTC

• IFA/zAAP, zIIP, and IOP/SAP processors

• PR/SM, LPAR, and HiperDispatch problems

• Intelligent Resource Director (IRD) problems

6©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WLM Component - sample reportRULE WLM103: SERVICE CLASS DID NOT ACHIEVE VELOCITY GOAL DB2HIGH (Period 1): Service class did not achieve its velocity goal during the measurement intervals shown below. The velocity goal was 50% execution velocity, with an importance level of 2. The '% USING' and '%TOTAL DELAY' percentages are computed as a function of the average address space ACTIVE time. The 'PRIMARY,SECONDARY CAUSES OF DELAY' are computed as a function of the execution delay samples on the local system.

‑‑‑‑‑‑LOCAL SYSTEM‑‑‑‑‑‑‑‑ % % TOTAL EXEC PERF PLEX PRIMARY,SECONDARY MEASUREMENT INTERVAL USING DELAY VELOC INDX PI CAUSES OF DELAY 21:15‑21:30,08SEP1998 16.6 83.4 17% 3.02 2.36 DASD DELAY(99%)

RULE WLM361: NON‑PAGING DASD I/O ACTIVITY CAUSED SIGNIFICANT DELAYS DB2HIGH (Period 1): A significant part of the delay to the service class can be attributed to non‑paging DASD I/O delay. The below data shows intervals when non‑paging DASD delay caused DB2HIGH to miss its performance goal:

AVG DASD AVG DASD ‑‑AVERAGE DASD I/O TIMES‑ MEASUREMENT INTERVAL I/O RATE USING/SEC RESP WAIT DISC CONN 21:15‑21:30,08SEP1998 31 1.405 0.010 0.003 0.004 0.002

7©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WLM Component - sample reportRULE WLM601: TRANSPORT CLASS MAY NEED TO BE SPLIT You should consider whether the DEFAULT transport class should be split. A large percentage of the messages were too small, while a significant percentage of messages were too large. Storage is wasted when buffers are used by messages that are too small, while unnecessary overhead is incurred when XCF must expand the buffers to fit a message. The CLASSLEN parameter establishes the size of each message buffer, and the CLASSLEN parameter was specified as 16,316 for this transport class. This finding applies to the following RMF measurement intervals:

SENT SMALL MESSAGES MESSAGES TOTAL MEASUREMENT INTERVAL TO MESSAGES THAT FIT TOO BIG MESSAGES 10:00‑10:30,26MAR1996 JA0 4,296 0 57 4,353 12:00‑12:30,26MAR1996 Z0 2,653 6 762 3,421 12:30‑13:00,26MAR1996 Z0 2,017 0 109 2,126

RULE WLM316: PEAK BLOCKED WORK WAS MORE THAN GUIDANCE The SMF statistics showed that blocked workload waited longer than specified by the BLWLINTHD parameter in IEAOPTxx. A maximum of more than 2 address spaces and enclaves were concurrently blocked during the interval. BLWLINTHD BLWLTRPCT --BLOCKED WORKLOAD-- MEASUREMENT INTERVAL IN IEAOPT IN IEAOPT AVERAGE PEAK 7:14- 7:29,01OCT2010 20 5 0.002 63 7:29- 7:44,01OCT2010 20 5 0.000 22 7:44- 7:59,01OCT2010 20 5 0.001 49 7:59- 8:14,01OCT2010 20 5 0.001 63 8:14- 8:29,01OCT2010 20 5 0.002 62

8©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WLM Component - sample reportRULE WLM893: LOGICAL PROCESSORS IN LPAR HAD SKEWED ACCESS TO CAPACITY LPAR SYSC: HiperDispatch was specified for one or more LPARs in this CPC, and at least one LPAR used one or more high polarity central processors. LPAR SYSC was not operating in HiperDispatch Management Mode, and experienced a skew of its access to physical processors because the high polarity processors and medium polarity processors used by LPARs running in HiperDispatch Management Mode. The information below shows the number of logical processors that were assigned to LPAR SYSC and each logical processor share of physical a processor. The CPU activity skew is shown during each RMF interval, showing the minimum, average, and maximum CPU busy for the logical processors assigned to LPAR SYSC.

LOGICAL CPUS % PHYSICAL CPU ACTIVITY SKEW MEASUREMENT INTERVAL ASSIGNED CPU SHARE MIN AVG MAX 13:59-14:14,15SEP2009 2 45.5 28.2 43.3 58.4

RULE WLM537: ZAAP-ELIGIBLE WORK HAD HIGH GOAL IMPORTANCE Rule WLM530 or Rule WLM535 was produced for this system, indicating that a relatively large amount of zAAP-eligible work was processed on a central processor. One possible cause of this situation is that the zAAP-eligible work was assigned a relatively high Goal Importance (the Goal Importance was either Importance 1 or Importance 2). Please see the discussion in the WLM Component User Manual for an explanation of this issue.

9©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component

•Analyzes standard DB2 interval statistics

•Applies analysis from DB2 Administration Guide

and DB2 Performance Guide (with DB2 9.1)

•Analyzes DB2 Versions 3, 4, 5, 6, 7, 8, and 9

•Evaluates overall DB2 constraints, buffer pools,

EDM pool, RID list processing, Lock Manager,

Log Manager, DDF, and data sharing

•All analysis can be tailored to your site!

10©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component

• There might be insufficient buffers for work files

• There were insufficient buffers for work files in merge passes

• Buffer pool was full

• Hiperpool read requests failed (pages stolen by system)

• Hiperpool write requests failed (expanded storage not available

• Buffer pool page fault rate was high

• Data Management Threshold (DMTH) was reached

• DWQT and VDWQT might be too large

• DWQT, VDWQT, or VPSEQT might be too small

Typical DB2 local buffer constraints

11©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 ComponentTypical DB2 I/O prefetch constraints

• Sequential prefetch was disabled, buffer shortage

• Sequential prefetch was disabled, unavailable read engine

• Sequential prefetch not scheduled, prefetch quantity = 0

• Synchronous read I/O and sequential prefetch was high

• Dynamic sequential prefetch was high (before DB2 8.1)

• Synchronous read I/O was high

12©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component

• Parallel groups fell back to sequential mode

• Parallel groups reduced due to buffer shortage

• Prefetch quantity reduced to one‑half of normal

• Prefetch quantity reduced to one‑quarter of normal

• Prefetch I/O streams were denied, shortage of buffers

• Page requested for a parallel query was unavailable

Typical DB2 parallel processing constraints

13©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 ComponentTypical DB2 EDM pool constraints

• Failures were caused by full EDM pool

• Low percent of DBDs found in EDM pool

• Low percent of CT Sections found in EDM pool

• Low percent of PT Sections found in EDM pool

• Size of EDM pool could be reduced

• Excessive Class 24 (EDM LRU) latch contention

14©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 ComponentTypical DB2 Lock Manager constraints

• Work was suspended because of lock conflict

• Locks were escalated to shared mode

• Locks were escalated to exclusive mode

• Lock escalation was not effective

• Work was suspended for longer than time-out value

• Deadlocks were detected

15©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 ComponentTypical DB2 Log Manager constraints

• Archive log read allocations exceeded guidance

• Archive log write allocations exceeded guidance

• Waits were caused by unavailable output log buffer

• Log reads satisfied from active log data set

• Log reads were satisfied from archive log data set

• Failed look-ahead tape mounts

16©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 ComponentTypical DB2 Data Sharing constraints

• Group buffer pool is too small

• Incorrect directory entry/data entry ratio

• Directory reclaims resulting in cross-invalidations

• Castout processing occurring in “spurts”

• Excessive lock contention or false lock contention

• GBPCACHE ALL inappropriately specified

• GBPCACHE CHANGED inappropriately specified

• Conflicts between applications

17©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component - sample report

RULE DB2-208: VIRTUAL BUFFER POOL WAS FULL Buffer Pool 2: A usable buffer could not be located in virtual Buffer Pool 2, because the virtual buffer pool was full. This condition should not normally occur, as there should be ample buffers. You should consider using the -ALTER BUFERPOOL command to increase the virtual buffer pool size (VPSIZE) for the virtual buffer pool. This situation occurred during the intervals shown below:

BUFFERS NUMBER OF TIMES MEASUREMENT INTERVAL ALLOCATED POOL WAS FULL 10:54-11:24, 15SEP1999 100 12 11:24-11:54, 15SEP1999 100 13

RULE DB2-216: BUFFER POOLS MIGHT BE TOO LARGE Buffer Pool 1: The page fault rates for read and write I/O indicated that the buffer pools might be too large for the available processor storage. This situation occurred for Buffer Pool 1 during the intervals shown below:

BUFFERS PAGE-IN FOR PAGE-IN FOR PAGE MEASUREMENT INTERVAL ALLOCATED READ I/O WRITE I/O RATE 11:15-11:45, 16SEP1999 25,000 36,904 195 41.2 11:45-12:15, 16SEP1999 25,000 30,892 563 35.0 12:45-13:15, 16SEP1999 25,000 23,890 170 26.7

18©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component - sample report

RULE DB2-230: SEQUENTIAL PREFETCH WAS DISABLED - BUFFER SHORTAGE Buffer Pool BP1: Sequential prefetch is disabled when there is a buffer shortage, as controlled by the Sequential Prefetch Threshold (SPTH). Ideally, sequential prefetch should not be disabled, since performance is adversely affected. If sequential prefetch is disabled a large number of times, the buffer pool size might be too small. The sequential prefetch threshold was reached for Buffer Pool BP1 during the intervals shown below.

BUFFERS TIMES SEQUENTIAL PREFETCH MEASUREMENT INTERVAL ALLOCATED DISABLED (BUFFER SHORTAGE) 5:00- 5:15, 15MAY2009 268,000 125 BP1 5:15- 5:30, 15MAY2009 268,000 1,533 BP1

RULE DB2-234: WRITE ENGINES WERE NOT AVAILABLE FOR ASYNCHRONOUS I/O Buffer Pool BP13: DB2 has 600 deferred write engines available for asynchronous I/O operations. When all 600 write engines are used, synchronous writes are performed. The application is suspended during synchronous writes, and performance is adversely affected. This situation occurred for Buffer Pool BP13 during the intervals shown below:

BUFFERS TIMES WRITE ENGINES MEASUREMENT INTERVAL ALLOCATED WERE NOT AVAILABLE 5:45- 6:00, 15MAY2009 12,800 44 BP13

19©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component - sample report

RULE DB2-423: DATABASE ACCESS THREAD WAS QUEUED, ZPARM LIMIT WAS REACHED Database access threads were queued because the ZPARM maximum for active remote threads was reached. You should consider increasing the maximum number of database access threads allowed. This situation occurred during the intervals shown below:

DATABASE ACCESS THREADS QUEUED MEASUREMENT INTERVAL ZPARM LIMIT REACHED 11:24-11:54, 01OCT2010 9

RULE DB2-512: LOG READS WERE SATISFIED FROM ACTIVE LOG DATA SET The DB2 Log Manager statistics revealed that more than 25% of the log reads were satisfied from the active log data set. It is preferable that the data be in the output buffer, but this is not always possible with an active DB2 environment. However, if a large percent of reads are satisfied from the active log, you should ensure that the output buffer is as large as possible. This finding occurred during the intervals shown below:

TOTAL LOG LOG READS FROM MEASUREMENT INTERVAL READS ACTIVE LOG DATA SET PERCENT 14:24-14:54, 01OCT2010 6,554 4,678 71.4 14:54-15:24, 01OCT2010 7,274 3,695 50.8

20©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DB2 Component - sample report

RULE DB2-601: COUPLING FACILITY READ REQUESTS COULD NOT COMPLETE Group Buffer Pool 6: Coupling facility read requests could not be completed because of a lack of coupling facility storage resources. This situation occurred for Group Buffer Pool 6 during the intervals shown below:

GROUP BUFFER POOL TIMES CF READ MEASUREMENT INTERVAL ALLOCATED SIZE REQUESTS NOT COMPLETE 11:01-11:31, 14OCT1999 38M 130

RULE DB2-610: GBPCACHE(N0) OR GBPCACHE NONE MIGHT BE APPROPRIATE Group Buffer Pool 4: This buffer pool had a very small amount of read activity relative to write activity. Pages read were less than 1% of the pages written. Since so few pages were read from this group buffer pool, you should consider specifying GPBCACHE(NO) for the group buffer pool or specifying GBPCACHE NONE for the page sets using the group buffer pool. This situation occurred for Group Buffer Pool 4 during the intervals shown below:

GROUP BUFFER POOL PAGES PAGES READ MEASUREMENT INTERVAL ALLOCATED SIZE READ WRITTEN PERCENT 10:34-11:04, 14OCT1999 38M 14 18,268 0.07%

21©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CICS Component

•Processes CICS Interval Statistics contained inMXG Performance Data Base (standard SMF 110)

•Analyzes all releases of CICS (CICS/ESA,CICS/TS for OS390, and CICS/TS for z/OS)

•Applies most analysis techniques contained inIBM’s CICS Performance Guides

•Produces specific suggestions for improving CICS performance

22©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CICS Component(Major areas analyzed)

•Virtual and real storage (MXG/AMXT/TCLASS)•VSAM and File Control (NSR and LSR pools)•Database management (DL/I, IMS, DB2)•Journaling (System and User journals)•Network and VTAM (RAPOOL, RAMAX)•CICS Facilities (temp storage, transient data)•ISC/IRC (MRO, LU61., LU6.2 modegroups)•System logger•Temporary Storage •Coupling Facility Data Tables (CFDT)•CICS-DB2 Interface•Open TCB pools•TCP/IP and SSL

23©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CICS Component - sample report

RULE CIC101: CICS REACHED MAXIMUM TASKS TOO OFTEN The CICS statistics revealed that the number of attached tasks was restricted by the MXT operand, but storage did not appear to be constrained. CPExpert suggests that you consider increasing the MXT value in the System Initialization Table (SIT) for this region. This finding applies to the following CICS statistics intervals:

TIMES PEAK TIME STATISTICS MXT -PEAK TASKS- MAXTASK MAXTASK WAITING COLLECTION TIME APPLID VALUE TOTAL USER REACHED QUEUE MAXTASK 0:00,01OCT2010 CICSIDG. 20 46 20 36 8 0:02:29.0

RULE CIC140: THE NUMBER OF TRANSACTION ERRORS IS HIGH The CICS statistics revealed that more than 5 transaction errors were related to terminals. These transactions errors may indicate that there is an attempted security breach, there may be problems with the terminal, or perhaps additional operator training is indicated. This finding applies to the following CICS statistics intervals:

STATISTICS COLLECTION TIME APPLID TERMINAL NUMBER OF ERRORS 0:00,01OCT2010 CICSPROD T2M1 348 0:00,01OCT2010 CICSPROD T2M2 60 0:00,01OCT2010 CICSPROD T2M6 348

24©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CICS Component - sample reportRULE CIC170: MORE THAN ONE STRING SPECIFIED FOR WRITE-ONLY ESDS FILE More than one string was specified for a VSAM ESDS file that was used exclusively for write operations. Specifying more than one string can significantly affect performance because of exclusive control conflict that can occur. If this finding occurs for all normal CICS processing you should consider specifying only one string in the ESDS file definition.

STATISTICS NUMBER OF COLLECTION TIME APPLID VSAM FILE WRITE OPERATIONS 0:00,16MAR2010 CICSYA LNTEMSTR 431,436

RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests. This finding applies to the following CICS statistics intervals:

STATISTICS ALLOCATE REQUESTS COLLECTION TIME APPLID RETURNED TO USERS 10:00,26MAR2008 CICSDTL1 335

25©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CICS Component - sample report

RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. The number of ALLOCATE requests returned was greater than the value specified for the ALLOCQ guidance variable in USOURCE(CICGUIDE). CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests. This finding applies to the following CICS statistics intervals:

STATISTICS ALLOCATE REQUESTS COLLECTION TIME APPLID RETURNED TO USERS 10:00,26MAR2008 CICSDTL1 335 11:00,26MAR2008 CICSDTL1 12 12:00,26MAR2008 CICSDTL1 27

26©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CICS Component - sample report

RULE CIC307: FREQUENT LOG STREAM DASD-SHIFTS OCCURRED CICS75.A075CICS.DFHLOG: More than 1 log stream DASD-shift was initiated for this log stream during the intervals shown below. A DASD-shift event occurs when system logger determines that a log stream must stop writing to one log data set and start writing to a different data set. You normally should allocate sufficiently large log data sets so that a DASD-shift occurs infrequently.

------NUMBER OF DASD LOG SHIFTS------ SMF INTERVAL DURING INTERVAL DURING PAST HOUR 14:45,16MAR2010 1 2

RULE CIC650: CICS EVENT PROCESSING WAS DISABLED IN CICS EVENTBINDING Event Processing was disabled in EVENTBINDING, with the result that events defined in the EVENTBINDING were not captured by CICS Event Processing. You should investigate the Event Binding to determine whether the Binding should be enabled or disabled for the region. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 0:00,12MAR2009 3:00,12MAR2009 6:00,12MAR2009

27©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DASD Component•Processes SMF Type 70(series) to automaticallybuild model of your I/O configuration.

•Identifies performance problems with deviceswhich have most potential for improvement.

• PEND delays• Disconnect delays• Connect delays• IOSQ delays• Shared DASD conflicts

•Analyzes SMF Type 42(DS) and Type 64 toidentify VSAM performance problems.

28©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DASD Component - sample report

RULE DAS100: VOLUME WITH WORST OVERALL PERFORMANCE

VOLSER DB2327 (device 2A1F) had the worst overall performance during the entire measurement period (10:00, 16FEB2001 to 11:00, 16FEB2001). This volume had an overall average of 56.8 I/O operations per second, was busy processing I/O for an average of 361% of the time, and had I/O operations queued for an average of 1% of the time. Please note that percentages greater than 100% and Average Per Second Delays greater than 1 indicate that multiple I/O operations were concurrently delayed. This can happen, for example, if multiple I/O operations were queued or if multiple I/O operations were PENDing. The following summarizes significant performance characteristics of VOLSER DB2327:

I/O --- AVERAGE PER SECOND DELAYS--- MAJOR MEASUREMENT INTERVAL RATE RESP CONN DISC PEND IOSQ PROBLEM 10:00-10:30,16FEB2001 59.1 1.308 0.316 0.004 0.988 0.000 PEND TIME 10:30-11:00,16FEB2001 57.2 3.792 0.300 0.004 3.483 0.006 PEND TIME 11:00-11:30,16FEB2001 54.2 5.769 0.279 0.004 5.464 0.023 PEND TIME

29©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DASD Component - sample report

RULE DAS130: PEND TIME WAS MAJOR CAUSE OF I/O DELAY.

A major cause of the I/O delay with VOLSER DB2327 was PEND time. The average per-second PEND delay for I/O is shown below:

PEND PEND PEND PEND PEND TOTAL MEASUREMENT INTERVAL CHAN DIR PORT CONTROL DEVICE OTHER PEND 10:00-10:30,16FEB2001 0.492 0.000 0.000 0.000 0.495 0.988 10:30-11:00,16FEB2001 1.927 0.000 0.000 0.000 1.556 3.483 11:00-11:30,16FEB2001 2.840 0.000 0.000 0.000 2.624 5.464

RULE DAS160: DISCONNECT TIME WAS MAJOR CAUSE OF I/O DELAY.

A major cause of the I/O delay with VOLSER DB26380 was DISCONNECT time. DISC time for modern systems is a result of cache read miss operations, potentially back-end staging delay for cache write operations, peer-to-peer remote copy (PPRC) operations, and other miscellaneous reasons.

--PERCENT-- DASD CACHE ----CACHE---- READ WRITE TO TO MEASUREMENT INTERVAL READS WRITES HITS HITS CACHE DASD PPRC BPCR ICLR 8:30- 8:45,22OCT2001 14615 932 19.2 100.0 11825 903 0 0 0 8:45- 9:00,22OCT2001 14570 921 20.7 100.0 11567 907 0 0 0

30©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DASD Component - sample report

RULE DAS300: PERHAPS SHARED DASD CONFLICTS CAUSED PERFORMANCE PROBLEMS

Accessing conflicts caused by sharing VOLSER DB2700 between systems might have caused performance problems for the device during the measurement intervals shown below. Conflicting systems had the indicated I/O rate, average CONN time per second, average DISC time per second, average PEND time per second, and average RESERVE time to the device. Even moderate CONN, DISC, or RESERVE can cause delays to shared devices. .. I/O MAJOR OTHER -------OTHER SYSTEM DATA-------- MEASUREMENT INTERVAL RATE PROBLEM SYSTEM I/O RATE CONN DISC PEND RESV 8:30- 8:45,22OCT2001 31.3 QUEUING SY1 35.0 0.041 0.001 0.455 0.000 SY2 88.2 0.100 0.003 0.714 0.000 SY3 109.0 0.123 0.003 0.723 0.000 TOTAL 232.2 0.264 0.006 1.892 0.000 8:45- 9:00,22OCT2001 25.7 QUEUING SY1 46.4 0.054 0.001 0.565 0.000 SY2 98.2 0.112 0.003 0.836 0.000 SY3 119.0 0.136 0.003 0.846 0.000 TOTAL 263.5 0.303 0.007 2.247 0.000

31©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DASD Component - sample reportRULE DAS607: VSAM DATA SET IS CLOSE TO MAXIMUM NUMBER OF EXTENTS

VOLSER: RLS003. More than 225 extents were allocated for the VSAM data sets listed below. The VSAM data sets are approaching the maximum number of extents allowed. The below shows the number of extents and the primary and secondary space allocation: .. TOTAL EXTENTS ---ALLOCATIONS--- SMF TIME STAMP JOB NAME VSAM DATA SET .. EXTENTS THIS OPEN PRIMARY SECONDARY 10:30,11MAR2002 CICS2ABA RLSADSW.VF01D.DATAENDB.DATA................. 229 4 30 CYL 1 CYL

RULE DAS625: NSR WAS USED, BUT LARGE PERCENT OF ACCESS WAS DIRECT

VOLSER: MVS902. Non-Shared resources (NSR) was specified as the buffering technique for the below VSAM data sets, but more than 75% of the I/O activity was direct access. NSR is not designed for direct access, and many of the advantages of NSR are not available for direct access. You should consider Local Shared Resources (LSR) for the below VSAM data sets (perhaps using System Managed Buffers to facilitate the use of LSR). The I/O RATE is for the time the data set was open. The SMF TIME STAMP and JOB NAME are from the last record for the data set. .. I/O OPEN -ACCESS TYPE (PCT)- SMF TIME STAMP JOB NAME VSAM DATA SET .. RATE DURATION SEQUENTIAL DIRECT 13:19,19SEP2002 NRXX807. SDPDPA.PK.MVSP.RT.NDMGIX.DATA............... 8.4 0:07:08 0.0 100.0 13:19,19SEP2002 NRXX807. SDPDPA.PR.MVSP.RT.NDMGIXD.DATA.............. 11.2 0:06:42 0.0 100.0 13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQFDA.DATA............. 0.3 2:21:58 0.0 100.0 13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQF.DATA............... 2.8 3:37:53 0.0 100.0 13:33,19SEP2002 TSJHM... SDPDPA.PK.MVSP.RT.NDMTCF.DATA............... 11.1 6:24:10 0.1 99.9

32©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

DASD Component(Application Analysis)

•Requires simple modification to MXG or MICS

• Modification collects job step data while processing SMF Type 30 (Interval) records

• Typically requires less than 10 cylinders

• Data is correlated with Type 74 information

•CPExpert associated performance problems tospecific applications (jobs and job steps)

•CPExpert can perform “Loved one” analysis ofDASD performance problems

33©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WMQ ComponentAnalyzes SMF Type 115 statistics, as processedby MXG or MICS and placed into performancedata base.

• MQMLOG - Log manager statistics

• MQMMSGDM - Message/data manager statistics

• MQMBUFER - Buffer Manager statistics

• MQMCFMGR - Coupling Facility Manager stats

Type 115 records should be synchronized withSMF interval recording interval.

IBM says overhead to collect accounting data is negligible.

34©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WMQ ComponentOptionally analyzes SMF Type 116 accounting

data, as processed by MXG or MICS and

placed into performance data base.

• MQMACCTQ - Thread-level accounting data

• MQMQUEUE- Queue-level accounting data

Type 116 records should be synchronized withSMF interval recording interval.

IBM says overhead to collect accounting data is 5-10%

35©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQTypical queue manager problems

Assignment of queues to page sets

Assignment of page sets to buffer pools

Queue manager parameters

Index characteristics of queues

Characteristics of messages in queues

Characteristics of MQ calls

CPExpert analysis uses SMF Type 116 records

36©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQTypical buffer manager problems

Buffer thresholds exceeded for pool

Buffers assigned per pool (too few/too many)

Message traffic

Message characteristics

Application design

CPExpert analysis uses SMF Type 115 records

37©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQTypical log manager problems

Log buffers assigned

Active log use characteristics

Archive log use characteristics

Tasks backing out

System paging of log buffers

Excessive checkpoints taken

CPExpert analysis uses SMF Type 115 records

38©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQTypical DB2-interface problems

Thread delays

DB2 server processing delays

Server requests queued

Server tasks experienced ABENDs

Deadlocks in DB2

Maximum request queue depth was too large

CPExpert analysis uses SMF Type 115 records

39©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQTypical Shared queue problems

Structure was full

Large number of application structures defined

MINSIZE is less than SIZE for CSQ.ADMIN

SIZE is more than double MINSIZE

ALLOWAUTOALT(YES) not specified

FULLTHRESHOLD value might be incorrect

CPExpert analysis uses SMF Type 115 records and Type 74 (Coupling Facility) records

40©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQ – sample reportRULE WMQ100: MESSAGES WERE WRITTEN TO PAGE SET ZERO

More than 0 messages were written to Page Set Zero during the intervals shown below. Messages should not be written to Page Set Zero, since serious WebSphere MQ system problems could occur if Page Set Zero should become full. This finding relates to queue SYSTEM.COMMAND.INPUT

MESSAGES WRITTEN STATISTICS INTERVAL TO PAGE SET ZERO 13:16-14:45, 28AUG2003 624

RULE WMQ122: DEAD.LETTER QUEUE IS INAPPROPRIATE FOR PAGE SET ZERO

Buffer Pool 0. The DEAD.LETTER queue was assigned to Page Set Zero. A dead-letter queue stores messages that cannot be routed to their correct destinations. If the DEAD-LETTER queue grows large unexpectedly, Page Set Zero can become full, and WebSphere MQ can enter a serious stress condition. You should redefine the DEAD.LETTER queue to a page set other than Page Set Zero. This finding relates to queue SYSTEM.DEAD.LETTER.QUEUE

41©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQ – sample reportRULE WMQ110: EXPYRINT VALUE IS OFF OR TOO SMALL

Buffer Pool 3. There were more than 25 expired messages skipped when scanning a queue for a specific message. Processing expired messages adds both CPU time and elapsed time to the message processing. With WebSphere 5.3, the EXPYRINT keyword was introduced to allow the queue manager to automatically determine whether queues contained expired messages and to eliminate expired messages at the interval specified by the EXPYRINT value. This finding applies to queue: DPS.REPLYTO.RCB.IVR04

GET BROWSE EXPIRED MESSAGES STATISTICS INTERVAL SPECIFIC SPECIFIC PROCESSED 13:41-13:41, 03JUL2003 0 0 313

RULE WMQ320: APPLICATIONS WERE SUSPENDED FOR LOG WRITE BUFFERS

Applications were suspended while in-storage log buffers are being written to the active log. This finding normally means that too few log buffers were assigned. However, the finding could mean that there is an I/O configuration problem and the log buffer writes to the active log are delayed for I/O reasons. This finding applies to the following statistics intervals.

NUMBER OF SUSPENSIONS STATISTICS INTERVAL WAITING ON OUTPUT BUFFERS 14:19-14:44, 12SEP2003 139

42©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQ – sample reportRULE WMQ201: BUFFER POOL ENCOUNTERED SYNCHRONOUS (5%) THRESHOLD

Buffer Pool 0. This buffer pool encountered the Synchronous Write threshold (less than 5% of the pages in the buffer pool were "stealable" or more than 95% of the pages were on the Deferred Write queue). While the Synchronous Page Writer is executing, updates to any page cause the page to be written immediately to the page set (the page is not placed on the Deferred Write Queue, but is written immediately to the page set as a synchronous write operation). This situation harms performance of applications, and is an indicator that the buffer pool is in danger of encountering a Short on Storage condition.

BUFFERS TIMES AT IMMEDIATE STATISTICS INTERVAL ASSIGNED 5% THRESHOLD WRITES 17:08-17:09, 07OCT2003 1,050 19 19

RULE WMQ205: HIGH I/O RATE TO PAGE SETS WITH SHORT-LIVED MESSAGES

Buffer Pool 0. This buffer pool had short-lived messages assigned. The total I/O rate (read and write activity) to page sets for the short-lived messages was more than 0.5 pages per second. Writing pages to the page set and subsequently reading the pages from the page set cause I/O overhead and delay to the application. This finding applies to the following intervals:

BUFFERS PAGES PAGES I/O RATE STATISTICS INTERVAL ASSIGNED WRITTEN READ WITH DASD 11:32-11:32, 24JUL2006 50,000 101 0 50.5

43©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

WebSphere MQ – sample reportRULE WMQ300: ARCHIVE LOGS WERE USED FOR BACKOUT

WebSphere MQ applications issued log reads to the archive log file for backout more than 0 times during the WebSphere MQ statistics intervals shown below. Most log read requests should come from the output buffer or the active log. Using archive logs for backout purposes often indicates that either the active log files were too small or long-running applications were backing out work. NUMBER OF LOG READS STATISTICS INTERVAL FROM ARCHIVE LOG 4:30- 5:00, 12SEP2003 192

RULE WMQ611: LARGE NUMBER OF APPLICATION STRUCTURES WERE DEFINED

SMF TYPE74 (Structure) statistics showed that more than 5 application structures were defined to a coupling facility. IBM suggests that you should have as few application structures as possible. Having multiple application structures in a coupling facility can degrade performance.

WEBSPHERE MQ COUPLING FACILITY STRUCTURES DEFINED CF1 8 CF2 9 CF3 8

44©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

CPExpert Release 18.1(Issued April 2008)

Major enhancements with this update:• Provided support for z10 server

• Provided analysis of HiperDispatch problems

• Provided new reports to help analysis of DB2 buffer pool problems

• Expanded the CPExpert email feature to the DASD Component

• Provided additional analysis features for the WebSphere MQ Component

45©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support for z/OS Version 1, Release 10

• Provided additional analysis of z/OS performance problems (in WLM Component), including reduced CPU speed caused by cooling unit failure

• Provided new reporting of rules based on History information kept by CPExpert (applies to all components except DB2 Component)

• Added masking technique to select CICS regions (by region Group), DASD volumes (including SMS Storage Groups), and WebSphere MQ subsystems

CPExpert Release 18.2(Issued October 2008)

46©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Enhanced WLM Component with analysis of more

z/OS performance problems, including Enqueue Promoted Dispatching Priority analysis

• Project the amount of zAAP-eligible work that could be offloaded to a zAAP processor, if a zAAP processor were assigned to the LPAR

• Provided more analysis of CICS temporary storage in CICS Component

• Added Resource Enqueue analysis to DASD Component

CPExpert Release 19.1(Issued April 2009)

47©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support for z/OS Version 1, Release 11

• Provide support for CICS/TS Release 4.1.

• Added analysis of Resource Enqueue contention between different levels of Goal Importance to WLM Component

• Added analysis of CICS Event Processing to the CICS Component (applicable to CICS/TS 4.1)

• Allow users to specify narrative descriptions of individual DB2 buffer pools in CPExpert reports

CPExpert Release 19.2(Issued October 2009)

48©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Enhanced WLM Component with analysis of SMF

buffer specifications and other SMF performance constraints

• Support analysis of VSAM performance problems when analyzing a MICS performance data base, but using MXG TYPE42DS and MXG TYPE64 files

• Allow selection of up to 20 unique DB2 subsystems while analyzing performance problems with DB2 subsystems, and add logic to handle the case where an installation has multiple identical DB2 subsystem names defined in z/OS images

CPExpert Release 20.1(Issued April 2010)

49©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support z/OS Version 1 Release 12

• Provided support for z/Enterprise System (z196)

• Enhanced WLM Component to provide analysis of dropped SMF records and analysis of SMF flood facility (available with z/OS V1R12)

• Enhanced WLM Component to provide Management Overview of CPExpert findings, with web-enabled documentation links

• Enhanced the WebSphere MQ Component to provide analysis of a non-indexed request/reply-to queue

CPExpert Release 20.2(Issued October 2010)

50©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided new “analysis area” to include:

• Analysis of address spaces queued for logical processor

• Analysis of work units queued for logical processor

• Include analysis of queuing due to “power steering” option

• Provided additional analysis of HiperDispatch

CPExpert Release 21.1(Issued April 2011)

51©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support z/OS Version 1 Release 13

• Provided support for CICS/Transaction Server Version 4 Release 2 (CICS/TS 4.2), including:

• CICS/TS shortage and critical shortage above the Bar

• IARV64 macro with CONVERT=FROMGUARD failed

• Main temporary storage issues with TSMAINLIMIT

• CICS-DB2 thread REUSELIMIT analysis

• CICS/TS: routing unsupported function across IPIC

CPExpert Release 21.2(Issued October 2011)

52©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support z/OS Version 1 Release 13

• Provided support for CICS/Transaction Server Version 4 Release 2 (CICS/TS 4.2), including:

• CICS/TS shortage and critical shortage above the Bar

• IARV64 macro with CONVERT=FROMGUARD failed

• Main temporary storage issues with TSMAINLIMIT

• CICS-DB2 thread REUSELIMIT analysis

• CICS/TS: routing unsupported function across IPIC

CPExpert Release 22.1(Issued April 2012)

53©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support for z/Enterprise EC12.

• Provide additional analysis in the WLM Component

• Analysis of IBM’s CPU Measurement Facility data (Type 113)

• Excessive Average Penalty Cycles Per Instruction

• SMF Type 30 data for intervals with high penalty cycles

• Processor degradation due to penalty cycle delays

• Unused Special processors were defined to the LPAR

CPExpert Release 22.2(Issued October 2012)

54©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support for CICS Release 5.1

• Provided additional analysis, CICS Event Processing

• Event Capture Queue

• Peak events dispatched

• Synchronous events

• Transactional events

• Modified analysis to support changes with CICS/TS 5.1

• Added analysis of SMF Type 113 data placed in a MICS performance data base

CPExpert Release 23.1(Issued April 2013)

55©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:

• Provided additional analysis, WLM Component

(analysis of new z/OS Interrupt Delay Time)

• Provided additional analysis, CICS Component (analysis of Buffer Management Facility – VSAM/RLS)

• Provided additional analysis, DASD Component (analysis of Interrupt Delay Time)

CPExpert Release 23.2(Issued October 2013)

56©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:• Provided support for CICS Release 5.1

• Provided additional analysis, CICS Event Processing

• Event Capture Queue

• Peak events dispatched

• Synchronous events

• Transactional events

• Modified analysis to support changes with CICS/TS 5.1

• Added analysis of SMF Type 113 data placed in a MICS performance data base

CPExpert Release 24.1(Issued April 2014)

57©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Major enhancements with this update:

• Provided support for CICS Release 5.2

• Provided support for DB2 Release 11

• Provide option to summarize DB2 Interval

Statistics with DB2 Release 10 and DB2 Release 11

• User specified summarization interval

• DB2 performance analysis and reporting done based on

summarized interval data

CPExpert Release 24.2(Issued November 2014)

58©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

License fees(Site license)

Components First Year Additional year

WLM Component 7,500 5,000

DB2 Component 7,500 5,000

CICS Component(see note) 5,000 3,000

WMQ Component 5,000 3,000

DASD Component 3,000 1,500

Note: Fees shown for the CICS Component are for analyzing no more than 50 CICS regions.

59©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

Summary

• The major objective is to share solutions and provide insight into new z/OS features.

• CPExpert is updated every six months; support for new versions of z/OS has been available within 30 days after General Availability of the new z/OS release.

• CPExpert is offered at a low cost (affordable by all z/OS shops).

• 45-day no-obligation trial is available (see license agreement for details).

• Free no-obligation performance analysis is available

60©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

For more information, please contact

Don DeeseComputer Management Sciences, Inc.634 Lakeview DriveHartfield, VA 23071-3113

Phone: (804) 776-7109email: [email protected]

Visit www.cpexpert.com for more information, to review sample output, to review documentation in SAS ODS “point-and-click” format, to download license agreements in .pdf “form” mode, etc.