table of contents  · web view2017. 12. 7. · for more information on the performance of earlier...

63
IBM z14 and FICON Express16S+ Performance December 2017 Brian Murphy

Upload: others

Post on 18-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM z14 and FICON Express16S+ Performance

December 2017

Brian MurphyDinesh KumarYamil Rivera

IBM Systems© 2017, International Business Machines Corporation

Page 2: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Table of ContentsTable of Contents......................................................................................................................................2Introduction.................................................................................................................................................3The High Performance FICON for IBM Z:......................................................................................5

Figure 1 – FICON Express Channel Maximum Throughput............................................6High Performance FICON for System z (zHPF) architecture performance features....7

Figure 2 – Link Protocols...............................................................................................................8zHPF Exploitation.....................................................................................................................................9FEx16S+ channel performance using zHPF and FICON protocols...................................10

Figure 3 – FEx16S+ Maximum IO/sec....................................................................................10Figure 4 – FICON Aggregation..................................................................................................11Figure 5 – zHPF Maximum MB/sec.........................................................................................12Figure 6 – FICON Maximum MB/sec......................................................................................13Figure 7 – FEx16S+ Maximum MB/sec.................................................................................14Figure 8 – FEx16S+ Response Time.......................................................................................15Figure 9 – FEx16S+ Channel Processor Utilization.........................................................16Figure 10 – FEx16S+ Channel Utilizations..........................................................................16

Fibre Channel Protocol (FCP) for FICON Express....................................................................17Figure 11 – Maximum Small Block I/O Rate........................................................................17Figure 12 – Maximum Large Block Throughput.................................................................18

FEx16S+ FCP Performance................................................................................................................19FCP Hardware Data Router Performance................................................................................19

Figure 13 – Maximum Transaction Rate................................................................................21Figure 14 – Single Read I/O Response Times......................................................................22Figure 15 – 64KB Maximum Throughput..............................................................................23

FEx16S+ Performance Improvements Over FEx16S...........................................................23Figure 16 – 4KB Read Maximum Transaction Rate..........................................................24Figure 17 – 4KB Read Data Router Response Times........................................................25Figure 18 – 64KB Maximum Throughput..............................................................................26

Commercial Batch Workload Performance..................................................................................27Figure 19 – Batch Elapsed Time...............................................................................................28

FEx16S+ Channel Performance using zHPF and FICON Protocols at up to 100km of Distance......................................................................................................................................................28

Figure 20 – Distance Diagram...................................................................................................29Figure 21 – Distance zHPF Large Data Transfer...............................................................31Table 1 - Average Frame Sizes..................................................................................................32Figure 22 – Distance FICON Large Data Transfer............................................................32Figure 23 – Local vs Remote Distance...................................................................................33

zHyperLink Express...............................................................................................................................33Figure 24 – zHyperLink Diagram.............................................................................................35Figure 25 – FEx16S+ vs zHyperLink......................................................................................36

RMF Channel Activity report.............................................................................................................36Figure 26 – RMF Diagram...........................................................................................................37

RMF Synchronous I/O Device Activity...........................................................................................38Figure 27 – RMF Synchronous I/O...........................................................................................39

FEx16S+ Card and I/O Subsystem (IOSS) performance with zHPF..................................39Figure 28 – z14 IOSS Diagram..................................................................................................40Figure 29 – FEx16S+ Channel vs Card Bandwidth...........................................................40Figure 30 – z14 IOSS Bandwidth..............................................................................................41

z14 SAP capacity.....................................................................................................................................42

© Copyright IBM Corp. Page 2 of 52

Page 3: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 31 – Fully interconnected 4 drawer system...........................................................42Figure 32 – Available models of z14........................................................................................43Figure 33 – RMF I/O Queueing Activity Report..................................................................44Figure 34 – z14 SAP Capacity....................................................................................................45

Summary.....................................................................................................................................................46Acknowledgments...............................................................................................................................47

© Copyright IBM Corp. Page 3 of 52

Page 4: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

IntroductionThis technical paper was developed to help technical representatives understand the I/O performance characteristics of the FICON Express16S+channels available on the IBM z14™.

In this paper FICON Express16S+ may be referred to as FEx16S+, FICON Express16S as FEx16S, and FICON Express8S as FEx8S. When zHPF is used by itself, it refers to the zHPF protocol. When the term FICON is used by itself, it refers to the FICON protocol. When the term FCP is used by itself, it refers to the FCP protocol. Most FICON and zHPF performance results presented in this paper were obtained using versions of z/OS® available at the time of the measurement, while most FCP measurements were produced using versions of Linux on Z. In particular, the FEx16S+ channel measurement results summarized in this paper were collected using z/OS V2.2 and SUSE Linux Enterprise Server 12.

For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity website at:

http://www.ibm.com/systems/z/connectivity/

The z14 supports the following FICON features: FICON Express 16S+ (FEx16S+) FICON Express 16S (FEx16S) (carry-forward only) FICON Express 8S (FEx8S) (carry-forward only)

The FEx16S+, FEx16S, and FEx8S features conform to the following architectures:

Fibre Connection (FICON) High Performance FICON on Z (zHPF) Fibre Channel Protocol (FCP)

They provide connectivity between any combination of servers, directors, switches, and devices (control units, disks, tapes, and printers) in a SAN.

Each FEx16S+, FEx16S, and FEx8S feature occupy one I/O slot in the PCIe I/O drawer. Each feature has two ports, with one PCHID and one CHPID associated with each port.

© Copyright IBM Corp. Page 4 of 52

Page 5: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

This document contains performance information. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the numbers stated here. The information herein is provided “AS IS” with no warranties, express or implied. This information does not constitute a specification or form part of the warranty for any IBM product.

The High Performance FICON for IBM Z: High Performance FICON for IBM Z (zHPF) is implemented for throughput and latency, which it does by reducing the number of information units (IU) that are processed. Enhancements to the z/Architecture and the FICON protocol provide optimizations for online transaction processing (OLTP) workloads. Experiments using FEx16S+ within an IBM z14, I/O intensive batch workloads will realize a 22% improvement in Batch Elapsed time due to FEx16S+ as compared to z13 FEx16S. I/O intensive batch workloads will realize a 37% improvement in Batch Elapsed time due to FEx16S+ as compared to z13 FEx8S.

Reflected in the bar charts in Figure 1 below are the "best can do" capabilities of each of the generations of FICON Express channels supported on z14. For each of the generations of FICON Express channels, there are two bars, one which displays the maximum capability using the FICON protocol exclusively and another that shows the maximum capability using the zHPF protocol exclusively.

© Copyright IBM Corp. Page 5 of 52

Page 6: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

20,000

92,000

23,000

98,000

23,800

314,000

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

FICON zHPF

IO/s

ecz14 Supported Channels Small Block Size

FEx8S FEx16S FEx16S+

220%increase

620

1,600

620

2,560

630

3,200

0

500

1,000

1,500

2,000

2,500

3,000

3,500

FICON zHPF

MB/

sec

z14 Supported Channels Large Block Size

FEx8S FEx16S FEx16S+

25%increase

Figure 1 – FICON Express Channel Maximum Throughput

The first chart displays the maximum or 100% channel utilization 4K I/O rates for each channel as measured at a point in time around the general availability (GA) dates of each product using an I/O driver benchmark program for 4K byte read hits. The size of most online database transaction processing (OLTP) workload I/O operations is 4k

© Copyright IBM Corp. Page 6 of 52

Page 7: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

bytes. In laboratory measurements, using FEx16S+ in a z14 with the zHPF protocol and small data transfer I/O operations, FEx16S+ achieved a maximum of 314,000 IO/sec, compared to the maximum of 98,000 IO/sec achieved with FEx16S. This represents approximately a 220% increase. The second chart displays the maximum READ+WRITE MB/sec for each channel.In laboratory measurements, using FEx16S+ in a z14 with the zHPF protocol and a mix of large sequential read and update write data transfer I/O operations, FEx16S+ achieved a maximum throughput of 3,200 READ+WRITE MB/sec compared to the maximum of 2,560 READ+WRITE MB/sec achieved with FEx16S. This represents approximately a 25% increase.

The maximum zHPF 4k IO/sec measured on a FEx16S+ channel was approximately 12x increase of the maximum FICON protocol capability. Response time curves for these results will be described later in this document. As shown in Figure 1, with zHPF, it is possible to achieve an improvement in both small block IO/sec processing for OLTP workloads and large sequential READ+WRITE I/O processing compared to previous FICON offerings.

High Performance FICON for System z (zHPF) architecture performance featureszHPF is an extension to the FICON architecture designed to improve the execution of small block I/O requests. zHPF streamlines the FICON architecture and reduces the overhead on the channel processors, control unit ports, switch ports, and links by improving the way channel programs are written and processed. To understand how zHPF improves upon FICON, one needs to review the relevant characteristics of FICON channel processing.

A FICON channel program consists of a series of Channel Command Words (CCWs) which form a chain. The command code indicates whether the I/O operation is going to be a read or an update write from disk, and the count field specifies the number of bytes to transfer. When the channel finishes processing one CCW and either a

© Copyright IBM Corp. Page 7 of 52

Page 8: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

command chaining or data chaining flag is turned on, it processes the next CCW and the CCWs belonging to such a series are said to be chained. Each one of these CCWs is a FICON channel Information Unit (IU) which requires separate processing on the FICON channel processor and separate commands to be sent across the link from the channel to the control unit. The zHPF architecture defines a single command block to replace a series of FICON CCWs as illustrated in Figure 2 below.

zHPF improves upon FICON by providing a Transport Control Word (TCW) that facilitates the processing of an I/O request by the channel and the control unit. The TCW has a capability that enables multiple channel commands to be sent to the control unit as a single entity instead of being sent as separate commands as is done with FICON CCWs. In addition, the channel is no longer expected to process and keep track of each individual channel command word. Instead, the channel forwards a chain of commands to the control unit to execute. The reduction of this overhead increases the maximum I/O rate possible on the channel and improves the utilization of the various sub-components along the path traversed by the I/O request.

zHPF provides a much simpler link protocol than FICON. Figure 2 below shows an example of a 4k read FICON channel program, where three IUs are sent from the channel to the control unit plus three IUs from the control unit to the channel. In this example, zHPF reduces the total number of IUs sent in half, using one IU from the channel to the control unit and two IUs from the control unit to the channel.

© Copyright IBM Corp. Page 8 of 52

Page 9: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

CHANNEL

CONTROL

UN IT

OPEN EXCHANGE, PREFIX CMD & DATA

READ COMMAND

CMR

4K of DATA

STATUS

CLOSE EXCHANGE

FICON

Link Protocol Comparison for a 4KB READ

CHANNEL

CON TR O L

UNIT

OPEN EXCHANGE, send a Transport Command IU

4K OF DATA

Send Transport Response IUCLOSE EXCHANGE

zHPF

zHPF provides a much simpler link protocol than FICON

Figure 2 – Link Protocols

With zHPF, “well constructed” CCW strings are collapsed into a single new Control Word. Conceptually this is similar to the Modified Indirect Data Address Word (MIDAW) facility enhancement to FICON, which allowed a chain of data CCWs to be collapsed into one CCW. zHPF now allows the collapsing of both Command Chained as well as Data Chained CCW strings into one Control Word. zHPF-capable channels and devices support both FICON and zHPF protocols simultaneously.

The maximum number of open exchanges or the number of I/Os that can be simultaneously active on FEx16S+ channels is designed to be significantly higher with zHPF compared to FICON. An open exchange is an I/O that is active between the channel and the control unit and it includes I/Os that are cache hits, which begin transferring data back to the channel immediately and those that are cache misses which

© Copyright IBM Corp. Page 9 of 52

Page 10: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

might experience a delay of several milliseconds before the data can begin transferring back to the channel. Since higher I/O activity levels are both possible now and expected to increase in the future with zHPF, the maximum number of open exchanges allowed per channel has been increased with zHPF. More information on how to find the actual average number of open exchanges used in a production workload environment is provided in the “zHPF fields on the RMF Channel Activity report” section of this document.

zHPF ExploitationzHPF can be turned on or off. For z/OS exploitation, there is a parameter in the IECIOSxx member of SYS1.PARMLIB (ZHPF=YES/NO) and on the SETIOS command to control whether zHPF is enabled or disabled. The default is ZHPF=NO.

For zHPF exploitation, FEx16S+(CHPID type FC) on a z14 requires at a minimum:

z/OS

Version 2 Release 3 Version 2 Release 2 Version 2 Release 1 Version 1 Release 13 (compatibility support only, with extended

support agreement)

zHPF-capable channels and devices support both FICON and zHPF protocols simultaneously. The Media Manager component of DFSMS™ detects whether the device supports zHPF or not and builds the appropriate channel programs. Media Manager will build the zHPF Transport Mode channel programs for DB2, PDSE, VSAM, QSAM, BPAM, BSAM, zFS and Extended Format SAM that could benefit from the improved transfer technique.

© Copyright IBM Corp. Page 10 of 52

Page 11: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

FEx16S+ channel performance using zHPF and FICON protocolsWith the introduction of the FEx16S+ channel, improvements can be seen in both response times and maximum throughput for IO/sec and MB/sec for workloads using the zHPF protocol on FEx16S+channels.

One primary performance improvement in a FEx16S+ channel with zHPF compared to FICON is the maximum number of small block I/Os per second (4k bytes per I/O) that can be processed. As displayed in Figure 3 below, the maximum number of IO/sec that was measured on a FEx16S+ channel running an I/O driver benchmark with a 4k bytes per I/O workload exploiting zHPF is 314,000, which is approximately twelve times what was measured with FICON. In this case, two separate experiments were conducted; one where the channel was executing all zHPF channel programs and another where the channel was executing all FICON channel programs.

0.00

0.50

1.00

1.50

2.00

2.50

3.00

0 50,000 100,000 150,000 200,000 250,000 300,000 350,000

Resp

onse

Tim

e (m

sec)

IO/sec

z14 Single Channel FEx16S+ 4K Read

FICON zHPF

Figure 3 – FEx16S+ Maximum IO/sec

Figure 3 shows that the total response time for a 4k read I/O operation is approximately 100 microseconds better for zHPF compared to FICON channel programs at I/O activity levels less than 20,000 IO/sec per channel. For FICON, response times increase gradually to approximately 0.3ms for activity levels up to 18,000

© Copyright IBM Corp. Page 11 of 52

Page 12: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

IO/sec per channel and then increase sharply at a level of activity between 20,000 and 23,000 IO/sec. In contrast, response times for zHPF continue at a moderate rate of increase up to approximately 0.19ms at 300,000 IO/sec, followed by the sharp increase indicative of reaching the maximum capability of the channel. Please note that these measurements were done with a single z14 channel connected to multiple host adapter ports, each on a different IBM System Storage DS8870. Just as is the case with FICON, the maximum capability of any individual control unit (CU) port running zHPF channel programs is different than the maximum capability of the zHPF channel. Contact an IBM representative for more information on the performance of a DS8870.

Since the difference in maximum capability with 100% zHPF activity compared to 100% FICON activity is so great on the FEx16S+channel, it is worthwhile to note how the maximum capability changes for I/O benchmarks that read 4K bytes of data using a mixture of zHPF and FICON protocols. As shown in the chart below, with a mixture of 90% zHPF and 10% FICON, the maximum capability is 280,000 IO/sec. With a mix of 50% zHPF and 50% FICON, the maximum capability is 170,000 IO/sec on a FEx16S+ channel. Higher performance improvements can be experienced with higher percent zHPF activity. For information on how to enable zHPF, refer to the “zHPF Exploitation” section of this paper.

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0 50,000 100,000 150,000 200,000 250,000 300,000 350,000

Resp

onse

Tim

e (m

sec)

IO/sec

z14 FEx16S+ FICON Aggregation 4K Read

100% zHPF 90% zHPF + 10% FICON 75% zHPF + 25% FICON 50% zHPF + 50% FICON

25% zHPF + 75% FICON 10% zHPF + 90% FICON 100% FICON

© Copyright IBM Corp. Page 12 of 52

Page 13: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 4 – FICON Aggregation

Improvements in maximum megabytes per second (MB/sec) can also be realized with FEx16S+ channels compared to FEx16S channel as displayed in Figure 5 below.

0

500

1,000

1,500

2,000

2,500

3,000

3,500

FEx16S FEx16S+

MB/

sec

z14 zHPF Maximum MB/sec

Read Write Read/Write (mix)

25%

6%9%

Figure 5 – zHPF Maximum MB/sec

Figure 5 above shows the "best can do" capabilities of both FEx16S+ and FEx16S channel measured using an I/O driver benchmark program that executes channel programs which use the zHPF protocol to either read multiple sequential 4k records (32x4k = 128k bytes per I/O) of data from disk or write multiple sequential 4k records of data to disk or do a 50/50 mix of read and write I/O operations. The maximum MB/sec for a single FEx16S+ channel is approximately 1,600 MB/sec for READs or update WRITEs and 3,200 MB/sec for a mix of READ and update WRITE I/O operations. For reference, the maximum MB/sec measured for a single FEx16S channel was approximately 1,500 MB/sec for READs or update WRITEs and 2,560 MB/sec for a mix of READ and update WRITE I/O operations. Due to the improvement of the 16 Gigabits per second

© Copyright IBM Corp. Page 13 of 52

Page 14: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

(Gbps) link, the maximum FEx16S+channel measurement results are approximately 9% better for all READs and 6% better for all update WRITEs. With a 50/50 mix of READ and update WRITE I/O operations, the maximum capability of the FEx16S+ channel is 25% greater than the maximum READ+WRITE MB/sec of a single FEx16S channel using the zHPF protocol. In each case using the zHPF protocol, the FEx16S+ channel is capable of driving the maximum MB/sec that a 16 Gbps link can support.

The following FICON measurements displayed in Figure 6 below are representative of large sequential I/O operations that do not use the zHPF protocol, e.g. those that use EXCP to execute channel programs. These experiments were done using channel programs that transferred six half tracks worth of data or 6x27k bytes for each READ or update WRITE I/O. The maximum MB/sec for a single FEx16S+channel using the FICON protocol is approximately 830 MB/sec for READs, 910 MB/sec for update WRITEs and 860 MB/sec for a mix of READ and WRITE I/O operations. The maximum MB/sec for a single FEx16S channel using the FICON protocol is approximately 650 MB/sec for READs, 630 MB/sec for update WRITEs and 630 MB/sec for a mix of READ and WRITE I/O operations.

0100200300400500600700800900

1,000

FEx16S FEx16S+

MB/

sec

z14 FICON Maximum MB/sec

Read Write Read/Write (mix)

36%43%28%

Figure 6 – FICON Maximum MB/sec

© Copyright IBM Corp. Page 14 of 52

Page 15: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

As shown in Figure 7 below, using zHPF on a FEx16S+ channel, a range from 400 MB/sec for a single 4k block to over 1,500 READ MB/sec when 64 or more 4k blocks are transferred per I/O operation was achieved. In contrast, using the FICON protocol with the MIDAW facility, a range from 90 MB/sec for a single 4k block to over 550 READ MB/sec when 64 or more 4k blocks are transferred per I/O operation was measured on a FEx16S+channel. The maximum MB/sec possible with zHPF ranges from 3 times FICON for large data transfers (256 k bytes) to 5 times FICON for smaller data transfers (8k bytes per I/O). The graph below also demonstrates the performance benefits of the new FEx16S+efficiencies of the zHPF protocol, which increases the average bytes per frame and reduces the number of frames per I/O that is transferred.

0200400600800

1,0001,2001,4001,6001,800

4 8 16 32 64 128

MB/

sec

K Bytes of Data Transfer per I/O Operation

FEx16S+ Maximum Read MB/sec zHPF vs FICON

FICON zHPF

Figure 7 – FEx16S+ Maximum MB/sec

Examples of the response time benefits of both the new FEx16S+ channel and the efficiencies of the zHPF protocol are shown in Figure 8 below. Simple benchmark experiments were conducted using a FEx16S+ channel to READ or update WRITE 32 4k records or 128K bytes of data in total to a single device. PEND and CONN time

© Copyright IBM Corp. Page 15 of 52

Page 16: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

response time components are displayed for zHPF compared to the FICON protocol. For I/O operations that READ 128K bytes of data, zHPF response times are 30% faster than FICON which is displayed in the CONN time component. For I/O operations that update WRITE 128K bytes of data, zHPF response times are 20% faster than FICON.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

FICON Read

zHPF Read

FICON Write

zHPF Write

Response Time (msec)

z14 FEx16S+ Single Stream 32x4K

PEND CONN

20%

30%

Figure 8 – FEx16S+ Response Time

Benefits of both the new FEx16S+ channel and the efficiencies of the zHPF protocol can also be observed in the channel processor utilization reported on the Resource Measurement Facility (RMF) Channel Activity Report. Figure 9 below shows the FEx16S+channel processor utilization reported for both zHPF and FICON measurements for 32x4k READ I/O operations. Since the hardware data router is handling the movement of data for zHPF, the maximum FEx16S+ channel processor utilization reported for these large data transfer zHPF I/O operations is less than 10%. In contrast, the FEx16S+channel microprocessor is involved in processing both the Channel Command Words (CCWs) and the transfer of each of the 8k byte data information units (IUs). The channel utilization reported for equivalent amounts of MB/sec processed is more than 20 times what

© Copyright IBM Corp. Page 16 of 52

Page 17: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

is used for zHPF large data transfers. For large data transfer I/O operations using the zHPF protocol it is therefore more appropriate to observe either channel bus or link utilizations.

0

20

40

60

80

100

0 200 400 600 800 1,000 1,200 1,400 1,600Chan

nel P

roce

ssor

Util

izati

on

MB/sec

z14 FEx16S+ Large Data Transfer

zHPF FICON

Figure 9 – FEx16S+ Channel Processor Utilization

In general, IBM recommends the following guidelines to achieve good response times:

keep Channel Processor Utilizations less than 50% keep Channel BUS Utilizations less than 70%

Channel Processor Utilization

Channel BUS

Utilization

Figure 10 – FEx16S+ Channel Utilizations

© Copyright IBM Corp. Page 17 of 52

Page 18: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

When using either the FICON or the zHPF protocol, the FEx16S+ channel processor utilization is the most important factor to observe for small data transfers. The link or bus utilization is a more important factor to observe for large data transfers using the zHPF protocol. For large data transfers using the FICON protocol, both the channel processor and link utilization guidelines should be observed.

In summary, the FEx16S+channel offers performance improvements in both response times and maximum throughput for IO/sec and MB/sec for workloads using the zHPF protocol.

Fibre Channel Protocol (FCP) for FICON ExpressFICON Express hardware supports the Fibre Channel Protocol (FCP) when defined as CHPID type FCP, allowing z/VM, z/VSE, KVM and Linux on IBM Z to connect to industry-standard Small Computer System Interface (SCSI) storage controllers and devices.

The maximum performance capabilities of the last few generations of FICON Express FCP channels is displayed below, highlighting the improvements achieved with the latest FEx16S+ on z14.

Maximum Small Block I/O Rate

© Copyright IBM Corp. Page 18 of 52

Page 19: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 11 – Maximum Small Block I/O Rate

Maximum small block performance was measured by driving each channel with multiple concurrent 4KB FCP read operations. Using FEx16S+ in an IBM z14 with the FCP protocol for small data transfer operations, FEx16S+ operating at 16 Gbps achieved a maximum of 380,000 IO/sec, compared to the maximum of 110,000 IO/sec achieved with FEx16S. This represents 3.45 times the maximum IO/sec achieved with the previous generation card.

Maximum Large Block Throughput

© Copyright IBM Corp. Page 19 of 52

Page 20: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 12 – Maximum Large Block Throughput

Maximum large block performance was then measured by driving each channel with a mixture of 64KB FCP read and write operations. Using FEx16S+ in an IBM z14 with the FCP protocol and FEx16S+ operating at 16 Gbps, FEx16S+ achieved a maximum throughput of 3,200 MB/sec (reads + writes). This represents a 25% increase in throughput over FEx16S.

These results represent the “best can do” capabilities of these channels when running at 100% utilization. To produce these numbers, a test program was used to initiate FCP protocol operations from within a Linux on IBM Z partition, driving work through a single FICON Express FCP channel. Several SCSI storage devices were spread across two DS8870 boxes with multiple Fibre Channel ports, avoiding saturating any single storage device or link.

© Copyright IBM Corp. Page 20 of 52

Page 21: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

FEx16S+ FCP Performance The new FEx16S+ channel brings significant performance improvements over previous generation cards, especially for FCP data router small block operations. This section provides information and examples on these improvements, as well as the overall performance characteristics of this channel.

FCP Hardware Data Router Performance

FCP hardware data router support has been available on several generations of the FICON Express Channels, starting with FEx8S. This feature offers improved throughput and transaction processing capacity, in addition to reduced response times, by providing a shorter path when transferring data from system memory and reducing the channel’s internal processing costs.

To make use of the hardware data router, the following requirements must be met:

© Copyright IBM Corp. Page 21 of 52

FCP Hardware Data Router Minimum Requirements

zEnterprise 196 GA2 and zEnterprise 114 with FEx8S. z/VM V6.3 for guest exploitation. Linux on IBM Z:

– SLES 12 and SLES 11 SP3.– RHEL 7 and RHEL 6.4.– Ubuntu 16.04 LTS (or higher).

KVM hypervisor which is offered with the following Linux distributions: SLES-12 SP2 or higher, and Ubuntu 16.04 LTS or higher. For minimal and recommended distribution levels refer to the IBM Z website.

Page 22: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

© Copyright IBM Corp. Page 22 of 52

Enabling the FCP Hardware Data Router

The use of the FCP hardware data router can be manually enabled / disabled by the user in a Linux on IBM Z:

By modifying a kernel parameter:

ON: ‘zfcp.datarouter=1’OFF: ‘zfcp.datarouter=0’

Or by passing an additional parameter when loading the zfcp driver:

ON: modprobe zfcp datarouter=1OFF: modprobe zfcp datarouter=0

To verify the status of the FCP data router, the following commands can be used:

# systool -m zfcp -vModule = "zfcp"

Attributes: initstate = "live" refcnt = "48" srcversion = "52F4E65D11CC52E53FD2D0D" supported = "Yes"

Parameters: ... datarouter = "Y" ...

Or

# cat /sys/module/zfcp/parameters/datarouterY

For additional information, see the ‘Linux on IBM Z and LinuxONE – FCP Hardware Data Router support’ online document: https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaag/wecm/l0wecm00_fcp_hw_data_router_support.htm

Page 23: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

FEx16S+ small block performance greatly benefits from running with the FCP data router enabled. An example is provided below, which displays the effects of this parameter on the average transaction response times and IO/sec achieved with a single FCP channel. In this case, a curve was built by driving the channel with an increasing number of concurrent 4KB read operations, with and without the FCP hardware data router:

© Copyright IBM Corp. Page 23 of 52

3.8x maximum 4KB Read IO/sec with the use of the FCP Hardware Data Router

Page 24: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 13 – Maximum Transaction Rate

When the hardware data router is enabled, FEx16S+ can process a maximum of 380,000 4KB Read IO/sec, which is 3.8 times the maximum that can be processed without it. Non-data router operations are limited to 100,000 IO/sec. In addition, the hardware data router offers latency benefits, which can result in 28% lower response times when executing a single FCP 4KB read operation. This is a decrease from approximately 93us (non data router) to 65us (data router). As the amount of work increases in the channel, data router response times increase at a lower rate than non-data router operations, only showing a sharp increase once we have reached the maximum capacity of the channel.

Response times can be further reduced for hardware data router operations with larger block sizes, benefitting from the lower latency provided by this feature. The following chart highlights this behavior, displaying the response time associated with a single read operation as the block size increases from 512B to 64KB. In this experiment, a 16KB read operation was measured to take roughly 35% less time to complete when the hardware data router was enabled, while a 64KB blocks saw a larger delta, at 46%:

© Copyright IBM Corp. Page 24 of 52

Page 25: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

0.5 1 2 4 8 16 32 640

0.05

0.1

0.15

0.2

-100%

-80%

-60%

-40%

-20%

0%

20%

40%

60%

-26% -25% -26% -28% -30% -35% -41% -46%

FCP - FEx16S+Single Read I/O Response Times

Delta % Data Router OFF Data Router ON

FCP Data Block Size (KB)

Aver

age

Tran

sacti

on R

espo

nse

Tim

e (m

s)

Delta

%

Figure 14 – Single Read I/O Response Times

FEx16S+ large block maximum throughput also benefits from using the hardware data router. Running a mixture of large block FCP read and write operations results in a maximum of 3200 MB/s full duplex throughput with this parameter enabled, which is over twice the maximum achieved when the hardware data router is not in use. On the other hand, maximum large block throughput when transferring data in a single direction has a smaller gain: between 4% - 5%, reaching over 1600 MB/s.

© Copyright IBM Corp. Page 25 of 52

Page 26: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

FEx16S+ -

400

800

1,200

1,600

2,000

2,400

2,800

3,200

3,600

FCP - FEx16S+ 64KB Maximum Throughput

Data Router ON VS OFF

Data Router OFF - 64KB Read Data Router OFF - 64KB Write Data Router OFF - 64KB MixedData Router ON - 64KB Read Data Router ON - 64KB Write Data Router ON - 64KB Mixed

Mill

ions

of B

ytes

Per

Sec

ond

+10%

Data Router ONData Router OFF

+110%

+5% +4%

Figure 15 – 64KB Maximum Throughput

FEx16S+ Performance Improvements Over FEx16S

Small Block Performance

One of the main improvements brought by FEx16S+ over FEx16S is the significant increase in small block processing capacity. As described in ‘The Fibre Channel Protocol for FICON Express‘ section, when running hardware data router operations, a maximum of 380,000 4KB read IO/sec is achieved. This is 3.45x the maximum transaction

© Copyright IBM Corp. Page 26 of 52

Page 27: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

rate that can be processed with the FEx16S channel, which can produce 110,000 IO/sec under the same conditions. However, not only does FEx16S+ outperform FEx16S’ maximum IO rate, it also offers lower transaction response times for these operations as displayed below:

0 50000 100000 150000 200000 250000 300000 350000 400000 4500000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

FCP - FEx16S+ vs FEx16S4KB Read Transaction

Response Times

FEx16S (Data Router OFF) FEx16S+ (Data Router OFF)FEx16S (Data Router ON) FEx16S+ (Data Router ON)

Transaction Rate

Aver

age

Tran

sacti

on R

espo

nse

Tim

e (m

s)

Data Router ON Performance: - FEx16S+ achieves 3.45x the maximum 4KB Read throughput of FEx16S- offers significantly lower transaction response times accross comparable points.

Data Router OFF Performance: - 10% increase in small block transactions over FEx16S

Figure 16 – 4KB Read Maximum Transaction Rate

This chart represents the 4KB read transaction rates and response times achieved with both FEx16S+ and FEx16S, as the number of concurrent data router operations increases. Notice how throughout most of the curve, FEx16S+ maintains lower response times than FEx16S, even when it surpasses the maximum IO/sec limit of the

© Copyright IBM Corp. Page 27 of 52

Page 28: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

previous generation card. At 380,000 IO/sec, FEx16S+ average response times are 0.083 ms, a 85% drop from FEx16S running at 110,000 IO/sec.

Comparing those data points when both channels handled a similar amount of IO/sec, further highlights FEx16S+ latency improvements. The chart below offers a slightly modified view of this data, comparing the average transaction response times between these cards at specific transaction rate levels:

Single I/O @ 30K TX/s @ 60K TX/s @ 100K TX/s @ 110K TX/s -

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

-100%

-80%

-60%

-40%

-20%

0%

-25% -36% -52%

-79%

-89%

FCP - FEx16S+ vs FEx16S4KB Read Data Router

Response Times

Response Time Delta FEx16S FEx16S+

Transaction Rate

Aver

age

Tran

sacti

on R

espo

nse

Tim

e (m

s)

Delta

%

Figure 17 – 4KB Read Data Router Response Times

Executing a single 4KB read operation with FEx16S+ took 0.065 ms, a 25% reduction compared to FEx16S. As the number of concurrent operations increases, the gap between FEx16S+ and FEx16S increases as well. When both channels are executing 110,000 transactions per second (the maximum supported by the previous

© Copyright IBM Corp. Page 28 of 52

Page 29: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

generation card), FEx16S+ can drive this work with approximately 89% lower response times.

FEx16S+ non-data router small block performance, offers more modest gains over FEx16S. FEx16S+ can achieve approximately 100,000 transactions, which represents a 10% increase in 4KB read IO/sec when compared to FEx16S.

Large Block Performance

FEx16S+ also offers improvements to maximum large block throughput when running FCP operations. Full duplex bandwidth benefits the most here, but operations moving data in a single direction can also see some performance gains.

FEx16S FEx16S+ -

400

800

1,200

1,600

2,000

2,400

2,800

3,200

3,600

FCP - FEx16S+ vs FEx16S 64KB Maximum Throughput

Data Router OFF - 64KB Read Data Router OFF - 64KB Write Data Router OFF - 64KB MixedData Router ON - 64KB Read Data Router ON - 64KB Write Data Router ON - 64KB Mixed

Mill

ions

of B

ytes

Per

Sec

ond

+10%

Data RouterON

Data Router ON

Data Router OFF

Data Router OFF

+25%

+10% +5%+70%+100%

+75%

© Copyright IBM Corp. Page 29 of 52

Page 30: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 18 – 64KB Maximum Throughput

As described in ‘The Fibre Channel Protocol for FICON Express ‘ section, when the hardware data router is enabled, FEx16S+ supports up to full duplex 16Gb bandwidth at 3200 MB/s when executing a mixture of large block 64KB read and write FCP operations. This is a 25% increase over FEx16S, which supported up to 2560 MB/s. When issuing only 64KB operations in a single direction, FEx16S+ offers a 10% increase for FCP read throughput and a 5% increase for FCP writes, each reaching over 1600 MB/s.

On the other hand, without the use of the hardware data router, FEx16S+ offers significant large block performance gains when compared to FEx16S under the same conditions. A FEx16S+channel’s maximum full duplex throughput was measured around 1500 MB/s for a mixture of 64KB read and writes FCP operations. Compared to the 850 MB/s maximum supported by FEx16S, this is a 75% increase in throughput. When sending data in a single direction, maximum throughput sees similar gains when running FEx16S+: 70% higher throughput for FCP read only operations (1550 MB/s) and a 100% increase for FCP write only operations (1550 MB/s).

In summary, FEx16S+ offers several performance improvements when compared to FEx16S, including overall higher small block and large block throughput, as well as reduced transaction response times. However, hardware data router operations benefit the most, with FEx16S+ substantially outperforming FEx16S’ small block processing capacity while maintaining lower response times, even at higher load levels.

© Copyright IBM Corp. Page 30 of 52

Page 31: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Commercial Batch Workload PerformanceThe workload chosen for this experiment is CB-L workload that consists of a mix of Commercial Batch Jobs. The batch jobs include various combinations of C, COBOL, FORTRAN, and PL/I compile, link-edit, and execute steps. Sorting, DFSMS utilities (e.g. dump/restore and IEBCOPY), VSAM and DB2 utilities, SQL processing, GDDM® graphics, and FORTRAN engineering/scientific subroutine library processing are also included. The expectation is that this workload would provide ITRR values consistent with the Average Relative Nest Intensity (RNI) column of the Large System Performance Reference (LSPR). https://www-304.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprindex?OpenDocument

Based on experiments done using FEx16S+ within an IBM z14, I/O intensive batch workloads will realize a 22% improvement in Batch Elapsed time due to FEx16S+ as compared to z13 FEx16S. I/O intensive batch workloads will realize a 37% improvement in Batch Elapsed time due to FEx16S+ as compared to z13 FEx8S.

z13 FEx8S z13 FEx16S z14 FEx16S+

Commercial Batch Elapsed Time

-37%

-22%

© Copyright IBM Corp. Page 31 of 52

Page 32: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 19 – Batch Elapsed Time

Every workload is different in terms of the amount of I/O response time and overall transaction response time variation that can be tolerated but connecting two (or more) CU ports to each channel is one example of a reasonable way to consolidate channels.

The FICON Aggregation tool function in zCP3000 can be used to help with channel consolidation and migration from older generation FICON channels to FEx16S+channels so that the conversion to zHPF is possible. The Disk Magic tool is also available to help assess the optimum number of channels needed for a production workload on IBM storage subsystems. Both of these tools use RMF data as input to the analysis process for a specific production environment.

FEx16S+ Channel Performance using zHPF and FICON Protocols at up to 100km of Distance IBM recommends the use of Cascaded FICON directors with 16Gbps InterSwitch Links (ISLs) between the two directors to support distances up to 100km.

The FEx16S+ channel has 90 buffer credits for each individual channel port which is enough to support a 16 Gbps link speed at distances up to 10 km between each channel and its nearest neighbor, which could be a port on a director or tape or disk storage subsystem.

To evaluate FEx16S+ channel performance using zHPF and FICON protocols at 100 km of distance, using Dense Wavelength Division Multiplexer (DWDM). A series of performance measurements were conducted with multiple FEx16S+ channels on z14 connected to multiple storage subsystems using cascaded FICON directors with a 16 Gbps InterSwitch Link (ISL) between the two directors extended over a Dense Wavelength Division Multiplexer (DWDM) as illustrated in Figure 20 below.

© Copyright IBM Corp. Page 32 of 52

Page 33: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

FICONDirector

FICONDirector

100KM Fiber

DWDMDWDM16Gb

16Gb16Gb

ISL16Gb

ISL

16Gb

Figure 20 – Distance Diagram

The amount of throughput that can be achieved on an ISL at distances up to 100 km are dependent on a number of variables:

Workload characteristics such as data transfer sizes and mix of READ and WRITE activity

zHPF vs FICON protocols Link speed and the bit encoding scheme used on the ISL Buffer credits available on the ISL ports on each of the FICON

directors Actual distance between the two ISL ports on each of the FICON

directors

Figure 20 below displays the maximum MB/sec achieved on 16 Gbps ISLs between two FICON directors separated by 100 km DWDM with both 720 and 960 Buffer-to-Buffer (B2B) credits available on each of the ISL ports. Previously, the general recommendation was to use 0.5 buffers for each 1 km of distance and each 1 Gbps of link speed. This implies 50, 100, 200, 400 and 500 B2B credits for 1, 2, 4, 8 and 10 Gbps link speeds respectively. However, in contrast to the 1, 2, 4 and 8 Gbps FICON link speeds which use 8/10 bit encoding schemes, the 16 Gbps ISL link uses a 64/66 bit encoding scheme. This means that in contrast to the 1, 2, 4 and 8 Gbps FICON channel link speeds which

© Copyright IBM Corp. Page 33 of 52

Page 34: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

are capable of a maximum of approximately 100, 200, 400 and 800 MB/sec, a 16 Gbps ISL link is capable of a maximum of approximately 1600 MB/sec in each direction. These measurements used 960 B2B credits to ensure the maximum was achieved.

For 128k byte READ or update WRITE I/O operations with 960 B2B credits, a maximum of 3,120 MB/sec was achieved using zHPF protocols compared to 2,990 MB/sec with FICON protocols. For a 50/50 mix of 128k byte READ and WRITE I/O operations with 960 B2B credits, a maximum of 6,000 MB/sec was achieved using zHPF protocols compared to 5,200 MB/sec with FICON protocols.

For 128k byte READ or update WRITE I/O operations with 720 B2B credits, a maximum of 2,800 MB/sec was achieved using zHPF protocols compared to 2,700 MB/sec with FICON protocols. For a 50/50 mix of 128k byte READ and WRITE I/O operations with 720 B2B credits, a maximum of 5,400 MB/sec was achieved using zHPF protocols compared to 4,900 MB/sec with FICON protocols.

For the same number of buffer credits, this represents a 5 to 16% improvement in throughput for large data transfers using zHPF protocols compared to FICON protocols. This improvement corresponds to the increase in average bytes per frame using zHPF compared to FICON as shown in Table 1 below. This table also shows the much higher improvements in link efficiency and throughput that are possible with small data transfers of 4k bytes per I/O using zHPF compared to FICON. The increase from 720 to 960 B2B credits yielded an improvement in throughput on the 16 Gbps ISL for large data transfers as shown in Figure 21 and Figure 22.

© Copyright IBM Corp. Page 34 of 52

Page 35: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

100km B2B=720 100km B2B=960

MB/

sec

z14 zHPF 32x4K

Read Write Read/Write (mix)

12%

10%12%

Figure 21 – Distance zHPF Large Data Transfer

Table 1 below summarizes the improvement in link efficiency for zHPF compared to FICON for small, medium, and large data transfer sizes. For example, when the frames from a 50/50 mix of 4k byte READ and WRITE I/O operations are flowing across an ISL link, the average bytes per frame using zHPF protocols is 1080 bytes compared to 880 bytes for FICON protocols. This represents a 23% improvement, assuming the same number of IO/sec. For READ I/O operations, the average frame size shown is for the data frames flowing from the ISL port adjacent to the storage subsystem to the ISL port closest to the z14. For update WRITE I/O operations, the average frame size shown is for the data frames flowing to the ISL port adjacent to the storage subsystem from the ISL port closest to the z14.

© Copyright IBM Corp. Page 35 of 52

Page 36: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Data Transfer

zHPF Average Frame Size

FICON Average Frame Size

zHPF Improvement

in Link Efficiency

4K Read or Write 1,084 884 23%4K Read/Write (mix) 895 619 45%32x4K Read or Write 1,991 1,917 4%32x4K Read/Write (mix) 1,948 1,739 12%128x4K Read or Write 2,029 1,965 3%128x4K Read/Write (mix) 2,004 1,830 10%

Note:

** Average of bytes per frame coming into and going out of ISL port closest to z14.

Table 1 - Average Frame Sizes

0

1,000

2,000

3,000

4,000

5,000

6,000

100km B2B=720 100km B2B=960

MB/

sec

z14 FICON 32x4K

Read Write Read/Write (mix)

12% 17%

7%

Figure 22 – Distance FICON Large Data Transfer

© Copyright IBM Corp. Page 36 of 52

Page 37: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

From a response time perspective, most workloads will experience a 1 millisecond (ms) adder at 100 km of distance compared to local response times due to the speed of light through a fiber. Figure 23 below summarizes this 1ms adder to response times at reasonable levels of activity for READ I/O operations with small data transfers (4k bytes per I/O), large data transfers (32x4k or 128k bytes per I/O) and very large data transfers (128x4k or approximately 512k bytes per I/O) at 100 km compared to Local distance.

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

zHPF 4K zHPF 32x4K FICON 4K FICON 32x4K

Resp

onse

Tim

e (m

sec)

z14 FEx16S+ Local Distance vs 100km Distance

Local 100km

Figure 23 – Local vs Remote Distance

zHyperLink ExpresszHyperLink Express is a new feature introduced on the z14 machine that will enable Synchronous I/O. zHyperLink provides a low latency direct connection between a z14 system and a DS8886. This is a point-to-point connection using PCIe Gen3 x8 physical and link layers. The optical MPO cables and transceivers are the same as those defined for sysplex coupling links, with a maximum cable length of 150m being supported. A new transport protocol is defined for reading ECKD data records.

© Copyright IBM Corp. Page 37 of 52

Page 38: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

z14 hardware support for this feature consists of PBU (PCI-Express Bridge Unit) and ZHB (zSystem Host Bridge) changes on the z14 processor and a new adapter, named zHyperLink Express. The PBU changes consist of CRC generation and checking for individual data records and also some performance enhancements for dynamically monitoring each data transfer and determining when an operation completes. The zHyperLink Express adapter provides 2 zHyperLink Express ports and is managed using native z/PCI commands, with some minor enhancements for Synchronous I/O. A maximum of 8 Montauk adapters are supported in each z14 system.

The following key assumptions apply for z14 GA1:

Only ECKD supported FICON/HPF paths required in addition to zHyperLink Express

path for backup/recovery Only native LPAR supported All data transfers must be 16-byte aligned and with length being

a multiple of 16 bytes ECKD Reads supported

o Maximum Read Record length 4K

This low latency interface provides the opportunity for the operating system to read the data records synchronously, thus avoiding the scheduling and interrupt overhead associated with asynchronous operations. Consequently a new Synchronous I/O command has been defined to allow the operating system to synchronously read one data record. Access to the zHyperLink Express adapter links are provided using z/PCI constructs such as a PCI Function Identifier. Up to 64 virtual functions are supported for each zHyperLink Express interface (2 per adapter).

© Copyright IBM Corp. Page 38 of 52

Page 39: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

zHyperLink, Express Synchronous I/O Paradigm is illustrated in Figure24 below.

SAN

zHyperLink

FICON/zHPF

No I/O InterruptsNo Dispatches

Figure 24 – zHyperLink Diagram

As illustrated in Figure 25 using an IBM z14 zHyperLink Express attached to a DS8886, Multi-Stream 4K Read link latency is reduced

© Copyright IBM Corp. Page 39 of 52

Page 40: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

by up to 5x as compared to z14 FEx16S+ attached to a DS8886 with a single 16Gb Host Adapter.

0.1 0.098 0.098

0.019 0.019 0.019

0.000

0.020

0.040

0.060

0.080

0.100

0.120

Single Stream Multi (Four) Streams Multi (Eight) Streams

Resp

onse

Tim

e (m

sec)

z14 4K Read

FEx16S+ zHyperLink Express

Figure 25 – FEx16S+ vs zHyperLink

RMF Channel Activity report There are six fields on the RMF Channel Activity report that can be used to distinguish between FICON and zHPF traffic.

The RATE field refers to the number of FICON or zHPF I/Os per second initiated at the total physical channel level (not by LPAR)

The ACTIVE field refers to what we have previously called the "open exchanges", i.e. the number of I/Os that are simultaneously active within a channel.

The DEFER field refers to the number of deferred FICON or zHPF I/O operations per second. This is the number of operations that could not be immediately initiated by the channel due to a temporary lack of resources.

© Copyright IBM Corp. Page 40 of 52

Page 41: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 26 – RMF Diagram

The above example above Figure 26 shows a mix of FICON and zHPF I/O operations on each channel as shown in the “SPEED” field.

CHPID 90 (FEx8S), the channel link is operating at a speed of 8Gbps.

CHPID 91 (FEx8S), the channel link is operating at a speed of 4Gbps.

CHPID 60 (FEx16S), the channel link is operating at 4Gbps. CHPID 6C (FEx16S), the channel link is operating at 8Gbps. CHPID 92 (FEx16S+), the channel link is operating at 4Gbps. CHPID 93 (FEx16S+), the channel link is operating at 8Gbps. CHPID 94 (FEx16S+), the channel link is operating at 16Gbps.

The FEx16S+ channels support the following generation field (“G” column) of the RMF Channel Activity:

“1F” the channel link has autonegotiated to a speed of 4 Gbps “20” the channel link has autonegotiated to a speed of 8 Gbps “21” the channel link is operating at a speed of 16 Gbps “22” the channel link is operating at a speed of 16Gbps with

FEC (Forward Error Correction)

The FEx16S channels support the following generation field (“G” column) of the RMF Channel Activity:

“15” the channel link has autonegotiated to a speed of 4 Gbps “16” the channel link has autonegotiated to a speed of 8 Gbps

© Copyright IBM Corp. Page 41 of 52

Page 42: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

“17” the channel link is operating at a speed of 16 Gbps “18” the channel link is operating at a speed of 16Gbps with

FEC (Forward Error Correction)The FEx8S channels support the following generation field (“G” column) of the RMF Channel Activity:

“12” the channel link has autonegotiated to a speed of 4 Gbps “13” the channel link is operating at a speed of 8 Gbps

For additional descriptions please refer to RMF User’s Guide: https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.erbb500/device.htm

RMF Synchronous I/O Device Activity There is a new section of the RMF Report for Synchronous I/O Device Activity report that can be used to distinguish between Synchronous and Asynchronous Device Activity.

DIRECT ACCESS DEVICE ACTIVITYo Device Activity Rate, if Synchronous I/O Activity will be

signified by “S”. SYNCRONOUS I/O DEVICE ACTIVITY

o Synchronous I/O Device Activity

For additional descriptions please refer to RMF User’s Guide: https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.erbb500/device.htm

Figure 27 – RMF Synchronous I/O

© Copyright IBM Corp. Page 42 of 52

Page 43: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

FEx16S+ Card and I/O Subsystem (IOSS) performance with zHPF

z14 I/O Infrastructure

Figure 28 – z14 IOSS Diagram

The z14 supports the I/O drawer and form factor I/O cards which use Peripheral Component Interconnect Express Generation 3 (PCIe Gen3) links with increased capacity, granularity, and infrastructure bandwidth, as well as increased reliability, availability, and serviceability. The results of performance measurements done at each of the levels in this new I/O Infrastructure displayed in Figure 28 are summarized in this section.

At the card level, the new FEx16S+ channel card has two channels on the channel card.

Measurements were done with both FEx16S+ channels active Figure 29 on the same card to determine the maximum I/Os per second and MB/sec capability of the FEx16S+card with zHPF. When each I/O is transferring 4k bytes of data, the maximum IO/sec for two channels

© Copyright IBM Corp. Page 43 of 52

Page 44: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

on a card is over 600,000 IO/sec or 2 times the maximum IO/sec of a single FEx16S+channel exploiting the zHPF protocol.

When each I/O is transferring 128k bytes of data, the maximum MB/sec for two channels on a single card is approximately 3,200 MB/sec for READs or update WRITEs and over 6,400 MB/sec for a mix or READ and WRITE I/O operations. These card measurement results are about 2 times the maximum MB/sec of a single FEx16S+channel using the zHPF protocol.

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

Single Channel Single Card (2 Channels)

IO/s

ec

FEx16S+ Maximum IO/sec

Read Write Read/Write (mix)

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

Single Channel Single Card (2 Channels)

MB/

sec

FEx16S+ Maximum MB/sec

Read Write Read/Write (mix)

Figure 29 – FEx16S+ Channel vs Card Bandwidth

Figure 30 summarizes the expected I/O bandwidth available at each level of the new z14 I/O Infrastructure including a single FEx16S+channel, a single FEx16S+card with two FEx16S+channels, single PCIe Interconnect domain with up to eight FEx16S+channel cards and PCIe fanout card with two PCIe interconnects that allow connections to as many as 16 FEx16S+channel cards.

© Copyright IBM Corp. Page 44 of 52

Page 45: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Single Channel Single Card (2 Channels) PCIe Interconnect PCIe Fanout

MB/

sec

z14 IOSS Maximum MB/sec

Read Write Read/Write (mix)

Figure 30 – z14 IOSS Bandwidth

The results achieved at the channel and card level of the I/O Infrastructure were each described in this document. For the PCIe interconnect level with up to eight FEx16S+channel cards, approximately 12 GB/sec was measured for READs or update WRITEs and over 19 GB/sec was measured for a mix or READ and WRITE I/O operations. For the PCIe fanout card with two PCIe Interconnects or up to sixteen FEx16S+channel cards, approximately 25 GB/sec was measured for READs or update WRITEs and over 38 GB/sec for a mix or READ and WRITE I/O operations. The two PCIe Interconnects as measured is 11% more GB/sec than measured at the corresponding PCIe Gen3 fanout card level on the z13 server.

The z14 supports the Redundant I/O Interconnect feature which is designed to help avoid unplanned outages by maintaining critical connections to I/O devices during path failures, or upgrades or repairs of a multi-drawer server. The z14 uses an alternate PCIe link which

© Copyright IBM Corp. Page 45 of 52

Page 46: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

operates at the same speed as the primary PCIe link. The results achieved using two alternate path PCIe links into a single PCIe fanout card were similar to what was achieved using the primary PCIe links, namely 12 GB/sec READ or update WRITE.

This information about the maximum capabilities of various levels of the z14 IOSS is provided to help customers plan appropriately for high bandwidth demand workloads.

z14 SAP capacity The z14 system architecture incorporates several enhancements due to increased system size and evolving usage patterns of Service Element (SE) and Hardware Management Console (HMC) functions by clients. The Central Processing Complex (CPC) and SE/HMC firmware for z14 has evolved for optimized performance in terms of system responsiveness and scaling. Some of the new features as part of this evolution include a new wait() state instead of dispatcher idle loop for the System Assist Processors (SAPs), new lock handling architecture, optimized control blocks for cache line alignment, next stage of system automation for performance improvements, a new dashboard for monitoring and tracking SE performance and a greatly enhanced bus architecture & system topology for much higher efficiency as compared to the previous machine generation. The new system bus architecture provides direct connectivity between any given drawer to any other drawer leading to a fully interconnected 4 drawer system as illustrated in Figure 31.

© Copyright IBM Corp. Page 46 of 52

Page 47: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Figure 31 – Fully interconnected 4 drawer system

In particular, the System Assist Processors (SAPs) of z14 now also employ the Simultaneous Multi-Threading (SMT) feature that was only employed by the specialty engines in the previous machine generation. Figure 32 shows a table of various configurations of z14 available to customers.

Figure 32 – Available models of z14

z14 models M01 to M04 are 1, 2, 3 and 4 drawer configurations, respectively, that use 41 cores per drawer. The M05 model has 4 drawers with 49 cores each. Any system configuration will have 1 XSAP core (which is not SMT enabled) and multiple other SMT enabled SAP cores. The std. SAPs listed in Figure 32 include 1 XSAP

© Copyright IBM Corp. Page 47 of 52

Page 48: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

for each of the models. For example, single drawer model M01 will have 1 non-SMT XSAP core and 4 SMT enabled SAP cores resulting in a total of 5 standard SAPs. Additional SAPs can be purchased by customer for each drawer. Each SMT enabled SAP core will show as 2 IO Processor (IOP) threads in the RMF I/O Queueing Activity Report. The XSAP (non-SMT) will show as a single IOP thread and not as 2 threads. Thus a model with n SAP cores will show 2n – 1 threads in the RMF report. Figure 33 shows a snapshot for the 5 SAP model M01 with 9 IOP threads.

Figure 33 – RMF I/O Queueing Activity Report

With all the evolutionary advancements including the SMT feature of SAPs discussed above the maximum I/O processing capability on a z14™machine with 23 standard System Assist Processors (SAPs) is now over 4,300,000 IO/sec, a significant 34% increase compared to the previous z13™ with 24 SAPs (non-SMT).

These increases in IO capacity and the number of CPs in a drawer lead to a skyrocketing capacity for IBM customers in terms of IO growth and workload consolidation.

An activity rate of over 4,300,000 IO/sec was measured on a z14™ using z/OS V2.2, seven LPARs and a standard configuration of Central

© Copyright IBM Corp. Page 48 of 52

Page 49: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Processors (CPs), 23 System Assist Processors (SAPs), and a mix of FEx16S+, FEx16S, and FEx8S channels each executing zHPF small block (4k bytes) I/O operations. The maximum ideal SAP capacity for 23 SAPs at 100% SAP utilization is 4.3 Million IO/sec. However, for production environments, IBM recommends that SAP utilizations be kept at or below 70% utilization. Figure 34 below shows I/O capacity measured when running 1, 2, 3 and 4 drawer models of the z13™ and z14™ with the SAPs capped at 70% utilization levels.

Figure 34 – z14 SAP Capacity

As shown in Figure 34 above:

A z14 Model M01 with 5 SAPs on 1 drawer has an IO capacity of 1,060,600 IO/sec obtained with 2 LPARs. This is 13% higher than that of z13 with 6 SAPs on 1 drawer.

A z14 Model M02 with 10 SAPs on 2 drawers has an IO capacity of 1,712,800 IO/sec obtained with 4 LPARs. This is 12% higher than that of z13 with 12 SAPs on 2 drawers.

© Copyright IBM Corp. Page 49 of 52

Page 50: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

A z14 Model M03 with 15 SAPs on 3 drawers has an IO capacity of 2,566,700 IO/sec obtained with 6 LPARs. This is 24% higher than that of z13 with 18 SAPs on 3 drawers.

A z14 Model M04 with 20 SAPs on 4 drawers has an IO capacity of 3,218,400 IO/sec obtained with 7 LPARs. This is 42% higher than that of z13 with 24 SAPs on 4 drawers.

A z14 Model M05 with 23 SAPs on 4 drawers has an IO capacity of 3,459,800 IO/sec obtained with 7 LPARs. This is 53% higher than that of z13 with 24 SAPs on 4 drawers.

SummaryThe z14 FEx16S+ channel offers many benefits over previous generations of FICON channels. Improvements can be seen in both response times and maximum IO/sec and MB/sec for workloads using

© Copyright IBM Corp. Page 50 of 52

Page 51: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

zHPF on FEx16S+ channels. A single FEx16S+ channel can support up to either 314,000 zHPF IO/sec for OLTP workloads or 3,200 READ+WRITE MB/sec for high bandwidth applications using the zHPF protocol.

The maximum zHPF 4k IO/sec measured on a FEx16S+ channel was approximately twelve times the maximum FICON capability.

zHPF streamlines the FICON architecture and reduces the overhead on the channel processors, control unit ports, switch ports, and links by improving the way channel programs are written and processed. In general, for a variety of workloads, FEx16S+ channels with zHPF can either allow the number of channels to be reduced while maintaining similar levels of I/O response times or with the same number of channels, zHPF can reduce the channel utilization or improve the I/O response time.

zHPF and FICON rates for FEx16S+ channels on a z14 are displayed on the RMF Channel Activity report.

The z14 supports the new form factor I/O cards which use PCIe Gen3 links with increased capacity, granularity, and infrastructure bandwidth.

At the card level, the new FEx16S+ channel card has two channels on a channel card. The z14 FEx16S+ card capacity is 1.95 to 2.0 times the maximum capability of a single FEx16S+ channel using the zHPF protocol.

The increased capability of the FEx16S+ channels are complemented by improved performance at many levels of the z14 I/O Subsystem.

The z14 PCIe fanout card with two PCIe Interconnects or up to sixteen FEx16S+ channel cards supports up to 38GB/sec Read+Write, 11% more than what was measured at the corresponding level on the System z13 server.

The System Assist Processors (SAPs) of z14 now also employ the Simultaneous Multi-Threading (SMT) feature that was only employed by the specialty engines in the previous machine generation.

© Copyright IBM Corp. Page 51 of 52

Page 52: Table of Contents  · Web view2017. 12. 7. · For more information on the performance of earlier generation FICON channels, refer to the material available on the System z I/O connectivity

IBM Z - z14 and FICON Express16S+ Performance

Each SMT enabled SAP core will show as 2 IO Processor (IOP) threads in the RMF I/O Queueing Activity Report. The XSAP (non-SMT) will show as a single IOP thread and not as 2 threads. Thus a model with n SAP cores will show 2n – 1 threads in the RMF report.

The maximum I/O processing capability on a z14 with 23 SMT enabled SAPs is over 4,300,000 IO/sec, a 34% increase compared to a z13™ with 24 non-SMT SAPs.

Additional zHPF and FICON product information is available on the IBM Z I/O connectivity Web site at

http://www.ibm.com/systems/z/connectivity/.

AcknowledgmentsThe data presented in this technical paper is based upon measurements using a mixture of IBM internal tools and non-IBM I/O driver programs, specifically PAI/O™ Driver for z/OS.

The following people contributed to the measurement results presented in this technical paper: Brian Murphy, Su KR Han, Andrea Harris, Shelly Black, Yamil Rivera, and Fei Fei Li.

We would also like to thank all of the reviewers of this paper for their helpful comments.

Copyright IBM Corporation 2017IBM CorporationNew Orchard Rd.Armonk, NY 10504U.S.AProduced in the United States of America12/2017All Rights Reserved....PAI/O is a trademark of Performance Associates, Inc.

© Copyright IBM Corp. Page 52 of 52