linux on system z - performance update part 2 of 2

40
IBM Linux and Technology Center © 2009 IBM Corporation Linux on System z - Performance Update Part 2 of 2 Mario Held IBM Lab Boeblingen, Germany

Upload: others

Post on 21-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

IBM Linux and Technology Center

© 2009 IBM Corporation

Linux on System z -Performance UpdatePart 2 of 2

Mario HeldIBM Lab Boeblingen, Germany

2

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

TrademarksThe following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.

The following are trademarks or registered trademarks of other companies.

* All other products may be trademarks or registered trademarks of their respective companies.

Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:

*, AS/400®, e business(logo)®, DBE, ESCO, eServer, FICON, IBM®, IBM (logo)®, iSeries®, MVS, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA, WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®

Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market.

Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

3

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Agenda Disk I/O

– Linux system– File system– I/O options– I/O scheduler– Logical volumes

– Mainframe– FICON/ECKD and FCP/SCSI specialties– Channels

–Storage server– Disk configuration

Network–General network performance–Setup recommendations–Network options comparison

4

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Linux file system

Use ext3 instead of reiserfs Tune your ext3 file system

–Select the appropriate journaling mode (journal, ordered, writeback)

–Consider to turn off atime–Enabling directory indexing

Temporary files–Don't place them on journaling file systems–Consider a ram disk instead of a disk device

If possible, use direct I/O to avoid double buffering of files in memory

If possible, use async I/O to continue program execution while data is fetched–Important for read operations–Applications usually are not depending on write completions

5

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Linux 2.6 disk I/O options

Direct I/O (DIO)–Transfer the data directly from the application buffers to the device

driver, avoids copying the data to the page cache–Advantages:

– Saves page cache memory and avoids caching the same data twice– Allows larger buffer pools for databases

–Disadvantage:– Make sure that no utility is working through the file system (page cache)

--> danger of data corruption Asynchronous I/O (AIO)

–Application is not blocked for the time of the I/O operation–Resumes its processing and gets notified when the I/O is completed.–Advantage

– the issuer of a read/write operation is no longer waiting until the request finishes.

– reduces the number of I/O processes (saves memory and CPU) Recommendation: use both if possible with your application

6

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Linux 2.6 disk I/O options - results

The combination of direct I/O and async I/O (setall) shows best results when using the Linux file system. Best throughput however was seen with raw I/O and async I/O.

ext2 and ext3 lead to identical throughput

ext2 direct_io ext2

async ext2

setall ext2

setall ext3

raw raw-async

0,00,20,40,60,81,01,21,41,61,8

Oracle 10g R1 - I/O options

norm

aliz

ed th

roug

hput

7

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Linux 2.6 I/O schedulers

Four different I/O schedulers are available–noop scheduler

only request merging–deadline scheduler

avoids read request starvation –Anticipatory (as) scheduler

designed for the usage with physical disks, not intended for storage subsystems

–complete fair queuing (cfq) scheduler all users of a particular drive would be able to execute about the same number of I/O requests over a given time.

deadline is the default in the current distributions

h05lp39:~ # dmesg | grep schedulerio scheduler noop registeredio scheduler anticipatory registeredio scheduler deadline registered (default)io scheduler cfq registered

8

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Linux 2.6 I/O schedulers - results

“as” scheduler is not a good choice for System z All other schedulers show similar results as the kernel 2.4

scheduling

as noop cfq deadline 2.40,0

0,2

0,4

0,6

0,8

1,0

Informix single server, I/O scheduler

norm

aliz

ed th

roug

hput

9

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Logical volumes

Linear logical volumes allow an easy extension of the file system

Striped logical volumes–Provide the capability to perform simultaneous I/O on different

stripes–Allow load balancing–Are also extendable

Don't use logical volumes for /root or /usr–If the logical volume gets corrupted your system is gone

Logical volumes require more CPU cycles than physical disks–Increases with the number of physical disks used for the logical

volume

10

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

I/O processing characteristics

FICON/ECKD:–1:1 mapping host subchannel:dasd–Serialization of I/Os per subchannel– I/O request queue in Linux–Disk blocks are 4KB–High availability by FICON path groups–Load balancing by FICON path groups and Parallel Access

Volumes FCP/SCSI

–Several I/Os can be issued against a LUN immediately–Queuing in the FICON Express card and/or in the storage server–Additional I/O request queue in Linux–Disk blocks are 512 bytes–High availability by Linux multipathing, type failover–Load balancing by Linux multipathing, type multibus

11

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Configuration 4Gbps disk I/O measurements

4 GbpsFICON/FCP

SWITCHDS8K

4 Gbps FICON Port4 Gbps FCP Port

8 FICON8 FCP

8 FICON8 FCP

2094

12

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Disk I/O performance with 4Gbps links - FICON

Strong throughput increase (average x 1.6) Doubles with sequential read

4 Gbps 2 Gbps0

200

400

600

800

1000

1200

1400

1600Compare FICON 4 Gbps - 2 Gbps

SQ WriteSQ RWriteSQ ReadR ReadR WriteT

hrou

ghpu

t [M

B/s

ec]

13

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Disk I/O performance with 4Gbps links - FCP

Moderate throughput increase Highest increase with sequential read at x 1.25

4 Gbps 2 Gbps0

200

400

600

800

1000

1200

1400

1600

1800

2000Compare FCP 4 Gbps - 2 Gbps

SQ WriteSQ RWriteSQ ReadR ReadR WriteT

hrou

ghpu

t [M

B/s

ec]

14

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Disk I/O performance with 4Gbps links – FICON versus FCP

Throughput for sequential write is similar FCP throughput for random I/O is 40% higher

FICON 4 Gbps FCP 4 Gbps0

200

400

600

800

1000

1200

1400

1600

1800

2000Compare FICON to FCP 4 Gbps

SQ WriteSQ RWriteSQ ReadR ReadR WriteT

hrou

ghpu

t [M

B/s

ec]

15

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Disk I/O performance with 4Gbps links – FICON versus FCP / direct I/O Bypassing the Linux page cache improves throughput for FCP

up to 2x, for FICON up to 1.6x. Read operations are much faster on FCP

FICON 4 Gbps page cacheFCP 4 Gbps page cache

FICON 4 Gbps dioFCP 4 Gbps dio

0

500

1000

1500

2000

2500

3000Compare FICON to FCP 4 Gbps direct I/O

SQ WriteSQ RWriteSQ ReadR ReadR WriteTh

roug

hput

[MB

/sec

]

16

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

(DS8000) disk setup

Don't treat a storage server as a black box, understand its structure

Enable storage pool striping if available Principles apply to other storage vendor products as well You ask for 16 disks and your system administrator gives you

addresses 5100-510F–From a performance perspective this is close to the worst case

So - what's wrong with that?

17

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

DS8000 architecture

structure is complex - disks are connected via two internal FCP switches for higher bandwidth

the DS8000 is still divided into two parts named processor complex or just server - caches are organized per server

one device adapter pair addresses 4 array sites

one array site is build from 8 disks - disks are distributed over the front and rear storage enclosures - have the same color in the chart

one RAID array is defined using one array site

one rank is built using one RAID array ranks are assigned to an extent pool extent pools are assigned to one of the

servers - this assigns also the caches

- one disk range resides in one extent pool

18

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Rules for selecting disks

Goal is to get a balanced load on all paths and physical disks

Use as many paths as possible (CHPID -> host adapter)–ECKD switching of the paths is done automatically–FCP needs a fixed relation between disk and path

– Establish a fixed mapping between path and rank in our environment– Taking a disk from another rank will then use another path

Switch the rank for each new disk in a logical volume Switch the ranks used between servers and device adapters Select disks from as many ranks as possible! Avoid reusing the same resource (path, server, device

adapter, and disk) as long as possible

19

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Disk setupFICON/ECKD

8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

Server# 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1DA pair# 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3DA# 0 3 4 7 1 2 5 6 0 3 4 7 1 2 5 6Rank# 0 5 8 13 1 4 9 12 2 7 10 15 3 6 11 14Disk# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3132 33 34 35 36 37 38 39 40 41 42 43 44 45 46 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

FCP/SCSIFCP channel# 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7WWPN# 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7Server# 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1DA pair# 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3DA# 0 2 4 6 1 3 5 7 0 2 4 6 1 3 5 7Rank# 16 20 24 28 18 22 26 30 16 20 24 28 18 22 26 31Disk# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3132 33 34 35 36 37 38 39 40 41 42 43 44 45 46 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

Ficon channelsStorageServer ports

20

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General recommendations

FICON/ECKD–Storage pool striped disks (no disk placement)–HyperPAV (SLES11)–Large volume (future distributions)

FCP/SCSI–Linux LV with striping (disk placement)–Multipath with failover

21

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Agenda Disk I/O

– Linux system– File system– I/O options– I/O scheduler– Logical volumes

– Mainframe– FICON/ECKD and FCP/SCSI specialties– Channels

–Storage server– Disk configuration

Network–General network performance–Setup recommendations–Network options comparison

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (1) Which connectivity to use for best performance:

–External connectivity:– LPAR: 10 GbE cards– z/VM: VSWITCH with 10GbE card(s) attached– z/VM: For maximum throughput and minimal CPU utilization attach OSA

directly to Linux guest–Internal connectivity:

– LPAR: HiperSockets for LPAR-LPAR communication – z/VM: VSWITCH for guest-guest communication

For the highly utilized network –use z/VM VSWITCH with link aggregation–use channel bonding

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (2) If network performance problems are observed, the buffer

count can be increased up to 128–The default of 16 buffers limits the throughput of a HiperSockets connection

with 10 parallel sessions.–Check current buffer count with lsqeth -p command–A buffer count of 128 leads to 8MB memory consumption.

– One buffer consists of 16x4KB pages which yields 64KB, so 128x64KB=8MB. Set the inbound buffer count in the appropriate config file:

–SUSE SLES10: in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 add QETH_OPTIONS="buffer_count=128"

–SUSE SLES11: in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{buffer_count}="128"

–Red Hat: in /etc/sysconfig/network-scripts/ifcfg-eth0 add OPTIONS="buffer_count=128"

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (3) Consider to switch on priority queueing if an OSA Express adapter in

QDIO mode is shared amongst several LPARs. How to achieve:–SUSE SLES10:

in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 add QETH_OPTIONS="priority_queueing=no_prio_queueing:0"

–SUSE SLES11: in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{priority_queueing}="no_prio_queueing:0"

–Red Hat: in /etc/sysconfig/network-scripts/ifcfg-eth0 add OPTIONS="priority_queueing=no_prio_queueing:0"

Note: Priority queueing on one LPAR may impact the performance on all other LPARs sharing the same OSA card.

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (4)

Choose your MTU size carefully. Set it to the maximum size supported by all hops on the path to the final destination to avoid fragmentation. –Use tracepath <destination> to check MTU size–If the application sends in chunks of <=1400 bytes, use MTU 1492.

– 1400 Bytes user data plus protocol overhead.–If the application is able to send bigger chunks, use MTU 8992.

– Sending packets > 1400 with MTU 8992 will increase throughput and save CPU cycles.

TCP uses the MTU size for the window size calculation, not the actual application send size.

For VSWITCH, MTU 8992 is recommended.–Synchronous operation, SIGA required for every packet.–No packing like normal OSA cards.

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (5) Following system wide sysctl settings can be changed temporarily by the

sysctl command or permanently in the appropriate config file:/etc/sysctl.conf

Set the device queue length from the default of 1000 to at least 2500 using sysctl:sysctl -w net.core.netdev_max_backlog = 2500

Adapt the inbound and outbound window size to suit the workload–Recommended values for OSA devices:

sysctl -w net.ipv4.tcp_wmem="4096 16384 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760

–System wide window size applies to all network devices– Applications can use setsockopt to adjust the window size– Has no impact on other network devices

–More than ten parallel sessions benefit from recommended default and maximum window sizes.

–A bigger window size can be advantageous for up to five parallel sessions.

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (6)

By default, numerous daemons are started that might be unnecessary.–Check whether selinux and auditing are required for your business. If

not, switch them off. dmesg | grep -i "selinux"chkconfig --list | grep -i "auditd"In /etc/zipl.confparameters="audit_enable=0 audit=0 audit_debug=0selinux=0 ..."chkconfig --set auditd off

–Check which daemons are runningchkconfig --list

–Examples of daemons which are not required under Linux on System z:– alsasound– avahi-daemon– resmgr– postfix

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

General network performance (7)

A controlled environment without an external network connection (“network in a box”) usually doesn't require a firewall.–The Linux firewall (iptables) serializes the network traffic because

it can utilize just one CPU.– If there is no business need, switch off the firewall (iptables) which

is enabled by default –For example in SLES distributions

chkconfig –set SuSEfirewall2_init offchkconfig -set SuSEfirewall2_setup offchkconfig –set iptables off

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking - HiperSockets Linux to z/OS, recommendations for z/OS and Linux (1)

Frame size and MTU size are determined by the chparm parameter of the IOCDS–MTU size = frame size – 8KB

Select the MTU size to suit the workload. If the application is mostly sending packets < 8KB, an MTU size of 8KB is sufficient.

If the application is capable of sending big packets, a larger MTU size will increase throughput and save CPU cycles.

MTU size 56KB is recommended only for streaming workloads with packets > 32KB.

fram e s ize M TU00 16KB 8K B40 24KB 16KB80 40KB 32KBC0 64KB 56KB

chparm

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking - HiperSockets Linux to z/OS, recommendations for z/OS and Linux (2)

HiperSockets doesn't require checksumming because it is a memory-to-memory operation.–Default is sw_checksumming.–To save CPU cycles switch checksumming off:

SUSE SLES10: in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 add QETH_OPTIONS="checksumming=no_checksumming"

SUSE SLES11: in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{checksumming}="no_checksumming"

Red Hat RHEL5 : in /etc/sysconfig/network-scripts/ifcfg-eth0 add OPTIONS="checksumming=no_checksumming"

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking - HiperSockets Linux to z/OS, recommendations for z/OS

Always set:TCPCONFIG DELAYACKS

If MTU size is 8KB or 16KB set:TCPCONFIG TCPRCVBUFRSIZE 32768TCPCONFIG TCPSENDBFRSIZE 32768

If MTU size is 32KB set:TCPCONFIG TCPRCVBUFRSIZE 65536TCPCONFIG TCPSENDBFRSIZE 65536

If MTU size is 56KB set:TCPCONFIG TCPRCVBUFRSIZE 131072TCPCONFIG TCPSENDBFRSIZE 131072

For HiperSockets MTU 56KB, the CSM fixed storage limit is important. The default is currently 120 MB and should be adjusted to 250MB if MSGIVT5592I is observed

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking - HiperSockets Linux to z/OS,recommendations for Linux (1)

The setting of rmem / wmem under Linux on System z determines the minimum, default and maximum window size which has the same meaning as buffer size under z/OS.

Linux window size settings are system wide and apply to all network devices.–Applications can use setsockopt to adjust the window size individually.

– Has no impact on other network devices –HiperSockets and OSA devices have contradictory demands

– >10 parallel OSA sessions suffer from a send window size > 32KB– The suggested default send/receive window size for HiperSockets 8KB and

16KB are also adequate for OSA devices.

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking - HiperSockets Linux to z/OS, recommendations for Linux (2) As a rule of thumb the default send window size should be

twice the MTU size, e.g.:–MTU 8KB

sysctl -w net.ipv4.tcp_wmem="4096 16384 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760

–MTU 16KBsysctl -w net.ipv4.tcp_wmem="4096 32768 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760

–MTU 32KBsysctl -w net.ipv4.tcp_wmem="4096 65536 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760

–MTU 56KBsysctl -w net.ipv4.tcp_wmem="4096 131072 131072 sysctl -w net.ipv4.tcp_rmem="4096 131072 174760

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking - Linux to z/OS, SAP recommendations

The SAP Enqueue Server requires a default send window size of 4 * MTU size.

HiperSockets–SAP networking is a transactional type of workload with a packet

size < 8KB. HiperSockets MTU 8KB is sufficient.sysctl -w net.ipv4.tcp_wmem="4096 32768 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760

OSA–If the SAP Enqueue Server connection is run over an OSA adapter,

the MTU size should be 8192. If MTU 8992 is used, the default send window size must be adjusted to 4 * 8992.

35

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Network – benchmark description

AWM - IBM Benchmark which simulates network workload – All our tests are running with 10 simultaneous connections– Transactional Workloads – 2 types

– RR – request/response– A connection to the server is opened once for a 5 minute time frame

– CRR – connect/request/response– A connection is opened and closed for every request/response

– RR 200/1000 (Send 200 bytes from client to server and get 1000 bytes response)– Simulating online transactions

– RR 200/32k (Send 200 bytes from client to server and get 32kb bytes response)– Simulating website access

– CRR 64/8k (Send 64 bytes from client to server and get 8kb bytes response)– Simulating database query

–Streaming workloads – 2 types– STRP - "stream put" (Send 20m bytes to the server and get 20 bytes response)– STRG - "stream get" (Send 20 bytes to the server and get 20m bytes response)

– Simulating large file transfers

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

VSWITCH OSA Express2 10Gbps mtu 1492

VSWITCH OSA Express2 10Gbps mtu 8992

OSA Express2 10Gbps mtu 1492

OSA Express2 10Gbps mtu 8992

VSWITCH virtual mtu 1492

VSWITCH virtual mtu 8992

HiperSockets mtu 32K

Network throughputdatabase query (crr64x8k)

Database query SLES10 SP2 / z10

64 byte request

8k response

50 connections

x axis is #trans

larger is better

Choosing Device

→ Use large MTU size

→ HiperSockets LP/LP

→ VSWITCH inside VM

→ 10 GbE / OsaEx3

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

VSWITCH OSA Express2 10Gbps mtu 1492

VSWITCH OSA Express2 10Gbps mtu 8992

OSA Express2 10Gbps mtu 1492

OSA Express2 10Gbps mtu 8992

VSWITCH virtual mtu 1492

VSWITCH virtual mtu 8992

HiperSockets mtu 32K

Processor consumption per transactiondatabase query (crr64x8k)

Database query SLES10 SP2 / z10

64 byte request

8k response

50 connections

x axis is util. per trans.

smaller is better

Choosing Device

→ Use large MTU size

→ HiperSockets LP/LP

→ VSWITCH inside VM

→ outgoing VSWITCH costs

→ 10 GbE / OsaE3

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Networking throughput overview (SLES10)

1.5x equal

1.2x 1.2x

Website access (crr64x8k)

Online transaction

(rr200x1000)

Database query

(rr200x32k)

File transfer (strp, strg 20Mx20)

Advantage of large MTU size

over default MTU size

1.4x (1GbE), 2.3x (10GbE)

3.2x (only 10GbE)

Advantage of 10GbE over

1GbE

2.1x (large MTU )

3.4x (large MTU)

Advantage of virtual networks

over OSA

2.5x (1GbE) 2.0x (10GbE)

2.8x (1GbE) 2.2x (10GbE)

3.7x (1GbE), 1.8x (10GbE)

4.8x (1GbE), 1.4x (10GbE)

Fastest connection

HiperSockets LPAR

HiperSockets LPAR

HiperSockets LPAR

HiperSockets LPAR

39

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Visit us !

Linux on System z: Tuning Hints & Tipshttp://www.ibm.com/developerworks/linux/linux390/perf/

Linux-VM Performance Website:http://www.vm.ibm.com/perf/tips/linuxper.html

IBM Redbookshttp://www.redbooks.ibm.com/

IBM Techdocshttp://www.ibm.com/support/techdocs/atsmastr.nsf/Web/Techdocs

40

IBM Linux and Technology Center

Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation

Questions