linux on system z - performance update part 2 of 2
TRANSCRIPT
IBM Linux and Technology Center
© 2009 IBM Corporation
Linux on System z -Performance UpdatePart 2 of 2
Mario HeldIBM Lab Boeblingen, Germany
2
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
TrademarksThe following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
The following are trademarks or registered trademarks of other companies.
* All other products may be trademarks or registered trademarks of their respective companies.
Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:
*, AS/400®, e business(logo)®, DBE, ESCO, eServer, FICON, IBM®, IBM (logo)®, iSeries®, MVS, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA, WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market.
Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.
3
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Agenda Disk I/O
– Linux system– File system– I/O options– I/O scheduler– Logical volumes
– Mainframe– FICON/ECKD and FCP/SCSI specialties– Channels
–Storage server– Disk configuration
Network–General network performance–Setup recommendations–Network options comparison
4
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Linux file system
Use ext3 instead of reiserfs Tune your ext3 file system
–Select the appropriate journaling mode (journal, ordered, writeback)
–Consider to turn off atime–Enabling directory indexing
Temporary files–Don't place them on journaling file systems–Consider a ram disk instead of a disk device
If possible, use direct I/O to avoid double buffering of files in memory
If possible, use async I/O to continue program execution while data is fetched–Important for read operations–Applications usually are not depending on write completions
5
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Linux 2.6 disk I/O options
Direct I/O (DIO)–Transfer the data directly from the application buffers to the device
driver, avoids copying the data to the page cache–Advantages:
– Saves page cache memory and avoids caching the same data twice– Allows larger buffer pools for databases
–Disadvantage:– Make sure that no utility is working through the file system (page cache)
--> danger of data corruption Asynchronous I/O (AIO)
–Application is not blocked for the time of the I/O operation–Resumes its processing and gets notified when the I/O is completed.–Advantage
– the issuer of a read/write operation is no longer waiting until the request finishes.
– reduces the number of I/O processes (saves memory and CPU) Recommendation: use both if possible with your application
6
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Linux 2.6 disk I/O options - results
The combination of direct I/O and async I/O (setall) shows best results when using the Linux file system. Best throughput however was seen with raw I/O and async I/O.
ext2 and ext3 lead to identical throughput
ext2 direct_io ext2
async ext2
setall ext2
setall ext3
raw raw-async
0,00,20,40,60,81,01,21,41,61,8
Oracle 10g R1 - I/O options
norm
aliz
ed th
roug
hput
7
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Linux 2.6 I/O schedulers
Four different I/O schedulers are available–noop scheduler
only request merging–deadline scheduler
avoids read request starvation –Anticipatory (as) scheduler
designed for the usage with physical disks, not intended for storage subsystems
–complete fair queuing (cfq) scheduler all users of a particular drive would be able to execute about the same number of I/O requests over a given time.
deadline is the default in the current distributions
h05lp39:~ # dmesg | grep schedulerio scheduler noop registeredio scheduler anticipatory registeredio scheduler deadline registered (default)io scheduler cfq registered
8
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Linux 2.6 I/O schedulers - results
“as” scheduler is not a good choice for System z All other schedulers show similar results as the kernel 2.4
scheduling
as noop cfq deadline 2.40,0
0,2
0,4
0,6
0,8
1,0
Informix single server, I/O scheduler
norm
aliz
ed th
roug
hput
9
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Logical volumes
Linear logical volumes allow an easy extension of the file system
Striped logical volumes–Provide the capability to perform simultaneous I/O on different
stripes–Allow load balancing–Are also extendable
Don't use logical volumes for /root or /usr–If the logical volume gets corrupted your system is gone
Logical volumes require more CPU cycles than physical disks–Increases with the number of physical disks used for the logical
volume
10
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
I/O processing characteristics
FICON/ECKD:–1:1 mapping host subchannel:dasd–Serialization of I/Os per subchannel– I/O request queue in Linux–Disk blocks are 4KB–High availability by FICON path groups–Load balancing by FICON path groups and Parallel Access
Volumes FCP/SCSI
–Several I/Os can be issued against a LUN immediately–Queuing in the FICON Express card and/or in the storage server–Additional I/O request queue in Linux–Disk blocks are 512 bytes–High availability by Linux multipathing, type failover–Load balancing by Linux multipathing, type multibus
11
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Configuration 4Gbps disk I/O measurements
4 GbpsFICON/FCP
SWITCHDS8K
4 Gbps FICON Port4 Gbps FCP Port
8 FICON8 FCP
8 FICON8 FCP
2094
12
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Disk I/O performance with 4Gbps links - FICON
Strong throughput increase (average x 1.6) Doubles with sequential read
4 Gbps 2 Gbps0
200
400
600
800
1000
1200
1400
1600Compare FICON 4 Gbps - 2 Gbps
SQ WriteSQ RWriteSQ ReadR ReadR WriteT
hrou
ghpu
t [M
B/s
ec]
13
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Disk I/O performance with 4Gbps links - FCP
Moderate throughput increase Highest increase with sequential read at x 1.25
4 Gbps 2 Gbps0
200
400
600
800
1000
1200
1400
1600
1800
2000Compare FCP 4 Gbps - 2 Gbps
SQ WriteSQ RWriteSQ ReadR ReadR WriteT
hrou
ghpu
t [M
B/s
ec]
14
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Disk I/O performance with 4Gbps links – FICON versus FCP
Throughput for sequential write is similar FCP throughput for random I/O is 40% higher
FICON 4 Gbps FCP 4 Gbps0
200
400
600
800
1000
1200
1400
1600
1800
2000Compare FICON to FCP 4 Gbps
SQ WriteSQ RWriteSQ ReadR ReadR WriteT
hrou
ghpu
t [M
B/s
ec]
15
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Disk I/O performance with 4Gbps links – FICON versus FCP / direct I/O Bypassing the Linux page cache improves throughput for FCP
up to 2x, for FICON up to 1.6x. Read operations are much faster on FCP
FICON 4 Gbps page cacheFCP 4 Gbps page cache
FICON 4 Gbps dioFCP 4 Gbps dio
0
500
1000
1500
2000
2500
3000Compare FICON to FCP 4 Gbps direct I/O
SQ WriteSQ RWriteSQ ReadR ReadR WriteTh
roug
hput
[MB
/sec
]
16
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
(DS8000) disk setup
Don't treat a storage server as a black box, understand its structure
Enable storage pool striping if available Principles apply to other storage vendor products as well You ask for 16 disks and your system administrator gives you
addresses 5100-510F–From a performance perspective this is close to the worst case
So - what's wrong with that?
17
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
DS8000 architecture
structure is complex - disks are connected via two internal FCP switches for higher bandwidth
the DS8000 is still divided into two parts named processor complex or just server - caches are organized per server
one device adapter pair addresses 4 array sites
one array site is build from 8 disks - disks are distributed over the front and rear storage enclosures - have the same color in the chart
one RAID array is defined using one array site
one rank is built using one RAID array ranks are assigned to an extent pool extent pools are assigned to one of the
servers - this assigns also the caches
- one disk range resides in one extent pool
18
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Rules for selecting disks
Goal is to get a balanced load on all paths and physical disks
Use as many paths as possible (CHPID -> host adapter)–ECKD switching of the paths is done automatically–FCP needs a fixed relation between disk and path
– Establish a fixed mapping between path and rank in our environment– Taking a disk from another rank will then use another path
Switch the rank for each new disk in a logical volume Switch the ranks used between servers and device adapters Select disks from as many ranks as possible! Avoid reusing the same resource (path, server, device
adapter, and disk) as long as possible
19
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Disk setupFICON/ECKD
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Server# 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1DA pair# 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3DA# 0 3 4 7 1 2 5 6 0 3 4 7 1 2 5 6Rank# 0 5 8 13 1 4 9 12 2 7 10 15 3 6 11 14Disk# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3132 33 34 35 36 37 38 39 40 41 42 43 44 45 46 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
FCP/SCSIFCP channel# 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7WWPN# 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7Server# 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1DA pair# 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3DA# 0 2 4 6 1 3 5 7 0 2 4 6 1 3 5 7Rank# 16 20 24 28 18 22 26 30 16 20 24 28 18 22 26 31Disk# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3132 33 34 35 36 37 38 39 40 41 42 43 44 45 46 4748 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Ficon channelsStorageServer ports
20
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General recommendations
FICON/ECKD–Storage pool striped disks (no disk placement)–HyperPAV (SLES11)–Large volume (future distributions)
FCP/SCSI–Linux LV with striping (disk placement)–Multipath with failover
21
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Agenda Disk I/O
– Linux system– File system– I/O options– I/O scheduler– Logical volumes
– Mainframe– FICON/ECKD and FCP/SCSI specialties– Channels
–Storage server– Disk configuration
Network–General network performance–Setup recommendations–Network options comparison
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (1) Which connectivity to use for best performance:
–External connectivity:– LPAR: 10 GbE cards– z/VM: VSWITCH with 10GbE card(s) attached– z/VM: For maximum throughput and minimal CPU utilization attach OSA
directly to Linux guest–Internal connectivity:
– LPAR: HiperSockets for LPAR-LPAR communication – z/VM: VSWITCH for guest-guest communication
For the highly utilized network –use z/VM VSWITCH with link aggregation–use channel bonding
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (2) If network performance problems are observed, the buffer
count can be increased up to 128–The default of 16 buffers limits the throughput of a HiperSockets connection
with 10 parallel sessions.–Check current buffer count with lsqeth -p command–A buffer count of 128 leads to 8MB memory consumption.
– One buffer consists of 16x4KB pages which yields 64KB, so 128x64KB=8MB. Set the inbound buffer count in the appropriate config file:
–SUSE SLES10: in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 add QETH_OPTIONS="buffer_count=128"
–SUSE SLES11: in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{buffer_count}="128"
–Red Hat: in /etc/sysconfig/network-scripts/ifcfg-eth0 add OPTIONS="buffer_count=128"
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (3) Consider to switch on priority queueing if an OSA Express adapter in
QDIO mode is shared amongst several LPARs. How to achieve:–SUSE SLES10:
in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 add QETH_OPTIONS="priority_queueing=no_prio_queueing:0"
–SUSE SLES11: in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{priority_queueing}="no_prio_queueing:0"
–Red Hat: in /etc/sysconfig/network-scripts/ifcfg-eth0 add OPTIONS="priority_queueing=no_prio_queueing:0"
Note: Priority queueing on one LPAR may impact the performance on all other LPARs sharing the same OSA card.
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (4)
Choose your MTU size carefully. Set it to the maximum size supported by all hops on the path to the final destination to avoid fragmentation. –Use tracepath <destination> to check MTU size–If the application sends in chunks of <=1400 bytes, use MTU 1492.
– 1400 Bytes user data plus protocol overhead.–If the application is able to send bigger chunks, use MTU 8992.
– Sending packets > 1400 with MTU 8992 will increase throughput and save CPU cycles.
TCP uses the MTU size for the window size calculation, not the actual application send size.
For VSWITCH, MTU 8992 is recommended.–Synchronous operation, SIGA required for every packet.–No packing like normal OSA cards.
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (5) Following system wide sysctl settings can be changed temporarily by the
sysctl command or permanently in the appropriate config file:/etc/sysctl.conf
Set the device queue length from the default of 1000 to at least 2500 using sysctl:sysctl -w net.core.netdev_max_backlog = 2500
Adapt the inbound and outbound window size to suit the workload–Recommended values for OSA devices:
sysctl -w net.ipv4.tcp_wmem="4096 16384 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760
–System wide window size applies to all network devices– Applications can use setsockopt to adjust the window size– Has no impact on other network devices
–More than ten parallel sessions benefit from recommended default and maximum window sizes.
–A bigger window size can be advantageous for up to five parallel sessions.
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (6)
By default, numerous daemons are started that might be unnecessary.–Check whether selinux and auditing are required for your business. If
not, switch them off. dmesg | grep -i "selinux"chkconfig --list | grep -i "auditd"In /etc/zipl.confparameters="audit_enable=0 audit=0 audit_debug=0selinux=0 ..."chkconfig --set auditd off
–Check which daemons are runningchkconfig --list
–Examples of daemons which are not required under Linux on System z:– alsasound– avahi-daemon– resmgr– postfix
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
General network performance (7)
A controlled environment without an external network connection (“network in a box”) usually doesn't require a firewall.–The Linux firewall (iptables) serializes the network traffic because
it can utilize just one CPU.– If there is no business need, switch off the firewall (iptables) which
is enabled by default –For example in SLES distributions
chkconfig –set SuSEfirewall2_init offchkconfig -set SuSEfirewall2_setup offchkconfig –set iptables off
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking - HiperSockets Linux to z/OS, recommendations for z/OS and Linux (1)
Frame size and MTU size are determined by the chparm parameter of the IOCDS–MTU size = frame size – 8KB
Select the MTU size to suit the workload. If the application is mostly sending packets < 8KB, an MTU size of 8KB is sufficient.
If the application is capable of sending big packets, a larger MTU size will increase throughput and save CPU cycles.
MTU size 56KB is recommended only for streaming workloads with packets > 32KB.
fram e s ize M TU00 16KB 8K B40 24KB 16KB80 40KB 32KBC0 64KB 56KB
chparm
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking - HiperSockets Linux to z/OS, recommendations for z/OS and Linux (2)
HiperSockets doesn't require checksumming because it is a memory-to-memory operation.–Default is sw_checksumming.–To save CPU cycles switch checksumming off:
SUSE SLES10: in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 add QETH_OPTIONS="checksumming=no_checksumming"
SUSE SLES11: in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{checksumming}="no_checksumming"
Red Hat RHEL5 : in /etc/sysconfig/network-scripts/ifcfg-eth0 add OPTIONS="checksumming=no_checksumming"
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking - HiperSockets Linux to z/OS, recommendations for z/OS
Always set:TCPCONFIG DELAYACKS
If MTU size is 8KB or 16KB set:TCPCONFIG TCPRCVBUFRSIZE 32768TCPCONFIG TCPSENDBFRSIZE 32768
If MTU size is 32KB set:TCPCONFIG TCPRCVBUFRSIZE 65536TCPCONFIG TCPSENDBFRSIZE 65536
If MTU size is 56KB set:TCPCONFIG TCPRCVBUFRSIZE 131072TCPCONFIG TCPSENDBFRSIZE 131072
For HiperSockets MTU 56KB, the CSM fixed storage limit is important. The default is currently 120 MB and should be adjusted to 250MB if MSGIVT5592I is observed
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking - HiperSockets Linux to z/OS,recommendations for Linux (1)
The setting of rmem / wmem under Linux on System z determines the minimum, default and maximum window size which has the same meaning as buffer size under z/OS.
Linux window size settings are system wide and apply to all network devices.–Applications can use setsockopt to adjust the window size individually.
– Has no impact on other network devices –HiperSockets and OSA devices have contradictory demands
– >10 parallel OSA sessions suffer from a send window size > 32KB– The suggested default send/receive window size for HiperSockets 8KB and
16KB are also adequate for OSA devices.
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking - HiperSockets Linux to z/OS, recommendations for Linux (2) As a rule of thumb the default send window size should be
twice the MTU size, e.g.:–MTU 8KB
sysctl -w net.ipv4.tcp_wmem="4096 16384 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760
–MTU 16KBsysctl -w net.ipv4.tcp_wmem="4096 32768 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760
–MTU 32KBsysctl -w net.ipv4.tcp_wmem="4096 65536 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760
–MTU 56KBsysctl -w net.ipv4.tcp_wmem="4096 131072 131072 sysctl -w net.ipv4.tcp_rmem="4096 131072 174760
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking - Linux to z/OS, SAP recommendations
The SAP Enqueue Server requires a default send window size of 4 * MTU size.
HiperSockets–SAP networking is a transactional type of workload with a packet
size < 8KB. HiperSockets MTU 8KB is sufficient.sysctl -w net.ipv4.tcp_wmem="4096 32768 131072 sysctl -w net.ipv4.tcp_rmem="4096 87380 174760
OSA–If the SAP Enqueue Server connection is run over an OSA adapter,
the MTU size should be 8192. If MTU 8992 is used, the default send window size must be adjusted to 4 * 8992.
35
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Network – benchmark description
AWM - IBM Benchmark which simulates network workload – All our tests are running with 10 simultaneous connections– Transactional Workloads – 2 types
– RR – request/response– A connection to the server is opened once for a 5 minute time frame
– CRR – connect/request/response– A connection is opened and closed for every request/response
– RR 200/1000 (Send 200 bytes from client to server and get 1000 bytes response)– Simulating online transactions
– RR 200/32k (Send 200 bytes from client to server and get 32kb bytes response)– Simulating website access
– CRR 64/8k (Send 64 bytes from client to server and get 8kb bytes response)– Simulating database query
–Streaming workloads – 2 types– STRP - "stream put" (Send 20m bytes to the server and get 20 bytes response)– STRG - "stream get" (Send 20 bytes to the server and get 20m bytes response)
– Simulating large file transfers
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
VSWITCH OSA Express2 10Gbps mtu 1492
VSWITCH OSA Express2 10Gbps mtu 8992
OSA Express2 10Gbps mtu 1492
OSA Express2 10Gbps mtu 8992
VSWITCH virtual mtu 1492
VSWITCH virtual mtu 8992
HiperSockets mtu 32K
Network throughputdatabase query (crr64x8k)
Database query SLES10 SP2 / z10
64 byte request
8k response
50 connections
x axis is #trans
larger is better
Choosing Device
→ Use large MTU size
→ HiperSockets LP/LP
→ VSWITCH inside VM
→ 10 GbE / OsaEx3
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
VSWITCH OSA Express2 10Gbps mtu 1492
VSWITCH OSA Express2 10Gbps mtu 8992
OSA Express2 10Gbps mtu 1492
OSA Express2 10Gbps mtu 8992
VSWITCH virtual mtu 1492
VSWITCH virtual mtu 8992
HiperSockets mtu 32K
Processor consumption per transactiondatabase query (crr64x8k)
Database query SLES10 SP2 / z10
64 byte request
8k response
50 connections
x axis is util. per trans.
smaller is better
Choosing Device
→ Use large MTU size
→ HiperSockets LP/LP
→ VSWITCH inside VM
→ outgoing VSWITCH costs
→ 10 GbE / OsaE3
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Networking throughput overview (SLES10)
1.5x equal
1.2x 1.2x
Website access (crr64x8k)
Online transaction
(rr200x1000)
Database query
(rr200x32k)
File transfer (strp, strg 20Mx20)
Advantage of large MTU size
over default MTU size
1.4x (1GbE), 2.3x (10GbE)
3.2x (only 10GbE)
Advantage of 10GbE over
1GbE
2.1x (large MTU )
3.4x (large MTU)
Advantage of virtual networks
over OSA
2.5x (1GbE) 2.0x (10GbE)
2.8x (1GbE) 2.2x (10GbE)
3.7x (1GbE), 1.8x (10GbE)
4.8x (1GbE), 1.4x (10GbE)
Fastest connection
HiperSockets LPAR
HiperSockets LPAR
HiperSockets LPAR
HiperSockets LPAR
39
IBM Linux and Technology Center
Linux on System z - Performance Update - Part 2 of 2 © 2009 IBM Corporation
Visit us !
Linux on System z: Tuning Hints & Tipshttp://www.ibm.com/developerworks/linux/linux390/perf/
Linux-VM Performance Website:http://www.vm.ibm.com/perf/tips/linuxper.html
IBM Redbookshttp://www.redbooks.ibm.com/
IBM Techdocshttp://www.ibm.com/support/techdocs/atsmastr.nsf/Web/Techdocs