aix performance updates 010609

101
© 2008 IBM Corporation AIX Performance Updates Tools & Tunables AIX 5.3 TL07, TL08, TL09 AIX 6.1 TL01, TL02 Steve Nasypany [email protected] IBM Advanced Technical Support

Upload: fc68979

Post on 29-Nov-2014

516 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Aix Performance Updates 010609

© 2008 IBM Corporation

AIX Performance UpdatesTools & TunablesAIX 5.3 TL07, TL08, TL09AIX 6.1 TL01, TL02

Steve Nasypany

[email protected]

IBM Advanced Technical Support

Page 2: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Agenda

� SMT POWER5 vs POWER6

� AIX 5 vs AIX 6

– Tunables Framework

– VMM Tunings

� AIX 5.3 Tunables Updates

� Shared Ethernet

� Dedicated Processor Donation

� Virtual Shared Pools

� AIX 5.3 TL-09

– ‘nmon’ in AIX

– Topas VIOS/Adapter/MPIO

– svmon Reports

� POWER6 p575 & 595

Page 3: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation3

Agenda� AIX 6.1 TL01

–Workload Partitions Support• ps, ipcs, netstat, proc*, trace, vmstat, topas, tprof, filemon, netpmon, pprof, curt• Separate presentations available to cover WPAR specifics

–Restricted Tunables

–IO pacing–AIO

–CIO

–NFS biod

–JFS2 nolog–Multiple Page Size Segments - svmon

–iostat/topas - Filesystem and Workload Partition breakdowns (AIX 6)

� AIX 6.1 TL02–topas Memory Pool and Shared Ethernet monitoring

–svmon Reports

–filemon Reports

–mpstat/sar WPAR support–tprof Large Page and Data profiling

Page 4: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

ST vs SMT in Micro partitions

� Dedicated Processor Partitions switch from symmetric multi-threaded mode (SMT) to single-threaded mode (ST) automatically at low multi-programming levels

– On POWER5, Micro Partitions do not switch SMT/ST modes automatically

– Micro Partitions may be configured to run in ST mode through the AIX smtctlcommand

� On POWER5, long-running single-threaded tasks can see their response time elongated in Micro partitions

– Effects of processor folding

– Effects of the secondary (idle) thread creating some interference for processor core resources

� POWER6 has a key technical improvement over POWER5 in multi-threading which dramatically reduces this SMT effect in Micro partitions

– On POWER6 Micro partitions do switch SMT/ST modes automatically

– On POWER6, on each cycle the hardware core may dispatch instructions for both hardware threads

Page 5: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

ST vs SMT in Micro partitions – POWER6 example

� Generally, see perhaps 1% impact from running in SMT mode in Micro partitions on POWER6

� Example code from Northwestern University Minebench 1.0

� Shows the ratio of the test running in a Micro partition in SMT mode / ST mode

SMT/ST elapsed time

0.994475138

0.9 0.95 1 1.05 1.1

ScalParc

Page 6: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

AIX 5.3 vs AIX 6.1 Framework� AIX 6.1 adopts common tunings by default and introduces restricted

tunables– Too many tunables, too much confusion– It just works

• Don’t change restricted tunables without direction from AIX service stream

• Carefully review software vendor specific recommendations. Often, they are just carrying over old/obsolete tunings from previous OS levels.

• Restricted tunables not displayed by default except by -o tunable• Use –F to force view or change

– If you update from AIX 5.3 to AIX 6.1, legacy tunings will be maintained• This is probably bad for any customer who hasn’t adopted memory

tunings used in last few years (lru_file_repage=0, etc)• Changes will be flagged in lastboot.log and errlog files during reboot• If you are using a tunable outside of the norm, and are unsure what to

do, open a PMR and ask– New set of SMIT panels to change restricted parameters

• Existing panels only show non restricted parameters

Page 7: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

AIX 5.3 vs AIX 6.1

� Performance Differences– You should not see significant deltas between AIX 5.3 and AIX 6.1– CPU usage should be no more than a couple of percent either way– Memory footprints may be larger for applications using 64KB pages

• But 64KB page policy is very conservative, specifically to avoid large changes in memory utilization

Page 8: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

� AIX 6.1

– minperm% = 3

– maxperm% = 90

– maxclient% = 90

– strict_maxperm = 0

– strict_maxclient = 1

– lru_file_repage = 0

– page_steal_method = 1

� Tunings on right are universally recommended for AIX 5.3

– And AIX 5.2, but limiting cache to no more than 24 GB

� Set-and-forget, lru_file_repage = 0 protects computational memory, always steal from cache

� No paging to the paging space will occur unless the system memory is over committed (AVM > 97%)

AIX 5 vs 6 VMM Page Replacement tuning

� AIX 5.2/5.3

– minperm% = 20

– maxperm% = 80

– maxclient% = 80

– strict_maxperm = 0

– strict_maxclient = 1

– lru_file_repage = 1

– page_steal_method = 0

Page 9: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

lru_file_repage=0 Issues

� But now my system is ~100% memory usage…– New memory model results in free memory being consumed by cache– AIX does not actively scrub cache, as it is an expensive overhead

• AIX only looks for memory when it needs it– Customers do not know how to assess whether additional workloads can

be added without causing physical paging� There is no trivial method for knowing how much cache is optimal or active

for a given workload– Options on next slide

� If the system is paging to page space with these settings, you are memory bound– First, make sure you don’t have a memory leak– If you have to live with this workload, optimize your paging space

• Add paging spaces, spread them out• Paging spaces of equal sizes

Page 10: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Minimizing/Optimizing Cache with lru_file_repage=0

� Simple– DLPAR memory in as needed when workloads increase and paging occurs– Script filesystems to unmount/remount after workloads have completed, which will clear them

from cache– Use release-behind mechanisms

• Tells VMM data will not be operated on (no cache benefit)• read, write and read+write mount options• You need to know a little bit about your workloads behavior

� More work– Decrease maxclient/maxperm or deallocate memory to benchmark workloads

• Baseline current configurations vmstat ‘fi’ value• Reduce by 5%, allowing the system time to adjust• When the fi value sustains a significant increase, cache is likely constrained• Raise value 5%. Current computational (vmstat ‘avm’ or svmon ‘virtual’) and non-

computational (JFS: numperm, JFS2: numclient) totals should approximate current requirements

– If you have very different workloads, you’ll have to pick which one you want to tune to

� Difficult– Use svmon to identify files in cache, monitor I/O & database information

• svmon –jcS lists/sorts client pages and file information• filemon will give you file activity over short periods

� Punt– Adopt Direct I/O or Concurrent I/O

Page 11: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

List-based LRU page_steal_method=1� Partition memory is broken up into page pools

– A page pool is a set of physical pages of the same size and form a list

– One lrud per memory pool

� When the free list is depleted, lrud scans the list for the typeof pages VMM desires (in buckets of 128K pages)

� Default page_steal_method = 0– Working storage and file pages mixed in one list– lrud scans sequentially to find pages of the right type

� List-based page_steal_method = 1– There are two lists for a page pool, one for working

storage and another for file pages

� The lru_file_repage effects which pages are stolen

– If lru_file_repage = 0, then it will steal from the file list. The higher the computational footprint, the better the scanning efficiency will be.

– If lru_file_repage = 1, then legacy repagingcounters/logic will determine which list is used

� List-based reduces CPU time due to less scanning

� This is NOT a dynamic tunable– Requires a bosboot/reboot to take effect– Is the AIX 6.1 default

Page Pool with page_steal_method = 1

List of w/s

pages

Page

scan for

w/s

List of file

pages

Page

scan for

file

Page 12: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

New Tunables

� psm_timeout_interval = 5000

– Determines the timeout interval, in milliseconds, to wait for page size management daemons to make forward progress before LRU page replacement is started. This setting is only valid on the 64-bit kernel. Default: 5 seconds. Possible values: 0 through 60,000 (1 minute). When page size management is working to increase the number of page frames of a particular page size, LRU page replacement is delayed for that page size for up to this amount of time. On a heavily loaded system, increasing this tunable can give the pagesize management daemons more time to create more page frames before LRU runs.

– Basically, 64 KB page migrations can cause a deadlock between lrud and psmd

– vmo tunable

Page 13: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

New Tunables

� JFS2 Sync Tunables (TL08)

– The file system sync operation can be problematic in situations where there is very heavy random I/O activity to a large file. When a sync occurs all reads and writes from user programs to the file are blocked. With a large number of dirty pages in the file the time required to complete the writes to disk can be large. New JFS2 tunables areprovided to relieve that situation.

Page 14: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

New Tunables

– j2_syncPageCountLimits the number of modified pages that are scheduled to be written by sync in one pass for a file. When this tunable is set, the file system will write the specified number of pages without blocking i/o to the rest of the file. The sync call will iterate on the write operation until all modified pages have been written.Default: 0 (off), Range: 0-65536, Type: Dynamic, Unit: 4KB pages

– j2_syncPageLimitOverrides j2_syncPageCount when a threshold is reached. This is to guarantee that sync will eventually complete for a given file. Not applied if j2_syncPageCount is off.Default: 16, Range: 1-65536, Type: Dynamic, Unit: Numeric

– If application response times impacted by syncd, try j2_syncPageCount settings from 256 to 1024. Smaller values improve short term response times, but still result in larger syncs that impact reponse times over larger intervals.

– These will likely require a lot of experimentation, and detailed analysis of IO– Does not apply to mmap() or shmat() memory files.

Page 15: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

New Tunables

� proc_disk_stats (TL08)

– There is a single process-wide structure that is updated for each I/O

– Structure is protected by a single lock: pv_lock_d

– More threads doing high I/O, the higher the potential for lock contention

• Should be easily visible by using splat lock tool

• Default behavior not changed. Turn off when process scope disk statistics not required

• Encountered in DB2 TPC-C benchmark tests

– schedo tunable

– APAR IZ12059

Page 16: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

New Tunables

� large_receive (TL08)

– Shared Ethernet

– The 10 Gig adapter's LRO ("large receive offload") feature is enabled by default, and this may cause problems for a system configuration where a Shared Ethernet Adapter is bridging traffic for Linux LPARs (which cannot receive packets larger than their MTU).

– SEA will provide its own "large_receive" attribute, defaulted to "no", which will disable the feature in the underlying real adapter to avoid such problems out of the box. The user has the choice to override this and set the SEA's attribute to "yes" to enable the large receive feature in the underlying device (if available), overriding the device's own large_receive attribute setting

– SEA large_receive setting is dynamic as long as the adapter large_receive was enabled at boot. Otherwise adapter has to be recycled to support SEA change.

Page 17: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Shared Ethernet vs HEA on 10Gb� SEA has architectural limits with 10Gb adapters

– POWER5 limited by RIO-G/drawer bandwidth (~3 Gb/s)– POWER6 (1500 MTU)

• Send– large_send off 3 Gb/s– large_send on 8 Gb/s

• Receive– large_receive off 3 Gb/s– large_receive on ? Gb/s (no benchmark data available yet)

– No issues with 1Gb performance, just 10Gb– large_receive setting should allow SEA to be more competitive with HEA, but HEA is expected to

be higher performance

� Always use large_send, regardless of MTU size– HEA will buffer and break up packets automatically

� Use 266 MHz slots for 10Gb adapters as possible in heavy traffic environments

� Any VIOS entitlements must be increased– Need at least 2-3 CPUs to max out a 10Gb card

� Memory cost is ~150MB per LHEA port

� There are APARs in work for network dog-thread optimization issues (would impact customers with small packet sizes and packet counts in the 100K+/sec range). Expected in Q1/2009.

Page 18: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Shared Ethernet Tools

� seastat

– Shared Ethernet statistics, shipped in AIX 5.3 TL08

– Not Nigel’s tool

– CLI script in VIOS 1.5.2.1 executes command

– Device must be enabled for accounting statistics

� nmon 12 supports SEA reports

Page 19: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

seastat

$ seastat -?

Usage: seastat -d <device name> -c

seastat -d <device name> [-n | -s searchtype=value]

$ chdev -dev ent8 -attr accounting=enabled

ent8 changed

$ seastat -d ent8

=============================================================================

Advanced Statistics for SEA

Device Name: ent8

=============================================================================

MAC: A6:3C:00:09:33:04

----------------------

VLAN: None

VLAN Priority: None

Hostname: js22aix.aixncc.uk.ibm.com

IP: 9.69.44.177

Transmit Statistics: Receive Statistics:

-------------------- -------------------

Packets: 8 Packets: 18

Bytes: 646 Bytes: 1103

Page 20: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

New mount option - noatime

� Ingo Molnar (Linux kernel developer) said:– "It's also perhaps the most stupid Unix design idea of all times. Unix is really

nice and well done, but think about this a bit: 'For every file that is read from the disk, lets do a ... write to the disk! And, for every file that is already cached and which we read from the cache ... do a write to the disk!'"

� If you have a lot of file activity, you have to update a lot of timestamps– File timestamps

• File creation (ctime)• File last modified time (mtime)• File last access time (atime)

– New mount option noatime disables last access time updates for JFS2

– File systems with heavy inode access activity due to file opens can have significant performance improvements

� APARs– IZ11282 AIX 5.3– IZ13085 AIX 6.1

Page 21: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation21

Dedicated Processor Donation (TL06 & POWER6)

� The ability of dedicated processor partitions to give unused compute cycles to the shared processor pool

� Using this feature has the effect of making the capacity of the shared pool variable

� Partitions configured in this way only donate cycles to the shared pool when physical processors in the partition are idle

– If the partition becomes > 80% busy under AIX, the partition ceases to donate cycles to the shared pool– Any I/O interrupt will result in the dedicated processor partition being redispatched if it had donated capacity– there is a guaranteee not to get phantom interrupts (interrupts for other partitions)– the partition keeps running on the same physical processors– must be enabled on HMC

� New phyp instrumentation collects– donated cycles

• voluntarily donated by an idle dedicated partition to shared pool– stolen cycles

• cycles stolen by phyp from a dedicated partition to run maintenance tasks (hypervisor)• can happen whether donation is enabled or not (just wasn’t instrumented before)

� Tools metrics impact– processors belonging to donating dedicated partitions are counted in pool size– PURR stops on context switches

• similar to what happens to shared partitions• tools will compensate so that dedicated percentages are still relative to total capacity

� Tools updated– lparstat, mpstat sar, topas and topasout reports

Page 22: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Dedicated Processor Donation – how to enable

Page 23: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Dedicated Processor Donation – where it fits in

� In some cases, dedicated processor partitions are warranted– Licensing or customer concerns …

– The need for extremely low I/O latency (<1 ms)

– The need for memory affinity or usage of RSETs– Scalability problems in applications spread over large numbers of virtual

processors

� Shared Dedicated Capacity allows the benefits of dedicated processor partitions, without locking down all of the capacity of processors in the partition

– Idle cycles can be used by uncapped partitions in the shared pool

� Shared Dedicated Capacity does not help with the footprint problem of requiring the sum of the entitlement of Micro partitions to be less than or equal to the number of processors in the shared pool

– Since Shared Dedicated Capacity donation to the shared pool is opportunistic, based on load

Page 24: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation24

Dedicated Processor Donation - lparstat

$ lparstat -i

Node Name : va01

Partition Name : va

Partition Number : 2

Type : Dedicated-SMT

Mode : Donating

Entitled Capacity : 1.00

Partition Group-ID : 32770

Shared Pool ID : -

Online Virtual CPUs : 1

Maximum Virtual CPUs : 1

Minimum Virtual CPUs : 1

Online Memory : 800 MB

Maximum Memory : 1024 MB

Minimum Memory : 128 MB

Variable Capacity Weight : -

Minimum Capacity : 1.00

Maximum Capacity : 1.00

Capacity Increment : 1.00

Maximum Physical CPUs in system : 4

Active Physical CPUs in system : 4

Active CPUs in Pool : -

# lparstat 1 3

System configuration: type=Dedicated mode=Donating smt=On lcpu=2 mem=800

%user %sys %wait %idle physc vcsw

---- ---- ---- ----- ---- -------

0.1 0.4 0.0 99.5 0.68 670234

0.0 0.2 0.0 99.8 0.68 670234

0.0 0.2 0.0 99.8 0.68 670234

shows actual physical processor consumption:

number of physical processors minus donated and stolen cycles

donation causes hardware context switches

Stay relative to partition capacity.

In this case one processor

Page 25: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation25

%idon, %bdon: percentages of idle and busy times donated

%istol, %bstol: percentages of idle and busy times stolen

Dedicated Processor Donation - lparstat details

� New -d flags shows more details

� Example with donation enabled

# lparstat –dSystem configuration: type=Dedicated mode=Donating smt=On lcpu=2 mem=800

%user %sys %wait %idle %idon %bdon %istol %bst ol----- ---- ----- ----- ------ ----- ----- ------

0.1 0.2 2.1 97.7 12.79 6.8 4.8 2.75

� Example without donation and in combination with -h

# lparstat -dhSystem configuration: type=Dedicated mode=Capped smt=On lcpu=2 mem=800

%user %sys %wait %idle %hypv hcalls %istol %bst ol----- ---- ----- ----- ----- ------ ------ ------

0.1 0.2 2.1 97.7 0.0 391 4.8 2.75

Page 26: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation26

Dedicated Processor Donation - sar and mpstat

� sar

– automatically displays phyc when donation is enabled

� mpstat

– automaticaly displays pc and lcs if donation is enabled

– new -h option to show more details on hypervisor related statistics

• donation enabledSystem configuration: lcpu=2 mode= Donating

cpu pc ilcs vlcs idon bdon istol bstol

0 0.3 50327 687231635 10.2 4.5 0.59 0.32

1 0.5 61702 684989764 10.2 4.5 0.59 0.32

ALL 0.8 112029 1372221399 20.4 9.0 1.18 0.64

• donation disabledSystem configuration: lcpu=2 mode= Capped

cpu pc ilcs vlcs istol bstol

0 0.3 503727 687231635 0.59 0.32

1 0.41 61702 684989764 0.59 0.32

ALL 0.71 565429 1372221399 1.18 0.64

• shared partitionSystem configuration: lcpu=2 ent=0.5 mode= Uncapped

cpu pc ilcs vlcs

0 0.6 503727 687231635

1 0.6 61702 684989764

ALL 0.8 565429 1372221399

idon, bdon: percentages of idle and busy times donated

istol, bstol: percentages of idle and busy times stolen

Page 27: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation27

Dedicated Processor Donation - topas -L

Interval: 2 Logical Partition: Fri Sep 2209:01:46 2006

Donating SMT ON Online Memory: 3200.0

Partition CPU Utilization Online Virtual CPUs: 1 Online Logical CPUs: 2

%user %sys %wait %idle %hypv hcalls %istl %bstl %idon %bdon vcsw

1 1 0 98 1 200 0 2.1 3.5 10.0 1.0

Page 28: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation28

Dedicated Processor Donation - topas -C� Example of topasout report for CEC recording

Report: Topas CEC Detailed --- hostname: ptoolsl1 version: 1.2

Start:02/09/06 06.30.00 Stop:02/09/06 07.30.00 In t:60 Min Range: 600 Min

Partition Info Memory (GB) Processors Avail Pool: 1.3

Monitored : 8 Monitored : 0.0 Monitored : 7 Shr Physical Busy: 2.2

UnMonitored: - UnMonitored: 0.0 UnMonitored: 0 Ded Physical Busy: 0.4

Shared : 6 Available :32.0 Available : 7 Donated Physical CPUs:0.7

Uncapped : 1 UnAllocated: - UnAllocated: 1 Stolen Pysical CPUs: 0.1

Capped : 7 Consumed : 8.7 Shared : 4 Hypervisor

Dedicated : 2 Dedicated : 3 Virt. Context Switch:332

Donating : 2 Donated : 1 Phantom Interrupts : 2

Pool Size : 2

Host OS M Mem InU Lp Us Sy Wa Id PhysB V csw Ent %EntC PhI

--------------------------------shared------------- -----------------------------

ptools1 A53 u 1.1 0.4 1 15 3 0 82 1.30 200 0.50 22.0 5

ptools5 A53 U 12 10 2 12 3 0 85 0.20 121 0.25 0.3 3

ptools3 A53 C 5.0 2.6 2 10 1 0 89 0.15 52 0.25 0.3 2

ptools7 A53 c 2.0 0.4 1 0 1 0 99 0.05 2 0.10 0.3 2

Host OS M Mem InU Lp Us Sy Wa Id PhysB Vcsw %istl %bstl %bdon %idon

------------------------------dedicated------------ -----------------------------

ptools4 A53 D 0.6 0.3 2 12 3 0 85 0.60 110 1 2 0 5

ptools6 A52 d 1.1 0.1 1 11 7 0 82 0.50 50 10 5 10 0

ptools8 A52 1.1 0.1 1 11 7 0 82 0.50 5 0 1 - -

ptools2 A52 1.1 0.1 1 11 7 0 82 0.50 4 0 2 - -

Time: 07.30.00 -----------------------------------------------------------------

donating partitions

donated processors

stolen cycles

donated cycles

Page 29: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

# iostat –D 10

hdisk1 xfer: %tm_act bps tps bread bwrtn

87.7 62.5M 272.3 62.5M 823.7

read: rps avgserv minserv maxserv timeouts fails

271.8 9.0 0.2 168.6 0 0

write: wps avgserv minserv maxserv timeou ts fails

0.5 4.0 1.9 10.4 0 0

queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull

1.1 0.0 14.1 0.2 1.2 2374

Virtual adapter’s extended throughput report (-D)

Metrics related to transfers (xfer:)tps Indicates the number of transfers per second issued to the adapter.recv The total number of responses received from the hosting server to this adapter.sent The total number of requests sent from this adapter to the hosting server.partition id The partition ID of the hosting server, which serves the requests sent by this adapter.

Adapter Read/Write Service Metrics (read:)avgserv Indicates the average time. Default is in milliseconds.minserv Indicates the minimum time. Default is in milliseconds.maxserv Indicates the maximum time. Default is in milliseconds.

Adapter Wait Queue Metrics (wait:)avgtime Indicates the average time spent in wait queue. Default is in milliseconds.mintime Indicates the minimum time spent in wait queue. Default is in milliseconds.maxtime Indicates the maximum time spent in wait queue. Default is in milliseconds.avgwqsz Indicates the average wait queue size.qvgsqsz Indicates the average service queue size – Waiting to be sent to the disk.sqfull Indicates the number of times the service queue becomes full.

Can’t exceed queue_depth for the disk

If this is often > 0, then

increase queue_depth

I/O Monitoring with iostat – Service Times (Review)

Earlier AIX 5.3 levels may report sqfull as a delta, but APARs fixes convert to rate, so values will be much smaller

Default format hard to read with many hdisks. Use –l option for wide output

Service Time Goals

Reads < 20 msecs

Writes

with cache < 2 msecs

w/o cache < 10 msecs

Page 30: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation30

iostat tape support (TL-07)� Uses existing dkstat structures to store metrics

–same as disk devices

–includes support for service time monitoring

–but there is no queuing, so no wait metrics

� Initially only ATAPE devices are going to be supported

� Detailed output example (-p for tapes)

# iostat –Dp 1 1

System configuration: lcpu=1 tapes=1 drives=1 paths=2 vdisks=0

Rmt0 xfer: %tm_act bps tps bread bwrtn

1.0 5.8K 1.4 799.0 5.0K

Read: rps avgserv minserv maxserv timeouts fails

0.1 6.6 0.1 53.8 0 0

write: wps avgserv minserv maxserv timeouts fails

1.3 8.2 0.9 113.7 0 0

Page 31: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Virtual Shared Processor Pools (POWER6 & TL07)

� Description– Allows user to set capacity limits on groups of LPAR’s– A shared processor pool has two settings

• Maximum capacity – limit on total capacity LPAR’s in pool can consume• Reserved entitled capacity – reserved uncapped entitled capacity

– Primary motivation is reduced licensing costs• Uncapped partitions can be capped to virtual pool’s limit rather than total

number of physical processors in pool

� Configuration– Up to 64 pools are supported– Pool 0 is default pool

• Pool 0 is equivalent to the physical shared processor pool– All attributes of a pool can be changed dynamically– LPAR’s can be re-assigned to different pools dynamically

Page 32: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Virtual Shared Processor Pools

n2

AIX

DB2

n3

Linux

987654321111

Physical Shared Pool (9 processor cores)

n1

i5/OS

n6

Uncapped

Linux

WAS

VP = 4

Ent. = 2.00

n5

Uncapped

AIX

DB2

VP = 4

Ent. = 1.7

n8

Uncapped

AIX

WAS

VP = 3

Ent. = 1.00

n7

Uncapped

i5/OS

WAS

VP = 7

Ent. = 2.00

n4

Uncapped

AIX

DB2

VP = 4

Ent. = 1.80

POWER6 Multiple shared pools:

• Can reduce the number of software licenses by putting a limit on the amount of processors an uncapped partition can use

• Up to 64 shared pools

Virtual Shared pool #1 Max Cap: 5 processors

Virtual Shared pool #2 Max Cap: 6 processors

DB2 cores to license:• 1 from dedicated partition n2• 5 from pool 1= 6

WebSphere cores to license:• 6 from pool 2= 6

Server with 12 processor cores

Page 33: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Virtual Shared Processor Pools

� Hardware Requirements

– POWER6 or later

– HMC-managed

• Virtual shared processor pools are not supported with IVM

� Software Requirements

– eFW 3.2 or later

– AIX 5.3 TL07 or later

– AIX 6.1 or later

Page 34: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Enable Monitoring of the shared pool usage

� Surprisingly, many customer do not seem to be prepared for monitoring the shared pool

� Make sure at least one partition on the CEC can do pool monitoring!

� Required for lparstat to see free pool resources, but topas gets around this because it can collect data from remote agents and calculate itself

Page 35: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation35

Multiple shared pools (topas –C)� New pool section

– Turned on by using “p” on any topas CEC panel

• Short, long and no header options

–Cursor and “f” key trigger focus on single pool

• Lists shared partitions using that virtual pool

pool psize ent maxc physb app mem inu

1 8 6.5 12.0 4.8 3.2 128 80.5

2 8 5.0 8.0 2.1 5.9 64 55.3

Host OS M Mem InU Lp Us Sy Wa Id PhysB Vcsw Ent %EntC PhI

-------------------------------------shared-------------------------------

ptoolsl1 53 U 3.1 1.9 4 1 2 0 96 0.01 398 0.2 0 5.3 0

Host OS M Mem InU Lp Us Sy Wa Id PhysB Vcsw %istl %bstl %bdon %idon

------------------------------------dedicated-----------------------------

ptools1 61 D 3.1 0.9 2 0 0 0 99 0.00 177 - - 0 20

ptoolsl3 61 S 3.1 0.9 2 0 0 0 99 0.00 170 - - - -

psize = pool size (effective capacity)

physb = shared physB

ent = entitlement

maxc = maximum capacity

app = available pool processors

mem = memory

inu = memory in use

Page 36: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Overview of topas / nmon / topasrec

� AIX 5.3 TL09 and AIX 6.1 TL02

� topas is a curses based tool used to monitor various performanceparameters (statistics) of the system. Supported with the operating system since AIX 4.3.

� nmon is also a curses based tool for System Performance monitoring and also has recording capabilities. Developed by Nigel Griffiths (IBM).

� Development has integrated nmon-like functionality into AIX– Legacy topas and nmon options supported– Legacy recording formats supported (input into nmon Analyser, etc)

� topasrec is a new tool used to start topas local / CEC recording in binary format– AIX Local recordings previously used xmwlm agent– AIX CEC recordings previously used topas with –R option

Page 37: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

'nmon' in AIX

� Can be started by running command 'nmon' or ‘topas_nmon’

� Can be started by pressing “~” from topas screen

./topas_nmon -h

Hint: topas_nmon [-h] [-s <seconds>] [-c <count>] [-f -d -t -r <name>] [-x]Command: TOPAS-NMON

-h FULL help information - much more than hereInteractive-Mode:read startup banner and type: "h" once it is runningFor Data-Collect-Mode (-f) -f spreadsheet output format [note: default -s300 -c288]optional-s <seconds> between refreshing the screen [default 2]-c <number> of refreshes [default millions]-t spreadsheet includes top processes-x capacity planning (15 min for 1 day = -fdt -s 900 -c 96)

For Interactive-Mode-s <seconds> between refreshing the screen [default 2]-c <number> of refreshes [default millions]-g <filename> User decided Disk Groups

- file = on each line: group_name <hdisk_list> space separated- like: rootvg hdisk0 hdisk1 hdisk2- upto 32 groups hdisks can appear more than once

-b black and white [default is colour]-B no boxes [default is show boxes]example: topas_nmon -s 1 -c 100

Page 38: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Initial Screen of nmon

� Shows resources

Page 39: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Help Screen in nmon

Page 40: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Top process Panel in nmon

� Enter “t” to see top processes

Page 41: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

CPU utilization Panel in nmon

� Enter 'c' to toggle on CPU utilization panel

Page 42: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Disk Utilization Panel in nmon

� Enter 'd' to turn on Disk utilization panel

Page 43: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Partition Details Panel in nmon

� Enter 'p' to turn on partition details panel

Page 44: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Multiple Panels in one screen

Page 45: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon

� Following are the different options available for recording in nmon

Page 46: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon

� New command topasrec is introduced to do local / CEC binary topas recordings

� The naming conventions of the generated recording i s as follows:

– Nmon Style Recording (Custom recording)

• hostname_yymmdd_hhmm.nmon

– Nmon Style Recording (Persistent recording)

• hostname_yymmdd.nmon

– Binary Style Recording (Custom recording)

• hostname_yymmdd_hhmm.topas

– Binary Style Recording (Persistent recording)

• hostname_yymmdd.topas

– CEC Recording (Custom recording)

• hostname_cec_yymmdd_hhmm.topas

– CEC Recording (Persistent recording)

• hostname_cec_yymmdd.topas

Page 47: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

� New Smit Panels introduced to operate on topas recordin gs. Options are provided:

– To start / stop persistent recording ( 24 x7 )

– To start / stop WLE data collection

– To choose type of recording:

• Binary / Nmon Style Local recording• CEC recording

– List Available / Completed recordings

– Generate reports on the completed recordings

Page 48: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

Page 49: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

Page 50: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

Page 51: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

Page 52: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

Page 53: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Recording using topas / nmon (Contd.,)

Page 54: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

VIOS Monitoring using topas

� Run topas -C and press 'v' to show the VIOS Monitoring Panel

� All systems must be at AIX TL09 or higher to be monitored

Page 55: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

VIOS Monitoring using topas� From topas VIOS panel, move the cursor to a particular VIOS server and press 'd' to get

the detailed monitoring for that server

Page 56: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Topas Adapter / MPIO panel

� From topas Disk Panel, press 'd' to toggle on/off Adapter Panel, press 'm'

to toggle on/off Path panel.

Page 57: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

svmon Report Enhancements (5.3 TL09)

� Reports

– A new option -O is added to change the content and presentation of the reports that the svmon command generates.

– Filtering and sorting options

– To overwrite the default values that are defined for the -O options flag, a user can define the .svmonrc configuration file in the directory where the svmon command is launched.

– -X option is added to generate reports in XML format

� RBAC Enablement (AIX 6.1 TL02 only) / Non-root user access

� Memory Affinity information

Page 58: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

svmon Report Options (-O values)

� Following are the values that can be passed to -O option:

– activeusers=[on | off], affinity=[on | detail | off], commandline=[on | off],

– filename=[on | off], filtercat=[off exclusive kernel shared unused unattached],

– filterpgsz=[off s m L S], filterprop=[off notempty data text],

– filtertype=[off working persistent client], format=[80 | 160 | nolimit], frame=[on | off],

– mapping=[on | off], mpss=[on | off], overwrite=[on | off], pgsz=[on | off],

– pidlist=[on | number | off], process=[on | off], range=[on | off],

– segment=[on | category | off],

– shmid=[on | off], sortentity=[inuse | virtual | ....] (depending on the selected summary),

– sortseg=[inuse | pin | pgsp | virtual], subclass=[on | off], summary=[basic | longreal],

– svmonalloc=[on | off], timestamp=[on | off], unit=[auto | page | KB | MB | GB]

Page 59: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

svmon Report Examples (-O option)

Page 60: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

svmon Report Examples (-O option)

Unused work type segments

Page 61: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation61

POWER6 p575 & p595� Tools adjusted to use Scaled Processor Utilization Resource Register (SPURR)

– Measure of processor time dynamically scaled based on throttling or frequency slewing

• Caused by Thermal Power Management savings mode • Throttling – delays instruction processing by injecting dead cycles• Slewing – clock is able to dynamically adjust to other frequencies

– CPU tools updated to show processor rate (%npe)

• 100% no slewing or throttling• <100% percentage of nominal performance

– Adds another layer of complexity to determine utilization

� Dedicated Processor Folding

– Workloads consolidated onto as few processors as possible, equivalent to Virtual Processor Folding in shared environments

– mpstat –s is probably the only tool that can accurately detect this.

� Memory Throttling

– Larger DIMMs will be throttled, no tools can see this

– Implemented in POWER6 p575 and p595 platforms

– Not expected to be a major issue, but lack of measurement capability is a concern

Page 62: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation62

AIX 6.1� AIX 6.1 TL01

–Workload Partitions Support• ps, ipcs, netstat, proc*, trace, vmstat, topas, tprof, filemon, netpmon, pprof, curt• Separate presentations available to cover WPAR specifics

–Restricted Tunables

–IO pacing

–AIO

–CIO

–NFS biod

–JFS2 nolog

–Multiple Page Size Segments - svmon

–iostat/topas - Filesystem and Workload Partition breakdowns (AIX 6)

� AIX 6.1 TL02

–topas Memory Pool and Shared Ethernet monitoring

–filemon Reports

–mpstat/sar WPAR support

–tprof Large Page and Data profiling

Page 63: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Performance Tunables

� Tunables now in two categories

� Restricted Tunables

– Should not be changed unless recommended by AIX development or development support

– Are not shown by tuning commands unless the –F flag is used

– Dynamic change will show a warning message

– Permanent change must be confirmed

– Permanent changes will cause an error log entry at boot time

� Non-Restricted Tunable

– Can have restricted tunables as dependencies

Page 64: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Changing restricted tunables

�ioo -po aio_sample_rate=6Modification to restricted tunable aio_sample_rate, confirmation yes/no

> ioo -o aio_sample_rate=6Warning: a restricted tunable has been modified

�Changing a restricted tunable dynamically

A permanent change of a restricted tunable requires a confirmation from the user.

Note: The system will log changes to restricted tunable in the system error log atboot time.

A dynamic change of a restricted tunable will inform the user.

�Changing a restricted tunable permanently

Page 65: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

List restricted tunables

> ioo -aF

aio_active = 0

aio_maxreqs = 65536

...

posix_aio_minservers = 3

posix_aio_server_inactivity = 300

##Restricted tunables

aio_fastpath = 1

aio_fsfastpath = 1

aio_kprocprio = 39

aio_multitidsusp = 1

aio_sample_rate = 5

aio_samples_per_cycle = 6

j2_maxUsableMaxTransfer = 512

j2_nBufferPerPagerDevice = 512

j2_nonFatalCrashesSystem = 0

j2_syncModifiedMapped = 1

j2_syncdLogSyncInterval = 1

Page 66: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

TUNE_RESTRICTED Error Log EntryLABEL: TUNE_RESTRICTEDIDENTIFIER: D221BD55

Date/Time: Thu May 24 15:05:48 2007Sequence Number: 637Machine Id: 000AB14D4C00Node Id: quakeClass: OType: INFOWPAR: GlobalResource Name: perftune

DescriptionRESTRICTED TUNABLES MODIFIED AT REBOOT

Probable CausesSYSTEM TUNING

User CausesTUNABLE PARAMETER OF TYPE RESTRICTED HAS BEEN MODIFIED

Recommended ActionsREVIEW TUNABLE LISTS IN DETAILED DATA

Detail DataLIST OF TUNABLE COMMANDS CONTROLLING MODIFIED RESTRICTED TUNABLES AT REBOOT, SEE FILE /etc/tunables/lastboot.log

Page 67: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Why you ask?

� The number of tunables in AIX had grown to a ridiculously large number

– 5.3 TL06: vmo 61, ioo 27, schedo 42, no 135, plus a few others

– 6.1 vmo 29, ioo 21, schedo 15, no 133, plus a few others

� The potential combinations that exist are too huge to effectively test and document

� Many of the tunables had been created to deal with very specificcustomers or situations which don’t apply often

� This wasn’t done in a vacuum, a survey of support and recent situations was employed to identify the commonly used tunables (which remain unrestricted)

� If a restricted tunable must be changed, a PMR should be opened to identify the issue

Page 68: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Implementation Considerations

�Best Practices�Do not apply legacy tuning since some tunables may now be restricted

�If you do an upgrade install, your old tunings will be preserved�You may wish to undo them, but we won’t make you

�This level of tune was been applied to numerous AIX 5.3 customers through field support

�We are confident this was a good thing

�However, we try to never change defaults in the service stream, so AIX 5.3 remains as it was

�Change restricted tunables only if recommended by AIX support

Page 69: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Implementation Considerations (Cont’d)

�Problem Determination�Common problems - seen in field or lab

�Legacy VMM tuning results in error log entries (TUNE_RESTRICTED)

�Tuning scripts fail due to required confirmation for permanent changes of restricted tunables

�Install/tuning scripts fail due missing aio0 device�Diagnostics

�Check AIX errpt for TUNE_RESTRICTED

�Check /etc/tunables/lastboot.log�PERFPMR

Page 70: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

VMM File IO Pacing Enabled By Default

� IO Pacing Enabled By Default– Prevents system responsiveness issues due to large quantities of

writes– Limits the maximum number of pages of I/O outstanding to a file

• Without I/O pacing a program can fill up large amounts of memory with written pages. Those “queued” I/O’s can result in long waits for other programs using the storage

• Better solution than the file system write behind techniques– New defaults

• Not very aggressive, intended to limit one or a few programs from impacting system responsiveness. Values high enough not to impact sequential write performance

• maxpout = 8193• minpout = 4096

Page 71: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

AIO Support

� Interface Changes

– All the AIO entries in the ODM and AIO smit panels have been removed

– The aioo command will not longer be shipped

– All the AIO tunables have current, default, minimum and maximum value that can be viewed with ioo

� AIO kernel extension loaded at system boot

– Applications no longer fail to run because you forgot to load the kernel extension (you may applaud here)

– No AIO servers are active until requests are present

– Extremely low impact on memory requirements with this implementation

Page 72: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Improvements to AIO CIO

� AIO Fast Path for CIO enabled by default– With the fast path, the AIO server

threads no longer participate in the I/O path

– By removing the AIO servers from the path, we get three things• The removal of AIO servers as

any potential resource bottleneck• The reduction in path length for

AIO read/write services, as less dispatching is required

• Potentially better coalescing of sequential I/O requests initiated through AIO or LISTIO services

� Fast Path enabled for LV and PV’s for a long time– No change in behavior for

environments such as Oracle 10G/ASM on raw hdisks

Application

File System File System

LVM

Device Driver

Application

File System

LVM

Device Driver

AIO ServerApplication

FS no Fast Path

CIO Fast Path

Page 73: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

General improvements to AIO

� The number of AIO servers varies between minservers and maxservers (times #CPUs), based on workload

– AIO servers stay active as long as they service requests

– Number of AIO server dynamically increased/reduced based on the demand of the workload

– aio_server_inactivity defines after how many seconds idle time an AIO server will exit

– Do not confuse no active servers with kernel extension not loaded. The kernel extension is always loaded

� Changes to AIO tunables are dynamic through ioo

– Changes do not require system reboot– minservers is changed to a per CPU tunable

– maxservers is changed to 30

– maxreqs is changed to 65536

� Benefit

– No longer necessary to tune the minservers/maxservers/maxreqs as in the past

Page 74: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

CIO Read Mode Flag

� Allows an application to open a file for CIO such that subsequent opens without CIO avoid demotion

– In the past, a 2nd opening of a file without CIO, would cause “demotion” which removes many of the benefits of CIO

– The 2nd read-only opening without CIO will still result in that opening having uncached reads to the file. Thus, such programs should ensure that the I/O sizes are large enough to achieve I/O efficiency

� Example, a backup application can access database files in read only mode while the database has the file opened in concurrent IO mode

� open() flag is O_CIOR

� procfiles does not reflect O_CIO/O_CIO_R currently

– kdb 'u <slotnumber>' then for each file listed there 'file <filepointer>' gives some info

Page 75: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

NFS Performance Improvements

� RFC 1323 enabled by default– Allows for TCP window scaling beyond 64K, so more one-way packets

in-flight allowed between acks for large sequential transfers. We had the nfs_rfc1323 tunable before, it just wasn't enabled by default.

� Increase default number of biod daemons– 32 biod daemons per NFS V3 mount point– Very slight increase in memory (<2MB) required over previous default

of 4– Enables more I/O’s to be outstanding at the same, doesn’t speed

sequential operations much, but helps random access (e.g. OLTP)

� Default read/write size increased to 64k for TCP connections– Was 32k previously

Page 76: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

NFS biod changes

� Having more biod’s allows better read-ahead and write-behind

� However, measured on a single-process basis, don’t have huge performance differences over the AIX 5.3 defaults

� Results should improve in tests with multiple processes/threads operating over NFS

� NFS client tests, p5 520 on 1GB Ethernet with 64kB I/O’s (next slide)

Page 77: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

NFS biod changes

NFS single process throughput, over 256MB file

0

20000

40000

60000

80000

100000

120000

read

seq

serv

er u

ncach

ed

read

seq

serv

er ca

ched

read

rand

ser

ver u

ncac

hed

write

seq

over

write

write

seq

crea

tewrit

e ra

nd c

reat

e

MB

/sec

ond

32biod4biod

Page 78: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

NFS biod change with Kerberos krbp5

� The increase in biod’s has a much more positive impact when using Kerberos DES security

� Overlapping more compute with network traffic through more biod’s greatly improves throughput

� Same model as previous chart, krbp5 (full packet encryption) mount option

NFS biod changes with Kerberos

0

10000

20000

30000

40000

50000

60000

70000

read

seq

serv

er u

ncach

ed

read

seq

serv

er ca

ched

read

rand

ser

ver u

ncac

hed

write

seq

over

write

write

seq

crea

tewrit

e ra

nd c

reat

e

MB

/sec 32biod

4biod

Page 79: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Enhanced JFS2 “nolog” option

� JFS2 standard metadata logging for filesystem integrity disabledvia a mount option

– Similar to legacy JFS “nointegrity option”

� Meant to enable faster migration of data to new storage

– File system operation with heavy file create/delete activity cancreate log bottlenecks

– Potentially useful for temporary file systems where the filesystem can be easily recreated or fsck’ed

� Mount –o log=NULL during data migration phase, then unmountand mount with standard logging

Page 80: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Enhanced JFS2 “nolog” option - example

� 4-way POWER5 p550, PHP test “Wikibench”

� Test makes heavy use of file meta-data

� With single disk setup, bottleneck on disk writes to Enhanced JFS2 logs

� With “nolog”, the log bottleneck is avoided

Disk utilization over time

0

20

40

60

80

100

time

%di

sk b

usy

default log

nolog

PHP Wikibench

0102030405060708090

Default log nolog

Thr

ough

put

Page 81: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

Multiple Page Size Segment (MPSS) Support (6.1 TL01)

� POWER6 provides hardware support for mixing 4kB pages and 64kB pages in the same hardware segment

� This allows the AIX operating system to transparently promote small pages to medium pages

– This typically improves performance by reducing stress on hardware translation mechanisms

– Controlled with the vmo vmm_default_pspa parameter (-1 turns off)

� This behavior is enabled as a default on AIX 6.1 on POWER6 hardware– Since it is not supported on POWER5, systems running identical

application conditions on POWER5 and POWER6 may differ on exact memory page usage

– In general, no increase in memory consumption should be noticed,however the usage of 64kB pages may increase on POWER6

– System paging activity may result in 64kB pages being broken into 4kB pages

– 64kB pages that are broken by paging won’t usually be reconstituted into 64kB pages later

Page 82: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation82

svmon Mixed Page Sizes (6.1 TL01)� AIX 6.1 will dynamically collapse sets of 4K pages into 64K pages

– creates mixed page size segments

� Short reports update

Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual

1c8a6 2 work process private s 81 3 0 81

2869 2 work process private s 81 3 0 81

12881 2 work process private s 81 3 0 81

14842 2 work process private s 81 3 0 81

e7cf f work shared library data sm 69 0 0 69

� Long (-l) reports update

Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual

1c8a6 2 work process private s 81 3 0 81

2869 2 work process private s 81 3 0 81

12881 2 work process private s 81 3 0 81

14842 2 work process private s 81 3 0 81

e7cf f work shared library data s 5 0 0 5

m 4 0 0 4

Page 83: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

svmon MPSS detail

svmon –D d3a7

Segid: d3a7

Type: working

PSize: sm (4 KB - 64 KB)

Address Range: 0..4095

Size of page space allocation: 3744 pages ( 14.6 MB)

Virtual: 4096 frames (16.0 MB)

Inuse: 582 frames ( 2.3 MB)

Page Psize Frame Pin ExtSegid ExtPage

0 m 442176 Y - -

1 m 442177 Y - -

2 m 442178 Y - -

382 s 362140 N - -

435 s 430534 N - -

Page 84: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation84

iostat File System (6.1 TL01)� Available in AIX 6.1

- f to specify system and hdisk utilization (below)

- F to just display file system activity

System configuration: lcpu=2 drives=2 ent=0.50 path s=2 vdisks=2 fs=9

tty: tin tout avg-cpu: % user % sys % idle % iowait physc % entc

0.0 72.0 39.0 4.9 53.8 2.3 0.2 46.0

Disks: % tm_act Kbps tps Kb_read Kb_wrtn

hdisk0 37.0 3897.0 70.0 0 3897

hdisk1 50.0 3897.0 70.0 0 3897

FS Name: % tm_act Kbps tps Kb_read Kb_wrtn

/ - 3.7 2.0 3 0

/usr - 0.0 0.0 0 0

/var - 0.0 0.0 0 0

/tmp - 43.8 968.0 0 43

/home - 0.0 0.0 0 0

/admin - 0.0 0.0 0 0

/proc - 0.0 0.0 0 0

/opt - 0.0 0.0 0 0

Page 85: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation85

topas File System (6.1 TL01)

� Available in AIX 6.1

- f to specify number of monitored file systems on main screen at startup

- F to start file system view on startup or (F) to toggle screen display

Topas Monitor for host: ec03 Interval: 2 Sun Jul 20 19:21:21 2008

=================================================== =============================

FileSystem KBPS TPS KB-R KB-W Open Crea te Lock

/tmp 47.0 967.0 0.0 47.0 0 0 0

/var 10.0 0.0 202.0 0.0 0 0 0

/usr 0.0 0.0 0.0 0.0 0 0 0

/ 0.0 0.0 0.0 0. 0 0 0 0

/home 0.0 0.0 0.0 0.0 0 0 0

/audit 0.0 0.0 0.0 0.0 0 0 0

/admin 0.0 0.0 0.0 0.0 0 0 0

/proc 0.0 0.0 0.0 0.0 0 0 0

/opt 0.0 0.0 0.0 0.0 0 0 0

Page 86: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

topas Memory Pool (6.1 TL02) � From topas CEC panel, press 'm' to view Memory Pool Panel and press 'f'

focusing on a memory pool to view the partition level usage for the selected memory pool

Page 87: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

topas Shared Ethernet Adapter (6.1 TL02)

� Press E to display the Shared Ethernet Adapter(SEA) on a Virtual I/O Server.

Page 88: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

mpstat/sar WPAR Support (6.1 TL02)

� Commands mpstat / sar are enabled to show statistics when invoked within a WPAR

� -@ option is added to mpstat / sar to collect and display statistics of a specified WPAR from Global environment

� New field 'rset' is added to the Configuration line of the mpstat / sar report to indicate the type of rset that a particular WPAR is associated with.

� A new row with cpuid “R” is added to per processor utilization report of mpstat / sar. The “R” row will show the RSET level utilization.

� Disk statistics are not available inside WPAR, hence sar will not report disk statistics inside WPAR

Page 89: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

mpstat / sar (Contd.,)

� To view processor statistics of the processor that belongs to the rset associated with a specified WPAR. Run mpstat -@ <wparname>

� Invoking mpstat inside a WPAR to view statistics for all the processors in the system

� Invoking mpstat inside a WPAR to view SMT threads

Page 90: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

mpstat / sar (Contd.,)

� Invoking sar inside WPAR to view RSET processor statistics. The Red Circled CPU ID ('R') provides the RSET level utilization

� Invoking sar inside WPAR to view all processor statistics. The Red Circled CPU ID with a prefix '*' indicates that the CPU is associated with the RSET used by the WPAR

Page 91: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

filemon Report Enhancements (6.1 TL02)

� New filtering options are added to -O option of filemon to generate new type of report– lf[=num]: monitor logical file I/O and display first num records where num > 0

– vm[=num]: monitor virtual memory I/O and display first num records where num > 0

– lv[=num]: monitor logical volume I/O and display first num records where num > 0

– pv[=num]: monitor physical volume I/O and display first num records where num > 0

– pr[=num]: display data process-wise and display first num records where num > 0

– th[=num]: display data thread-wise and display first num records where num > 0

– all[=num]: short for lf,vm,lv,pv,pr,th and display first num records where num > 0

– detailed: display detailed information other than summary report

– abbreviated: Abbreviated mode (transactions)

– collated: Collated mode (transactions)

� New options added to make filemon run in automated offline-mode– A: Enable Automated Offline Mode

– x: Provide the user command to execute, use double quotes if you provide argument to the command

– r: Root String for trace and gennames filenames

Page 92: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

filemon – Abbreviated Report

# filemon -r trace -O abbreviated

Page 93: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

filemon - Collated Report

# filemon -r trace -O collated

Page 94: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Large Page Analysis (6.1 TL02)

� New option 'a' is introduced to enable large page analysis. tprof –a collects profile trace from a representative application run and producesperformance projections for mapping different portions of the application's data space to different page sizes.

� Large Page Analysis uses the information in the trace to project translation buffer performance when mapping any of the following four application memory regions to a different page size:

– static application data (initialized and uninitialized data)

– application heap (dynamically allocated data)

– stack

– application text

� Performance projections are provided for each of the page sizes

supported by the operating system. The first performance

projection is a baseline projection for mapping all four memory

regions to the default 4KB pages.

Page 95: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Large Page Analysis (Contd.,)

Memory Reference and Allocation counts

Memory References, Allocations summary by process

Memory References by Modeled regions

Performance Projections of Memory Translation Misses by modeled regions for various page sizes

Page 96: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Data Profiling (6.1 TL02)

� New option 'b', 'B' is introduced to enable basic data profiling in tprof. Basic Data profiling reports data access information.

� Summary section reports access information across kernel data, library data, user global data and stack heap sections for each process.

� When used with –s, -u, -k and –e, tprof data profiling reports most used data structures (exported data symbols) in shared library, binary, kernel and kernel extensions. The –B flag enables the reporting of function names that accessed the data structures

Page 97: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Data Profiling (Contd.,)

Summary section which reports the % of data access by each process

Summary section which reports the % of data access for each data region in the process

Page 98: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Data Profiling (Contd.,)

Detail by Data Structure Name and the subroutines that accessed those data structures

Page 99: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Data Profiling (Contd.,)

Kernel Data Structures Profiling

Page 100: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation

tprof Data profiling (Contd.,)

Shared Library Data Structures Profiling

Page 101: Aix Performance Updates 010609

IBM Advanced Technical Support

© 2008 IBM Corporation101

TrademarksThe following are trademarks of the International B usiness Machines Corporation in the United States, other countries, or both.

The following are trademarks or registered trademar ks of other companies.

* All other products may be trademarks or registered trademarks of their respective companies.

Notes : Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:

*, AS/400®, e business(logo)®, DBE, ESCO, eServer, FICON, IBM®, IBM (logo)®, iSeries®, MVS, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA, WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®

Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market.

Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.