Download - VMware Performance Troubleshooting
![Page 1: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/1.jpg)
VMware Performance Troubleshooting
Presented by Chris Kranz
![Page 2: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/2.jpg)
Topics Covered• Introduction• Root Cause Analysis• Performance Characteristics• CPU• Networking• Memory• Disk• Virtual Machine optimisation• ESXTop• vm-support• Service Console• Resource Groups• Design Guidelines• Capacity Planner limitations and cautions• Conclusion• Reference Articles
![Page 3: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/3.jpg)
Introduction
Multiple layers of virtualisation are used to increase service levels, availability and manageability
However, multiple layers of virtualisation often mask performance and configuration issues making it more of a challenge to troubleshoot and correct
The worst out come is that performance issues after a virtualisation project lead to the perception that VMware results in reduced performance and future confidence in VMware can be affected
![Page 4: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/4.jpg)
• Virtual Machine Resources– CPU– Memory– Disk– Networking
Performance Basics
![Page 5: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/5.jpg)
Resource Maximums
Host Guest
Logical Processors 64 N/A
Virtual CPUs N/A 8
Virtual CPU’s per Core 20 N/A
Memory 1TB 256GB
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
![Page 6: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/6.jpg)
Typical Host
vSphere 1U Host
CPU’s 2 x Quad Core
Memory 32-64GB RAM
Typical 3 VMs per core, 24VM’s per HostEach has 2GB of RAM = 48GB of RAM
![Page 7: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/7.jpg)
Root Cause Analysis
http://www.vmware.com/resources/techresources/10066
![Page 8: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/8.jpg)
Root Cause ...
![Page 9: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/9.jpg)
• Do not rely on guest tools, but– Can show high CPU, & Memory Utilisation– Measurement of Latency & throughput of Disk &
Network Interfaces• Use the virtualisation layer, to diagnose cause:– Guest is unaware of virtualisation workload– The way in which guest OS’s account time is
different– No visibility of available resources
Monitoring Performance
![Page 10: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/10.jpg)
• esxtop (service console only)• resxtop (remote command line utilities)• Performance graphs in vCentre
Performance Analysis Tools
![Page 11: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/11.jpg)
• esxtop can be run:– Interactively – Batch (eg. esxtop -a -b > analysis.csv)– Load batch into windows perfmon or MS Excel
• Two keys to remember– H : help– F : fields to display
esxtop
![Page 12: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/12.jpg)
esxtop basics
Number of WorldsName of Resource Pool, Virtual Machine or World
Host Resources
![Page 13: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/13.jpg)
Performance Characteristics
CPU NetworkingMemory DiskSlow ProcessingHigh CPU Wait
Packet LossSlow Network
Slow ProcessingDisk Swapping
Log StallsDisk Queue
Slow Application PerformanceReduced User ExperienceData Loss and Corruption
![Page 14: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/14.jpg)
CPUESX Scheduler
ServiceConsole
VirtualMachine
Limits / Shares / Reservations
Basic World StatesRead / Run / Wait
CPU StatesReady / Usage / Wait
![Page 15: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/15.jpg)
CPUesxtop
•PCPU(%): CPU utilization•%USED: Utilization•%RDY: Ready Time•%RUN: Run Time•%WAIT: Wait and idling time
High %RDY + High %User can imply over commitment
![Page 16: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/16.jpg)
CPUVI-Client
Used Time > Ready Time: Possible CPU over-committment
Used Time
Ready Time
![Page 17: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/17.jpg)
CPUFurther Investigation
%MLMTD shows this VM has been limited
![Page 18: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/18.jpg)
CPUFurther Investigation
High ready time caused by CPU resource limit
![Page 19: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/19.jpg)
VMware Memory Management• Transparent Page Sharing• VMware Tools Balloon Driver to force the VM to swap to disk• Virtual Machine Page File
![Page 20: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/20.jpg)
MemoryBallooning vs. Swapping
Ballooning driver causes the host to swap pages that it chooses to disk
ESX Swapping will swap any pages to disk.
![Page 21: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/21.jpg)
• Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using:sched.mem.maxmemctl
• Default is set to 65%, can be controlled at host level.
• Only is an issue in resource contention scenarios. (or VM’s with low latency eg Citrix)
Memory
![Page 22: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/22.jpg)
Memory - Host
VI Client shows memory usage of the host. This is calculated as “consumed + overhead memory + Service Console”.
Performance charts are a very good way of showing the Virtual Machine memory breakdown.
• Consumed Memory• Ballooned Memory• Shared Memory• Swapped Memory
![Page 23: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/23.jpg)
Memory - Guest
Host Memory = Consumed + Overhead MemoryGuest Memory = Active Memory for Guest OS
![Page 24: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/24.jpg)
Memory – Guest Overhead
![Page 25: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/25.jpg)
Memory
Metric DescriptionMemory Active (KB) Physical pages touched recently by a VM
Memory Usage (%) Active memory / configured memory
Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of shared pages. Doesn’t include overhead memory
Memory Granted (KB) Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesn’t include overhead memory.
Memory Shared (KB) Physical pages shared with other virtual machines
Memory Balloon (KB) Physical memory ballooned from a virtual machine
Memory Swapped (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative
Overhead Memory (KB) Machine pages used for virtualisation
Virtual Machine Memory Metrics – VI Client
![Page 26: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/26.jpg)
Memory
Metric DescriptionMemory Active (KB) Physical pages touched recently by the host
Memory Usage (%) Active memory / configured memory
Memory Consumed (KB) Total host physical memory – free memory on host. Includes Overhead and Service Console memory
Memory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesn’t include overhead memory.
Memory Shared (KB) Physical pages shared by virtual machines on host
Shared Common (KB) Total machine pages used by shared pages
Memory Balloon (KB) Machine pages ballooned from virtual machines
Memory Swap Used (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative
Overhead Memory (KB) Machine pages used for virtualisation
Host Memory Metrics – VI Client
![Page 27: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/27.jpg)
Memoryesxtop
PMEM: Total physical memory breakdownVMKMEM: Memory managed by vmkernelCOSMEM: Service Console memory breakdownPSHARE: Page sharing statisticsSWAP: Swap statisticsMEMCTL: Balloon driver data
![Page 28: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/28.jpg)
Memory
VI Client esxtopActive Memory TCHDMemory Usage %ACTVConsumed Memory N/AMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOPMemory Balloon MCTLSZMemory Swapped SWCUR (SWR/s & SWW/s are rates)Overhead Memory OVHD & OVHDMAX
esxtop / VI Client metrics : Virtual Machines
![Page 29: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/29.jpg)
Memory
VI Client esxtopMemory Active N/A (try /proc/vmware/sched/mem-verbose)Memory Usage N/A (try /proc/vmware/sched/mem-verbose)Memory Consumed PMEM total – PMEM freeMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared PSHARE (shared)Memory Shared Common PSHARE (common)Memory Balloon MEMCTLMemory Swap Used SWAP (r/w and w/s are rates)Overhead Memory OVHD & OVHDMAX
esxtop / VI Client metrics : Host Usage
![Page 30: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/30.jpg)
MemoryVI Client memory usage graph
![Page 31: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/31.jpg)
MemoryTroubleshooting Memory usage issues
![Page 32: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/32.jpg)
Networking
Network configuration is more likely to blame than resource contention
•Switch Assisted Teaming (IP Hash)•VLAN Trunking•Flow Control (full)•Speed & Duplex (1000Mb / Full)•Port Fast•BPDU Disabled•STP Disabled•Link State Tracking•Jumbo Frames
![Page 33: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/33.jpg)
Networkingesxtop
Transmit and Receive in Mb/s
Transmit and Receive in Packets
![Page 34: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/34.jpg)
Networkingesxtop
Drop Packets Received
Dropped Packets Transmit
![Page 35: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/35.jpg)
Disk
Varying Factors• File system performance• Disk subsystem configuration (SAN, NAS, iSCSI, local disk)• Disk caching• Disk formats (thick, sparse, thin)
ESX Storage Stack• Different latencies for different disks• Queuing within the kernel
K: KernelD: DeviceG: Guest
![Page 36: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/36.jpg)
Disk
Quite Coarse Statistics• Disk read / write rate (KB/s)• Disk usage: sum of read BW and write BW (KB/s)• Disk read / write requests (per 20s interval)• Bus resets / Command aborts (per 20s interval)• Per LUN or aggregated stats
VI Client statistics
![Page 37: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/37.jpg)
DiskAggregated stats similar to VI Client• Disk read / write per sec (READS/s, WRITES/s)• MB read / write per sec (MBREAD/s, MBWRTN/s)
Latency Statistics• Kernel Average / command (KAVG/cmd)• Device Average / command (DAVG/cmd)• Guest Average / command (GAVG/cmd)
Queuing Information• Adapter Queue Length (AQLEN)• LUN Queue Length (LQLEN)• VMKernel (QUED)• Active Queue (ACTV)• %Used (%USD = ACTV/LQLEN)
esxtop statistics
![Page 38: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/38.jpg)
DiskSAN Rough Estimates
Purely looking at a single ESX host, roughly:Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec
FC, rough maximums:Effective Link Bandwidth = ~80/90% of Real Bandwidth
Effective (2Gbps) = 200 – 230 MBpsEffective (4Gbps) = 410 – 460 MBpsEffective (8Gbps) = 820 – 920 MBps
iSCSI / NFS / FCoE, rough maximums:Effective Link Bandwidth = ~70/80% of Real Bandwidth
Effective (1GigE) = 90 – 100 MBpsEffective (10GigE) = 900 – 1000 MBps
![Page 39: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/39.jpg)
DiskDesired Latency CalculationsDesired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host
Example:Number of Hosts: 16Effective Link Bandwidth: 90 MBpsThroughput per host: 90 / 16 = 5.6 MBpsDesired Latency: (32 * 32) / (5.6) = 182.86 msec
Workload Cached Sequential Read Cached Sequential Write
Desired Latency (msec) 182.86 182.86
Observed Latency (msec) ~350 ~180
Throughput Drop? Yes No
Throughput (MBps) ~45 ~90
![Page 40: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/40.jpg)
DiskVI Client
SAN Cache disabled Poor throughput
SAN Cache enabledHigh throughput
![Page 41: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/41.jpg)
Diskesxtop
Latency is quite high
After enabling cache,Latency is reduced
![Page 42: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/42.jpg)
Virtual Machine OptimisationDeploy all machines from an optimised template!
• VMware tools MUST be installed• The disks MUST be block aligned to the storage (even when using NFS and SAN)• Where possible, always separate data disks from OS disks• Windows performance settings should be optimised for application performance• Guest operating system timeouts should be set as defined by the SAN vendor• Pagefile should be separated where appropriate (this can impact VMware SRM however)• Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)• Last access update time should be disabled (unless where required)• Logging of the VM should be disabled (only enabled for troubleshooting)• Remove any unused virtual hardware (floppy drives, USB, etc.)• Disable screen savers and power saving features, including logon screen saver• Enable Remote Desktop, avoid using the VI Client for remote administration• Install standard applications into template (bginfo, AntiVirus, any host agents, etc)• Multiple-CPU’s should be allocated sparingly
![Page 43: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/43.jpg)
Virtual Machine OptimisationBlock alignment is vital to good disk performance!
![Page 44: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/44.jpg)
esxtopCommand Actionspace Update the display? Show the help pageq quitf / F Add or Remove columns from the displayo / O Change the order the display is sorteds change the update interval# change the number of instances to displayW Write configuration to filee Expand / Rollup CPU StatsV View only VM instancesL Change the length of the NAME fieldm Display memory statisticsn Display network statisticsi Display interrupt statisticsd Display disk adapter statisticsu Display disk device statisticsv Display disk VM statistics
Command Options when inside esxtop
![Page 45: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/45.jpg)
esxtop
Command Action-b batch mode-l locks the objects available in the first snapshot-s enables secure mode-a show all statistics-c sets the configuration file-R enables replay mode (used with “vm-support –S”)-d sets the update interval-n runs esxtop for n iterations
Command Line Optionsfrom the console
![Page 46: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/46.jpg)
esxtop
Expand the default window size for your session to get all statistics
![Page 47: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/47.jpg)
vm-supportCreates a packaged zip file containing the following sections:• boot
• contains the grub configuration• etc
• contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)• proc
• contains much of the hardware configuration modules and variables• tmp
• contains a lot of the ESX specific configuration output• var
• contains log files and any core dumps• vmfs
• contains the structure of the VMFS datastores• esx3-installation (where appropriate)
• contains a copy if the previous esx3 configuration variables
![Page 48: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/48.jpg)
vm-supportUsing vm-support to extract performance information:
vm-support –S –d <duration> -i <interval><duration> and <interval> are in seconds
The output from this can then be replayed in esxtop for review after it has been extracted.
esxtop –R <path_to_vm-support_output>
![Page 49: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/49.jpg)
Service Console Performance
•Multiple Service Console networks – for network resiliency•Increased Service Console memory – upto 800MB•Use host agents supplied by your vendors•Make storage recommended tweaks such as HBA Queue Depth and IO timeouts•Minimal use of the VI Client console – RDP or SSH instead•Properly sized vCenter server – 64bit OS where possible
![Page 50: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/50.jpg)
Resource Groups
Dynamically reallocate resource shares
Additional VM, shares allow you to over-commit resources and have a graceful re-allocation
Remove a VM and exploit extra resources across all remaining VM’s
![Page 51: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/51.jpg)
Design Guidelines• Full Resilience / Multiple paths• Standard configuration across all aspects (ESX, Storage, Networking, etc.)
• Standard naming conventions• Learn from others mistakes• Follow guidelines from vendors best-practices• Rule out the basics before requesting support
![Page 52: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/52.jpg)
Capacity Planner & P2V Cautions and Limitations
• Peak CPU usage can sometimes be misleading• Back-end storage system performance• P2V machines will require block-aligning to the storage• P2V machines will still require guest OS optimisation
![Page 53: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/53.jpg)
Conclusion• Performance issues can often be traced with simple root cause analysis using basic tools (VI Client / esxtop)• Performance tools help diagnose issues and help rule out non-issues• Performance tools are useful in different contexts, not always either/or• Real-time data and troubleshooting: esxtop• Historical data: VI Client• Coarse resource / cluster usage: VI Client• Detailed resource usage: esxtop
• Combine information from various tools to get a complete picture• Always benchmark your systems first so you not what the optimal performance is that you can receive
![Page 54: VMware Performance Troubleshooting](https://reader035.vdocuments.net/reader035/viewer/2022081413/54963a22b47959002d8b456d/html5/thumbnails/54.jpg)
Reference Articles• http://www.vmware.com/pdf/esx3_memory.pdf• http://www.vmworld.com/docs/DOC-2370• http://blogs.vmware.com/performance/• http://communities.vmware.com/docs/DOC-5420• http://kb.vmware.com/kb/1008205 • http://communities.vmware.com/community/vmtn/general/performance• http://www.vmware.com/products/vmmark/ • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf • http://www.vmware.com/pdf/GuestOS_guide.pdf • http://www.vmware.com/resources/techresources/10066 • http://www.vmware.com/resources/techresources/10059• http://www.vmware.com/resources/techresources/10062