troubleshooting performance issues in a citrix virtualized environment kapil ramlal sr. software...
Post on 26-Mar-2015
220 Views
Preview:
TRANSCRIPT
Troubleshooting Performance Issues in a Citrix Virtualized EnvironmentKapil RamlalSr. Software Maintenance EngineerDaniel LazarLead Escalation Engineer
• XenServer Performance Overview
• Troubleshooting XenServer Performance
• Windows Application Architecture Primer
• Troubleshooting Virtual Machine Performance
• Citrix Performance VM Demo
• Q & A
Citrix Confidential - Do Not Distribute
Agenda
XenServer Performance Overview
Citrix XenServer
Citrix Confidential - Do Not Distribute
XenServer Performance Overview
XenServer is designed to do one thing…Performance is
a function of “VM density”
…Consolidate machine workloads
All hypervisors have VM density limitations
How do we determine optimal VM density for a host?XenServer Performance Overview
Citrix Confidential - Do Not Distribute
• XenServer Hardware
• Infrastructure, such as network and storage
• Workload and sizing demands of the virtual machines
• Native XenServer characteristics
DomUDomUDomUDomU
Xen HypervisorXen Hypervisor
Dom0Dom0
ToolstackToolstack AppApp AppApp AppAppAppApp
Native DriverNative Driver
netbacknetback netfrontnetfront netfrontnetfront
Guest OSGuest OS Guest OSGuest OS
Host Machine (Hardware)Host Machine (Hardware)Host Machine (Hardware)Host Machine (Hardware)
External FactorsXenServer Performance Overview
Citrix Confidential - Do Not Distribute
• Network
• Storage
• VM Workload and Sizing
Troubleshooting XenServer Performance
Dom0 Memory Pool
6 MB752 MB352 MB400 MB
Domain 0 Memory Management
for Dom0
for DomU
346 MB6 MB6 MB6 MB for DomU
for DomU
for DomU
328 MB6 MB x n = Total DomU “footprint”+total Dom0
= memory required for n VMs
( )
752MB allows for about 60 VMs-per-host
Total XenServerMemory Pool(ex. 12GB)
• What happens when we start more VMs than Dom0 has memory to manage?• Slow VM performance, poor user experience.• Slow response from XenAPI—takes longer to process tasks like starting,
shutting down and migrating virtual machines.• It can cause XenServer host instability resulting in unpredictable behavior and
potentially crashing the XenServer host machine!!
Citrix Confidential - Do Not Distribute
Troubleshooting XenServer Performance
Troubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
There are two common ways to monitor performance in XenServer
XenCenter Performance Tab XenServer Command Line Interface
Citrix Confidential - Do Not Distribute
• Good for ‘at a glance’ monitoring
• Unwieldy for refined or customized performance testing
• Difficult to use for historical trending
• Data cannot be easily exported
• Some types of information not gathered.
Troubleshooting XenServer PerformanceUsing XenCenter
Performance monitoring commandsTroubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
# top # Provides a dynamic real-time view of a running system.
Tasks: 68 total, 2 running, 65 sleeping, 0 stopped, 1 zombieCpu(s): 13.0%us, 33.6%sy, 0.0%ni, 1.0%id, 52.5%wa, 0.0%hi, 0.0%si, 0.0%stMem: 417792k total, 302832k used, 114960k free, 68384k buffersSwap: 524280k total, 104k used, 524176k free, 80928k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND12857 65550 15 0 27784 3012 1280 D 19 0.6 0:00.57 qemu-dm 4679 root 12 -3 281m 16m 5308 S 12 3.4 3:47.85 xapi 5993 root 15 -3 6164 2276 1188 S 2 0.5 0:24.73 stunnel 1264 root 16 -4 2244 664 384 S 0 0.1 0:24.00 udevd 4641 root 15 0 16348 1936 952 S 0 0.4 0:01.25 xenstored 4650 root 15 0 12304 652 544 S 0 0.1 0:00.05 blktapctrl12722 root 15 0 2188 1052 836 R 0 0.2 0:00.03 top
Performance monitoring commandsTroubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
# xentop# Displays real-time information about a Xen system and domains.
xentop - 17:24:33 Xen 3.3.14 domains: 1 running, 3 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdownMem: 12580820k total, 7092880k used, 5487940k free CPUs: 8 @ 1600MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS ... Domain-0 -----r 9849 65.5 417792 3.9 no limit n/a 8 0 ... Win2K3-01 ------ 1 1.5 2097020 16.7 2106164 16.7 2 1 ... Win2K3-02 ------ 1 4.3 2097020 16.7 2106164 16.7 2 1 ... Win2K3-03 ------ 0 9.8 2097020 16.7 2106164 16.7 2 1 ...
What if I need to increase my VM-density-per-host?Troubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
• We can tune XenServer to increase VM density. In a scalability study conducted by Citrix to determine the maximum number of Windows XP virtual desktops per host for XenServer 5.5 running XenDesktop 4 we were able to host 130 VMs per host.
Troubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
• This was achieved by making two key configuration changes to a default XenServer 5.5 installation.• Increased the amount of RAM assigned to Dom0 to 2.94GB from the default
752MB; increasing it enabled us to launch more desktop clients.• Increased the “Xen-heap” setting to take into account the large number of VMs
on this single server host. This was done by adding "xenheap_megabytes=24" to the Xen command-line in /boot/extlinux.conf which resulted in an increase from the default of 16MB to 24MB.
Additional InformationTroubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
• Both the scalability study and instructions for increasing Dom0 memory limits are documented in the Citrix Knowledge Center here:• http://support.citrix.com/article/CTX124086 - XenServer Single Server
Scalability with XenDesktop• http://support.citrix.com/article/CTX124259 - Adjusting Dom0 and Xenheap
Setting in XenServer
Disclaimer: Your results may vary! This testing was done on very high-end equipment using Citrix best practices!
Troubleshooting commands - StorageTroubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
• # iostat # Reports basic I/O stats for devices and partitions
• # hdparm # Performs timed sequential reads
• # dd # Simple, common block device copy utility
TIP: iSCSI storage throughput can usually be tied directly to network performance. If there is slow throughput for an iSCSI storage array, perform network diagnostics first!!
Troubleshooting commands - NetworkTroubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
• # tcpdump # Dumps traffic on a network• http://support.citrix.com/article/CTX120869 - detailed instructions for using
tcpdump.
• # netstat # Display network interface statistics
• # ifconfig # Display and configure network interfaces
TIP: You can always type ‘man’ followed by a Linux command name (i.e., ‘man netstat’) to get detailed help for the command.
Citrix Confidential - Do Not Distribute
• Can capture customized data sets
• Can be run over defined periods of time
• Can be formatted specifically for reporting purposes.
• Requires knowledge of Linux and shell scripting languages.
Troubleshooting XenServer PerformanceRunning Shell Scripts
Additional InformationTroubleshooting XenServer Performance
Citrix Confidential - Do Not Distribute
• On the Citrix Knowledge Center you can find shell script examples, procedures and best practices for how to troubleshoot all aspects of a XenServer environment.
• Some useful links to troubleshooting articles:• http://support.citrix.com/article/CTX124157• http://support.citrix.com/article/CTX121634• http://support.citrix.com/article/CTX122806• http://support.citrix.com/article/CTX120737
Windows Application Architecture Primer
A process, in the simplest terms, is an executing program.
- Microsoft (2010)
Citrix Confidential - Do Not Distribute
• An application consists of one or more processes
• Each process provides the resources needed to execute a program
• One or more threads run in the context of the process
• Each process is started with a single thread, often called the primary thread, but can create additional threads
Application Basics
• A thread is the basic unit to which the operating system allocates processor time
• Threads carry out the work of a process
• All threads of a process share its virtual address space and system resources
• Uses stack-based storage for handling data
Application Basics
• What is the Stack?• It’s temporary memory used by threads• It’s used to store function’s parameters
and Local variables
Application Basics
Frame 0
Frame 1
Frame 2
Frame 3
Local Variables
Saved Frame pointer
Return Address
Function Parameters
Thread Stack
Frame Pointer
A closer look
Citrix Confidential - Do Not Distribute
User & Kernel Space
• The Windows operating system can be conceptually divided into 2 parts:• User Space (User Mode)• Kernel Space (Kernel Mode)
• Applications run in User Mode
• System drivers run in Kernel Mode (Privileged Mode)
Application Basics
USER MODE
USER SPACE
KERNEL SPACE
USER APPLICATION
USER APPLICATION USER
APPLICATIONUSER
APPLICATIONUSER
APPLICATIONUSER
APPLICATION USER APPLICATION
USER APPLICATION
USER APPLICATION
keyboard.syswin32k.systcpip.sys
rusb2w2k.sys
[…]
Troubleshooting Virtual Machine Performance
• Common performance related issues inside the VM:•High CPU•Disk/registry contention•High network utilization•Memory
Troubleshooting Virtual Machine Performance
• Identify offending Thread (s)
• Identify the top function call and its module
• Capture user memory dump of offending process for analysis
• Engage respective application vendor
High CPU
ProcessExplorer can be used for live stack-trace viewing!ProcessExplorer can be used for live stack-trace viewing!
• Next generation performance monitoring from Microsoft
• Track CPU usage, application start times, boot issues etc.
• Identify common performance problems without a debugger
• Included with Windows 7 SDK Download
The Windows Performance Tools
ISSUE:
• High CPU on wfica32.exe
Methodology
• Compare a 30-second sample of activity and compare to non-working ICA, working ICA and working RDP
The Windows Performance Tools: Case Study
ICA test run where problem occurred
• Notice that on this dual processor machine – 1 processor is frequently at or very close to 100%.
• Looking inside the above testing to see which instructions were being executed the most during the test – was wfica32.exe.
The Windows Performance Tools: Case Study
•Drilling into the calls of wfica32.exe, lead to the Windows function NtUserSetCursor() which results in calls to the igdkmd32.sys driver and then into the kernel – specifically the memcpy() function.
The Windows Performance Tools: Case Study
•User dumps contain a snapshot of a process’ memory•Kernel dumps contain a snapshot of kernel memory space•A complete memory dump contains both the kernel and the entire user space
Memory Dump Collection
•Configure a default post-mortem debugger:
How to Set the NT Symbolic Debugger as a Default Windows Postmortem Debugger (CTX105888)
How to Set WinDbg as a Default Windows Postmortem Debugger (CTX107528)
Use Task Manager for manual dumps
User Dump Collection
System Dump Collection• Small Memory Dump
• Generally we avoid
• Kernel Memory Dump• System crash
• Complete Memory Dump• System unresponsive
Control Panel -> System->Advanced Tab -> Startup and Recovery
• Windows 7 introduced the Dedicated Dump Drive setting
• Allows a pagefile to be configured on a dedicated drive for dump capture
• Recommended to debug VM’s streamed through PVS
How to Recover Windows Kernel Level Dump Files from Provisioned Target (CTX123642)
Citrix Confidential - Do Not Distribute
System Dump Collection
Demo: Citrix XenServer Performance VM
Q & A
TechEdge Survey, Video Postings & PPTs
• The TechEdge survey will be emailed out to end-user customers
• If you complete the survey, you will be entered to win a $250 Amazon gift card. The winner will be announced June 1st.
• View TechEdge videos & PPTs on the Knowledge Center by Monday, May17th http://support.citrix.com/techedge2010
top related