vmware customer support day confidential vmware customer support day welcome to broomfield’s 3rd...
TRANSCRIPT
© 2009 VMware Inc. All rights reserved
Confidential
VMware Customer Support Day
Broomfield, Colorado
March 2, 2010
2 Confidential
Broomfield Support Day Agenda
10:00 AM Registration
10:30 AM Kick-off
10:45 AM Keynote - Eric Wansong, VP GSS Americas
11:15 AM vSphere Upgrade Best Practices
12:00 PM Lunch - Q&A with GSS Experts
12:45 PM Storage Best Practices
1:45 PM Networking Best Practices
2:45 PM Break
3:00 PM Performance Best Practices
4:00 PM Wrap-up and Give-away
3 Confidential
VMware Customer Support Day
Welcome to Broomfield’s 3rd Customer Support Day
Collaboration bringing VMware Support, Sales & our Customers
Together
Collaboration bringing VMware Support, Sales & our Customers
Together
Value Add
• Education: VMware Best Practices, Tips & Tricks
• Technical Support Overview
• Certification Offerings
• Product Demos
Customer Feedback- Support Day Topics
© 2009 VMware Inc. All rights reserved
Confidential
vSphere Upgrade Best Practices
Brian Pope – Install/OS Escalation Engineer, GSS
5 Confidential
Agenda
Planning
vCenter
ESX/ESXi
VMware Tools / Virtual Hardware
Licensing
6 Confidential
vSphere Upgrade Pre-planning
VMware vSphere Upgrade Center
• Collection of Docs, Videos, Best Practices, New Features, etc.
• http://www.vmware.com/products/vsphere/upgrade-center/resources.html
vSphere Upgrade Guide
• http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_upgrade_guide.pdf
Upgrading to ESX 4.0 and vCenter 4.0 Best Practices
• Knowledge Base Article 1009039
vSphere Migration Checklist
• http://vmware.com/files/pdf/vsphere-migration-prerequisites-checklist.pdf
Installing ESX 4.0 and vCenter 4.0 Best Practices
• Knowledge Base Article 1009080
VMware vCenter Install Worksheet
• Knowledge Base Article 1010023
7 Confidential
vCenter Server
Upgrade components in the following order:
• vCenter
• ESX/ESXi Hosts
• VMware Tools
• Virtual Hardware
vCenter now supported on 64bit OS, however, requires a 32bit DSN
• Knowledge Base Article 1010401
Backup the vCenter Database (should be doing this anyway)
Verify dbo perms on MSDB, VC, and UM DB’s
Allow for any new vSphere required ports
• Knowledge Base Article 1012382
TEST TEST TEST
• Setup a test environment to test critical applications to verify functionality and
performance.
8 Confidential
ESX 4.0 / ESXi 4.0
vSphere 4.0 offers two GUI-based applications and a script that you
can use to upgrade ESX 3.5 to ESX 4.0:
vSphere Host Update Utility
• For standalone hosts
VMware vCenter Update Manager
• For ESX/ESXi hosts that are managed by vCenter Server
• Use ―Host Upgrade‖ baseline vs ―Host Patch‖ baseline
esxupgrade.sh script
• For Offline Upgrade - ESX 3.x hosts that do not have network access.
Knowledge Base Article 1009440
Several upgrade tools were supported in previous ESX releases and are no longer
supported in the current release. These tools include graphical upgrade from CD, text-
mode upgrade from CD, tarball upgrade using the service console, scripted upgrade from
CD or PXE server by using esxupdate, and scripted upgrade from CD or PXE server using
kickstart commands.
9 Confidential
ESX 4.0 / ESXi 4.0
VMware ESX 4.0 will only install and run on servers with 64-bit x86
CPUs.
• Known 64-bit processors:
• All AMD Opterons support 64 bit.
• All Intel Xeon 3000/3200, 3100/3300, 5100/5300, 5200/5400, 7100/7300, and
7200/7400 support 64 bit.
• All Intel Nehalem processors support 64 bit.
ESX requires ~15G VMFS volume for Console VM
• The service console must be installed on a VMFS datastore that is resident on
a host's local disk or on a SAN disk that is masked and zoned to that particular
host only. The datastore cannot be shared between hosts.
Upgrading ESXi 3.5 hosts with OEM server vendor’s specific
components to ESXi 4.0
• Knowledge Base Article 1010489
10 Confidential
VMware Tools / Virtual Hardware
Upgrading an ESX 3.x virtual machine to ESX 4.0
• Knowledge Base Article 1010675
VMware tools 4.0 are backwards compatible to 3.x
• Feel free to immediately upgrade VMware Tools. You will still be able to
vMotion to 3.x hosts.
• Snapshot critical VMs in the event tools upgrade is not successful.
• Clone and test VMs to ensure tools and hardware upgrade successfully.
Virtual Hardware version 7 is NOT backwards compatible
• Once upgraded virtual hardware 7 will only run on ESX 4.0. If done before your
host are all at 4.0 you will limit migration capability.
• Virtual Hardware downgrade is NOT supported.
• Only upgrade virtual hardware for specific VM‘s needing the new features.
• Upgrade is a powered off operation.
• A full reboot following VMware Tools install is required before hardware is
upgraded.
11 Confidential
Licensing
vSphere Licensing Information Portal
• http://www.vmware.com/products/vsphere/upgrade-center/licensing.html
• What‘s New in Licensing
• Preparing for Your License Upgrade
• Entitlement Mapping
• Licensing Troubleshooting
Configuring a legacy license server to manage ESX/ESXi 3.x hosts
in vCenter Server 4.0
• http://kb.vmware.com/kb/1010704
© 2009 VMware Inc. All rights reserved
Confidential
Questions
© 2009 VMware Inc. All rights reserved
Confidential
Lunch – Q&A
Brian Pope
Install/OS Escalation Engineer
David Garcia
NASA L2 Escalation Engineer
Gerald Camacho
Network Escalation Engineer
Jake McDermott
BCS Engineer
Josh Newton
BCS Engineer
Paul Clark
Storage Escalation Engineer
Paul Hill
System Management Escalation Engineer
© 2009 VMware Inc. All rights reserved
Confidential
Storage Best Practices
Paul Clark – Storage Escalation Engineer, GSS
15 Confidential
Agenda
Performance
SCSI Reservations
Performance Monitoring• esxtop
Common Storage Issues• Snapshot LUN‘s
• Virtual Machine Snapshot
• iSCSI Multi Pathing
• All Paths Dead (APD)
16 Confidential
Disk subsystem bottlenecks cause more performance problems
than CPU or RAM deficiencies
Your disk subsystem is considered to be performing poorly if it is
experiencing:
• Average read and write latencies greater than 20 milliseconds
• Latency spikes greater than 50 milliseconds that last for more than a few seconds
Performance
17 Confidential
Performance vs. Capacity comes into play at two main levels
• Physical drive size
• Hard disk performance doesn‘t scale with drive size
• In most cases the larger the drive the lower the performance.
• LUN size
• Larger LUNs increase the number of VM‘s, which can lead to contention on that
particular LUN
• LUN size is often times related to physical drive size which can compound performance problems
Performance vs. Capacity
18 Confidential
You need 1 TB of space for an application
• 2 x 500GB 15K RPM SAS drives = ~300 IOPS
• Capacity needs satisfied, Performance low
• 8 x 146GB 15K RPM SAS drives = ~1168 IOPS
• Capacity needs satisfied, Performance high
Performance – Physical Drive Size
19 Confidential
SCSI Reservations – when an initiator requests/reserves exclusive use of a target(LUN)
• VMFS is a clustered file system
• Uses SCSI reservations to protect metadata
• To preserve the integrity of VMFS in multi host deployments
• One host has complete access to the LUN exclusively
• A reboot or release command will clear the reservation
• The virtual machine monitor users SCSI-2 reservations
SCSI Reservations – Why?
20 Confidential
What causes SCSI Reservations• When a VMDK is created, deleted, placed in REDO
mode, has a snapshot (delta) file, is migrated (reservations from the source ESX and from the target ESX) or when the VM is suspended (Since there is a suspend file written). MEDIUM ERROR – LOGICAL UNIT NOT READY
• When VMDK is created via a template, we get SCSI reservations on the source and target
• When a template is created from a VMDK, SCSI reservation is generated
SCSI Reservations
21 Confidential
• Simplify/verify deployments so that virtual machines do not span more than one LUN• This will ensure SCSI reservations do not impact more than one LUN
• Determine if any operations are occurring on a LUN on which you want to perform another operation• Snapshots
• VMotion
• Template Deployment
• Use a single ESX server as your deployment server to limit/prevent conflicts with other ESX servers attempting to perform similar operations
SCSI Reservation Best Practice
22 Confidential
• Inside vCenter, limit access to actions that initiate reservations to administrators who understand the effects of reservations to control WHO can perform such operations
• Schedule virtual machine reboots so that only one LUN is impacted at any given time
• A power on and power off are considered separate operations and both with create a
reservations
• VMotion
• Use care when scheduling backups. Consult the backup provider best practices information
• Use care when scheduling Anti Virus scans and updates
SCSI Reservation Best Practice - Continued
23 Confidential
• Monitoring /var/log/vmkernel for:
• 24/0 0x0 0x0 0x0
• SYNC CR messages
• In a shared environment like ESX there will be some SCSI reservations. This is normal. But when you see 100‘s of them it‘s not normal.
• Check for Virtual Machines with snapshots
• Check for HP management agents still running the storage agent
• Check LUN presentation for Host mode settings
• Call VMware support to dig into it further
SCSI Reservation Monitoring
© 2009 VMware Inc. All rights reserved
Confidential
Storage Performance Monitoring
Paul Clark – Storage Escalation Engineer, GSS
25 Confidential
esxtop
26 Confidential
DAVG = Raw response time from the device
KAVG = Amount of time spent in the VMkernel, aka. virtualization
overhead
GAVG = Response time that would be perceived by virtual machines
D + K = G
esxtop - Continued
27 Confidential
esxtop - Continued
28 Confidential
esxtop - Continued
29 Confidential
• What are correct values for these response times?• As with all things revolving around performance, it is subjective
• Obviously the lower these numbers are the better
• ESX will continue to function with nearly any response time, however how well it functions is another issue
• Any command that is not acknowledged by the SAN within 5000ms (5 seconds) will be aborted. This is where perceived disk performance takes a sharp dive
esxtop - Continued
© 2009 VMware Inc. All rights reserved
Confidential
Common Storage Issues
Paul Clark – Storage Escalation Engineer, GSS
31 Confidential
How a LUN is detected as a snapshot in ESX
• When an ESX 3.x server finds a VMFS-3 LUN, it compares the SCSI_DiskID
information returned from the storage array with the SCSI_DiskID information
stored in the LVM Header.
• If the two IDs do not match, the VMFS-3 volume is not mounted.
A VMFS volume on ESX can be detected as a snapshot for a number of
reasons:
• -LUN ID change
• -SCSI version supported by array changed (firmware upgrade)
• -Identifier type changed – Unit Serial Number vs NAA ID
Snapshot LUNs
32 Confidential
Resignaturing Methods
ESX 3.5
Enable LVM Resignaturing on the first ESX host
Configuration > Advanced Settings > LVM > LVM.EnableResignaturing to 1.
ESX 4
Single Volume Resignaturing
Configuration > Storage > Add Storage > Disk / LUN
Select Volume to Resignature > Select Mount, or Resignature
Snapshot LUNs - Continued
33 Confidential
What is a Virtual Machine Snapshot:
• A snapshot captures the entire state of the virtual machine at the time you take
the snapshot.
• This includes:
Memory state – The contents of the virtual machine‘s memory.
Settings state – The virtual machine settings.
Disk state – The state of all the virtual machine‘s virtual disks.
Virtual Machine Snapshots
34 Confidential
Common issues:
• Snapshots filling up a Data Store
• Offline commit
• Clone VM
• Parent has changed.
• Contact VMware Support
• No Snapshots Found
• Create a new snapshot, then commit.
Virtual Machine Snapshot - Continued
35 Confidential
ESX 4, Set Up Multipathing for Software iSCSI
Prerequisites:
• Two or more NICs.
• Unique vSwtich.
• Supported iSCSI array.
• ESX 4.0 or higher
ESX4 iSCSI Multi-pathing
36 Confidential
Using the vSphere CLI, connect the software iSCSI initiator to the
iSCSI VMkernel ports.
Repeat this command for each port.
• esxcli swiscsi nic add -n <port_name> -d <vmhba>
Verify that the ports were added to the software iSCSI initiator by running the
following command:
• esxcli swiscsi nic list -d <vmhba>
Use the vSphere Client to rescan the software iSCSI initiator.
ESX4 iSCSI Multi-pathing - Continued
37 Confidential
This example shows how to connect the software iSCSI initiator
vmhba33 to VMkernel ports vmk1 and vmk2.
Connect vmhba33 to vmk1:
esxcli swiscsi nic add -n vmk1 -d vmhba33
Connect vmhba33 to vmk2:
esxcli swiscsi nic add -n vmk2 -d vmhba33
Verify vmhba33 configuration:
esxcli swiscsi nic list -d vmhba33
ESX4 iSCSI Multi-pathing - Continued
38 Confidential
The Issue
You want to remove a LUN from a vSphere 4 cluster
You move or Storage vMotion the VMs off the datastore who is being removed
(otherwise, the VMs would hard crash if you just yank out the datastore)
After removing the LUN, VMs on OTHER datastores would become unavailable (not
crashing, but becoming periodically unavailable on the network)
the ESX logs would show a series of errors starting with ―NMP‖
All Paths Dead (APD)
39 Confidential
Workaround 1
In the vSphere client, vacate the VMs from the datastore being removed (migrate or
Storage vMotion)
In the vSphere client, remove the Datastore
In the vSphere client, remove the storage device
Only then, in your array management tool remove the LUN from the host.
In the vSphere client, rescan the bus.
Workaround 2
Only available in ESX/ESXi 4 U1
esxcfg-advcfg -s 1 /VMFS3/FailVolumeOpenIfAPD
All Paths Dead - Continued
© 2009 VMware Inc. All rights reserved
Confidential
Questions
© 2009 VMware Inc. All rights reserved
Confidential
vSphere Networking Overview
David Garcia – NASA L2 Escalation Engineer, GSS
42 Confidential
Agenda
Virtual Switches
Virtual Switch Capabilities
NIC Teaming
Link Aggregation
NIC Failover
New Adapter Types
VLANs
Tips & Tricks
Troubleshooting
43 Confidential
Why Do We Need a Virtual Switch?
Non-Virtualized
Layer 2 Access
switches
Access Ports
(single VLAN
—no tagging)
Per Host
network visibility
from each port
VLAN
Trunks
Distribution
and core
Virtual Switch
…
Virtualized
Access Ports
(single VLAN
—no tagging)
VLAN
Trunks
L2 Virtual Switch
provides fanout and
policy control to each VM
(consistent with non-
virtualized environment)
Layer 2 Virtual Access
switch
Layer 2
switches
Distribution
and core
VLAN
Trunks
ESX host
44 Confidential
Virtual vs. Physical Network Management
• Separation of Network and Server
provisioning and management systems
• Virtual Center managing & provisioning
ESX hosts and virtual switches
• Physical network managed / provisioned by
existing networking vendor‘s tools and
applications
• Network visibility ends at physical
switch port
• Different interfaces
and tools
• IOS CLI for physical network
• VC GUI and esxcfg cli
for vSwitchesNetwork
Management
Virtual Center
vNetwork Distributed Switch
45 Confidential
Copyright © 2005 VMware, Inc. All rights reserved.
vNetwork Standard Switch
What is it?
• Virtual network living inside ESX providing
interconnectivity between VMs and the external
physical network via standard networking
protocols (Ethernet)
• Enables many VMs to share same physical NIC
and communicate directly with each other
Standard Networking Features
• L2 Ethernet switching (inter-vm traffic)
• VLAN Segmentation
• Rate limiting - restrict traffic generated by a VM
• NIC port aggregation and redundancy for
enhanced availability and load balancing of
physical network resources (VMware NIC
Teaming)
I/O Features
• Enhanced VMXNET, E1000, VLANCE
• Checksum off-loading, TSO, Jumbo Frames,
NetQueue
• 10GigE, FCoE
• IB (community support)
46 Confidential
vNetwork Standard Switch – Up Close
Port Groups
created for each host
Uplinks (physical NICs)
attached to vSwitch
vNetwork Standard
Switch (vSwitch)
Standard Switch for each ESX host
Virtual Machines
47 Confidential
vNetwork Standard Switch
Virtual Switch
Virtual Machine
Network
W2003EE-32-A W2003EE-32-B
Virtual Switch – Host1
Virtual Switch
Virtual Machine
Network
W2003EE-32-A2 W2003EE-32-B2
Virtual Switch
Virtual Machine
Network
W2003EE-32-A3 W2003EE-32-B3
Virtual Switch
Virtual Machine
Network
W2003EE-32-A4 W2003EE-32-B4
Virtual Switch – Host2 Virtual Switch – Host3 Virtual Switch – Host4
ESX HOST 1 ESX HOST 2 ESX HOST 3 ESX HOST 4
48 Confidential
vNetwork Distributed Switch (vDS)
Aggregated cluster level (and
beyond) virtual network
management
Simplified setup and change
Easy troubleshooting, monitoring
and debugging
Additional features include:
Private VLANs
Bi-directional traffic shaping
Network VMotion
3rd party distributed switch
support
Bundled with vSphere Enterprise
Plus
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
VMware vSphere™
vNetwork Distributed Switch
49 Confidential
vNetwork Distributed Switch (vDS) - Continued
Production
Network
Service
Console
vmkernel Service
Console
vmkernel Service
Console
vmkernel
…
Service
Console
1Vmk
1
Service
Console
2
Service
Console
3Vmk
2Vmk
3
ESX Host 1 ESX Host 2 ESX Host 3
vCenter Server
vNetwork Distributed Switch
vDS Representation
A B G H C D I J E F K L
A B C D E F G H I J K L
The Virtual Switch Control
Planes are aggregated
in vCenter Server
The Data Plane remains in each
ESX host and is responsible for
frame forwarding, teaming, etc
DV Port Groups aggregated over
entire vDS and across hosts and
group ports with same
configuration and policy
50 Confidential
vNetwork Distributed Switch: Configuration View
DV Port Groups
span all hosts
covered by vDS
DV Uplink Port Group
defines uplink policies
DV Uplinks abstract
actual physical nics
(vmnics) on hosts
vmnics on each host
mapped to dvUplinks
51 Confidential
vSphere Networking - 3rd Party - Distributed Switch Style
W2003EE-32-A W2003EE-32-B
vSwitch
Single
Distributed
Port
Group
Host1
W2003EE-32-A2 W2003EE-32-B2
Host2
W2003EE-32-A3 W2003EE-32-B3
Host3
W2003EE-32-A4 W2003EE-32-B4
Host4
vNetwork
3rd Party Distributed Virtual Machine Network Single
Distributed
Virtual Switch
3rd Party Distributed
Switch Spanning
Host1, Host2,
Host3, Host4
52 Confidential
vNetwork Appliance API
• Filter driver in vmkernel to provide security features within ESX networking layer
• vNetwork Appliance APIs available to partners
• Clients of this API may inspect/alter/drop/inject any frame on a given port:
• Either directly in the IO path (fast path agent)
• Or by punting frames up to an appliance VM (slow path agent)
• State mobility for data in fast path agent and slow path agent
• Communication between slow path and fast path agents
• Bind to VM‘s vNIC or to dvswitch port
Lightweight filtering
in ―Fast Path‖ agent
Heavyweight filtering
in ―Slow Path‖ agent
53 Confidential
vNetwork - 3rd Party Virtual Switches – Who does what?
Third Party VSwitch
Roles and Responsibilities vNetwork
Distributed
Switch
vNetwork
(with 3rd Party virtual
switching)
Associate VMs to virtual networks vSphere Admin vSphere Admin
Associate server NICs to virtual networks vSphere Admin vSphere Admin
Create Virtual Switches vSphere Admin Network Admin
Create Port Groups vSphere Admin Network Admin
Modify VLAN Settings (virtual) vSphere Admin Network Admin
Configure NIC Team vSphere Admin Network Admin
Monitors Virtual Network vSphere Admin Network Admin
3rd Party Virtual Switches enable end to end
physical and virtual networking feature parity
Network admins now able to provision and
monitor the virtual network using existing physical
network management tools
54 Confidential
Nexus 1000V & vCenter Server Views
―show interface‖ from Nexus 1000V VSM consoleView from vSphere Client to vCenter Server
―show module‖ from Nexus 1000V VSM console
―access‖ port—assigned
to single VLANVSM
VEM
55 Confidential
vDS Deployment Options
vSS, vDS and Nexus Switches
can co-exist on same host
Network VMotion only required
for Guest VMs
• Optionally leave SC, vmkernel
ports on vSS
• Note: enhanced features only on
vDS
vSS vSS vSS
vSS vSS vSS
vDS
vDS
Partial Migration to vDS
VMs use
vDS
Service Console,
vmkernel ports
remain on vSS
Original Environment Complete Migration to vDS
KB - Migrating virtual machines between vSwitch or PortGroups to vDS or dvPortgroups (1010612)
56 Confidential
vDS Deployment Options - Continued
vSS Cisco Nexus 1000V
Original Environment Complete Migration to Nexus 1000V
vSS vSS vSS vSS vSS
vDS
Multiple vDS
vDSvSS vSS vSS
vDS
Partial Migration to Nexus 1000V
57 Confidential
vDS Deployment Rules
vSS, vDS, Nexus 1000V can co-exist
• Multiple vSS and vDS per host
• Maximum of one Nexus 1000V per host (VEM)
Take note of deployment limits (subject to change!)
• Refer to published limits
pnics (vmnics) can only belong to one virtual switch
58 Confidential
vNetwork Solution Comparisons
VMware
Standard
Switch
VMware
Distributed Switch
Cisco Nexus 1000V
Virtual Network Model Per Host Per ―Datacenter‖ Per ―Datacenter‖
L2 Forwarding YES YES
Cisco Catalyst /
Nexus Features and
Functionality
VLAN Segmentation YES YES
802.1Q Tagging YES YES
NIC Teaming YES YES
TX Rate Limiting YES YES
CDP Support YES YES
vNetwork Appliance APIs YES YES
Datacenter-level management YES
RX Rate Limiting YES
VM Network Port Block YES
PVLAN Support YES
Network VMotion YES
3rd Party Distributed Switch Support YES
59 Confidential
vSphere Networking Summary
• What is it?
• Virtual network (i.e., set of virtual switches) living inside ESX providing interconnectivity between VMs and the external physical network
• Enables many VMs to share physical NICs and communicate directly with each other
• Virtual Networking with vSphere 4
• L2 Switching Features and Management
Cluster level unified virtual network management
Datacenter class features including VLAN, Private VLANs, CDP, RX/TX rate limiting etc.
Built-in availability (NIC Teaming) providing pnicredundancy, availability and load balancing
• vNetwork Platform Extensibility
3rd Party Distributed Switch Support (Cisco Nexus 1000-V)
VMsafe-Net Support
• IPv6 Support (VM, management, VC server)
• vSphere 4 I/O Features
• VMXNET Generation 3 (VMXNET3)
• HW offloading
(Checksum/TSO/LRO)
• Jumbo Frames (VM, NFS and
SW iSCSI)
• NetQueue v2
• VMDirectPath
• 10GigE
• FCoE
© 2009 VMware Inc. All rights reserved
Confidential
vSphere Networking Best Practices
David Garcia – NASA L2 Escalation Engineer, GSS
61 Confidential
ESX Virtual Switch: Capabilities
Layer 2 switch—forwards frames based
on 48-bit destination MAC address in
frame
MAC address known by registration
(it knows its VMs!)—no MAC
learning required
Can terminate VLAN trunks (VST mode)
or pass trunk through to VM (VGT mode)
Physical NICs associated with
vSwitches
NIC teaming (of uplinks)
• Availability: uplink to multiple
physical switches
• Load sharing: spread load
over uplinks
VM0 VM1
vSwitch
MAC
address
assigned to
vnic
62 Confidential
ESX Virtual Switch: Forwarding Rules
The vSwitch will forward frames
• VM VM
• VM Uplink
But not forward
• vSwitch to vSwitch
• Uplink to Uplink
ESX vSwitch will not create
loops in the physical network
And will not affect Spanning Tree
(STP) in the physical network
VM0 VM1
vSwitch
Physical
Switches
vSwitch
MAC a MAC b MAC c
63 Confidential
Spanning Tree Protocol (STP) Considerations
Spanning Tree Protocol used to create
loop-free L2 tree topologies
in the physical network
• Some physical links put in ―blocking‖ state
to construct loop-free tree
ESX vSwitch does not participate
in Spanning Tree and will not create
loops with uplinks
• ESX Uplinks will not block and always
active (full use of all links)
VM0 VM1
vSwitch
Physical
Switches
MAC a MAC b
Switches sending
BPDUs every 2s to
construct and
maintain Spanning
Tree Topology
vSwitch drops
BPDUs
Blocked link
Recommendations for Physical Network Config:
1. Leave Spanning Tree enabled on physical network and ESX
facing ports (i.e. leave it as is!)
2. Use ―portfast‖ or ―portfast trunk‖ on ESX facing ports
(puts ports in forwarding state immediately)
3. Use ―bpduguard‖ to enforce STP boundary
KB - STP may cause temporary loss of network connectivity when a failover or failback event occurs (1003804)
64 Confidential
NIC Teaming for Availability and Load Sharing
NIC Teaming aggregates multiple
physical uplinks for:
• Availability—reduce exposure
to single points of failure
(NIC, uplink, physical switch)
• Load Sharing—distribute load over
multiple uplinks (according to selected NIC
teaming algorithm)
Requirements:
• Two or more NICs on same vSwitch
• Teamed NICs on same L2
broadcast domain
VM0 VM1
vSwitch
NIC Team
KB - NIC teaming in ESX Server (1004088)
KB - Dedicating specific NICs to portgroups while maintaining NIC teaming and failover for the vSwitch (1002722)
65 Confidential
NIC Teaming Options
Name Algorithm—vmnic
chosen based upon:
Physical Network Considerations
Originating
Virtual Port ID
vnic port Teamed ports in same L2 domain
(BP: team over two physical switches)
Source MAC
Address
MAC seen on vnic Teamed ports in same L2 domain
(BP: team over two physical switches)
IP Hash* Hash(SrcIP, DstIP) Teamed ports configured in static
802.3ad ―Etherchannel‖
- no LACP
- Needs MEC to span 2 switches
Explicit Failover
Order
Highest order uplink
from active list
Teamed ports in same L2 domain
(BP: team over two physical switches)
Best Practice: Use Originating Virtual PortID
for VMs
*KB - ESX Server host requirements for link aggregation (1001938)
*KB - Sample configuration of EtherChannel/Link aggregation with ESX and Cisco/HP switches (1004048)
66 Confidential
NIC Teaming with vDS
Teaming Policies Are Applied in DV Port Groups to dvUplinks
Service
Console
vmkernel
esx10b.tml.local
A B
Service
Console
vmkernel
esx10a.tml.local
A B
esx09b.tml.localesx09a.tml.local
―Orange‖ DV Port Group
Teaming Policy
0
1
2
3
vmnic0 esx09a.tml.local
vmnic0 esx09b.tml.local
vmnic0 esx10a.tml.local
vmnic2 esx10b.tml.local
vmnic1 esx09a.tml.local
vmnic1 esx09b.tml.local
vmnic1 esx10a.tml.local
vmnic0 esx10b.tml.local
vmnic2 esx09a.tml.local
vmnic2 esx09b.tml.local
vmnic2 esx10a.tml.local
vmnic3 esx10b.tml.local
vmnic3 esx09a.tml.local
vmnic3 esx09b.tml.local
vmnic3 esx10a.tml.local
vmnic1 esx10b.tml.local
vDS
vmnic2 vmnic0vmnic1 vmnic3
vmnic0 vmnic1 vmnic2 vmnic3
KB - vNetwork Distributed Switch on ESX 4.x - Concepts Overview (1010555)
67 Confidential
Link Aggregation
68 Confidential
Link Aggregation - Continued
EtherChannel
is a port trunking (link aggregation is Cisco's term) technology used primarily on Cisco switches
Can be created from between two and eight active Fast Ethernet, Gigabit Ethernet, or 10 Gigabit Ethernet ports
LACP or IEEE 802.3ad
Link Aggregation Control Protocol (LACP) is included in IEEE specification as a method to control the bundling of
several physical ports together to form a single logical channel
Only supported on Nexus 1000v
EtherChannel vs. 802.3ad
EtherChannel and IEEE 802.3ad standards are very similar and accomplish the same goal
There are a few differences between the two, other than EtherChannel is Cisco proprietary and 802.3ad is an open
standard
EtherChannel Best Practice
One IP to one IP connections over multiple NICs are not supported (Host A one connection session to Host B uses
only one NIC)
Supported Cisco configuration: EtherChannel Mode ON – ( Enable Etherchannel only)
Supported HP configuration: Trunk Mode
Supported switch Aggregation algorithm: IP-SRC-DST short for (IP-Source-Destination)
The only load balancing option for vSwitch or vDistributed Switch that can be used with EtherChannel is IP HASH
Do not use beacon probing with IP HASH load balancing
Do not configure standby uplinks with IP HASH load balancing.
69 Confidential
Failover Configurations
• Link Status Only relies solely on the link status provided by the network adapter
•Detects failures such as cable pulls and physical switch power failures
•Cannot detect configuration errors
•Switch port being blocked by spanning tree
•Switch port configured for the wrong VLAN
•cable pulls on the other side of a physical switch.
• Beacon Probing sends out and listens for beacon probes
•Ethernet broadcast frames sent by physical adapters to detect upstream network
connection failures
•on all physical Ethernet adapters in the team, as shown in Figure
•Detects many of the failures mentioned above that are not detected by link status alone
•Should not be used as a substitute for a redundant Layer 2 network design
•Most useful to detect failures in the closest switch to the ESX Server hosts
•Beacon Probing Best Practice
•Use at least 3 NICs for triangulation
•If only 2 NICs in team, probe can‘t determine which link failed
•Shotgun mode results
•KB - What is beacon probing? (1005577)
•KB - ESX host network flapping error when Beacon Probing is selected (1012819)
•KB - Duplicated Packets Occur when Beacon Probing Is Selected Using vmnic and
VLAN Type 4095 (1004373)
•KB - Packets are duplicated when you configure a portgroup or a vSwitch to use a route
that is based on IP-hash and Beaconing Probing policies simultaneously (1017612)
Figure — Using beacons to detect upstream
network connection failures.
70 Confidential
Port Group Configuration
A Port Group is a template for one or more ports with a common configuration
• Assigns VLAN to port group members
• L2 Security—select ―reject‖ to see only frames for VM MAC addr
• Promiscuous mode/MAC address change/Forged transmits
• Traffic Shaping—limit egress traffic from VM
• Load Balancing—Origin VPID, Src MAC, IP-Hash, Explicit
• Failover Policy— Link Status & Beacon Probing
• Notify Switches—‖yes‖-gratuitously tell switches of mac location
• Failback—‖yes‖ if no fear of blackholing traffic, or, …
• … use Failover Order in ―Active Adapters‖
Distributed Virtual Port Group (vNetwork Distributed Switch)
• All above plus:
• Bidirectional traffic shaping (ingress and egress)
• Network VMotion—network port state migrated upon VMotion
71 Confidential
VMXNET3—The Para-virtualized VM Virtual NIC
• Next evolution of ―Enhanced VMXNET‖ introduced in ESX 3.5
• Adds
• MSI/MSI-X support (subject to guest operating system kernel support)
• Receive Side Scaling (supported in Windows 2008 when explicitly enabled
through the device's Advanced configuration tab)
• Large TX/RX ring sizes (configured from within the virtual machine)
• High performance emulation mode (Default)
• Supports
• High DMA
• TSO (TCP Segmentation Offload) over IPv4 and IPv6
• TCP/UDP checksum offload over IPv4 and IPv6
• Jumbo Frames
• 802.1Q tag insertion
KB - Choosing a network adapter for your virtual machine (1001805)
72 Confidential
VMDirectPath for VMs
I/O Device
Device Driver
Virtual
Layer
What is it?
Enables direct assignment of PCI devices to VM
Types of workloads
I/O Appliances
High performance VMs
Details
Guest controls the physical H/W
Requirements
vSphere 4
I/O MMU
Used for DMA Address Translation (Guest Physical
Host Physical) and protection
Generic device reset (FLR, Link Reset, ...)
KB - Configuring VMDirectPath I/O pass-through devices on an ESX host (1010789)
73 Confidential
FCoE on ESX
VMware ESX Support
• FCoE supported since ESX 3.5u2
• Requires Converged Network
Adapters ―CNAs‖—(see HCL) e.g.
• Emulex LP21000 Series
• Qlogic QLE8000 Series
• Appears to ESX as:
• 10GigE NIC
• FC HBA
• SFP+ pluggable transceivers
• Copper twin-ax (<10m)
• Optical
10GigE
NIC
Fibre
Channel
HBA
vSwitch
FCoE
Switch
Fibre
ChannelEthernet
FCoE
CNA—Converged
Network Adapter
ESX
74 Confidential
Using 10GigE
2x 10GigE common/expected
• 10GigE CNAs or NICs
Possible Deployment Method
• Active/Standby on all Portgroups
• VMs ―sticky‖ to one vmnic
• SC/vmk ports sticky to other
• Use Ingress Traffic Shaping
to control traffic type per
Port Group
• If FCoE, use Priority Group
bandwidth reservation
(on CNA utility)
vSwitch
iSCSI NFS VMotion FT SC
FCoE FCoE
SC#2
FCoE
10
FCoE Priority Group
bandwidth reservation
(in CNA config utility)
Gbps10GE10GE
Ingress (into switch)
traffic shaping policy
control on Port Group
1-2G Low b/wHigh
b/w
Variable/high
b/w 2Gbps+
75 Confidential
Traffic Types on a Virtual Network
Virtual Machine Traffic
• Traffic sourced and received from virtual machine(s)
• Isolate from each other based on service level
VMotion Traffic
• Traffic sent when moving a virtual machine from one ESX host to another
• Should be isolated
Management Traffic
• Should be isolated from VM traffic (one or two Service Consoles)
• If VMware HA is enabled, includes heartbeats
IP Storage Traffic—NFS and/or iSCSI via vmkernel interface
• Should be isolated from other traffic types
Fault Tolerance (FT) Logging Traffic
• Low latency, high bandwidth
• Should be isolated from other traffic types
How do we maintain traffic isolation without proliferating NICs?
76 Confidential
VLAN Trunking to Server
IEEE 802.1Q VLAN Tagging
• Enables logical network partitioning
(Traffic separation)
• Scale traffic types without scaling physical NICs
• Virtual machines connect to virtual
switch ports (like access ports
on physical switch)
• Virtual switch ports are associated
with a particular VLAN (VST mode)—defined
in PortGroup
• Virtual switch tags packets exiting host
VM0 VM1
vSwitch
PortGroup
―Blue‖
VLAN 20
Port Group
―Yellow‖
VLAN 10
VLAN Trunks
Carrying
VLANs 10, 20
802.1Q Header
810012-bit VLAN id
field
(0-4095)
77 Confidential
VLAN Tagging Options
vSwitch
Physical Switch
vSwitch
Physical Switch
vSwitch
Physical Switch
VST – Virtual Switch Tagging VGT – Virtual Guest Tagging EST – External Switch Tagging
VLAN Tags
applied in
vSwitch
VLAN Tags
applied in
Guest
PortGroup
set to VLAN
―4095‖
External Physical
switch applies
VLAN tags VST is the best practice and
most common method
VLAN
assigned in
Port Group
policy
78 Confidential
VLAN Tagging: Further Example
KB -Sample configuration of virtual switch VLAN tagging (VST Mode) and ESX Server (1004074)
Uplinks A, B, and C connected to trunk ports on physical switch which carry four VLANs
(e.g. VLANs 10, 20, 50, 90)
Ports 1-14 emit untagged frames, and only those frames which were tagged with their respective VLAN ID
(equivalent to ―access port‖ on physical switch)
• Port Group VLAN ID set to one of 1-4094
Port 15 emits tagged frames for all VLANs.
• Port Group VLAN ID set to 4095 (for vSS) or ―VLAN Trunking‖ on vDS DV Port Group
1310 12 14111 2 3 4 5 6 7 8 9
A C
15
B
VLAN Trunks
Carrying VLANs
10, 20, 50, 90
Access Ports
on VLAN 10Access Ports
on VLAN 20
Access Ports
on VLAN 50
All VLANs
(10,20,50,90)
trunked to
VM
interface GigabitEthernet1/2
description host32-vmnic0
switchport trunk encapsulation dot1q
switchport trunk native vlan 999
switchport trunk allowed vlan 10,20,50,90
switchport mode trunk
spanning-tree portfast trunk
Example
configuration on
Physical Switch
79 Confidential
Private VLANs: Traffic Isolation for Every VM
Solution: PVLAN
• Place VMs on the same virtual network
but prevent them from communicating
directly with each other (saves VLANs!)
• Avoids scaling issues from assigning
one VLAN and IP subnet per VM
Details
• Instead, configure a SINGLE DV port
group to have a SINGLE isolated*
VLAN (ONLY ONE)
• Attach all your VMs to this SINGLE
isolated VLAN DV port group
Distributed
Switch with
PVLAN
Private VLAN traffic isolation
between guest VMs
Common
Primary VLAN
on uplinks
KB - Private VLAN (PVLAN) on vNetwork Distributed Switch - Concept Overview (1010691)
80 Confidential
W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B
vNetwork Distributed SwitchPG PG PG PG PG PG PG PG PG PG PG PG
TOTAL COST: 12 VLANs (one per VM)
TOTAL COST: 1 PVLAN (over 90% savings…)
W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B
vNetwork Distributed SwitchPG (with Isolated PVLAN)
Private VLANs - Continued
81 Confidential
Designing the Network
How do you design the virtual network for
performance and availability and but maintain
isolation between the various traffic types
(e.g. VM traffic, VMotion, and Management)?
• Starting point depends on:
• Number of available physical ports on server
• Required traffic types
• 2 NIC minimum for availability, 4+ NICs
per server preferred
• 802.1Q VLAN trunking highly recommended for logical
scaling (particularly with low NIC port servers)
• Previous examples are meant as guidance and do not
represent strict requirements in terms of design
• Understand your requirements and resultant traffic types and
design accordingly
82 Confidential
Tips & Tricks
• KB - Changing a MAC address in a Windows virtual machine (1008473)
• When a physical machine is converted into a virtual machine, the MAC address of the network adapter is
changed. This can pose a problem when software is installed where the licensing is tied to the MAC
address.
• KB – Configuring speed and duplex of an ESX Server host network adapter (1004089)
• ESX recommended settings for Gigabit-Ethernet speed and duplex while connecting to a physical switch
port are as following:
• Auto Negotiate <-> Auto Negotiate
• It is not recommended to mix hard-coded setting with Auto-negotiate.
• KB - Sample Configuration - Network Load Balancing (NLB) Multicast mode over routed
subnet - Cisco Switch Static ARP Configuration (1006525)
• NLB Multicast Mode – Static ARP Resolution
• Since NLB packets are unconventional, meaning the IP address is Unicast while the MAC address of it is
Multicast, switches and routers drop NLB packets
• NLB Multicast Packets get dropped by routers and switches, causing the ARP tables of switches to not get
populated with cluster IP and MAC address
• Manual ARP Resolution of NLB cluster address is required on physical switch and router interfaces
• Cluster IP and MAC static resolution is set on each switch port that connects to ESX host
83 Confidential
Tips & Tricks - Continued
• KB - Cisco Discovery Protocol (CDP) network information via command line and
VirtualCenter on an ESX host (1007069)
• Utilizing Cisco Discovery protocol (CDP) to get switch port configuration information.
• This command is utilized to troubleshoot network connectivity issues related to VLAN tagging methods on
virtual and physical port settings.
• KB - Troubleshooting network issues with the Cisco show tech-support command
(1015437)
• If you experience networking issues between vSwitch and physical switched environment, you can obtain
information about the configuration of a Cisco router or switch by running the show tech-support command
in privileged EXEC mode.
• Note: This command does not alter the configuration of the router.
84 Confidential
Troubleshooting
• KB - ESX host or virtual machines have intermittent or no network connectivity
(1004109)
• KB - Troubleshooting Nexus 1000V vDS network issues (1014977)
• KB - Cisco Nexus 1000V installation and licensing information (1013452)
• Cisco Nexus 1000V Troubleshooting Guide, Release 4.0(4)SV1(2) 20/Jan/2010
• Cisco Nexus 1000V Troubleshooting Guide, Release 4.0(4)SV(1) 21/Jan/2010
• KB - Configuring promiscuous mode on a virtual switch or portgroup (1004099)
• KB - Troubleshooting network issues by capturing and sniffing network traffic via
tcpdump (1004090)
• KB - Troubleshooting network connection issues using Address Resolution Protocol
(ARP) (1008184)
• IEEE OUI and Company id Assignments http://standards.ieee.org/regauth/oui/index.shtml
• KB - Network performance issues (1004087)
85 Confidential
Troubleshooting - Continued
• KB - Low Network Throughput in Windows Guest when Running UDP Application
(5298153)
• KB - Performance of Outgoing UDP Packets Is Poor (10172)
• KB - Poor Network File Copy performance between local VMFS and shared VMFS
(1003554)
• KB - Cannot connect to ESX 4.0 host for 30-40 minutes after boot (1012942)
• Ensure that DNS is configured and reachable from the ESX host
• KB - Identifying issues with and setting up name resolution on ESX Server (1003735)
• Note: localhost must always be present in the hosts file. Do not modify or remove the entry for localhost
• The hosts file must be identical on all ESX Servers in the cluster
• There must be an entry for every ESX Server in the cluster
• Every host must have an IP address, Fully Qualified Domain Name (FQDN), and short name
• The hosts file is case sensitive. Be sure to use lowercase throughout the environment
86 Confidential
Must Read…
Conclusion
This study compares performance results for e1000 and
vmxnet virtual network devices on 32-bit and 64-bit guest
operating systems using the netperf benchmark. The results
show that when a virtual machine is running with software
virtualization, e1000 is better in some cases and vmxnet is
better in others. Vmxnet has lower latency, which sometimes
comes at the cost of higher CPU utilization. When hardware
virtualization is used, vmxnet clearly provides the best
performance.
Conclusion
VMXNET3, the newest generation of virtual network adapter from
VMware, offers performance on par with or better than its previous
generations in both Windows and Linux guests. Both the driver
and the device have been highly tuned to perform better on
modern systems. Furthermore, VMXNET3 introduces new
features and enhancements, such as TSO6 and RSS. TSO6
makes it especially useful for users deploying applications that
deal with IPv6 traffic, while RSS is helpful for deployments
requiring high scalability. All these features give VMXNET3
advantages that are not possible with previous generations of
virtual network adapters. Moving forward, to keep pace with an
ever‐increasing demand for network bandwidth, we recommend
customers migrate to VMXNET3 if performance is of top concern
to their deployments.
© 2009 VMware Inc. All rights reserved
Confidential
Break
© 2009 VMware Inc. All rights reserved
Confidential
vSphere 4
Performance Best Practices
Paul Hill - System Management Escalation Engineer, GSS
89 Confidential
Agenda
vSphere 4 Performance Enhancements Overview
Virtual Center
High Availability
Distributed Resource Scheduler
Fault Tolerance
Hardware considerations and settings
CPU Performance
Memory Performance
Benchmarking
Common Support Issues
90 Confidential
vSphere 4 Performance Enhancements
Efficiency - reduced virtualization overheads, higher consolidation ratios
Control – Performance monitoring and management, dynamic resource sizing
providing better scalability
Choice – Offering several options for guest OS, virtualization technologies,
broader HCL, and integration with 3rd party management tools
Scalability Enhancements:
vCPU – 4 to 8
VM memory – 65GB to 255GB
Host CPU core – 32 to 64
Host max memory – 256GB to 1TB
Powered-on VM per host – 128 to 320
91 Confidential
Performance Enhancements (cont.)
New Virtual Machine HW support
-SAS for MSCS 2008 support
-IDE for older guest OS
-VMXnet generation 3
-VM hotplug support: dynamically add vCPU and memory
-VMDirectPath: High i/o to physical network devices
CPU Enhancements
-Relaxed CO-Scheduler tuned for SMP VMs
-CPU locking algorithm is improved to reduce overhead in situations
where scheduling decisions are required
-Scheduler aware of process cache to optimize CPU usage
92 Confidential
Memory Enhancements
HW-Assisted Memory Virtualization
VMs differ from physical servers in the respect of virtual memory address
translation
Guest VM memory addresses need to be translated to guest physical addresses
using the guest OS’s page tables before translating to machine physical memory
addresses
The use of shadow page tables are required for this mapping which concurs CPU
and memory overhead
Current processor HW is now available to improve this situation with 2nd level page
table support
-Intel EPT (Extended Page Tables)
-AMD RVI (Rapid Virtualization Indexing)
TLB miss costs in the form of latency is higher with 2nd level PTs than with a single
PT, but the use of large pages for the guest OS will improve performance
93 Confidential
Virtual Center Scalability
Ability to support large scale virtual datacenters
Single VC server = manage 300 ESX hosts, 3000 virtual machines
Ability to link multiple VC servers with ―Linked-Mode‖ to manage up to 10,000
virtual machines from a single console
Performance Charts
-Now, view CPU, memory, disk, network from a single dashboard
-Identify top resource consumers from an aggregate view
-Thumbnail views for host, resource pools, clusters, data-stores to Quickly navigate to a specific chart
Easily drill down at many levels of the inventory to flesh out performance issue
-Data-store utilization by file type and unused capacity
Application Performance
- Oracle, SQL Server, SAP, Exchange, Java
Tools - VMMark, APPspeed
94 Confidential
VirtualCenter Performance
High CPU utilization and sluggish UI performance
Number of clients attached is high
VC needs to keep clients consistent with inventory changes
Aggressive alarm settings
DB administration
Periodic maintenance
Recovery and log settings
Appropriate VC statistics level
Use gigabit NICs for the service console to clone VMs
Assign permissions appropriately
SQL Server Express will only run well up to 5 hosts and/or 50 VMs. Past that, VC needs to run off an Enterprise-class DB.
95 Confidential
Virtual Center Best Practices
VC Database sizing
Estimate of the space required to store your performance statistics in the DB
Separate Critical Files onto Separate Drives
Make sure the database and transaction log files are placed on separate
physical drives
Place the tempdb database on a separate physical drive if possible
Arrangement distributes the I/O to the DB and dramatically improves its
performance
If a third drive is not feasible, place the tempdb files on the transaction log drive
Enable Automatic Statistics
Keep vCenter logging level low, unless troubleshooting
Proper scheduling of DB backups, maintenance, monitoring
Do not run vCenter on a server that has many applications running
vCenter Heartbeat - http://www.vmware.com/products/vcenter-server-
heartbeat/
96 Confidential
High Availability (HA)
HA network configuration check – DNS, NTP, lowercase hostnames, HA advanced settings
Redundancy: server hardware, shared storage, network, management
Test network isolation from a core switch level, and host failure for expected outage behavior
Critical VMs should NOT be grouped together
Categorize VM criticality, then set the failover appropriately
Valid VM network label names required for proper failover
Failover capacity/Admission control may be too conservative when host and VM sizes vary widely – slot size calculator in VC
97 Confidential
DRS (Distributed Resource Scheduler)
Higher number of hosts => more DRS balancing options
Recommend up to 32 hosts/cluster, may vary with VC server configuration and VM/host ratio
Network configuration on all hosts - VMotion network: Security policies, VMotion NIC enabled, Gig
Reservations, Limits, and Shares
- Shares take effect during resource contention
- Low limits can lead to wasted resources
- High VM reservations may limit DRS balancing
- Overhead memory
- Use resource pools for better manageability, do not nest too deep
Virtual CPU‘s and Memory size
High memory size and virtual CPU‘s => fewer migration opportunities
Configure VMs based on need network, etc.
98 Confidential
DRS (Cont.)
Ensure hosts are CPU compatible
- Intel vs. AMD
- Similar CPU family/features
- Consistent server bios levels, and NX bit exposure
- Enhanced VMotion Compatibility (EVC)
- ―VMware VMotion and CPU Compatibility‖ whitepaper
- CPU incompatibility => limited DRS VM migration options
Larger Host CPU and memory size preferred for VM placement (if all equal)
Differences in cache or memory architecture => inconsistency in performance
Aggressiveness threshold - Moderate threshold (default) works well for most cases
Aggressive thresholds recommended if homogenous clusters and VM demand relatively
constant and few affinity/anti-affinity rules
Use affinity/anti-affinity rules only when needed
Affinity rules: closely interacting VMs Anti-affinity rules: I/O intensive workloads, availability
Automatic DRS mode recommended (cluster-wide)
Manual/Partially automatic mode for location-critical VMs (per VM)
Per VM setting overrides cluster-wide setting
99 Confidential
FT - Fault Tolerance
FT Provides complete VM redundancy
By definition, FT doubles resource requirements
Turning on FT disables performance-enhancing features like, H/W MMU
Each time FT is enabled, it causes a live migration
Use a dedicated NIC for FT traffic
Place primaries on different hosts
Asynchronous traffic patterns
Host Failure considerations
Run FT on machines with similar characteristics
100 Confidential
HW Considerations and Settings
When purchasing new servers, target MMU virtualization(EPT/RVI) processors,
or at least CPU virtualization(VT-x/AMD-V) depending on your application work
loads
If your application workload is creating/destroying a lot of processes, or
allocating a lot of memory them MMU will help performance
Purchase uniform, high-speed, quality memory, populate memory banks evenly
in the power of 2.
Choosing a system for better i/o performance MSI-X is needed which allows
support for multiple queues across multiple processors to process i/o in parallel
PCI slot configuration on the motherboard should support PCIe v/2.0 if you
intend to use 10 gb cards, otherwise you will not utilize full bandwidth
101 Confidential
HW Considerations and Settings (cont.)
BIOS Settings
- Make sure what you paid for,… is enabled in the bios
-enable ―Turbo-Mode‖ if your processors support it
- Verify that hyper-threading is enabled – more logical CPUs allow more options
for the VMkernel scheduler
- NUMA systems verify that node-interleaving is enabled
- Be sure to disable power management if you want to maximize performance unless
you are using DPM. Need to decide if performance out-weighs power savings
C1E halt state - This causes parts of the processor to shut down for a short period of time in order to save
energy and reduce thermal loss
-Verify VT/NPT/EPT are enabled as older Barcelona systems do not enable these by
default
-Disable any unused USB, or serial ports
102 Confidential
Resource Types - CPU
CPU resources are the raw processing speed of a given host or
VM
However, on a more abstract level, we are also bound by the
hosts’ ability to schedule those resources.
We also have to account for running a VM in the most optimal
fashion, which typically means running it on the same processor
that the last cycle completed on.
103 Confidential
ESX Server
CPU Performance
Some multi-threaded apps in a SMP VM may not
perform well
Use multiple UP VMs on a multi-CPU physical machine
ESX Server
104 Confidential
CPU Performance
CPU virtualization adds varying amounts of overhead
Little or no overhead for the part of the workload that can run in direct
execution
Small to significant overhead for virtualising sensitive privileged instructions
Performance reduction vs. increase in CPU utilization
CPU-bound applications: any CPU virtualization overhead results in reduced
throughput
non-CPU-bound applications: should expect similar throughput at higher CPU
utilization
105 Confidential
CPU Performance
ESX supports up to eight virtual processors per VM
• Use UP VMs for single-threaded applications
• Use UP HAL or UP kernel
• For SMP VMs, configure only as many VCPUs as needed
• Unused VCPUs in SMP VMs:
• Impose unnecessary scheduling constraints on ESX Server
• Waste system resources (idle looping, process migrations, etc.)
106 Confidential
CPU Performance
Full support for 64-bit guests
64-bit can offer better performance than 32-bit
• More registers, large kernel tables, no HIGHMEM issue in Linux
ESX Server may experience performance problems due to shared
host interrupt lines
• Can happen with any controller; most often with USB
• Disable unused controllers
• Physically move controllers
• See KB 1290 for more details
107 Confidential
Resource Types - Memory
When assigning a VM a ―physical‖ amount of RAM, all you are really
doing is telling ESX how much memory a given VM process will
maximally consume past the overhead.
Whether or not that memory is physical depends on a few factors: Host
configuration, DRS shares/Limits/Reservations and host load.
Generally speaking, it is better to OVER-commit than UNDER-commit.
108 Confidential
Memory Performance
ESX memory space overhead
Service Console: 272 MB
VMkernel: 100 MB+
Per-VM memory space overhead increases with:
Number of VCPUs
Size of guest memory
32 or 64 bit guest OS
ESX memory space reclamation
Page sharing
Ballooning
109 Confidential
Memory Performance
Page tables
ESX cannot use guest page tables
ESX Server maintains shadow page tables
Translate memory addresses from virtual to machine
Per process, per VCPU
VMM maintains physical (per VM) to machine maps
No overhead from ―ordinary‖ memory references
Overhead
Page table initialization and updates
Guest OS context switching
VA
PA
MA
110 Confidential
Memory Performance
Avoid high active host memory over-commitment
• Total memory demand = active working sets of all VMs
+ memory overhead
– page sharing
• No ESX swapping: total memory demand < physical memory
Right-size guest memory
• Define adequate guest memory to avoid guest swapping
• Per-VM memory space overhead grows with guest memory
111 Confidential
Memory Performance
Increasing a VM’s memory on a NUMA machine
Will eventually force some memory to be allocated from a remote node, which
will decrease performance
Try to size the VM so both CPU and memory fit on one node
Node 0 Node 1
112 Confidential
Memory Performance
NUMA scheduling and memory placement policies in ESX manages all VMs transparently
No need to manually balance virtual machines between nodes
NUMA optimizations available when node interleaving is disabled
Manual override controls available
Memory placement: 'use memory from nodes'
Processor utilization: 'run on processors'
Not generally recommended
For best performance of VMs on NUMA systems
# of VCPUs + 1 <= # of cores per node
VM memory <= memory of one node
113 Confidential
Install VMware Tools
vmxnet – high speed networking driver
Memory balloon driver
Improved graphics – mks, screen resolution
Idler program – deschedule Netware guests when idle
Timer sponge for correct accounting of time
Experimental, manually started
www.vmware.com/pdf/vi3_esx_vmdesched.pdf
Time Sync – syncs time with the host every minute
Manually started (KB 1318)
114 Confidential
Benchmarking Guidelines
Carefully select benchmarks
Represents application
Documentation
Repeatability
Define parameters being measured and their metrics
Throughput (MBps), latency (ms)
Benchmark a specific system component
Monitor specific component metrics
Ensure no other component on the system is constrained
Or document any such constraint
For comparisons, preferably vary single parameter at a time
115 Confidential
Benchmarking Guidelines
Comparing native and virtual machines
# of Physical CPUs = # of Virtual CPUs
Native Kernel/HAL = VM Kernel/HAL
Physical Memory = VM Memory
Same bitness (32 or 64) of OS and application
Timing within the VM can be inaccurate
Especially when the processor is over-committed
Use external time source (e.g., the ‗ping‘ methodology)
Performance tools may not work accurately in a VM
116 Confidential
Benchmarking Guidelines
VMmark: A scalable benchmark for virtualized enterprise systems
• Provides meaningful measurement of virtualization performance
• Generates metric that scales with underlying system capacity
• Used to compare the performance of different hardware and virtualization platforms
• Employs realistic, diverse workloads running on multiple OSes
• Mail server: Windows 2003 / MS Exchange 2003 / LoadSim
• Java server: Windows 2003 / SPECjbb2005
• Web server: SLES10 / SPECweb2005
• Database server: SLES10 / MySQL / SysBench
• File server: SLES10 / DBench
117 Confidential
Common Support Issues – ESX/ESXi
Snapshots are not a backup tool
Running a VM on a snapshot may cause performance issues, and could fill up
the data stores
VM-support logs, archive point in time
ESXi - profile capture – syslog export – VIMA - mix with classic
Set the COS to 800 MB
v/4 COS: custom build of RHEL5 up3, limit non-standard RPMS
Hardware health monitoring
AMD Servers- NUMA memory balance
Size guest memory appropriately to avoid guest OS swapping
118 Confidential
Common Support Issues – Troubleshooting
Log collection at the time of the problems, and prior to support request-
website initiated.
Limit configuration changes
Call right away, always better to ask first than wait
Stay current, be proactive, not reactive, could be too late
Keep VMware tools up to date
KB, documentation, technical papers
119 Confidential
Reference links
http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf
http://www.vmware.com/resources/techresources/10041
http://www.vmware.com/resources/techresources/10054
http://www.vmware.com/resources/techresources/10066
http://www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf
http://www.vmware.com/pdf/RVI_performance.pdf
http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf
http://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf
© 2009 VMware Inc. All rights reserved
Confidential
Questions