presentation v sphere 5 storage best practices
TRANSCRIPT
-
vSphere 5 Storage Best Practices
Chad Sakac, EMC Corporation
Vaughn Stewart, NetApp
INF-STO2980
#vmworldinf
-
&
The Great Protocol Debate Every protocol can Be Highly Available, and generally,
every protocol can meet a broad performance band Each protocol has different configuration considerations In vSphere, there is core feature equality across protocols
-
&
The Great Protocol Debate
Source: Virtual Geek Poll
-
&
The Great Protocol Debate
NetApp AutoSupport July 2012
Large scale NetApp customers favor NAS
-
&
The Great Protocol Debate
17%
51% 47%
67%
18%
3% 1%
0%
10%
20%
30%
40%
50%
60%
70%
80%
DAS NFS iSCSI FC FCoE InfiniBand AoE
Percentage of Respondents that deployed Protocols (Multiple selections allowed, n=158, selections=371)
Source: Wikibon Survey July 2012
-
&
The Great Protocol Debate Every protocol can Be Highly Available, and generally,
every protocol can meet a broad performance band Each protocol has different configuration considerations In vSphere, there is core feature equality across protocols
Conclusion: there is no debate pick what works for you!
The best flexibility comes from a combination of VMFS and NFS
-
&
A packed agenda
6 Key Things To Do. Leverage Key Docs Setup Multipathing Right Alignment = Good Hygiene Leverage vCenter Plugins, VAAI, and VASA KISS Guidelines for Layout Use SDRS and SIOC if you Can
What To Do When You Are In Trouble. When To Break The Rules. A Peak Into the Future.
-
&
Leverage Key Docs
Key Best Practices 2012
1
-
&
Key VMware Resources & Documents
VMware technical Resource Center Storage Connectivity
Fibre Channel SAN Config Guide iSCSI SAN Config Guide Best Practices for NFS Storage
Understand Storage Taxonomy
LUN ownership Active/Active Active/Passive Virtual Port
Multipathing SAN / NAS
Highly Recommended is a kind way of saying
This Is Mandatory Reading
http://www.vmware.com/technical-resources/virtual-storage/resources.html
-
&
Key Partner Documents Storage varies far more vendor to vendor than servers do Stay current on your arrays Best Practices Even if youre NOT the storage team, read them
NetApp: 7-Mode: Technical Report TR-3749 Cluster-Mode: Technical Report TR-4068 EMC: VNX and vSphere Techbook (h8229), VMAX and vSphere Techbook (h2529), Isilon and vSphere Best Practices Guide (h10522)
http://www.emc.com/collateral/hardware/technical-documentation/h8229-vnx-vmware-tb.pdfhttp://www.emc.com/collateral/hardware/technical-documentation/h8229-vnx-vmware-tb.pdfhttp://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdf
-
&
Setup Multipathing Right
Key Best Practices 2012
2
-
&
Understanding the vSphere Pluggable Storage Architecture
-
&
Whats out of the box in vSphere? Path Selection Policies (PSP):
Fixed (Default for Active-Active arrays) I/O traverses preferred path Reverts to preferred path after failure
MRU (Default for many Active-Passive arrays)
I/O traverses preferred path Remains on alternative path after failure
Round Robin I/O traverse all paths Default is 1000 IOPs per path ALUA sets path preferance Notable change in vSphere 5.1 this is the
default for EMC VNX (R32), VMAX If upgrading, claim rules unchanged
To Change a PSP: Use your Vendors
vCenter Plug-in (easy)
or
resxcli nmp device setpolicy --device
--psp VMW_PSP_RR
-
&
What is Asymmetric Logical Unit (ALUA)?
ALUA Allows for paths to be profiled Active (optimized) Active (non-optimized) Standby Dead (target unreachable = APD) Gone Away (device unreachable =
PDL)
Ensures optimal path selection vSphere PSP and 3rd Party
MPPs
SP A SP B
LUN
-
&
Understanding SAN Multipathing
MPIO is based on initiator-target sessions not links
-
&
Multipathing with NFSv3
switch
SP A 1 0
SP B 1 0
switch
mnic2
ESXi host
vmnic0 vmnic1
NIC teams with Route based on IP hash
load-balancing policy
cross stack EtherChannel
switch port static or dynamic link aggregation
active/active configuration
single switch or stacked switches
spanned, teamed switch ports (feature may not be available on all switches)
yes no
ESXi host
vmnic0 vmnic1 allow VMkernel to make routing decisions
-
&
Microsoft Cluster Service
Unsupported Storage Configurations: FCoE, iSCSI & NFS datastores Round Robin PSP N-Port ID Virtualization (NPIV)
Array vendors solve storage gaps! 3rd party MPPs Guest Connected Storage
Other limits: Memory overcommit vMotion & Fault Tolerance
vSphere 5.1 has expanded WSFC support 4 node with disk quorum, 5 node when MNS
-
&
3rd Party Multi-Pathing Plugins (MPP)
Storage manageability Simple Provisioning Predictable & consistent Optimize data-path utilization
Performance and Scale Tune performance Predictive Load Balancing Automatic fault recovery
3rd party MPPs: EMC PowerPath/VE (now v5.8) Dell/Equalogic PSP
STO
RAG
E
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
PowerPath PowerPath PowerPath PowerPath
Shared Storage
-
&
General NFS Best Practices
Use the EMC & NetApp vCenter plug-ins, automates best practices Note that vCenter plugins from 5.0 and earlier will NOT
WORK with vSphere 5.1 (more on this later)
Use Multiple NFS datastores & 10GbE 1GbE requires more complexity to address I/O scaling due
to one data session per connection with NFSv3
-
&
General NFS Best Practices - Timeouts Configure the following on each ESX
server (automated by vCenter plugins): NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10
Increase Guest OS time-out values to match HKLM> System> CurrentControlSet>
Services> Disk. Select the TimeOutValue and set the data
value to 125 (decimal).
Increase Net.TcpIpHeapSize (follow vendor recommendation)
-
&
General NFS Best Practices - Timeouts Configure the following on each ESX
server (automated by vCenter plugins): NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10
Increase Guest OS time-out values to match HKLM> System> CurrentControlSet>
Services> Disk. Select the TimeOutValue and set the data
value to 125 (decimal).
Increase Net.TcpIpHeapSize (follow vendor recommendation)
-
&
What has changed in vSphere 5?
Minor change in NFS v3 client not NFS v4, NFS v4.1 or pNFS FQDN is specified in datastore configuration
DNS lookup will occur on ESXi boot Supports DNS round robin
Distribute NFS client logins across a vSphere 5 cluster for load balancing across multiple IPs EMC Isilon NetApp Data ONTAP 8 Cluster-Mode
-
&
Path Management with Scale-Out Arrays
Storage arrays have target addresses WWN / iQN / IP
Scale-out array these addresses are virtual & mapped to physical I/O ports
LUNs leverage multipathing sw to route as LUN traverses controllers
On NetApp Cluster-Mode, NFSv3 requires one IP per datastore Avoid hopping arrays to access physical disk
On EMC Isilon, use SmartConnect
-
&
iSCSI & NFS Ethernet Jumbo Frames What is an Ethernet Jumbo Frame?
Ethernet frames with more than 1500 bytes of payload (9000 is common FCoE is 2240)
Commonly thought of as having better performance
Should I use Jumbo Frames?
Adds complexity & performance gains (while existent) are relatively marginal with common block sizes
Stick with the defaults when you can
-
&
IP Storage: Using iSCSI & NFS Together iSCSI and NFS route
differently iSCSI uses vmknics with no
Ethernet failover using MPIO instead
NFS client relies on vmknics using link aggregation/Ethernet failover
NFS relies on host routing table
Best practice is to have separate subnets & virtual interfaces for each
-
&
Optimize I/O aka Alignment Key Best Practices 2012
3
-
&
Alignment is Optimal I/O Misalignment of filesystems results in additional work on
storage controller to satisfy IO request Affects every protocol, and every storage array
VMFS & NFS Datastores VMDKs & RDMs with NTFS, EXT3, etc
Filesystems exist in the datastore and VMDK
Cluster
Chunk
Cluster
Chunk
Cluster
Chunk
Block VMFS 1MB-8MB
Array 4KB-64KB
Guest Alignment
FS 4KB-1MB
-
&
Alignment is Optimal I/O Misalignment of filesystems results in additional work on
storage controller to satisfy IO request Affects every protocol, and every storage array
VMFS & NFS Datastores VMDKs & RDMs with NTFS, EXT3, etc
Filesystems exist in the datastore and VMDK
Cluster
Chunk
Cluster
Chunk
Cluster
Chunk
Block VMFS 1MB-8MB
Array 4KB-64KB
Guest Alignment
FS 4KB-1MB
-
&
Disk Alignment
Aligning I/O can have significant performance improvements for high disk I/O VMs
-
&
Alignment is Optimal I/O
VMware, Microsoft, Citrix, NetApp, EMC all agree, align partitions Plug-n-Play Guest Operating Systems
Windows 2008, Vista, & Win7 Fresh installations only no upgrades
Guest Operating Systems requiring manual alignment Windows NT, 2000, 2003, & XP (use diskpart to set to 1MB) Linux (use fdisk expert mode and align on 2048 = 1MB)
-
&
Fixing Misalignment
If VMFS is misaligned: migrate VMs & destroy datastore
If GOS filesystem is misaligned Step 1: Take an array snapshot/backup Step 2: Use offline tools to realign
EMC UBERAlign (open, works with all, scheduler, and in-guest reclaim)
vSphere Migrator Alternate: Use online tool to align
NetApp Migrate & Optimize (VSC feature)
-
&
-
Leverage Plug-ins VAAI & VASA
Key Best Practices 2012
4
-
Where Does Integration Happen? circa 2012
FC FCoE iSCSI APIs &
Mgmt
vCenter
VAAI SCSI cmds
ESX Storage Stack
Datamover
Vendor-specific vCenter Plug-In
View VMware-to-Storage relationships Provision datastores more easily
Leverage array features (compress/dedupe, file/filesystem/LUN snapshots)
VI Client VM
Storage Array
VMFS NFS
NFS client
Network Stack
VMware LVM
HBA Drivers
VSS via VMware Tools Snap request
SvMotion request VM provisioning cmd Turn thin prov on/off
Standards-based VAAI SCSI command support
vStorage API for Multi- pathing
NMP
NFS
NIC Drivers
vStorage API for Data Protection (VDDK)
Vendor Specific vStorage API for
SRM
SRM
VM object Awareness in array/mgmt
tools
Co-op
Co-op
Vendor-specific VAAI NFS operation support
VASA Module
NFS VAAI
Module
iSCSI/FCoE SW Vendor-specific VAAI block module
vCOPs Connectors
-
&
Where Does Integration Happen? circa 2012
FC FCoE iSCSI APIs &
Mgmt
vCenter
VAAI SCSI cmds
ESX Storage Stack
Datamover
Vendor-specific vCenter Plug-In
View VMware-to-Storage relationships Provision datastores more easily
Leverage array features (compress/dedupe, file/filesystem/LUN snapshots)
VI Client VM
Storage Array
VMFS NFS
NFS client
Network Stack
VMware LVM
HBA Drivers
VSS via VMware Tools Snap request
SvMotion request VM provisioning cmd Turn thin prov on/off
Standards-based VAAI SCSI command support
vStorage API for Multi- pathing
NMP
NFS
NIC Drivers
vStorage API for Data Protection (VDDK)
Vendor Specific vStorage API for
SRM
SRM
VM object Awareness in array/mgmt
tools
Co-op
Co-op
Vendor-specific VAAI NFS operation support
VASA Module
NFS VAAI
Module
iSCSI/FCoE SW Vendor-specific VAAI block module
5.1 Change
5.1 Change
5.1 Change
Inyo Change
vCOPs Connectors
New coolness
5.1 Change
-
&
New VAAI Stuff in vSphere 5.x
VAAI TP (Block) reclaim used by View 5.1 sparse VDMK format
VAAI TP at datastore level disabled by default in vSphere 5.0 u1 and vSphere 5.1 (will be back on in future vSphere releases)
VAAI TP reclaim using vmkfstools -k VAAI Fast Clone (File) used by View 5.1 and
vCloud Director 5.1 Depends on file-level snaps Hardware accelerated linked clone
-
&
VAAI NFS Demo
-
&
vCenter Plug-ins First gen was basic
view/provision Second gen exposed advanced
array functions Third gen worked on
simplifying/merging multiple plugins
Fourth gen worked on initial RBAC for VMware/Storage teams
Fifth gen is current. Next gen vSphere 5.1 requires
new plugin architecture around FLEX.
We use EMC Virtual Storage Integrator (VSI) to
dramatically accelerate and simplify storage configuration,
management, and multipathing and it has saved
us days of work Mike Schlimenti, Lead Systems Engineer,
Data Center Experian
-
&
FLEX Plugin Demo
-
&
Keep It Simple
Key Best Practices 2012
5
-
Keep Storage Simple 1. Use Large capacity datastores
1. Avoid RDMs 2. NFS: 16TB 3. VMFS: vSphere 5 = 64TB 4. vSphere 4 = 2TB 5. Avoid extents
2. On array consider 1. Use Storage Pools 2. Use thin provisioned LUNs & Volumes
1. Enable vCenter managed datastore alerts 2. Enable array thin provisioning alerts and auto-grow capabilities
3. Use broad data services rather than micromanage 1. Virtual / auto-tiering & large caches 2. Enable data deduplicaiton
-
&
SRDS and SIOC
Key Best Practices 2012
6
-
&
Use SRDS/SIOC if you can SRDS and SIOC are huge vSphere features If you can equals:
vSphere 4.1 or later, Enterprise Plus VMFS, NFS if vSphere 5.1 (not purely a qual)
Enable it (not on by default), even if you dont use shares will ensure no VM swamps the others
Bonus is you will get guest-level latency alerting! Default threshold is 30ms
Leave it at 30ms for 10K/15K, increase to 50ms for 7.2K, decrease to 10ms for SSD
Fully supported with array auto-tiering - leave it at 30ms for FAST pools Hard IO limits are handy for View use cases Some good recommended reading:
http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdf http://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-
array-auto-tiering.html http://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.html http://www.yellow-bricks.com/2010/09/29/storage-io-fairness/
http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdfhttp://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.htmlhttp://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.htmlhttp://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.htmlhttp://www.yellow-bricks.com/2010/09/29/storage-io-fairness/
-
&
Storage DRS Operations IO Thresholds
SDRS triggers action on either capacity and/or latency
Capacity stats are constantly gathered by vCenter, default threshold 80%.
I/O load trend is evaluated (default) every 8 hours based on a past day history, default threshold 15ms.
Storage DRS will do a cost / benefit analysis!
For latency Storage DRS leverages Storage I/O Control functionality.
When using array Auto-Tiering, use SDRS, but disable I/O metric here. This combination gives you the simplicity benefits of SDRS for automated placement and capacity balancing but adds: Economic and performance
benefits of automated tiering across SSD, FC, SAS, SATA
10x (VNX) and 100x (VMAX) higher granularity (sub VMDK)
-
&
Storage DRS Array use-case considerations Feature/Product/Use Case SDRS Initial
Placement SDRS Migration
VMware Linked Clones Not supported
VMware Snapshots Supported
VMware SRM Not Supported
RDM Pointer Files Pointers Supported / LUNs not
Pre vSphere 5 hosts Not Supported
NFS Datastores Supported
Distributed Virtual Volumes Supported
Array-based VM Clones Supported
Array-based Replication Supported Unanticipated migrations will increase WAN utilization
Array-based Snapshots Supported Unanticipated migrations will increase space consumed
Array-based Compression & Deduplication
Supported Unanticipated migrations will temporarily increase space consumed
Array-based thin provisioning Supported Supported on VASA-enabled arrays only
Array based auto-tiering Supported Supported. Disable I/O metrics in SDRS, but enable SIOC on datastores to handle spikes of I/O contention.
-
&
What to do when youre in trouble...
Getting yourself out of a jam
-
My VM is not performing as expected How do I know: application not meeting a pre-
defined SLA, or SIOC/SDRS GOS thresholds being exceeded
What do I do: Step 1, pinpoint (thank you Scott Drummonds!)
Use ESXTop first: http://communities.vmware.com/docs/DOC-5490
..then vSCSIstats: http://communities.vmware.com/docs/DOC-10095
Step 2, if the backend: Use Unisphere Analyzer, SPA (start with backend and CPU) Check VM alignment (will show excessive stripe crossings) Cache enabled, FAST/FAST Cache settings on the storage
pool ensure FAST and SIOC settings are consistent
http://communities.vmware.com/docs/DOC-5490http://communities.vmware.com/docs/DOC-10095
-
&
I see all these device events in vSphere
How do I know: VM is not performing well and LUN trespasses warning messages in event log
What do I do: ensure the right failover mode and policy are used. Ensure you have redundant paths from host to storage system. Check LUN ownership balance.
-
&
Datastore capacity utilization is low/high
How do I know: Managed Datastore Reports in vCenter 4.x + Array tools - e.g. EMC Unisphere (vCenter Integration)
Report, EMC Prosphere, NetApp OnCommand
What do I do: Consider doing space reclaim using vmkfstools -k Migrate the VM to a datastore that is configured over a
virtually provisioned storage. For VMFS datastore, ESX thin provisioning/compress/dedupe can also be utilized
-
&
My storage team gives me tiny devices How do I know:
Storage team controls provisioning What do I do:
This means you have an legacy mindset in storage Cloud is on-demand New Model Storage admin provisions pools VI
Admin consumes from pools via plug-ins VMAX uses hyper devices, and hypers are assembled into
meta devices, VNX defaults are pooled devices NetApp has aggregates, pools of RAID-DP protected disks
Both are basis for LUNs and FlexVols Engage your array vendor to move the
storage team into the 21st century
-
What? VAAI isnt working.
How do I know: Testing Storage VMotion/Cloning with no-offload versus Offload
What do I do: Ensure the block storage initiators for the ESX host
is configured ALUA on, also ensure the ESX server recognizes the change in the SATP look at IO bandwidth in vSphere client and storage array.
Benefit tends to be higher when svmotion across SPs
Google Virtual Geek VAAI Bad News
-
&
My NFS based VM is impacted following a storage reboot or failover
How do I know: VM freezes or, even worse, crashes
What do I do: Check your ESX NFS timeout settings compare to
TechBook recommendations (only needed if the datastore wasnt created using the plug-in)
Review your VM and guest OS settings for resiliency. See TechBook for detailed procedure on VM resiliency
-
&
THANK YOU
-
FILL OUT A SURVEY
EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A
$25 VMWARE COMPANY STORE GIFT CERTIFICATE
-
vSphere 5 Storage Best Practices
Chad Sakac, EMC Corporation
Vaughn Stewart, NetApp
INF-STO2980
#vmworldinf
vSphere 5 StorageBest PracticesThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateA packed agendaLeverage Key DocsKey VMware Resources & DocumentsKey Partner DocumentsSetup Multipathing RightUnderstanding the vSphere Pluggable Storage ArchitectureWhats out of the box in vSphere?What is Asymmetric Logical Unit (ALUA)?Understanding SAN MultipathingMultipathing with NFSv3Microsoft Cluster Service3rd Party Multi-Pathing Plugins (MPP)General NFS Best PracticesGeneral NFS Best Practices - TimeoutsGeneral NFS Best Practices - TimeoutsWhat has changed in vSphere 5?Path Management with Scale-Out ArraysiSCSI & NFS Ethernet Jumbo FramesIP Storage: Using iSCSI & NFS TogetherOptimize I/Oaka AlignmentAlignment is Optimal I/OAlignment is Optimal I/ODisk AlignmentAlignment is Optimal I/OFixing MisalignmentSlide Number 32Leverage Plug-insVAAI & VASAWhere Does Integration Happen? circa 2012Where Does Integration Happen? circa 2012New VAAI Stuff in vSphere 5.xVAAI NFS DemoSlide Number 38vCenter Plug-insFLEX Plugin DemoSlide Number 41Keep It SimpleKeep Storage SimpleSRDS and SIOCUse SRDS/SIOC if you canStorage DRS Operations IO ThresholdsStorage DRS Array use-case considerationsWhat to do when youre in trouble...My VM is not performing as expectedI see all these device events in vSphereDatastore capacity utilization is low/highMy storage team gives me tiny devicesWhat? VAAI isnt working.My NFS based VM is impacted following a storage reboot or failoverSlide Number 55Slide Number 56vSphere 5 StorageBest Practices