presentation v sphere 5 storage best practices

vSphere 5 Storage Best Practices

Chad Sakac, EMC Corporation

Vaughn Stewart, NetApp

INF-STO2980

#vmworldinf

&

The Great Protocol Debate Every protocol can Be Highly Available, and generally,

every protocol can meet a broad performance band Each protocol has different configuration considerations In vSphere, there is core feature equality across protocols

&

The Great Protocol Debate

Source: Virtual Geek Poll

&


NetApp AutoSupport July 2012

Large scale NetApp customers favor NAS

&


17%

51% 47%

67%

18%

3% 1%

0%

10%

20%

30%

40%

50%

60%

70%

80%

DAS NFS iSCSI FC FCoE InfiniBand AoE

Percentage of Respondents that deployed Protocols (Multiple selections allowed, n=158, selections=371)

Source: Wikibon Survey July 2012

&

The Great Protocol Debate Every protocol can Be Highly Available, and generally,

every protocol can meet a broad performance band Each protocol has different configuration considerations In vSphere, there is core feature equality across protocols

Conclusion: there is no debate pick what works for you!

The best flexibility comes from a combination of VMFS and NFS

&

A packed agenda

6 Key Things To Do. Leverage Key Docs Setup Multipathing Right Alignment = Good Hygiene Leverage vCenter Plugins, VAAI, and VASA KISS Guidelines for Layout Use SDRS and SIOC if you Can

What To Do When You Are In Trouble. When To Break The Rules. A Peak Into the Future.

&

Leverage Key Docs

Key Best Practices 2012

1

&

Key VMware Resources & Documents

VMware technical Resource Center Storage Connectivity

Fibre Channel SAN Config Guide iSCSI SAN Config Guide Best Practices for NFS Storage

Understand Storage Taxonomy

LUN ownership Active/Active Active/Passive Virtual Port

Multipathing SAN / NAS

Highly Recommended is a kind way of saying

This Is Mandatory Reading

http://www.vmware.com/technical-resources/virtual-storage/resources.html

&

Key Partner Documents Storage varies far more vendor to vendor than servers do Stay current on your arrays Best Practices Even if youre NOT the storage team, read them

NetApp: 7-Mode: Technical Report TR-3749 Cluster-Mode: Technical Report TR-4068 EMC: VNX and vSphere Techbook (h8229), VMAX and vSphere Techbook (h2529), Isilon and vSphere Best Practices Guide (h10522)

http://www.emc.com/collateral/hardware/technical-documentation/h8229-vnx-vmware-tb.pdfhttp://www.emc.com/collateral/hardware/technical-documentation/h8229-vnx-vmware-tb.pdfhttp://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdf

&

Setup Multipathing Right


2

&

Understanding the vSphere Pluggable Storage Architecture

&

Whats out of the box in vSphere? Path Selection Policies (PSP):

Fixed (Default for Active-Active arrays) I/O traverses preferred path Reverts to preferred path after failure

MRU (Default for many Active-Passive arrays)

I/O traverses preferred path Remains on alternative path after failure

Round Robin I/O traverse all paths Default is 1000 IOPs per path ALUA sets path preferance Notable change in vSphere 5.1 this is the

default for EMC VNX (R32), VMAX If upgrading, claim rules unchanged

To Change a PSP: Use your Vendors

vCenter Plug-in (easy)

or

resxcli nmp device setpolicy --device

--psp VMW_PSP_RR

&

What is Asymmetric Logical Unit (ALUA)?

ALUA Allows for paths to be profiled Active (optimized) Active (non-optimized) Standby Dead (target unreachable = APD) Gone Away (device unreachable =

PDL)

Ensures optimal path selection vSphere PSP and 3rd Party

MPPs

SP A SP B

LUN

&

Understanding SAN Multipathing

MPIO is based on initiator-target sessions not links

&

Multipathing with NFSv3

switch

SP A 1 0

SP B 1 0

switch

mnic2

ESXi host

vmnic0 vmnic1

NIC teams with Route based on IP hash

load-balancing policy

cross stack EtherChannel

switch port static or dynamic link aggregation

active/active configuration

single switch or stacked switches

spanned, teamed switch ports (feature may not be available on all switches)

yes no

ESXi host

vmnic0 vmnic1 allow VMkernel to make routing decisions

&

Microsoft Cluster Service

Unsupported Storage Configurations: FCoE, iSCSI & NFS datastores Round Robin PSP N-Port ID Virtualization (NPIV)

Array vendors solve storage gaps! 3rd party MPPs Guest Connected Storage

Other limits: Memory overcommit vMotion & Fault Tolerance

vSphere 5.1 has expanded WSFC support 4 node with disk quorum, 5 node when MNS

&

3rd Party Multi-Pathing Plugins (MPP)

Storage manageability Simple Provisioning Predictable & consistent Optimize data-path utilization

Performance and Scale Tune performance Predictive Load Balancing Automatic fault recovery

3rd party MPPs: EMC PowerPath/VE (now v5.8) Dell/Equalogic PSP

STO

RAG

E

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

OS

APP

PowerPath PowerPath PowerPath PowerPath

Shared Storage

&

General NFS Best Practices

Use the EMC & NetApp vCenter plug-ins, automates best practices Note that vCenter plugins from 5.0 and earlier will NOT

WORK with vSphere 5.1 (more on this later)

Use Multiple NFS datastores & 10GbE 1GbE requires more complexity to address I/O scaling due

to one data session per connection with NFSv3

&

General NFS Best Practices - Timeouts Configure the following on each ESX

server (automated by vCenter plugins): NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10

Increase Guest OS time-out values to match HKLM> System> CurrentControlSet>

Services> Disk. Select the TimeOutValue and set the data

value to 125 (decimal).

Increase Net.TcpIpHeapSize (follow vendor recommendation)

&

What has changed in vSphere 5?

Minor change in NFS v3 client not NFS v4, NFS v4.1 or pNFS FQDN is specified in datastore configuration

DNS lookup will occur on ESXi boot Supports DNS round robin

Distribute NFS client logins across a vSphere 5 cluster for load balancing across multiple IPs EMC Isilon NetApp Data ONTAP 8 Cluster-Mode

&

Path Management with Scale-Out Arrays

Storage arrays have target addresses WWN / iQN / IP

Scale-out array these addresses are virtual & mapped to physical I/O ports

LUNs leverage multipathing sw to route as LUN traverses controllers

On NetApp Cluster-Mode, NFSv3 requires one IP per datastore Avoid hopping arrays to access physical disk

On EMC Isilon, use SmartConnect

&

iSCSI & NFS Ethernet Jumbo Frames What is an Ethernet Jumbo Frame?

Ethernet frames with more than 1500 bytes of payload (9000 is common FCoE is 2240)

Commonly thought of as having better performance

Should I use Jumbo Frames?

Adds complexity & performance gains (while existent) are relatively marginal with common block sizes

Stick with the defaults when you can

&

IP Storage: Using iSCSI & NFS Together iSCSI and NFS route

differently iSCSI uses vmknics with no

Ethernet failover using MPIO instead

NFS client relies on vmknics using link aggregation/Ethernet failover

NFS relies on host routing table

Best practice is to have separate subnets & virtual interfaces for each

&

Optimize I/O aka Alignment Key Best Practices 2012

3

&

Alignment is Optimal I/O Misalignment of filesystems results in additional work on

storage controller to satisfy IO request Affects every protocol, and every storage array

VMFS & NFS Datastores VMDKs & RDMs with NTFS, EXT3, etc

Filesystems exist in the datastore and VMDK

Cluster

Chunk

Cluster

Chunk

Cluster

Chunk

Block VMFS 1MB-8MB

Array 4KB-64KB

Guest Alignment

FS 4KB-1MB

&

Disk Alignment

Aligning I/O can have significant performance improvements for high disk I/O VMs

&

Alignment is Optimal I/O

VMware, Microsoft, Citrix, NetApp, EMC all agree, align partitions Plug-n-Play Guest Operating Systems

Windows 2008, Vista, & Win7 Fresh installations only no upgrades

Guest Operating Systems requiring manual alignment Windows NT, 2000, 2003, & XP (use diskpart to set to 1MB) Linux (use fdisk expert mode and align on 2048 = 1MB)

&

Fixing Misalignment

If VMFS is misaligned: migrate VMs & destroy datastore

If GOS filesystem is misaligned Step 1: Take an array snapshot/backup Step 2: Use offline tools to realign

EMC UBERAlign (open, works with all, scheduler, and in-guest reclaim)

vSphere Migrator Alternate: Use online tool to align

NetApp Migrate & Optimize (VSC feature)

Leverage Plug-ins VAAI & VASA


4

Where Does Integration Happen? circa 2012

FC FCoE iSCSI APIs &

Mgmt

vCenter

VAAI SCSI cmds

ESX Storage Stack

Datamover

Vendor-specific vCenter Plug-In

View VMware-to-Storage relationships Provision datastores more easily

Leverage array features (compress/dedupe, file/filesystem/LUN snapshots)

VI Client VM

Storage Array

VMFS NFS

NFS client

Network Stack

VMware LVM

HBA Drivers

VSS via VMware Tools Snap request

SvMotion request VM provisioning cmd Turn thin prov on/off

Standards-based VAAI SCSI command support

vStorage API for Multi- pathing

NMP

NFS

NIC Drivers

vStorage API for Data Protection (VDDK)

Vendor Specific vStorage API for

SRM

SRM

VM object Awareness in array/mgmt

tools

Co-op

Co-op

Vendor-specific VAAI NFS operation support

VASA Module

NFS VAAI

Module

iSCSI/FCoE SW Vendor-specific VAAI block module

vCOPs Connectors

&

Where Does Integration Happen? circa 2012

FC FCoE iSCSI APIs &

Mgmt

vCenter

VAAI SCSI cmds

ESX Storage Stack

Datamover

Vendor-specific vCenter Plug-In

View VMware-to-Storage relationships Provision datastores more easily

Leverage array features (compress/dedupe, file/filesystem/LUN snapshots)

VI Client VM

Storage Array

VMFS NFS

NFS client

Network Stack

VMware LVM

HBA Drivers

VSS via VMware Tools Snap request

SvMotion request VM provisioning cmd Turn thin prov on/off

Standards-based VAAI SCSI command support

vStorage API for Multi- pathing

NMP

NFS

NIC Drivers

vStorage API for Data Protection (VDDK)

Vendor Specific vStorage API for

SRM

SRM

VM object Awareness in array/mgmt

tools

Co-op

Co-op

Vendor-specific VAAI NFS operation support

VASA Module

NFS VAAI

Module

iSCSI/FCoE SW Vendor-specific VAAI block module

5.1 Change

5.1 Change

5.1 Change

Inyo Change

vCOPs Connectors

New coolness

5.1 Change

&

New VAAI Stuff in vSphere 5.x

VAAI TP (Block) reclaim used by View 5.1 sparse VDMK format

VAAI TP at datastore level disabled by default in vSphere 5.0 u1 and vSphere 5.1 (will be back on in future vSphere releases)

VAAI TP reclaim using vmkfstools -k VAAI Fast Clone (File) used by View 5.1 and

vCloud Director 5.1 Depends on file-level snaps Hardware accelerated linked clone

&

VAAI NFS Demo

&

vCenter Plug-ins First gen was basic

view/provision Second gen exposed advanced

array functions Third gen worked on

simplifying/merging multiple plugins

Fourth gen worked on initial RBAC for VMware/Storage teams

Fifth gen is current. Next gen vSphere 5.1 requires

new plugin architecture around FLEX.

We use EMC Virtual Storage Integrator (VSI) to

dramatically accelerate and simplify storage configuration,

management, and multipathing and it has saved

us days of work Mike Schlimenti, Lead Systems Engineer,

Data Center Experian

&

FLEX Plugin Demo

&

Keep It Simple


5

Keep Storage Simple 1. Use Large capacity datastores

1. Avoid RDMs 2. NFS: 16TB 3. VMFS: vSphere 5 = 64TB 4. vSphere 4 = 2TB 5. Avoid extents

2. On array consider 1. Use Storage Pools 2. Use thin provisioned LUNs & Volumes

1. Enable vCenter managed datastore alerts 2. Enable array thin provisioning alerts and auto-grow capabilities

3. Use broad data services rather than micromanage 1. Virtual / auto-tiering & large caches 2. Enable data deduplicaiton

&

SRDS and SIOC


6

&

Use SRDS/SIOC if you can SRDS and SIOC are huge vSphere features If you can equals:

vSphere 4.1 or later, Enterprise Plus VMFS, NFS if vSphere 5.1 (not purely a qual)

Enable it (not on by default), even if you dont use shares will ensure no VM swamps the others

Bonus is you will get guest-level latency alerting! Default threshold is 30ms

Leave it at 30ms for 10K/15K, increase to 50ms for 7.2K, decrease to 10ms for SSD

Fully supported with array auto-tiering - leave it at 30ms for FAST pools Hard IO limits are handy for View use cases Some good recommended reading:

http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdf http://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-

array-auto-tiering.html http://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.html http://www.yellow-bricks.com/2010/09/29/storage-io-fairness/

http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdfhttp://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.htmlhttp://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.htmlhttp://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.htmlhttp://www.yellow-bricks.com/2010/09/29/storage-io-fairness/

&

Storage DRS Operations IO Thresholds

SDRS triggers action on either capacity and/or latency

Capacity stats are constantly gathered by vCenter, default threshold 80%.

I/O load trend is evaluated (default) every 8 hours based on a past day history, default threshold 15ms.

Storage DRS will do a cost / benefit analysis!

For latency Storage DRS leverages Storage I/O Control functionality.

When using array Auto-Tiering, use SDRS, but disable I/O metric here. This combination gives you the simplicity benefits of SDRS for automated placement and capacity balancing but adds: Economic and performance

benefits of automated tiering across SSD, FC, SAS, SATA

10x (VNX) and 100x (VMAX) higher granularity (sub VMDK)

&

Storage DRS Array use-case considerations Feature/Product/Use Case SDRS Initial

Placement SDRS Migration

VMware Linked Clones Not supported

VMware Snapshots Supported

VMware SRM Not Supported

RDM Pointer Files Pointers Supported / LUNs not

Pre vSphere 5 hosts Not Supported

NFS Datastores Supported

Distributed Virtual Volumes Supported

Array-based VM Clones Supported

Array-based Replication Supported Unanticipated migrations will increase WAN utilization

Array-based Snapshots Supported Unanticipated migrations will increase space consumed

Array-based Compression & Deduplication

Supported Unanticipated migrations will temporarily increase space consumed

Array-based thin provisioning Supported Supported on VASA-enabled arrays only

Array based auto-tiering Supported Supported. Disable I/O metrics in SDRS, but enable SIOC on datastores to handle spikes of I/O contention.

&

What to do when youre in trouble...

Getting yourself out of a jam

My VM is not performing as expected How do I know: application not meeting a pre-

defined SLA, or SIOC/SDRS GOS thresholds being exceeded

What do I do: Step 1, pinpoint (thank you Scott Drummonds!)

Use ESXTop first: http://communities.vmware.com/docs/DOC-5490

..then vSCSIstats: http://communities.vmware.com/docs/DOC-10095

Step 2, if the backend: Use Unisphere Analyzer, SPA (start with backend and CPU) Check VM alignment (will show excessive stripe crossings) Cache enabled, FAST/FAST Cache settings on the storage

pool ensure FAST and SIOC settings are consistent

http://communities.vmware.com/docs/DOC-5490http://communities.vmware.com/docs/DOC-10095

&

I see all these device events in vSphere

How do I know: VM is not performing well and LUN trespasses warning messages in event log

What do I do: ensure the right failover mode and policy are used. Ensure you have redundant paths from host to storage system. Check LUN ownership balance.

&

Datastore capacity utilization is low/high

How do I know: Managed Datastore Reports in vCenter 4.x + Array tools - e.g. EMC Unisphere (vCenter Integration)

Report, EMC Prosphere, NetApp OnCommand

What do I do: Consider doing space reclaim using vmkfstools -k Migrate the VM to a datastore that is configured over a

virtually provisioned storage. For VMFS datastore, ESX thin provisioning/compress/dedupe can also be utilized

&

My storage team gives me tiny devices How do I know:

Storage team controls provisioning What do I do:

This means you have an legacy mindset in storage Cloud is on-demand New Model Storage admin provisions pools VI

Admin consumes from pools via plug-ins VMAX uses hyper devices, and hypers are assembled into

meta devices, VNX defaults are pooled devices NetApp has aggregates, pools of RAID-DP protected disks

Both are basis for LUNs and FlexVols Engage your array vendor to move the

storage team into the 21st century

What? VAAI isnt working.

How do I know: Testing Storage VMotion/Cloning with no-offload versus Offload

What do I do: Ensure the block storage initiators for the ESX host

is configured ALUA on, also ensure the ESX server recognizes the change in the SATP look at IO bandwidth in vSphere client and storage array.

Benefit tends to be higher when svmotion across SPs

Google Virtual Geek VAAI Bad News

&

My NFS based VM is impacted following a storage reboot or failover

How do I know: VM freezes or, even worse, crashes

What do I do: Check your ESX NFS timeout settings compare to

TechBook recommendations (only needed if the datastore wasnt created using the plug-in)

Review your VM and guest OS settings for resiliency. See TechBook for detailed procedure on VM resiliency

&

THANK YOU

FILL OUT A SURVEY

EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A

$25 VMWARE COMPANY STORE GIFT CERTIFICATE

vSphere 5 Storage Best Practices

Chad Sakac, EMC Corporation

Vaughn Stewart, NetApp

INF-STO2980

#vmworldinf

vSphere 5 StorageBest PracticesThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateA packed agendaLeverage Key DocsKey VMware Resources & DocumentsKey Partner DocumentsSetup Multipathing RightUnderstanding the vSphere Pluggable Storage ArchitectureWhats out of the box in vSphere?What is Asymmetric Logical Unit (ALUA)?Understanding SAN MultipathingMultipathing with NFSv3Microsoft Cluster Service3rd Party Multi-Pathing Plugins (MPP)General NFS Best PracticesGeneral NFS Best Practices - TimeoutsGeneral NFS Best Practices - TimeoutsWhat has changed in vSphere 5?Path Management with Scale-Out ArraysiSCSI & NFS Ethernet Jumbo FramesIP Storage: Using iSCSI & NFS TogetherOptimize I/Oaka AlignmentAlignment is Optimal I/OAlignment is Optimal I/ODisk AlignmentAlignment is Optimal I/OFixing MisalignmentSlide Number 32Leverage Plug-insVAAI & VASAWhere Does Integration Happen? circa 2012Where Does Integration Happen? circa 2012New VAAI Stuff in vSphere 5.xVAAI NFS DemoSlide Number 38vCenter Plug-insFLEX Plugin DemoSlide Number 41Keep It SimpleKeep Storage SimpleSRDS and SIOCUse SRDS/SIOC if you canStorage DRS Operations IO ThresholdsStorage DRS Array use-case considerationsWhat to do when youre in trouble...My VM is not performing as expectedI see all these device events in vSphereDatastore capacity utilization is low/highMy storage team gives me tiny devicesWhat? VAAI isnt working.My NFS based VM is impacted following a storage reboot or failoverSlide Number 55Slide Number 56vSphere 5 StorageBest Practices

presentation v sphere 5 storage best practices

Technology