private cloud sample architectures based on vsphere 5 platform singapore, sep 2011

189
© 2009 VMware Inc. All rights reserved Confidential Private Cloud Sample Architectures based on vSphere 5 platform Singapore, Sep 2011 Iwan ‘e1’ Rahabok virtual-red-dot.blogspot.com | tinyurl.com/SGP-User-Group M: +65 9119-9226 | [email protected] VCAP-DCD

Upload: hayley

Post on 13-Feb-2016

43 views

Category:

Documents


6 download

DESCRIPTION

Private Cloud Sample Architectures based on vSphere 5 platform Singapore, Sep 2011. Iwan ‘e1 ’ Rahabok virtual-red-dot.blogspot.com | tinyurl.com/SGP-User-Group M: +65 9119-9226 | [email protected]. VCAP-DCD. Purpose of This Document. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

© 2009 VMware Inc. All rights reserved

Confidential

Private CloudSample Architectures based on vSphere 5 platform

Singapore, Sep 2011

Iwan ‘e1’ Rahabokvirtual-red-dot.blogspot.com | tinyurl.com/SGP-User-Group

M: +65 9119-9226 | [email protected]

VCAP-DCD

Page 2: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

2 Confidential

Purpose of This Document There is a lot of talk about Cloud Computing. But how does it look like at technical level?

• How do we really assure SLA, and have 3 Tier of service?• If I’m a small company with just 50 servers, what does my architecture look like? If I have 1000 VM, how does it look like?

For existing VMware customers, I go around and do a lot of “health check” at customers site.• The No 1 question is around design best practice. So this doc serves as quick reference for me. I can pull a slide from here for discussion.

This is my personal opinion. • Please don’t take it as official and formal VMware Inc recommendation. I’m not authorised to do so.• Also, generally we should judge the content, rather than the organisation/person behind the content. A technical fact is a technical fact,

regardless who said it Technology changes

• 10 Gb ethernet, SSD disk, 8 core CPU, FCoE, CNA, vStorage API, storage virtualisation, etc will impact the design. A lot ot new innovation coming within next 2 years.

• New modules/products from VMware & Ecosystem Partners will also impact the design.

This is a guide• Not a Reference Architecture, let alone a Detailed Blueprint. • So please don’t print and follows to the dot. This is for you to think and tailor.

It is written for hands-on vSphere Admin who have attended Design Workshop & ICM • You should be at least a VCP 5, preferably VCAP-DCD• No explanation on features.• A lot of the design consideration is covered in vSphere Design Workshop.

With that, let’s have a professional* discussion• * Not emotional & religious & political discussion Let’s not get angry over technical stuff. Not worth your health.

Folks, some disclaimer, since I am employee of VMware

Page 3: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

3 Confidential

Acknowledgement There is a lot of materials here. The following have provided their feedback• Eng Hwa, Regional Senior Architect, HP• Belmont Chia, Asia Pacific DC Solutions Architect, Cisco• William Lum, Asia Pacific Virtualization Channel, Consulting Architect, NetApp• Lee Teck Meng, Asia Pacific Private Cloud Solution Architect, EMC• Deddy Iswara, Systems Consultant, Indonesia, VMware• Vishal Srivastava, PSO, VMware

Page 4: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

4 Confidential

Table of Contents Introduction

• Requirements, Assumptions, Consideration, and Design Summary

vSphere Design: Data Center• Data Center, Cluster (DRS, HA, DPM, Resource Pool)

vSphere Design: Server• ESXi, physical host

vSphere Design: Network vSphere Design: Storage vSphere Design: VM vSphere Design: Miscellaneous (Management, Security) SRM Design

• To be added in future

Additional Info

Also see the speaker notes in the slides

Page 5: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

5 Confidential

Introduction

Page 6: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

6 Confidential

3 Sizes: Assumptions Assumptions are needed to avoid the infamous “It depends…” answer.

• The architecture for 50 VM differs with that for 250 VM, which in turn differs with that for 500 VM.• A design for large VM (8 vCPU, 64 GB) differs with a design for small VM (1 vCPU, 1 GB)• Workload for SME is smaller for Large Enterprise. Exchange handling 200 staff vs 1000 staff results in different workload

We will provide 3 sizes• I try to make it as real as possible for each choice. 3 sizes give you choice and shows reasoning used.• The idea is you take the closest size to your needs, then tailor it to the specific customer (not project). Do not tailor to project.

Size means size of entire company, not size of Phase 1 of the journey• A large company starting small should not use the “Small” option below; it should the “Large” option but reduce the # ESX. • I believe in “begin with the end in mind”, projecting around 2 years. Longer than 3 years is rather hazy as cloud computing is not fully

matured yet).

Small Cloud Medium Cloud Large Cloud

Company Small Company Medium Large

IT Staff 2 person doing everything 2 person doing infra1 person doing desktop3 person doing apps

2 person doing server/storage2 person doing network/security2 person doing desktop10 person doing apps

Data Center None. Hosted. 1. DR Site is hosted 2. With private connectivity.

Page 7: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

7 Confidential

3 Sizes: Assumptions

Small Cloud Medium Cloud Large Cloud

# Servers currently 25 servers. All are production

~65 servers. 45 (70%) is production

~120 servers65 (55%) is production

# Servers in 1 year Prod: 30 serversNon Prod: 15 servers (50%)

Prod: 50 serversNon Prod: 40 servers (80%)

Prod: 70 serversNon Prod: 70 servers (100%)

# VM that our design needs to cater 50 100 150

# Desktops/Laptop 200. With remote access.No need for offline VDI.

500. With remote access.Need offline VDI

1000. With remote access + 2 FANeed offline VDI

DR Requirements Yes Yes Yes

SAN expertise No. We are also keeping cost low by using IP Storage.

No Yes. RDM will be used as some DB may be large.

DMZ Zone / SSLF Zone Yes/No Yes/No Yes/No. Intranet also zoned

Back up Disk Tape Tape

Network standard No standard No standard Cisco

ITIL Compliance Not applicable A few are in place Some are in place

Change Management Mostly not in place A few are in place Some are in place

Overall System Mgmt SW (BMC, CA, etc) No No No

Configuration Management No No Needs to have some tools

Oracle RAC No No Yes

Audit Team No External External

Capacity Planning No Manual Needs to have some tools

Oracle softwares (BEA, DB, etc) No No Yes

Page 8: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

8 Confidential

3 Sizes: VMs with additional consideration We must know what type of VMs we are running. They impact the design or operation.

Type of VM Impact on Design

Microsoft NLB (network load balancer)Typical apps: IIS, VPN, ISA

VMware recommends Multicast. Need to have its own port group. This port group needs to have Forged Transmit (as it will change the MAC address)

MSCS(Veritas Clustering not supported by VMware, supported by Symantec). We are doing V-V cluster across 2 separate ESX hosts.Typical apps: Exchange, DHCP, SQLConsider Symantec AppHA instead

Need FC. iSCSI, NFS, FCoE is not supported.Need Anti-Affinity Rule (Host to VM mapping, not VM-VM as VMware HA does not obey VM-VM affinity rule). As such, need 4 node in a cluster.Need RAM to be 100% reserves. Impact HA Slot Size if you use default settings.Disk has to be eagerzerothick, so it’s a full size. Thin Provisioning at Array will not help as we zeroed all the disk.Need 2 extra NIC ports per ESX for heart beat.Need RDM disk with Physical-Compatibility mode. So VM can’t be cloned or converted to template. vMotion is not supported as at vSphere 5. This is not due to physical RDMImpact on ESX upgrade as ESX version must be the same.With native multipathing (NMP), the path policy can’t be round robinIt also uses Microsoft NLB. Impact SRM 5. It works, but needs scripting. Preferably same IP, so use vShield Edge to create streched VLAN

Microsoft Exchange If you need CCR (clustered continuous replication), then you need MSCS

Oracle Softwares Ideally, they are hosted on their own cluster. Oracle charges per cluster.I’m not 100% sure if Oracle still charge per cluster if we do not configure automatic vMotion (so just Active/Passive HA, just like physical world) for the VM (set DRS to manual for this VM). Looks like it they will charge per host in this case, basing on their document dated 13 July 2010. But interpretation from Gartner is Oracle charges for the entire cluster.

App that is licenced per cluster Similar to Oracle. I’m not aware of any other apps

App that are not supported While ISV support Vmware in general, they may only support for certain version. SAP, for example, only support from SAP NetWeaver 2004 (SAP Kernel 6.40) and only on Windows and Linux 64-bit (not on Solaris, for example)

Page 9: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

9 Confidential

3 Sizes: VMs with additional consideration

Type of VM Impact on Design

Peer Applications(Apps that scale horizontally. Example: Web Servers, App Servers

They need to exist on different ESX host in a cluster. So need to setup the Anti-Affinity Rule. You need to configure this per Peer. So if you have 5 set of Web servers from 5 different system (so 5 pair, 10 VM), you need to create 5 Anti-Affinity rule. Too many rules will create complexity, more so when #nodes is less than 4

Pair applications(Apps that protect each other for HA. Example: AD, DHCP Server)

As above

Security VM or network packet capture tool Need to create another port group to separate VMs being monitored and not. Need to have Promiscuous Mode turned on.

App that depends on MAC address for licence Need to have its own port group. May need to have MAC Address Change set to Yes.

App that holds sensitive data Should encrypt the data or the entire file system. vSphere 5 can’t encrypt the vmdk file yet. If you encrypt the Guest OS, back up product may not be able to do file-level back up.Should ensure no access by MS AD Group Administrator. Find out how it is back up, and who has access to the tape. If IT does not even have access to the system, then vSphere may not pass the audit requirement.Check partner products like Intel TXT and Hytrust

Fault Tolerance requirements Impact HA Slot Size (if we use this one) as it uses full reservation. Impact Resource Pool, make sure we cater for the VM overhead (small)

App on Fault Tolerance hardware FT is limited to 1 core. Consider Stratus to complement vSphere 5

Complex dependancy among apps Need to be aware of impact on application if during HA event. If 1 VM is shutdown by HA and then power on, the other VMs in the chain may need restart too. This should discussed with App Owner

Page 10: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

10 Confidential

3 Sizes: VMs with additional consideration

Type of VM Impact on Design

App that require software dongle Dongle must be attached to 1 ESX. vSphere 4.1 adds this support. Best to use network dongle.In the DR site, the same dongle must be provided too

App with high IOPS May need its own datastore with dedicated spindles. No point having dedicated datastores if the underlying spindles are shared among multiple datastores.

Apps that uses very large block size SharePoint uses 256 KB block size. So a mere 400 IOPS will saturate the GE link already. For such application, FC or FCoE will be a better protocol.Any application with 1 MB block size can easily saturate 1 GE link.

App with very large RAM This will impact DRS when a HA event occurs as it needs to have a host that house the VM. It will still boot so long reservation is not set to a high number.

App that needs Jumbo Frame This must be configured end to end (guest OS, port group, vSwitch, physical switch). Not all support 9000, so do a ping test and find the value.

App with >95% CPU utilisation in the physical world and have high run queue

Find out first why it is so high. We should not virtualise app that we are completely blind on its performance characteristic.

App that is very sensitive to time accuracy. Time drift is a possibility in virtual world.Find out business or technical impact if time deviates by 10 seconds.

A group of apps with complex power on sequence

I recommend HA Isolation response to shut down the VMs running on the isolated host. If they are shut-down, powering them on may need App Owner involvement (especially if it needs manual intervention)

App that takes advantages of specific CPU Instruction Set

Mixing with older CPU Architecture is not possible. This is a small problem if you are buying new server.EVC will not help, as it’s only a mask. See speaker notes

App that need < 0.01 ms end to end latency Separate cluster.

Page 11: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

11 Confidential

Architecture: what do I consider Architecture is an Art

• Balancing a lot of things, some are not even technical.• It considers future (unknown requirements). • Trying to be close to best practice• Not in any particular order, below is what I consider in this vSphere based architecture

My personal principle: Do not design something you cannot troubleshoot.• A good IT Architect does not setup potential risk for Support Person down the line.• Not all counters/metrics/info are visible in vSphere.

Consideration• Upgradability

• This is unique in the virtual world. A key component of cloud that people have not talked much.• After all my apps run on virtual infrastructure, how do I upgrade the virtualisation layer itself?• Based on historical data, VMware releases major upgrade every 2-3 years. vSphere 4.0 was released on May 2009, 5.0 was Sep 2011.• If you are laying down an architecture, check with your VMware rep for NDA roadmap presentation.

• Debugability• Troubleshooting in virtual environment is harder than physical, as boundary is blurred and physical resources are shared.• 3 types of troubleshooting:

• Configuration. This does not normally happen in production, as once it is configured, it is not normally changed.• Stability. Stability means something hang or crash (BSOD, PSOD, etc) or corrupted• Performance. This is the hardest among the 3, especially if the slow performance is short lived and in most cases it is performing well.

• Supportability• This is related, but not the same with Debug-ability. Support relates to things that make day to day support easier. Monitoring counters, reading logs,

setting up alerts, etc. For example, centralising the log via syslog and providing intelligent search (e.g. using Splunk or Integrien) improves Supportability

• A good design makes it harder for Support team to make human error. Virtualisation makes task easy, sometimes way to easy relative to physical world. Consider this operational/phychological impact in your design.

• Support also means using components that are support by the vendors. For example, SAP support is from certain versions onwards (old version not supported)

Page 12: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

12 Confidential

Architecture: what do I consider Consideration

• Security• vSphere Security Hardening Guide split security into 3 levels: Production, DMZ and SSLF• In the Large Design, I’m physically separating the DMZ cluster. vShield should be able to do, but for now Security folks are still not convinced on virt-sec

technology. Virt-sec technology is not something we can avoid. • Availability

• Software has Bugs. Hardware has Fault. We cater for hardware fault mostly. What about software bugs?• I try to cater for software bug, which is why the design has 2 VMware clusters with 2 vCenter. This lets you test cluster-related features in one cluster,

while keeping your critical VM on another cluster. • Since DR is the #1 driving factor besides cost saving, the design assumes it’s required even in the Small Design.• Cluster is always based on 1 host failure. In small cluster, the overhead can be high (50% in a 2-node cluster)

• Reliability• Related to availabity, but not the same. Availability is normally achieved by redundancy. Reliability is normally achieved by keeping things simple, using

proven components, separating things, standardising.• For example, solution for Small Design is simpler (a lot less features relative to Large Design). It also uses 1 vSwitch for 1 purpose, as opposed to a big

vSwitch with many port groups and complex NIC fail-over policy.• You will notice a lot of standardisation in all 3 examples. The drawback of standardisation is overhead, as we have to round up to the next bracket. A

VM with 6 GB RAM ends up getting 8 GB.• Performance

• Storage, Network, VMkernel, VMM, Guest OS, etc are considered.• We are aiming for <1% CPU Ready Time and near 0 Memory Ballooning in Tier 1. In Tier 3, we can and should have higher ready time and some

ballooning, so long it still meet SLA.• Scalability

• Includes both horizontal and vertical. Includes both hardware and software.• vSphere Essential is not used as it can’t be scaled to higher version or more ESXi host.

• Skills of IT team• Especially the SAN vs NAS skill. This is more important than the protocol itself.• Skills include both internal and external (preferred vendor who complement the IT team)• In Small/Medium environment, it is impossible to be expert on all areas. Consider complementing the internal team by establishing long term

partnership with an IT vendor. Having a vendor/vendi relationship saves cost initially, but in the long run there is a cost.

Page 13: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

13 Confidential

Architecture: what do I consider Consideration

• Cost• You will notice that the “Small” Design has a lot more limitations than the “Large” Design. • An even bigger Cost is ISV. Some, like Oracle, charges for the entire Cluster. Dedicating cluster for them is cheaper.• DR Site serves multiple purpose.• VMs from different Business Units are mixed in 1 cluster. If they can share same Production LAN and SAN, same reason can apply to hypervisor.• Window, Linux and Solaris VMs are mixed in 1 cluster.

• Improvement• Beside meeting current requirements, can we improve things? • For example, almost all companies need to have more servers, especially in non production. So when virtualisation happens, we have this VM Sprawl.

As such, the design have head room. • Another common example is moving toward “1 VM 1 OS 1 App”. In physical, some physical servers may serve multiple purpose. In virtual, they can

afford, and should do so, to run 1 App per VM.• We consider Desktop Virtualisation in the overall architecture. View 4.5 is not included in the detailed design.

Variations in Assumption• Mixing Prod and non Prod in same cluster

• Advantages of mixing• Prod is spread on more ESX. An host failure impact less VMs• Non-Prod normally have much less workload, so Prod gets more resources.

• Disadvantages of mixing• Non-Prod may spike (e.g. doing load testing). So proper Resource Pool (uses hard Limit) is required.• Easy to make mistake. Easy to move in and out of Production environment. Production should be more controlled.

• You can also share 1 Update Manager, 1 Template DS, 1 vCenter, etc.

Page 14: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

14 Confidential

3 Sizes: Design Summary The table below provides the overall comparison, so you can easily compare what was taken out in the Small or Medium design.

• Just like any other design, there is no 1 perfect answer. Example: you may use FC or iSCSI for Small.

This assumes 100% virtualised. It is easier to have 1 platform than 2.• Certain things in company, you should only have 1 (email, directory, office suite, back up). Something as big as a “platform” should be standardised.

That’s why they are called platform

Small Medium Large

Target VM 50 100 150

# FT VM 0 – 3 (in Prod Cluster only) 0 – 6 0 – 6 (Prod Cluster only)

# ESX 5 hosts, 2 clusters (3 + 2) 10 hosts, 2 clusters (7 and 3) 16 hosts, 4 clusters (2, 7, 5, 2)

VMware products vSphere AdvanceSRMvShield AppView Enterprise

vSphere Ent PlusSRMvShield AppView PremiervCenter Ops Advance

vSphere Ent Plus + NexusSRMvShield AppView PremiervCenter Ops EnterprisevShield End PointvCenter ChargebackvCenter Server Heartbeat

VMware certification & Skill

1 VCP 2 VCP 1 VCAP DCA, 1 VCAP DCDOr use VMware Mission Critical Support

Storage NFS NFS FC + NFS, with snapshot

Server 2x Xeon 5650, 48 GB RAM 2x Xeon 5650, 64 GB RAM 2x Xeon (6-8 core/socket), 96 GB RAM

Back up VMware Data Recovery to Array 2 VADP + 3rd party to Tape VADP + 3rd party to Tape

Page 15: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

15 Confidential

Other Design possibilities What if you need to architect for larger environment?

• Take the Large Cloud sample as starting point. It can be scaled to 2000 VM.• Above 1000 VM, you may consider a Pod approach.

• Upsize it by:• Adding larger ESXi Host. I’m using a 6-core socket, based on Xeon 5600. You should use 8-core Xeon 7500 to fit larger VM.• Adding more ESX in the existing cluster. Keep it maximum 8 nodes per cluster.• Tiering the cluster into 3: Tier 1, Tier 2, and Tier 3 cluster.• Adding more cluster. For example, you can have multiple Tier 1 Cluster• Adding Fault Tolerant Hardware from Stratus. Make this Stratus server as a member of the Tier 1 Cluster. It appears as 1 ESX, although there are 2

physical hardware. Stratus has its own hardware, so ensure the consistency in your cluster design. • Adding a Secure Cluster. This is a separate cluster for business-sensitive VM• Adding Management softwares: Application Discovery Manager, Configuration Manager, etc.• Split the IT Datastore into multiple. Group by function or criticality.• Adding SSD for faster IOPS• For > 2 TB limit, see the 2 TB slide.• If you are using Blade server and have filled 2 chassis, put the IT Cluster outside the blade and use rack mount. Separating the Blade and the server

managing it minimise chance of human error as we avoid the “Managing Itself” complexity.• No need “Swing” ESX to enable migration from Prod n Non-Prod, since our process requires downtime

• I don’t advocate live migration from/to Production Envi. It should be part of Change Control.• Migrating both Host and Storage is possible if VM is powered off. • Storage migration goes thru mgmt network. Take not of the impact as HA is running on this same network.

• The Large Cloud is not yet architected for vCloud Director• vCloud Director has its own best practices for vSphere design. I am planning to add with the next release of vCloud Director.

Page 16: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

16 Confidential

Other Design possibilities FCoE

• Ideal for mixed environment where customer direction is FCoE.• Good for customers who are upgrading their Switches. Getting FCoE switch means getting 10 GE switch for LAN (NFS, iSCSI) too.• The skills are relatively new at enterprise. Skills here mean both Design skill and Troubleshooting skill, not just Install & Config.• The physical switch must support lossless Ethernet.• NAS certification: FCoE CNAs can be used to certify NAS storage. Existing NAS devices listed on VMware SAN Compatibility Guide do not

require recertification with FCoE CNAs.• It is FC on Ethernet, not on IP. Obviously, you can’t route at the IP layer.

iSCSI• I’m using NFS instead of iSCSI.• Performance is relatively similar. But iSCSI can have “multi-pathing” and have lower CPU

• Some servers, like HP Blade, have built-in hardware iSCSI initiators• Some backup/DR solution can be achieved cheaply on iSCSI vs FC ==> low cost DR via iSCSI• Broadcom Hardware iSCSI does not support Jumbo Frames or IPv6. Dependent hardware iSCSI does not support iSCSI access to the same

LUN when a host uses dependent and independent hardware iSCSI adapters simultaneously.

vShield• Changing the paradigm in security. From “Hypervisor as another point to secure” to “Hypervisor to give unfair advantage for security team”.• vShield App for firewall and vShield End Point for anti virus (only Trend Micro has the product as at Sep 2011)• Does not need to throw away physical firewall first. Complement it by adding “object-based” rules that follows the VM.

Internet Proxy and Firewall• I’m assuming they are not virtualised.• Can be done. Best to use a physically separate ESX to create air-gap and avoid catch-22 situation

Page 17: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

17 Confidential

The “60 - 80% Rule" Generally, best to avoid >70% of the rated capacity of any component• Except for tape drive, where it’s a long running stream.• Except for long running batch job, as it tends to consume 100%

Manufacturer performance specifications are normally not• Never achieved in real world• Response time significantly increases after 70% utilization threshold is exceeded…….

Specific for ESX• We need the extra resources for vMotion/DRS/Storage vMotion/etc.• Target Peak CPU: 80%

• Not it is Maximum, not Average. So Average is much lower, probably in 50% range.• It is Daily chart, not Hourly. So I don’t have to look at it every hour.• Non Production can hit 100%, so long the Hourly Average is less than 70%.• We assume we can leave some amount of Ready Time in Non Production.• This does not include HA. So with HA, it’s even lower. That’s why small cluster

size is not good as overhead is bigger.• Target Peak RAM: 70%

• memory.active counter, not memory.consumed• vCenter 5.0 still use memory.consumed for most of its screens. vCenter Ops fixes

this “bug” by using demand instead.• Target ESXi KAVG: 0-1 ms• Target Disk Response time: 10 ms

• Non Production can hit 25 ms.• If your Test/Dev becomes important during the final stage of UAT (just before the

app goes live), then you need to be able to scale the storage to ensure infra is nota bottlenect during UAT. 0

50

100

150

200

250

300

10 20% 40% 60% 80% 100%

70%

Page 18: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

18 Confidential

Design Methodology Architecting a Private Cloud is not a sequential process

• There are 6 components.• The components are inter-linked. Like a mash.• In >1000 VM category, where it takes >2 years to implement, new vSphere will change the design.

Even the Bigger picture is not sequential• Sometimes, you may even have to leave Design and go back to Requirements or Budgetting.

Again, there is no perfect answer. Below is one example. This entire document is about Design only. Operation is another big space.

• I have not taken into account Audit, Change Control, ITIL, etc.

VMServer

Storage

Data Center

Network

MgmtThe steps are more like this

Page 19: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

19 Confidential

Data Center DesignData Center, Cluster, Resource Pool, DRS, DPM

Page 20: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

20 Confidential

Methodology

Define how many physical data centers are required• DR requirements normally dictate 2

For each Physical DC, define how many vCenter are required• Desktop and Server should be separated by vCenter

• View 4.5 comes with bundled vSphere (unless you are buying add-on)• Security Requirement, not scalability, drives this one

• In our sample scenario, it does not warrant separation.• Different vCenter Admin drives this segregration

For each vCenter, define how many virtual data centers are required• Virtual Data Center serve as name boundary. • Different paying customer drives this segregation

For each vDC, define how many Cluster are required For each Cluster, define how many ESXi are required

• Preferably 4 – 8. 2 is too small a size. Adjust according to workload• Standardise the host spec across cluster. While each cluster can have its own host type, this adds complexity

Physical DC

vCenter

Virtua

l DC

Cluster

ESXi

Physical DC

vCenter

Virtual DC

Virtual DCC

luster

Cluster

ESXi

ESXi

ESXi

vCenter

Virtual DC

Page 21: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

21 Confidential

DataCenter and Cluster In our design, we will have 1 Datacenter only for Production• Keeping things simpler• But we will have 2 clusters for Servers VM + multiple clusters for Desktops VM• Certain objects can go across Cluster, but not across Data Center

• You can vMotion from one cluster to another within a datacenter, but not to another datacenter. • Nexus and Distributed Switch can’t go across DC• Datastore name is per DataCenter. So network and storage are per Data Center

• You can still clone a VM within a datacenter and to a different datacenter• Our use scenario does not require multiple vDC

Page 22: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

22 Confidential

The need for Non Prod Cluster This is unique in the virtual data center.• Well, we don’t have “Cluster” to begin with in physical DC.

Non-Prod Cluster serves multiple purposes• Run Non Production VM

• In our design, all Non-Production run on DR Site to save cost. • A consequence of our design is migrating from/to Production can mean

copying large data across WAN.• Disaster Recovery• Test-Bed for Infrastructure patching or updates.• Test-Bed for Infrastructure upgrade or expansion

Evaluating or Implementing new features• In Virtual Data Centre, a lot of enhancements can impact entire data centre• e.g. Distributed Switch, Nexus 1000V, Fault Tolerant, vShield• All the above need proper testing. • Non-Prod Cluster should provide sufficient large scale scope

to make testing meaningful Upgrade of the core virtual infrastructure• e.g. from vSphere 4.1 to future version (major release)• This needs extensive testing and roll back plan.

Even with all the above…• How are you going to test SRM properly?

• SRM test needs 2 vCenters, 2 arrays, 2 SRM servers. If all are used in production, then where is the test-environment for SRM?

Business

IT

This new layer does not exist in physical world.

It is software, hence needs its own Non Prod envi.

Page 23: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

23 Confidential

The need for IT Cluster Special purpose cluster• Running all the IT VMs used to manage the virtual DC or provide core services• The Central Management will reside here too• Separated for ease for management & security

Sample VMs:• vCenter + DB. Multiple instances• Core services: AD, LDAP, DNS, DHCP, File servers, print servers etc• Infra Management: Update Manager, SRM Manager, Chargeback Manager,

CapacityIQ Manager, DC Management, Windows manager, console, SysLog• Security VM: Firewall, vShield Manager, Security Manager (3rd party)• Application Management: performance monitoring• Basic VMware tools: Converter, vMA, Admin Client• Networking: Cisco Nexus 1000V VSM, Load Balancer, etc• Management softwares: EMC Manager, Patch Management (physical server)• Desktop: VMware View Manager, ThinApp Master image, ThinApp Update Server, Profile Manager

This separation keeps Business Cluster clean,

“strictly for business”.

Page 24: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

24 Confidential

IT Cluster The table below shows the list of VMs that resides on the IT Cluster. Each line represent a VM.

• This shows for Production Site. DR Site will have a subset of this.

Explanation of some of the servers below:• IT DB Server = a MS SQLServer for vCenter, Update Manager, View 4.5, Chargeback, etc

• Check compatibility across! The devil lies in the detail (e.g. Service Pack level is different)• Security Management Server = VM to manage security (e.g TrendMicro Deep Security)

Small Cloud Medium Cloud Large Cloud

VMware vCenter (for Server Cloud)IT Database ServervCenter Update ManagervMA + Syslog ServerSite Recovery ManagerView ManagervCenter (for Dekstop Cloud)

vCenter (for Server Cloud)IT Database ServervCenter Update ManagervMASite Recovery ManagerView ManagervCenter (for Dekstop Cloud)ThinApp Server

vCenter (for Server Cloud)IT Database ServervCenter Update ManagervMAvCenter OrchestratorSite Recovery Manager + DBvCenter Ops + DBChargeback View ManagersvCenter (for Desktop Cloud)

Storage vCenter Data Recovery Storage Mgmt tool Storage Mgmt tool (may need physical RDM to get fabric info)

Network Network Management Tool Network Management Tool Network Management ToolNexus 1000V Manager (VSM)

Core Infra MS AD 1MS AD 2Admin Clients (1 per admin)Syslog server

MS AD 1MS AD 2Admin Clients (1 per admin)Syslog server

MS AD 1MS AD 2Admin Clients (1 per admin)Syslog server

Basic IT Services File & Print Server (FTP Server) File Server (FTP Server) File Server (FTP Server)

Security Security Management Server Security Management Server Security Management ServervShield App

Page 25: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

25 Confidential

Cluster Size I recommend 8 nodes per cluster. Why 8, not 4 or 12 or 16 or 32?

• A balance between too small (4 hosts) and too large (>12 hosts)• DRS: 8 give DRS sufficient host to “maneuver”. 4 is rather small from DRS scheduler point of view.• With vSphere 4.1, having 4 hosts do not give enough hosts to do “sub-cluster”• For cost reason, some clusters can be as small as 2 nodes. But DPM benefit can’t be used.

Best practice for cluster is same hardware spec with same CPU frequency.• Eliminates risk of incompatibility• Complies with Fault Tolerant & VMware View best practices• So more than 8 means it’s more difficult/costly to keep them all the same. You need to buy 8 hosts a time. • Upgrading >8 servers at a time is expensive ($$) and complex. A lot of VMs will be impacted when you upgrade > 8 hosts.

Manageability• Too many hosts are harder to manage (patch, performance troubleshooting, too many VMs per cluster, HW upgrade)• Allow us to isolate 1 host for VM-troubleshooting purpose. At 4 node, we can’t afford such ”luxury”

Too many paths to a LUN can be complex to manage and troubleshoot• Normally, a LUN is shared by 2 clusters, which are “adjacent” cluster.• 1 ESX is 4 paths. So 8 ESX is 32 paths. 2 clusters is 64 paths. This is a rather high number (if you compare with physical world)

N+2 for Tier 1 and N+1 for others• With 8 host, you can withstand 2 host failures if you design it to. • At 4 nodes, it is too expensive as payload is only 50% at N+2

Small Cluster size• In a lot of cases, the cluster size is just 2 – 4 nodes. From Availability and Performance point of view, this is rather risky.• Say you have 3-node cluster…. You are doing maintenance on Host 1 and suddenly Host 2 goes down… you are exposed with just 1 node.

Assuming HA Admission Control is enabled (which you should), the affected VM may not even boot. When a host is placed into maintenance mode, or disconnected for that matter, it is taken out of the admission control calculation.

• Cost: Too few hosts result in overhead (the “spare” host)

See slide notes for more details

Page 26: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

26 Confidential

Small Cloud: Cluster Design

Page 27: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

27 Confidential

Medium Cloud: Cluster Design The Architecture is very similar with Small Cloud The differences are:

• 100 VM as opposed to 50 VM• We have 7 hosts in DC 1 instead of 3• We have 3 hosts in DC 2 instead of 2• Since we have more hosts, we can do sub-cluster. We will place the following as sub-cluster

• Host 1 – 2: DMZ SubCluster• Host 3 – 4: DMZ SubCluster• Host 5 – 6: Oracle BEA SubCluster• Host 6 – 7 : Oracle DB SubCluster• Production is soft cluster. So a host failure means it can use Host 1 – 2 too.

On the next page, we will show the Large Cloud.• The architecture is relatively different to Medium, eventhough we only move up from 100 VM to 150 VM.• We have FC SAN in addition to NFS• We split the DMZ into its own cluster in Production DC• We split the IT VM into its own cluster in Production DC• The physical splitting is done to satisfy Audit or Security team.

Page 28: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

28 Confidential

Medium Cloud: SubCluster We only have 7 hosts in Production DC. But need to cater for:

• DMZ VM• IT VM. Separated from DMZ due to concern over management LAN• Production: Tier 1 and Tier 2• Oracle softwares

DMZ VMs

Oracle BEA

Rest of VMs

DMZ LANProduction LAN

Management LAN

IT VMs Oracle DB

Page 29: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

29 Confidential

Large Cloud: Cluster Design

Page 30: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

30 Confidential

Large Cloud: SubCluster We have more “room” to design

• No DMZ zone• No IT VM

As a result, we can use VM-affinity• Example: 2 web servers are placed in different ESX host.• But use sparingly.

Oracle BEA

Rest of VMs

Production LAN

Oracle DB

Page 31: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

31 Confidential

Small Cloud: Design Limitation It is based on vSphere Advanced edition

• vCPU can’t be more than 4 cores.• Storage can’t migrated live from 1 datastore to another• No DRS and DPM• Features of Distributed Switch are not available• Can’t use 3rd party multi-pathing.

Does not support MSCS• vSphere 4.1 only support FC for now. I use NFS in this design.• For 30-server environment, HA with VM monitoring should be sufficient.

• A simple script can be added that ping the application (services) is active on its given port/socket. • Alternative, a script within the Guest OS check the process if it’s up or not. If not, it sends alert.

Only 1 cluster in primary data center• Production, DMZ and IT all run on the same cluster. • Network are segregated as they use different network• Storage are separated as they use different datastore

Some side benefit?• Oracle licence is charged per host when there is no automatic (read: soft) migration, not per cluster. I’m not 100% on this, so consult your

legal department. Oracle document is grey on this.

Page 32: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

32 Confidential

Medium Cloud: Design Limitation Only 1 cluster in primary data center

• Production, DMZ and IT all run on the same cluster. • Network are segregated as they use different network• Storage are separated as they use different datastore

Does not support MSCS• vSphere 5 only support FC for now. I use NFS in this design

Complex Affinity and Host/VM• Be careful in designing VM Anti-Affinity rule

• We are using Group Affinity. So we have extra constraint.

Possible variations• If we can add 1 more ESX host, we can create 2 cluster of 4 node each.

• This will simplify the Affinity Rule• Using a 1-socket ESX host instead of 2-socket

• Save on VMware licence• Extra cost on servers• Extra cooling/power operational cost

Page 33: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

33 Confidential

Large Cloud: Design Limitation DMZ Cluster size is rather small.

• See the cluster size slide.• One way is to move apps towards DMZ for mobile access.

IT Cluster size is rather small. VM vCPU max size should be kept at 6

• We use a 6-core chip. • See the slide on the CPU selection.

Only 1 Production cluster• No separation between Tiers.• Example Tiering:

Tier # Host Node Spec? FailureTolerance

MSCS #VM Monitoring Remarks

Tier 1 5(always)

Always Identical

2 hosts Yes Max 18 Application level.Extensive Alert

Only for Critical App. No Resource Overcommit.

Tier 2 4 – 8 Maybe 1 host Limited 10 per (N-1) App can be vMotioned to Tier 1 during critical run

Tier 3 4 – 8 No 1 host No 15 per (N-1) Infrastructure levelMinimal Alert.

Some Resource Overcommit

Page 34: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

34 Confidential

Extra Large Cloud

This shows an example for Cloud for >500 VM. It also uses Active/Passive data centers.The overall architecture remain similar with Large Cloud.

Page 35: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

35 Confidential

Cluster: Settings For the 3 sample sizes, here is my personal recommendation• DRS fully automated. Sensitivity: Moderate • Use anti-affinity or affinity rules only when needed.

• More things for you to remember.• Gives DRS less room to maneuver

• DPM enabled. Choose hosts that support DPM• Do not use WOL. Use DPM or IPMI

• VM Monitoring enabled. • VM monitoring sensitivity: Medium• HA will restart the VM if the heartbeat between the host and the VM has not been received within a 60 second interval

• EVC enabled. Enable you to upgrade in future. • Prevent VMs from being powered on if they violate availability constraints better availability• Host isolation response: Shut down VM

• See http://www.yellow-bricks.com/vmware-high-availability-deepdiv/• Compared with “Leave VM Powered on”, this prevent data/transaction integrity risk. The risk is rather low as the VM itself has lock• Compared with “Power off VM”, this allows graceful shutdown. Some application needs to run consistency check after a sudden power

off.

Page 36: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

36 Confidential

Resource Pool What they are not

• A way to organise VM. Use folder for this.• A way to segregate admin access for VM. Use folder for this.

The more VM you put into a Resource Pool, the less each get. • The pool is not per VM. It is for the entire pool. See picture below.• The only way to give the VM guarantee is to set the pool for each VM. This has admin overhead as it’s not easily visible.

Common mistake• Sys Admin created 3 resource pool called Tier 1, Tier 2, Tier 3.

• The follow the relative High, Normal, Low share. So Tier 1 gets 4x the shares of Tier 3.• Place 1 VM on each Tier• Everything is fine. Tier 1 does get 4x the share.• Place 3 more VM on Tier 1. So Tier 1 now has 4 VM• Tier 1 performance drops. The 4 VM are fighting the same share.

Don’t put VM and RP as “sibling” or same level Use sparingly

Page 37: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

37 Confidential

DRS, DPM, EVC In our 3 sizes, here are the settings:

• DRS: Fully Automated• DRS sensitivity: Leave it at default (middle. 3 Star migration)• EVC: turn on.

• It does not reduce performance.• It is a simple mask.

• DPM: turn on. Unless HW vendor shows otherwise• VM affinity: use sparingly. It adds complexity as we are using group affinity.• Group affinity: use (as per diagram in design)

Why turn on DPM• Power cost is real concern

Singapore example: S$0.24 per kWh x (600 W + 600 W) x 24 hours 365 days x 3 years / 1000 W = $5100This is quite close of buying 1 serverFor every 1W of power consumed, we need minimum 1W of power for aircond + UPS + lighting

Page 38: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

38 Confidential

vApp When to use

• When you need to control the boot order across a group of VM within a cluster. • There is no start up order in a cluster. Only in each ESX, which is not applicable when the VMs are boot in different ESX

Other things to consider:• vApp in vSphere 5.0 is not technically the same with vApp in vCloud Director 1.5

Page 39: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

39 Confidential

Application-aware HA Difference with MSCS

• Uses 1 VM, not 2.• No need quorom disk. Hence no need RDM• No patching complexity to keep the 2 in-sync• No need 2 dedicated pNIC per ESX host just for heart-beat• Will not help in Rolling Upgrade.• No need VM-Host grouping.

• Compatible with vMotion, DRS, DPM, snapshot.• Works with iSCSI and NFS, not just FC.• Integrated management into admin client. No separate management• Only works with 64 bit Guest OS. Console needs to be Win08 or R2

Based on Symantec Veritas Cluster technologySee speaker notes for details.

Need agent on each VM.Need 1 console VM.

Page 40: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

40 Confidential

MSCS Consideration

• vSphere offers HA and FT. Translates to less need for MSCS• Unicast mode reassigns the station (MAC) address of the network adapter for which it is enabled and all cluster hosts are assigned the same

MAC address, you cannot have ESX send ARP or RARP to update the physical switch port with the actual MAC address of the NICs as this break the the unicast NLB communication

Limited OS Support• Windows Server 2003 SP2 (32, 64)

• Not R2. SP2 is after R2• Windows Server 2008 R2 (64)

Storage Design• Virtual SCSI adapter

• LSI Logic Parallel for Windows Server 2003• LSI Logic SAS for Windows Server 2008

• Round Robin not supported with native multipathing (NMP)

Boot from SAN for VM• Possible, but complex• Problems that you encounter in physical

environments extend to virtual environments. • For general information about booting from a SAN, see the Fibre Channel SAN Configuration Guide.

Page 41: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

41 Confidential

MS Clustering ESX Port Group properties

• Notify Switches = NO • Forged Transmits = Accept.

When to use vs VMware HA• Shorter Downtime

• HA needs to boot OS• HA needs >15 seconds to detect

• More accurate detection• Application aware• vSphere 5 has API for application awareness

Win08 does not support NFS ESXi changes

• ESXi 5.0 uses a different technique to determine if RDM LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as "perennially reserved" that is participating in a MSCS cluster.

Page 42: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

42 Confidential

Fault Tolerance Design Consideration

• Still limited to 1 vCPU in vSphere 5.0• FT impacts Reservation. It will auto reserve at 100%• Reservation impacts HA Admission Control as slot size is bigger.• HA does not check Slot Size nor actual utilisation when

booting up. It checks Reservation of that affected VM.• FT impacts Resource Pool. Make sure the RP includes the

RAM Overhead. It is smaller in 4.1, but you should still include in your design.

• Cluster Size is minimum 3, recommended 4.

General guides • Assuming 10:1 consolidation ratio, I’d cap FT usage to just

10% of Production VM• So 30 VM means around 3 ESX host means around 3 FT VM.• This translates to 1 Primary VM + 1 Secondary VM per host.• Small cluster size (2-4) are more affected when there is a HA.

See picture for a 3-node example.• Small Size Cloud: 3 FT (with 3 Secondary VM)• Medium Size: 6 FT• Large Size: 6 FT (1 per host)

Limitation• Turn off FT before doing Storage vMotion• FT protect infra, not app. Use Symantec ApplicationHA to protect App

Page 43: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

43 Confidential

Fault Tolerance Tune the application and Windows HAL to use 1 CPU.

Workload Type Application Specifics

Databases The most popular workloads on FT.Small to medium instances. Mostly SQL Server.

MS Exchange and Messaging

BES, Exchange.Gaming company has 750 mailboxes on 1 FT VM.See FT load test at blogs.vmware.com

Web and File servers

File server might be stateless but application using it may be sensitive to denial of service and may be very costly to lose. A simulation relying on a file server might have to be restarted if the file server fails.

Manufacturing and Custom Applications

These workloads keep production lines moving. Breaks result in loss of productivity and material.Examples: Propeller factory, meat factory, pharma line.

SAP SAP ECC 6.0 System based on SAP NetWeaver 7.0 platform. ASCS, a Message and Transaction locking service, is a SPOF.

BlackBerry BlackBerry Enterprise Server 4.1.6 (BES)1 vCPU BES can support 200 users, 100-200 emails/day

Page 44: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

44 Confidential

Server DesignESX Host

Page 45: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

45 Confidential

Approach Know #VM you need to host and their size.

• This gives you idea how many ESX you need.• Think of cluster, not 1 ESX host when sizing the ESXi host• Ideal cluster size is minimum 4. If you use 4-socket host, you will have a 16 socket cluster (128 cores). This is large.

Define the largest VM you need to host• ESXi host should be >2x the largest VM.• Plan for 1 fiscal year, not just next 6 months.

• You should buy host per cluster. This ensures they are the same batch.• Standardise the host spec makes management easier.

• If your largest VM needs >6 core, go for 8 core pCPU. • Ideally, a VM should fit inside socket to minimise NUMA effect. This happens in physical world too.

• If your largest VM needs 30 GB of RAM, then each socket should have 32 GB• I consider RAM overhead.• Extra RAM = Slower boot. ESXi is creating swap file that match the RAM size. You can use reservation to reduce this, so long you use “% based” in

Cluster setting.• Consider the 8 GB per core guidance

• A 12-core ESX box should have 96 GB.• This should be enough to cater for VM with large RAM.

• Assuming your largest VM is 6 vCPU, start with a 2-socket server• At 24 threads (with HT turned-on), this is a good start.

Decide: Blade or Rack Decide: IP or FC storage

Page 46: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

46 Confidential

VMware VMmark Use VMmark as the basis for CPU selection only, not entire box selection.• It is the official benchmark for VMware, and it uses multiple workload• Other benchmark are not run on vSphere, and typically test 1 workload• VMmark does not include TCO. Consider entire cost when choosing HW platform

Use it as a guide only• Your environment is not the same.• You need head room and HA.

How it’s done• VMmark 2.0 uses 1 - 4 vCPU• MS Exchange, MySQL, Apache, J2EE,

File Server, Idle VM Result page: • VMmark 2.0 is not compatible with 1.x results• www.vmware.com/products/vmmark/results.html

This slide needs update

Page 47: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

47 Confidential

VMware VMmark

Page 48: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

48 Confidential

VMmark: sample benchmark result (HP only) I’m only showing result from 1 vendor as vendor comparison is more than just VMmark result. IBM, Dell, HP, Fujitsu, Cisco, Oracle, NEC have VMmark results

Look at this number. 20 tiles = 100 Active VM

This tells us that Xeon 5500 can run 17 Tiles, at 100% utilisation.

Each Tile has 6 VM, but 1 is idle. 17 x 5 VM = 85 active VM in 1 box.

At 80% Peak utilisation, that’s ~65 VM.

Opteron 8439, 24 cores

Xeon 5570, 8 cores

Opteron 2435, 12 cores

Xeon 5470, 8 cores

This number is when comparing with same #Tiles± 10% is ok for real-life sizing. This is benchmark

Page 49: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

49 Confidential

ESX Host: CPU Xeon 5600 delivers 9x improvement over Xeon 5100

• Clock speed only improves by 0.1x and # core by 3x• Fujitsu delivers VMmark result of 38.39 at 27 tiles on July 2010

AMD Opteron at 2 sockets 24 cores is 32.44@22 tiles. An impressive number too.• Xeon is around 18% faster, not a huge margin anymore.• But Xeon uses 12 cores, while Opteron uses 24 cores. So each core is around 2x faster.

Recommendation (for Intel)• Use Xeon 2803 if budget is the constraint and you don’t need to run >6 vCPU VM.• Use Xeon 2820 if you need to run 8 vCPU VM.• Use Xeon 2850 if you need to run 10 vCPU VM

Recommendation (for Intel, 4 socket box)• Use 4807, then 4820, then 4850.

Page 50: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

50 Confidential

ESXi Host: CPU Sizing ESXi Host: CPU

• 2 - 4 vCPU per physical core• This is a general guideline.

• Not meant for sizing Tier 1 Application. Tier 1 App should be given 1:1 sizing.• More applicable for Test/Dev or Tier 3

• 12 core box 24 – 48 vCPU• Design with ~10 VM per box in Production and ~15 VM per box in Non Production.• ~10 VM per box means impact of downtime when host fails are capped at ~10 Production VM.• ~10 VM per box in a 8-node cluster means ~10 VMs may be able to boot in 7 hosts in the event of HA, hence reducing down time.• Based on 10:1 consolidation ratio, if all your VMs are 3 vCPU, then you need 30 vCPU, which means a 12 core ESX gives 2.5:1 CPU oversubcribe.• Based on 15:1 consolidation ratio, if all your VMs are 2 vCPU, then you need 30 vCPU.

• Buffer the following:• HA event• Performance isolation. • Hardware maintenance• Peak: month end, quarter end, year end• Future requirements: within 12 months• DR. If your cluster needs to run VM from the Production site.

Page 51: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

51 Confidential

ESXi Host: RAM sizing How much RAM? It depends on the # core in previous slide.

• Not so simple anymore. Each vendor is different.• 8 GB per core. So 12 core means around 96 GB.• Consider the channel best practice

• Don’t leave some empty. This bring benefits of memory interleaving.

• Performance drops as you put more DIMM per channel• 1 dimm per channel = 1333 mhz ==> total 6 dimms populated• 2 DIMM per channel = 1066 mhz ==> total 12 dimms populated• 3 DIMM per channel = 800mhz ==> 18 dimms populated

• Check with the server vendor on the specific model. • Some models now comes with 16 slots per socket, so you might be able to use lower DIMM size. • Some vendors like HP has similar price between 4 GB and 8 GB.• Dell R710 has 18 DIMM slots (?)• IBM x3650 M3 has 18 DIMM slots• HP DL 380 G7 and BL490cG6/G7 have 18 DIMM slots• Cisco has multiple models

• VMkernel has Home Node concept in NUMA system. For ideal performance, fit a VM within 1 CPU-RAM “pair” to avoid “remote memory” effect.• # of vCPUs + 1 <= # of cores in 1 socket.

So running a 5 vCPU VM in a quad-core will force remote memory situation• VM memory <= memory of one node

• Turn on Large Page, especially for Tier 1. • This means no TPS. That’s why my RAM sizing is relatively large.• Need application-level support

48 GB 48 GB

Page 52: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

52 Confidential

ESXi Host: IO & Management IO card

• 2 FC ports and 10-12 GE NIC ports• Get 2x quad-port NIC. Since the built-in NIC is either 2-4, the box will end up with 10 – 12 ports.

Use a network adapter that supports the following:• Checksum offload• Capability to handle high memory DMA (64-bit DMA addresses)• Capability to handle multiple scatter/gather elements per Tx frame

Management• Lights-out management

• So you don’t have to be in front of physical server to do certain thing (e.g. go into CLI as requested by VMware Support)• Hardware agent is properly configured

• Very important to monitor hardware health due to many VMs in 1 box.

PCI Slot on the motherboard• Since we are using 8 Gb FC HBA, make sure the physical PCI-E slot has sufficient bandwidth.• A single-dual port FC port makes more sense if the saving is high and you need the slot. But there is a risk of bus failure. Also, double check

to ensure the chip can handle the throughput of both ports.• If you are using blade, and have to settle for a single 2-port HBA (instead of two 1-port HBA), then ensure the PCI slot has bandwidth for 16

Gb. When using a dual-port HBA, ensure the chip & bus in the HBA can handle the peak load of 16 Gb.

Page 53: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

53 Confidential

ESXi: Sample host specification Estimated Hardware Cost: US$ 8K per ESXi. Configuration included in the above price:

• 2 Xeon X5650. The E series has different performance & price attributes• 72 GB RAM (18 slots x 4 GB) or 96 GB RAM (12 slots x 8 GB)• 10x 1 GE ports (no hardware iSCSI)• 5 year warranty (next business day)• 2x 50 GB SSD.

• Swap to host-cache feature in ESXi 5• Running agent VM that is IO intensive• Could be handy during troubleshooting. Only need 1 HD as it’s for troubleshooting purpose.

• Embedded ESXi• Installation service• Light-Out Management. Avoid using WoL. Uses IPMI or HP iLO.

Costs not yet included• FC cards. Add around S$2500 for a pair of single-port HBA• LAN switches. Around S$15 K for a pair of 48-port GE switch (total 96 ports)• SAN switches.

Page 54: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

54 Confidential

Blade or Rack Both are good. Both have pro and cons. Table below is relative comparison, not absolute. • Consult principal for specific model. Below is just for guidelines.

Comparison below is only for vSphere purpose. Not for other use case, say HPC or non VMware.

Blade RackRelativeAdvantages

• Some blades come with built-in 2x 10 GE port. To use it, you just need to get 10 GE switch.• Flexibility. Some blade virtualise the 10 GE NIC and can slice it. As usual, adding another layer adds

complexity.• Less cabling. • Better power efficiency. Better rack space efficiency.• Better cooling efficiency. The larger fan (4 RU) is better than the small fan (2 RU) used in rack• Some blade can be stateless. The management software can clone 1 ESX to another.• Better management

• Typical 2RU rack server normally comes with 4 built-in ports.

• Better suited for <20 ESX per site

• More local storage

RelativeDisadvantages

• Some replacement or major upgrade may require all blade to be powered off• Some have limited PCI slots (2 slots). Ensure that the # NIC ports and HBA can be met.• Best practice recommends 2 enclosure. The enclosure is passive in some models, it does not contain

electronic. So there can be initial cost as each chassis needs to have switches too.• Ownership of the SAN/LAN switches in the chassis needs to be made clear.• Need to learn the rules of the chassis/switches. Positioning of the switch matters in some model.• The common USB port in the enclosure may not be accessible by ESX. Need to check with respective blade

vendor.• USB dongle (which you should not use) can only be mounted in front. Make sure it’s short enough that you

can still close the rack door.

• The 1 RU rack server has very small fan, which is not as good as larger fan.

• Less suited when each DC is big enough to have 2 chassis

• Cabling & rewiring

Page 55: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

55 Confidential

Server Selection All Tier 1 vendors (HP, Dell, IBM, Cisco, etc) make great ESXi hosts.• Hence the following guidelines are relatively minor to the base spec.

Additional guidelines for selecting an ESXi Servers:• Does it have Embedded ESXi?• How much local SSD (capacity and IOPS) can it handle? This is useful for stateless desktop architecture. Useful when using local

SSD as cache or virtual storage.• Does it have built-in 2x 10 GE ports?• Does the built-in NIC card have hardware iSCSI capability?• Memory cost. Most ESXi Server has around 64 – 128 GB of RAM, with mostly around 72 GB. With 4 GB DIMM, it needs a lot of

DIMM slots.• What are the server unique features for ESXi?• Management integration. Majority of the server vendors have integrated management with vCenter. Most are free. Dell is not

free, although it has more features?• DPM support?

Page 56: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

56 Confidential

SAN Boot 4 methods of ESXi boot• Local Compact Flash• Local Disk• LAN Boot (PXE) with Auto-Deploy• SAN Boot

For the 3 sample size, we use ESXi Embedded.• Environment with >20 ESXi should consider Auto Deploy. • Auto-Deploy is also good for environment where you need to prove to security team that your ESXi has not been tempered (you

can simply boot it and it is back to “normal” ) Advantages of Local Disk to SAN boot• No SAN complexity• Need to label the LUN properly.

Disadvantages of Local Disk to SAN boot• Need 2 local disk, mirrored.

• Certain organisation does not like local disk.• Disk is a moving part. Lower MTBF.• Save power/cooling

• SAN Boot is a step toward stateless ESXi• An ideal ESX is just pure CPU and RAM. No disk, no PCI card, no identity.

Page 57: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

57 Confidential

Storage Design

Page 58: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

58 Confidential

Methodology

• Once mapping is done, turn on QoS if needed• Turn on Storage IO Control if a particular VM needs certain guarantee.• Turn on Storage IO Control is we want fairness among all VM within the DS• Storage IO Control is per datastore. If underlying LUN shares spindles with all other LUN, then it may not achieve the result. Consult with storage

vendor on this as they have entire array visibility/control.

SLA

Datastore

VM

Mapping

QoSDefine the standard (Storage Driven Profile)

Map each VM to each datastoreCreate another DS if insufficient (either capacity or performance)

See next slide for detail

For each VM, gather:• Capacity (GB)• Performance (IOPS) requirements• Importance to business: Tier 1, 2, 3

Define the Datastore profile.

Map Cluster to Datastore

Page 59: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

59 Confidential

SLA: 3 Tier pools of storage Create 3 Tiers of Storage. • This become the type of Storage Pool provided to VM• Paves for standardisation

• Choose 1 size for each Tier. Keep it consistent. Choose an easy number (e.g. 800 vs 738).• 20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier).

• Use Thin Provisioning at array level, not ESX level.• Separate Production and Non Production• Where can I put this VM without impacting performance…? Always know your datastore

Example below is based on the Large Cloud case. Again, this is an example, not instruction • Small Cloud will have simpler and smaller design. • Replication is to DR Site via array replication, not same building.• Snapshot = protected with array-level snapshot for fast restore• RAID level does not matter so much if Array has sufficient cache (with battery backed, naturally)• RDM will be used for data drive with 1 TB. Virtual-compatibility mode used unless Apps said so.• VMDK larger than 1 TB will be provisioned as RDM. Virtual-compatibility mode used.

Tier Interface IOPS Latency RAID RPO RTO Size Limit Replicated Snapshot # VM

1 FC 2000 10 ms 10 1 hour 1 hour, with SRM 1.0 TB 70% Yes, hourly Yes ~10 VM. EagerZeroedThick

2 FC 1500 15 ms 5 4 hour 4 hour, with SRM 2.0 TB 80% Yes, 4 hourly No ~20 VM. Normal Thick

3 FC 1000 20 ms 5 8 hour 8 hour 3.0 TB 80% No No ~30 VM. Thin Provision

Consult storagevendor for arrayspecific design

Page 60: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

60 Confidential

SLA: Type of Datastores Not all datastores are equal.

• Always know the underlying IOPS & SLA that the Array can provide for a given datastore

You should always know where to place a VM.• Unlike ESX, which has a concept of Cluster, Datastore does not have cluster of datastore.

• vSphere does not yet have Storage DRS. A Datastore “failure” does not auto move the VM to another datastore.• Always have a mental picture where your Tier 1 VM resides. It can’t be “somewhere in the cloud”

Types of datastore• Business VM

• Tier 1 VM, Tier 2 VM, Tier 3 VM, Single VM• Each Tier may have multiple datastores.

• DMZ VM• Mounted only by ESX that has DMZ network

• IT VM• Isolated VM• Template• Desktop VM• SRM Placeholder• Datastore Heartbeat?

• Do we dedicate datastores for it?

1 datastore = 1 LUN• Relative to “1 LUN = Many VMFS”, it gives better performance due to less SCSI reservation

Page 61: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

61 Confidential

Special Purpose Datastore 1 low cost Datastores for ISO and Templates

• Need 1 per vCenter• Need 1 per physical Data Center. Else you will transfer GBs of data across WAN.• Around 400 GB• ISO directory structure:

1 staging/troubleshooting datastore• To isolate a VM. Proof to Apps team that datastore is not affected by other VM.• For storage performance study or issue. Makes it easier to corelate with data from Array.• The underlying spindles should have enough IOPS & Size for the single VM• Our sizing:

• Small Cloud: 200 GB• Large Cloud: 500 GB

1 SRM Placeholder datastore• So you always know where it is. • Sharing with other datastore may confuse others.• Used in SRM 5 to place the VMs metadata so it can be seen in vCenter.• 10 GB enough. Low performance.

\ISO\ \OS\Windows \OS\Linux \Non OS\ store things like anti virus, utilities, etc

Page 62: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

62 Confidential

Arrangement within an array Below is a sample diagram, showing disk grouping inside an array.

• The array has 48 disks. Hot Spare not shown for simplicity• This example only has 1 RAID Group (2+2) for simplicity

Design consideration• Datastore 1 and Datastore 2 performance can impact one another, as they share physical spindles.

• The only way they don’t impact if there are “Share” and “Reservation” concept at “meta slice” level.• Datastore 3, 4, 5, 6 performance can impact one another.• DS 1 and DS 3 can impact each other since they share the same Controller (or SP). This contention happens if the shared component

becomes bottlenect (e.g. cache, RAM, CPU).• The only way to prevent is to implement “Share” or “Reservation” at SP level.

Page 63: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

63 Confidential

Mapping: Cluster - Datastore Always know which cluster mounts what datastores• Keep the diagram simple. Not too many info. The idea is to have a mental picture that you can remember.• If your diagram has too many lines, too many datastores, too many clusters, then it maybe too complex. Create a Pod when such

thing happens. Modularisation can be good.

Page 64: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

64 Confidential

Mapping: Datastore Replication

Page 65: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

65 Confidential

Mapping: Datastore – VM Criteria to use when placing a VM into a Tier:

• How critical is the VM? Importance to business.• What are its performance and availability requirements?• What are its Point-in-Time restoration requirements?• What are its backup requirements?• What are its replication requirements?

Have a document that lists which VM resides on which datastore group• Content can be generated using PowerCLI or Orchestrator, which shows datastores and their VMs.

• Example tool: Quest PowerGUI• While rarely happen, you can’t rule out if datastore metadata get corrupted.

• When that happens, you want to know what VMs are affected.

A VM normally change tiers throughout its life cycle • Criticality is relative and might change for a variety of reasons, including changes in the organization, operational processes, regulatory

requirements, disaster planning, and so on.• Be prepared to do Storage vMotion.

• Always test it first so you know how long it takes in your specific environment• VAAI is critical, else the traffic will impact your other VMs.

Datastore Group VM Name Size (GB) IOPS

Total 12 VM 1 TB 1400 IOPS

Page 66: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

66 Confidential

Storage Calculation We will split System Drive and Data Drive

• Enable changing the OS by swapping the C:\ vmdk file• We use 10 GB for C:\ to cater for Win08 and give space for defragmentation.• We use Thin Provisioning, at array level preferably.

The sample calculation below is for our Small Cloud example• 30 Production VM: 26 non-DB + 3 DB + 1 File Server

• Non-DB VM: 100 GB on average• DB VM: 500 GB on average• File server VM: 2 TB

• 15 Non Production

Capacity IOPS RemarksProduction (non DB) Average D:\ drive is 100 GB

Space needed: 2.6 TB.Datastore: 3 x 1 TB each

100 IOPS x 26 VM = 2600 IOPSConsult with storage team if this is too high. What if they have similar peak period?

This is on the high side, so we don’t have to add buffer for swap file, snapshot, VMFS/NFS buffer

Production (DB) Average D:\ drive is 500 GBSpace needed: 1.5 TBTiering in Production DS

500 IOPS x 3 VM = 1500 IOPS This is on the high side.

IT Cluster Average D:\ drive is 50 GBFile Server is 2 TB

100 IOPS per VM.300 IOPS for File Server

This is on the high side.

Non-Production Average D:\ drive is 100 GBSpace needed: 1.5 TBDatastore: 2 x 1 TB

100 IOPS x 15 VM This is on the high side.

Isolated VM 200 GB 200 IOPS

Total ~ 6.1 TB ~ 6000 IOPS

Page 67: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

67 Confidential

RDM Physical RDM• Can’t take snapshot.• No Storage vMotion. But can do vMotion.• Physical mode specifies minimal SCSI virtualization of the mapped device, allowing the greatest flexibility for SAN management

software.• VMkernel passes all SCSI commands to the device, with one exception: the REPORT LUNs command is virtualized so that the

VMkernel can isolate the LUN to the owning virtual machine. Virtual RDM• Specifies full virtualization of the mapped device. Features like snapshot, etc works• VMkernel sends only READ and WRITE to the mapped device. The mapped device appears to the guest operating system exactly

the same as a virtual disk file in a VMFS volume. The real hardware characteristics are hidden.

Page 68: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

68 Confidential

Multi-Pathing Different protocol has different technology

• NFS, iSCSI, FC all have different solution• NFS uses single-path for a given datastore. No multi-pathing. So use multiple datastore to spread load

In this design, I do not go for high-end array due to cost• High-end Array gives Active/Active, so we don’t have to do regular load balancing.• Most mid-range is Active-Passive. Always ensure the LUNs are balanced among the 2 SP. This is done manually within the array.

Choose ALUA array instead of plain Active/Passive• Less manual work on the balancing and selecting the optimal path.

• Both controller can receive IO request/command, although only 1 owns the LUN.• Path from the managing controller is the optimized path.

• Better utilization of the array storage processors (minimize unnecessary SP failover)• vSphere will show both path as Active, but the Preferred one is marked “Active IO”• Round Robin will issue IO across all optimized paths and will use non-optimized paths only if no optimized paths are available.• See http://www.yellow-bricks.com/2009/09/29/whats-that-alua-exactly/

Array Type My selection

Active/Active Round Robin or Fixed

ALUA Round Robin or MRU

Active/Passive Round Robin or MRU

EMC PowerPath/VE 5.4 SP2

Dell EquaLogic

EquaLogic MMP

HP/HDS PowerPath/VE 5.4 SP2?

Page 69: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

69 Confidential

FC: Multi-pathing VMware recommends minimum 4 paths

• Path is point to point. The Switch in the middle is not part of the path as far as vSphere is concerned.• Ideally, they are all active-active for a given datastore.• Fixed means 1 path active, 3 idle.• 1 zone per HBA port. The zone should see all the Target ports.

If you are buying new SAN Switches, consider the direction for the next 3 years. • Whatever you choose will likely be in your data center for the next 5 years.• If you are buying a Director-class, then consider for the next 5 years.

Upgrading Director is a major work, so plan for 5 years usage. Consider both EOL and EOSL date.

• Discuss with SAN switches vendors and understand their roadmap. • 8 Gb and FCoE are becoming common

Round-Robin• It is per Datastore, not per HBA.

• 1 ESX host typically has multiple datastores.• 1 Array certainly has multiple datastores. • All these datastores share the same SP, Cache, Ports, and possibly spindles.

• It is active/passive at a given datastore.• Leave the default settings of 1000.

No need to set iooperationslimit=1• Choose this over MRU. MRU needs manual fail back after path failure.

Page 70: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

70 Confidential

EMC PowerPath/VE Use EMC Power-Path if possible

• It works with non-EMC storage too. • It costs extra. But no longer needs Enterprise Plus since vSphere 4.1

Advantages of EMC Powerpath over VMware native multi-pathing• Higher throughput

• See http://www.emc.com/collateral/software/white-papers/h6533-performance-optimization-vmware-powerpath-ve-wp.pdf• It provides intelligent path routing with optimized load balancing policies for EMC and non-EMC arrays

• With Symmetrix Optimization and CLARiiON Optimization policies, it considers the number of I/Os, size of I/Os, type of I/Os, queuing delay, latency of recent I/Os, and throughput of I/Os when making a routing decision

• There are eight policies in total• It does path testing of idle and dead paths

• Path testing is load based. A busier system with more I/O completions requires less testing• Testing idle paths to a device ensure I/O success when the application begins driving I/O• Testing dead paths will return them to use quickly

• It responds to decaying paths (paths that have not entirely failed) by redistributing some of the I/O to healthier paths• It provides path failover by proactively shifting I/O load to known viable paths for hosts sharing a common path when a failure occurs• It recognizes the array type and configures the optimized load balancing and failover policies• It provides advanced path management and monitoring

• Integration with vCenter StorageViewer providing path state details• Centralized management using a remote version of the powermt CLI• Path latency monitoring measures the I/O completion time on all HBA paths and provides a user-settable threshold monitor. This aids Storage

Administrators in determining whether an application issue is a result of failing SAN paths

Things to note• Can be installed without impacting VM (need vMotion)• See http://www.emc.com/collateral/software/white-papers/h6340-powerpath-ve-for-vmware-vsphere-wp.pdf

Page 71: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

71 Confidential

Comparison

8 paths to LUN 0.PowerPath uses all 8.RR uses 4MRU uses 1Fixes uses 1

Page 72: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

72 Confidential

FC: Zoning & Masking Implement zoning

• Do it before going live, or during quite maintenance window due to high risk potential• 1 zone per HBA port.

• 1 HBA port does not need to know the existence of others.• This eliminates the Registered State Change Notification

• Use hard zoning, not soft zoning• Hard zone: zone based on the SAN Switch port. Any HBA connects to this switch port get this zone. Be careful when recabling things into the SAN

switch!• Soft zone: zone based on the HBA port. The switch port is irrelevant.• Situation that needs rezoning in Soft Zone: Changing HBA, replacing ESX server (which comes with new HBA), upgrading HBA• Situation that needs rezoning in Hard Zone: reassigning the ESX to another zone, port failure in the SAN switch.

• Virtual HBA can further reduce cost and offer more flexibility

Implement LUN Masking• Complement zoning. Do at array level, not ESX level• Mask on the array, not on each ESXi host

Page 73: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

73 Confidential

FC: Zoning & Masking

See the figure, there are 3 zones. • Zone A has 1 initiator and 1 target• Zone B has two initiators and targets. This is bad.• Zone C has 1 initiator and 1 target• Both SAN switches are connected via an Inter-Switch Link.

• If Host X rebooted and it’s HBA in Zone B logs out of the SAN, an RSCN will be sent to Host Y’s initiator in Zone B and cause all I/O going to that initiator to halt momentarily and recover within seconds.

• Another RSCN will be sent out to Host Y’s initiator in Zone B when Host X’s HBA logs back in to the SAN and cause another momentary halt in I\O. • Initiators in Zone A and Zone C are protected from these events because there are no other initiators in these zones.

• Most latest SAN switches provide RSCN suppression methods. • But suppressing RSCNs is not recommended, since RSCNs are the primary way for initiators to determine an event has occurred and to act on the

specified event such as lost of access to targets.

Page 74: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

74 Confidential

NFS/iSCSI: with Cross-stack Ether-channel Multiple storage IPs are required ESX host requires one VMkernel port. Simpler.

Page 75: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

75 Confidential

NFS/iSCSI: Without Cross-stack Etherchannel Multiple storage and ESX IPs are required ESX requires two VMkernel ports, each on a different subnet Storage node requires IP on each subnet

Page 76: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

76 Confidential

One VMKernel port& IP subnet

Support

multi-switch

Link

aggr?

Use multiple links with

IP hash load balancing on

the NFS client (ESX)

Use multiple links with

IP hash load balancing on

The NFS server (array)

Storage needs multiple

sequential IP addresses

Use multiple VMKernel

Ports & IP subnets

Use ESX routing table

Storage needs multiple

sequential IP addresses

Yes

Page 77: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

77 Confidential

Sample Array Specification Small Cloud. Using NetApp: 5 TB usable• FAS2050A Unified Storage, Dual Controllers, 4GB Cache• 20 Internal x 450GB 15K RPM SAS Drives (4U)

• 9 TB raw (Base10), 5 TB usable (Base2) with 2 hot spares• 4TB usable assuming 20% of capacity used for snapshots • 2 x (7D+2P) RAID-DP with 2 hot spares• Scalable to 104 hard drives with 6 x 14 disk drive shelves• Up to 500 Volumes, 1024 LUNs, 32 FC connected ESX servers • ESX Host Utilities Kit x No. of ESX servers• FC, iSCSI, NFS, CIFS • SnapRestore, NearStore, FAS Deduplication• Operations Manager

Medium Cloud. Using NetApp: 16.5 TB usable• FAS3140A Unified Storage, Dual Controllers, 8GB Cache• 4 shelves x 14 x 450GB 15K RPM FC Drives (18U)

• 25.2 TB raw (Base10), 16.5 TB usable (Base2) with 2 hot spares• 13.2TB usable assuming 20% of capacity used for snapshots • 2 x (14D+2P) + 2 x (9D+2P) RAID-DP with 2 hot spares• Scalable to 420 hard drives with 30 x 14 disk drive shelves• Up to 500 Volumes, 2048 LUN, 256 FC connected ESX servers (recommended up to 64)• ESX Host Utilities Kit x No. of ESX servers• FC, iSCSI, NFS, CIFS • SnapRestore, NearStore, FAS Deduplication• Operations Manager

Page 78: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

78 Confidential

Large Cloud: Reasons for FC (partial list) Network issue does not create storage issue Troubleshooting storage does not mean troubleshooting network too 8 Gb vs 1 Gb. 16 vs 2 Gb in redundant mode• 10 GE is still expensive and need uplink to change too• HP or Cisco blade may provide good alternative here.

Consider the total TCO and not just cost per box.

FC vs IP• FC protocol is more efficient & scalable than

IP protocol for storage• Path failover is <30 seconds, compared with

<60 seconds for iSCSI

Lower CPU cost• See the chart. FC has lowest CPU hit

to process the IO, followed by hardware iSCSI

Storage vMotion• You can estimate the time taken to move 100 GB over

1 Gb path…

FC consideration• Need SAN skills. Troubleshooting skills, not just

Install/Configure/Manage. • Need to be aware of WWWWW. This can impact

upgrade later on as new component may not work with older component

NFS S/W iSCSI

H/W iSCSI

FC0.00

0.20

0.40

0.60

0.80

1.00

1.20

ESX 3.5ESX 4.0

Rel

ativ

e C

PU

cos

t per

I/O

Page 79: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

79 Confidential

VMware Data Recovery We assume the following requirements

• Back up to external array, not the same array.• External Array can be used for other purpose too. So the 2 arrays are backing up each other.• How to ensure Write performance as the array is shared?

• 1x a day back up. No need multiple back up per day on the same VM.

Consideration• Bandwidth: Need dedicated NIC to the Data Recovery VM• Performance: Need to reserve CPU/RAM for the VM?• Group like VM together. It maximises dedupe• Destination: RDM LUN presented via iSCSI to the Appliance. See picture below (hard disk 2)

• Not using VMDK format to enable LUN level operation • Not using CIFS/SMB as Dedupliation Store is 0.5 TB vs 1 TB on RDM/VMDK

• Space calculation: need to find a tool to help estimate the disk requirements.

Page 80: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

80 Confidential

Partition alignment Affects every protocol, and every storage array

• VMFS on iSCSI, FC, & FCoE LUNs• NFS• VMDKs & RDMs with NTFS, EXT3, etc

VMware VMFS partitions that align to 64KB track boundaries give reduced latency and increased throughput• Check with storage vendor if there are any recommendations to follow. • If no recommendations are made, use a starting block that is a multiple of 8 KB.

Responsibility of Storage Team.• Not vSphere Team

On NetApp :• VMFS Partitions automatically aligned. Starting block in multiples of 4k• MBRscan and MBRalign tools available to detect and correct misalignment

Cluster

Chunk

Cluster

Chunk

Cluster

Chunk

BlockVMFS 1MB-8MB

Array 4KB-64KB

FS 4KB-1MB

Page 81: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

81 Confidential

Tools: Array-specific integration The example below is from NetApp. Other Storage partners have integration capability too. Always check with respective product vendor for latest information.

Page 82: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

82 Confidential

Tools: Array-specific integration Management of the Array can be done from vSphere client. Below is from NetApp Ensure storage access is not accidently given to vSphere admin by using RBAC

Page 83: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

83 Confidential

Backup with VADP This applies to Large Cloud in our case as it uses FC 1 back up job per ESX, so impact to production is minimized. Holding Tank no longer required.

Page 84: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

84 Confidential

Backup Server A backup server is an "I/O Machine"• By far, majority of work done is I/O related• Performance of disk is key• Fast internal bus is key. Multiple internal buses desirable. • No share path. 1 port from ESX (source) and 1 port to tape (target)• Lots of data in from clients and out to disk or tape• Not much CPU usage. 1 socket 4-core Xeon 5600 is more than sufficient• Not much RAM usage. 4 GB is more than enough

But Deduplication uses CPU and RAM• Deduplication relies on CPU to compare segments (or blocks) of data to determine if they have been previously backed up or if

they are unique.• This comparison is done in RAM. Consider 32 GB RAM (64 bit Windows)

Size the concurrency properly• Too many simultaneous backups can actually slow the overall backup speed.• Use backup policy to control the number of backups that occur against any datastore. This minimizes that I/O impact on

datastore, as it must still serve production usage. 2 ways of back up:• Mount the VMDK file as a virtual disk (with a drive letter). Back up software can then browse the directory.• Mount the VM as image file.

Page 85: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

85 Confidential

Alternative Backup Method VMware ecosystem may provide new way of doing back up.

• Example below is from NetApp

NetApp SnapManager for Virtual Infrastructure (SMVI)• In Large Cloud, SMVI server should sit on a separate VM from with vCenter.

• While it has no performance requirement, it is best from Segregation of Duty point of view.• Best practice is to keep vCenter clean & simple. vCenter is playing much more critical role in larger environment where plug-ins are relying on vCenter

up time.• Allows for consistent array snapshots & replication.• Combine with other SnapManager products (SM for Exchange, SM for Oracle, etc) for application consistency

• Exchange and SQL work with VMDK• Oracle, SharePoint, SAP require RDM

• Can be combined with SnapVault for vaulting to disk. • 3 levels of data protection :

• On disk array snapshots for fast backup (seconds) & recovery (up to 255 snapshot copies of any datastore can be kept with no performance impact)• Vaulting to separate array for better protection, slightly slower recovery• SnapMirror to offsite for DR purposes

• Serves to minimize backup window (and frozen vmdk when changes are applied)• Option to not create a vm snapshot to create crash consistent array snapshots

Page 86: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

86 Confidential

Network Design

Page 87: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

87 Confidential

Methodology Define how many VLAN you need Decide if you will use 10 GE or 1 GE• If you use 10 GE, define how you will use Network IO Control

Decide if you use IP storage or FC storage Decide the vSwitch to use: local, distributed, Nexus Decide when to use Load Based Teaming Select blade or rack mount• This has impact on NIC ports and Switches

Define the detailed design with vendor

Page 88: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

88 Confidential

Network Architecture

Page 89: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

89 Confidential

Small Cloud: ESXi Network configuration

Page 90: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

90 Confidential

Network At least 5 vSwitches or dvPort per ESX.

DVUplink # Ports Function VLAN Remarks

1 2-4 VM – ProductionVM – Non Production VLAN VM – Admin NetworkVM – Infra VM VLAN

Yes Yes if the VMs need to be on separate VLANA good rule of thumb is ~8 VM’s per GigabitAdmin Network is used for basic network services like DNS server, AD Server. Use vShield App to separate with Production. Complement existing VLAN, no need to create more VLANThe Infra VM is not connected to Production LAN, rather they are connected to Management LAN.

3 2 Management LANVMware ManagementVMware Cluster HeartbeatCisco Nexus ManagementCisco Nexus ControlCisco Nexus Packet

Yes Need to be VLAN.In some cases, the Nexus Control & Nexus Packet need to be physically separated from Nexus Management.

4 2 vMotion No Non routable, private network

5 2 Fault Tolerant No Non routable, private network

6 1 VM – Troubleshooting Maybe Same with Production. Used when we need to isolate the networking performance

7 2 Host-Based Replication Yes Only for ESXi that is assigned to do vSphere replication. From troughput point of view, if the inter-site link is only 1 GE, then you only need 1 GE max.

Page 91: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

91 Confidential

Explanation The diagram shows why we need 8 - 12 GE ports.• Alternatively, we can also use 2x 10 GE ports. This gives higher performance.• Additional NIC ports can be deployed if VM needs > 2 Gb. Choose server with sufficient ports expansion• 2x 10 GE ports give flexibility & higher thruput

Future scalability• Consider 1-2 year ahead while looking at NIC ports. You may need to give more network for VM as you are running more VM , or

running network-demanding VM.• Once wired, it is hard (expensive too) to rewire. All cables are already connected and labelled properly to each physical switch

ports. • Blade needs more planning. If all the PCI slots are occupied, then there may not be choice to expand.

The diagram does not include vShield App yet• It adds 1 hidden vSwitch per vSwitch for VM. Management network does not require vShield Zone protection.

Reason for isolation:• Availability, Performance • Security

• If you use vShield App, then the physical isolation is less of a concern. Port Group Isolation needs Distributed Switch

Use of Jumbo Frames is recommended for best VMotion performance. • Physical switch must support Jumbo Frames too.• Jumbo Frames adds complexity to network design and maintenance over time• Performance gains are marginal with common block sizes (4KB, 8KB, 32KB). vMotion uses large block size• Note: IEEE 802 standards do not recognize jumbo frames

Page 92: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

92 Confidential

Network Settings Load-Based Teaming

• We will not use as we are using 1 GE in this design.• If you use 10 GE, the default settings is a good starting point. It gives VM 2x the share versus hypervisor.

NIC Teaming• If the physical switch can support, then use IP-Hash

• Need a Stacked-Switch. Basically, they can be managed as if they are 1 bigger switch. Multi-chassis EtherChannel Switch is another name.• IP-Hash does not help if the source and address are constant. For example, vMotion always use 1 path only as source-destination pair is constant.

Connection from VMkernel to NFS server is contant,• If the physical switch can’t support, then use Source Port

• You need to manually balance this, so not all VM go via the same port.

VLAN• We are using VST. Physical switch must support VLAN trunking.

PVLAN• Not using in this design. Most physical switches are PVLAN aware already.• Packets will be dropped or security can be compromised if physical switch is not PVLAN aware.

Beacon Probing• Not enabled, as my design only has 2 NIC per vSwitch. ESXi will flood both NIC if it has 2 NIC only.

Review default settings• Change Forged Transmit to Reject.• Change MAC address changes to Reject

Page 93: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

93 Confidential

VLAN Native VLAN

• Native VLAN means the switch can receive and transmit untagged packets.• VLAN hopping occurs when an attacker with authorized access to one VLAN creates packets that trick physical switches into transmitting the

packets to another VLAN that the attacker is not authorized to access. Attacker send forms an ISL or 1Q trunk port to switch by spoofing DTP messages, getting access to all VLANs. Or attacker can send double tagged 1Q packets to hop from one VLAN to another, sending traffic to a station it would otherwise not be able to reach.

• This vulnerability usually results from a switch being misconfigured for native VLAN, as it can receive untagged packets.

Local vSwitches do not support native VLAN. Distributed vSwitch does.• All data passed on these switches is appropriately tagged. However, because physical switches in the network might be configured for native

VLAN, VLANs configured with standard switches can still be vulnerable to VLAN hopping.• If you plan to use VLANs to enforce network security, disable the native VLAN feature for all switches unless you have a compelling reason to

operate some of your VLANs in native mode. If you must use native VLAN, see your switch vendor’s configuration guidelines for this feature.

VLAN 0: the port group can see only untagged (non-VLAN) traffic. VLAN 4095: the port group can see traffic on any VLAN while leaving the VLAN tags intact.

Page 94: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

94 Confidential

Distributed Switch Design consideration

• Version upgrade• ?? Upgrade procedure

Page 95: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

95 Confidential

ESXi: Network configuration with HP Flex-10 If your server is HP blade, then consider HP Flex-10• It gives 2x 10 GE with no card• You get "free" 10Gb for within one chassis

• Up to 4 enclosure if u stack the enclosures.• This become useful for VMotion/FT/Storage

without spending additional money on on 10Gb switch.• External communication can still be 1 GE. Not an issue if most communication is within VMware.• 1x 10 GE port can be split into 4x ports. See screen below. • No changes on vSphere as it’s at BIOS level.• It supports TCP Offloading (TSO/LSO)

With 10 GE per physical card• 2.0 – 2.5 Gb: Mgmt + heart beat + vMotion• 2.0 – 3.0 Gb: VM• 0.7 – 1.5 Gb: Fault Tolerant• 3.0 – 5.0 Gb: NFS.

Things you need to know• Complexity with VST VLAN• Extra cost

Once you decide it’s HP,discuss the detail with HP.

Page 96: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

96 Confidential

ESXi: Network configuration with UCS If you are using Cisco UCS blade• 2x 10G or 4x 10G depending on blade model and mezzanine card

All mezzanine card models support FCoE• Unified I/O • Low Latency

The Cisco Virtualized Adapter (VIC) supports• Multiple virtual adapters per physical adapter• Ethernet & FC on the same adapter• Up to 128 virtual adapters (vNICs)• High Performance 500K IOPS• Ideal for FC, iSCSI and NFS

Once you decide it’s Cisco,discuss the detail with Cisco.

Page 97: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

97 Confidential

VM Design

Page 98: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

98 Confidential

Standard VM sizing: Follow McDonald 1 VM = 1 App = 1 purpose. No bundling of services.• Having multiple application or services in 1 OS tend to create more problem. Apps team knows this better.

Start with Small size, especially for CPU & RAM.• Use as few virtual CPUs (vCPUs) as possible.

• CPU impact on scheduler, hence performance• Hard to take back once you give them. Also, the app might be configured to match the processor (you will not know unless you ask the

application team).• Maintaining a consistent memory view among multiple vCPUs consumes resources.• There is licencing impact if you assign more CPU. vSphere 4.1 multi-core can help (always verify with ISV)• Virtual CPUs not used still consumes timer interrupts and execute the idle loops of the guest OS• In physical world, CPU tend to be oversized. Right size it in virtual world.

• RAM• RAM starts with 1 GB, not 512 MB. Patch can be large (330 MB for XP SP3) and needs RAM• Size impact vMotion, ballooning, etc, so you want to trim the fat• Tier 1 Cluster should use Large Page.

• Anything above XL needs to be discussed case by case. Utilise Hot Add to start small (need DC edition)• See speaker notes for more info

Item Small VM Medium VM Large CustomCPU 1 2 3 4 – 8

RAM 1 GB 2 GB 4 GB 8, 12, 16 GB, etc

Disk 50 GB 100 GB 200 GB 300, 400, etc GB

Page 99: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

99 Confidential

SMP and UP HAL Design Principle• Going from 1 vCPU to many is ok.

• Windows XP and Windows Server 2003 automatically upgrade to the ACPI Multiprocessor HAL• Going from many to 1 is not ok.

To change from 1 vCPU to 2 vCPU• Must change the kernel to SMP.• "In Windows 2000, you can change to any listed HAL type. However, if you select an incorrect HAL, the computer may not start

correctly. Therefore, only compatible HALs are listed in Windows Server 2003 and Windows XP. If you run a multiprocessor HAL with only a single processor installed, the computer typically works as expected, and there is little or no affect on performance.

• http://support.microsoft.com/default.aspx?scid=kb;EN-US;811366• Step to change: http://support.microsoft.com/kb/237556/

To change from many vCPU to 1.• Step is simple. But MS recommends reinstall.

• “In this scenario, an easier solution is to create the image on the ACPI Uniprocessor computer. “• http://kb.vmware.com/kb/1003978• http://support.microsoft.com/kb/309283

Page 100: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

100 Confidential

Infrastructure VM

Purpose CPU RAM Remarks

Admin Client.Win 7 32 bit

For higher security.

1 2 GB Dedicated for vSphere management/administration purpose.vSphere Client has plug-ins. So it’s more convenient to have a ready made client.Higher security than typical administrator personal notebook/desktop, which serve many other purpose (email, internet browsing, MS office, iTunes, etc)Can be placed in the Management LAN. From your laptop, do an RDP jump to this VM. Suitable for SSLFUseful when covering during leave, etc. But do not use shared ID.Softwares installed: Microsoft PowerShell (no need to install CLI as it’s in vMA), VMware Orchestrator

vCenterWin08 R2 64 bit Ent Edition

2 4 GB 1 CPU is not sufficient. 2 vCPU 4 GB RAM 5 GB data drive is enough for 50 ESX and 500 VM. No need to over allocate, especially on vCPU and RAM.Ensure MS IIS is removed prior to vCenter installation~ 1.5 MB of RAM per VM and ~3 MB of RAM per managed hostAvoid installing vCenter on a Domain Controller. But deploy it on a system that is part of the AD domain; facilitates security and flexibility in setting up VC roles, permissions and DB authentication

IT Database ServerWin08 R2 64 bit Ent Edition

2 4 GB SQL Server 2005 64 bit. See next slide. Need to plan carefully.

IT Database ServerWin08 R2 64 bit Ent Edition

2 4 GB SQL Server 2008 64 bit. See next slide. Need to plan carefully.

Update ManagerWin08 R2 64 bit?

1 4 GB 50 GB of D:\ drive for Patch Store is sufficient. Use Thin Provisioning.See: “VMware Update Manager Performance Best Practices” VMworld session.

vShield 1 1 GB Tier 1 as traffic goes here.1 per ESXi host vSwitch (serving VM, not VMkernel)

vShield Manager 1 1 GB Management console only

Patch Management Server 1 4 GB I’m assuming client has the tool in place and wants to continue

Page 101: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

101 Confidential

Infrastructure VM

Purpose CPU RAM RP Tier Resource PoolvMA 1 1 GB 3 Management console only

SRM 4.1 1 2 GB 2 Recommend to separate from VC.

Converter 1 2 GB 1 If possible, do not run in Production Cluster, so it does not impact the ESX utilisationNot set to 3 as you want the conversion process to be completed as soon as possible.

vShield Security VM from partner

2 2 GB 1 Tier 1 as it’s in the data path1 per ESXi host

Cisco Nexus. If you use Nexus

1 2 GB 3 Management console only. Not data pathRequires 100% reservation, so this impacts the cluster Slot Size

Cisco Nexus VSM (HA) 1 2 GB 3 The HA is managed by Cisco Nexus itself, not managed by VMware.

Database Bit Upd Mgr SRM CapIQ 1.5 vCenter Orchestrator View 4.5SQL Server 2005 Std Ed SP2 (not SP3) 32 bit Yes Yes Yes Yes Need SP3

SQL Server 2005 Ent Ed SP2 (not SP3) 64 bit Yes Yes Yes Yes Need SP3

SQL Server 2008 Std Ed (not SP1) 64 bit No Yes Yes Yes Need SP1

SQL Server 2008 Ent Ed (not SP1) 64 bit Yes Yes Yes Yes Need SP1

Oracle 10g Enterprise Edition, R2 64 bit Yes Yes Yes Yes Yes

Oracle 11g Standard Edition, R1 (not R2) 32 bit Yes Yes Yes Yes Yes

Page 102: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

102 Confidential

Time Keeping and Time Drift Critical to have the same time for all ESX and VM. All VM & ESX to get time from the same 1 internal NTP server• Synchronize the NTP Server with an external stratum 1 time source

The Internal NTP server to get time from a reliable external server or real atomic clock• Should be 2 sources

Do not virtualise the NTP server• As a VM, it may experience time drift if ESXi host is under resource constraint

Physical candidates for NTP Server:• Back up server (with vStorage API for Data Protection)• Cisco switch

See MS AD slide for specific MS AD specific impact.

Page 103: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

103 Confidential

MS Windows: Standardisation Data Center edition is cheaper on >6 VM per box MS Licensing is complex. • Table below may not apply in your case

Source: http://www.microsoft.com/windowsserver2008/en/us/hyperv-calculators.aspx

per VM. 10 VM means 10 licence

per 4 VM. 10 VM means 3 licence

per socket. 2 socket means 2 licence. Unlimited VM per box

Page 104: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

104 Confidential

Design: MiscellaneousOther things that don’t fit

Page 105: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

105 Confidential

Enterprise IT space

Separation of Duties with vSphere VMware Admin >< AD Admin

• In small setup, it’s the same person doing both.• AD Admin has access to NTFS. This can be too powerful if it has confidential data

Segregate the virtual world• Split vSphere access into 3.

• Storage• Server• Network

• Give Network to Network team.• Give Storage to Storage team.• Role with all access to vSphere

should be rarely used.• VM owner can be given some access

that they don’t have in physical world. They will like the empowerment (self service)

vSphere space

VMware Admin

Ser ver Adm in

Operator

VM Owner

Operator

VM Owner

St or ag e Adm in

MS AD Admin

Storage Admin

Network Admin DBA Apps

Admin

Page 106: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

106 Confidential

Folder Properly use it• Do not use Resource Pool to organise VM.• Caveat: the Host/Cluster view + VM is

the only view where you can see both ESX and VM. Study the hierarchy on the right• It is Folder everywhere. • Folder is the way to limit access.

• Certain object don’t have its own access control. They rely on folder.• E.g. You cannot set permissions directly on a

vNetwork Distributed Switches. To set permissions, create a folder on top of it.

Page 107: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

107 Confidential

Storage related access Server Admin should not have the following access

• Initiate Storage vMotion• Rename or Move Datastore• Create • Low level file operations

Page 108: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

108 Confidential

Network related access Server Admin should not have the following access

• Move network• This can be a security concern

• Configure network• Remove network

Server Admin should have• Assign network

• To assign a network to a VM

Page 109: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

109 Confidential

Roles and Groups Create new groups for vCenter Server users.

• Avoid using MS AD built-in groups or other existing groups• Do not use default user “Administrator” in any operation

• Each vCenter plug-in should have their own user, so you can differentiate among all the plug-in• Disable the default user “Administrator”

• Use your own personal ID. The idea is security should be trace-able to an individual.• Do not create another generic user (e.g. VMware Admin). This defeats the purpose, and is practically no different to “Administrator”• Creating a generic user increase risk of sharing, since it has no personal data.

Create 2 roles (not user) in MS AD• Network Admin• Storage Admin

Create a unique ID for each of the vSphere plug-in that you use• SRM, Update Manager, Chargeback, CapacityIQ, vShield Zone, Converter, Nexus, etc

• E.g. SRM Admin, Chargeback Admin• This is the ID that the product will use to login to vCenter. • This is not the ID you use to login to this product. Use your personal ID for this purpose.

• This helps in troubleshooting. Otherwise too many “Administrator” and you are not sure who they _really_ are.• Also, if the Administrator password has to change, then you don’t have to change everywhere.

Page 110: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

110 Confidential

Management Network The entire management LAN is isolated from other LAN Within the Management LAN, put a firewall between:• ESXi and vCenter• vSphere Client and ESXi/vCenter. • No need firewall among ESXi. Diagram shows, FYI only

Page 111: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

111 Confidential

vCenter Run vCenter Server as a VM• vCenter Server VM best practices:

• Disable DRS on this VM. So you always know which ESX is hosting it. Some operations require the VM to be powered off• Always remember where you run your vCenter.• Set HA to high priority

• Not connected to Production LAN. Connect to management LAN, so VLAN Trunking required as vSwitches are shared (assuming you are not having dedicated IT Cluster)

• Windows patching can’t be done via Update Manager• VM-level operation that requires the VM to be powered-off, can be done via ESX.

• Login directly to the ESXi host that has the vCenter VM. Do the changes, then boot the VM.

Security• Protect the special-purpose local vSphere administrator account from regular usage. Instead, rely on accounts tied to specific

individuals for clearer accountability. vCenter Server startup• Start in this order: Active Directory DNS vCenter DB vCenter

Other configuration• Statistic Level changed from Level 1 to 2 (more data).

• As we have less than 30 Host and 300 VM, this is fine. See the vSphere DB sizing document• Level 1 is rather limited for anything >60 minutes.

• Sometimes you need to look at the past• Useful in manual capacity planning or troubleshooting or chargeback

• Level 3 is a big jump in terms of data collected

Page 112: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

112 Confidential

Windows VM monitoring Use the new Perfmon counters provided. The built-in from Windows is misleading in virtual environment

Page 113: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

113 Confidential

Guest OS Use 64-bit if possible• Access to > 3 GB RAM.• Performance penalty is generally negligible, or even negative• In Linux VM, Highmem could show significant overheads with 32 bit. 64 bit guests can offer better performance.• Large memory footprint workloads will benefit more with 64 bit guests• Some Microsoft & VMware products have dropped support for 32 bit• Increase scalability in VM.

• Example: for Update Manager 4• If it is installed on 64 bit Windows, it can concurrently scan 4000 VM. But if it’s installed on 32 bit, the concurrency drops to 200• Powered on Windows VM scan per VUM server is 72.‐

• Example: for vCenter 4• If it is installed on 64 bit Windows, it can support 3000 powered-on VM. But if it’s installed on 32 bit, it can only do 2000.• So in larger View deployment, you need more VC.

• Most other numbers are not as drastic as the above example.

Disable unnecessary device from Guest OS Choose the right SCSI controller Set the right IO Time out

• On Windows VM, increase the value of the SCSI TimeoutValue parameter to allow Windows to better tolerate delayed I/O resulting from path failover.

For Windows VM, stagger anti-virus scan. Performance will degrade significantly if you scan all VM simultaneously

Page 114: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

114 Confidential

Naming convention

Component Standard Examples Remarks

Data center Purpose Production This is the virtual data center in vCenter. Normally, a physical data centers has 1 or many virtual data center.As you will only have a few of these, no need to create cryptic naming convention. Avoid renaming it.

Cluster Purpose As above.

ESX host name Esxi_locationcode_##.domain.name esxi_SGP_01.vmware.comesxi_KUL_01.vmware.com

Don’t include version no as it may change.No space.

VM Project_Name Purpose ## Intranet WebServer 01 Don’t include OS name.Can include space

Data store Environment_type_## PROD_FC_01TEST_iSCSI_01DEV_NFS_01Local_ESXname_01

Type is useful when we have multiple type.If you have 1 type, but multiple vendor, you can use vendor name (EMC, IBM, etc) instead.Prefix all Local so they are separated easily in the dialog boxes.

“Admin ID” for v_ProductName V_CapacityIQ All the various plug-in to vSphere needs Admin access.

Avoid special characters as you (or other VMware and 3rd party products or plug-in) may need to access them programmatically.

Page 115: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

115 Confidential

DR Design with SRM

Page 116: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

116 Confidential

Application Analysis Ideally, SRM is done after Business Impact Analysis

• BIA will list all the apps, owners, RTO, RPO, regulatory requirements, dependancy, etc.• Some applications are important to business, but not have high DR priority. • These are normally scheduled/batched apps, like Payroll, Employee Appraisal

Group application by Services• 1 Services has many VM.• Put them in the same Datastore Group, as they will need to fail over together.

For each app to be protected, document the dependancy• Upstream and downstream.• A large multi-tier app can easily span > 10 VM.• Some apps require basic services like AD and DNS to be up.

Type of “DR”• Sometimes there is not a big disaster but a small one. • Examples:

• The core switch is going to be out for 12 hours. • Annual power cycle on the entire building. This happens at Suntec City, which is considered the vertical Silicon Valley of Singapore.

• Define all the Recovery Plans

Consider the time it takes CIO to decide to trigger DR as part of RTO/RPO. Do you have enough CPU/RAM to boot the Production VM during Test Run?

• Identify DR VMs can be suspended. Add their total Reservation

Page 117: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

117 Confidential

Decision Trees Develop decision trees that is tailored to the organisation. Below are 2 examples.

Page 118: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

118 Confidential

Datastore Design and DR

Datastore Group 1

LUN 1

Protection Group 1: MS Exchange

DatastoreVMFS A

LUN 4

LUN 5

VMFS C

VMFS D

Datastore Group 2

Datastore

Datastore

CAS Hub Mailbox

Web

Protection Group 2: SharePoint

SQL

App/OS

DB/log

Protection Group 1

Recovery Plan 1 (Exchange only)

Protection Group 1

Recovery Plan 2 (Exchange and SharePoint)

Protection Group 2

Considerations for setting up LUNs and Protection Groups• Granularity of protection groups – Smallest number of apps to recover at once?

• Application dependencies – which VMs are required for full application recovery?

• Consistency groups – which VM disks need to be replicated on consistent schedule?

• Application consistency – is a separate LUN required for the data, for example to support 3rd party ‘application consistent’ replication?

Page 119: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

© 2009 VMware Inc. All rights reserved

Confidential

Thank You

Page 120: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

120 Confidential

Additional Info (some are still draft)Tech notes that you may find useful as input to the design. A lot more material can be found at the Design Workshop

Page 121: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

121 Confidential

Capacity Planner Version 2.8 does not yet have the full feature for Desktop Cap Plan. Wait for next upgrade.• But you can use it on case by case basis, to collect those demanding desktop.

Default setting of paging threashold does not take into account server RAM. • Best practice for the Paging threshold is 200 Pg/sec/GB. So, you have 48GB RAM x 200= 9600 Pgs/sec.• Reason is that this paging value provides for the lowest latency access to memory pages.• You might get high paging if back up job run.

Create project if you need to separate result (e.g. per data center) Win08 has firewall on. Need to turn off using command line. To be verified in 2.8: You can't change prime time. It's based on the local time zone.

Page 122: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

122 Confidential

VMware HA and DRS Read Duncan’s yellowbrick first.

• Done? Read it again. This time, try to internalise it. See speaker notes below for an example.

vSphere 4.1• Primary Nodes

• Primary nodes hold cluster settings and all “node states” which are synchronized between primaries. Node states hold for instance resource usage information. In case that vCenter is not available the primary nodes will have a rough estimate of the resource occupation and can take this into account when a fail-over needs to occur.

• Primary nodes send heartbeats to primary nodes and secondary nodes. • HA needs at least 1 primary because the “fail-over coordinator” role will be assigned to this primary, this role is also described as “active primary”.• If all primary hosts fail simultaneously no HA initiated restart of the VMs will take place. HA needs at least one primary host to restart VMs. This is why

you can only take four host failures in account when configuring the “host failures” HA admission control policy. (Remember 5 primaries…)• The first 5 hosts that join the VMware HA cluster are automatically selected as primary nodes. All the others are automatically selected as secondary

nodes. A cluster of 5 will be all Primary.• When you do a reconfigure for HA the primary nodes and secondary nodes are selected again, this is at random. The vCenter client does not show

which host is a primary and which is not.

• Secondary Nodes• Secondary nodes send their state info & heartbeats to the primary nodes only.

• HA does not knows if the host is isolated or completely unavailable (down).• The VM lock file is the safety net. In VMFS, the file is not visible. In NFS, it is the .lck file.

Nodes send a heartbeat every 1 second. The mechanism to detect possible outages.

Page 123: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

123 Confidential

vSphere 4.1: HA and DRS Best Practices

• Avoid using advance settings to decrease slot size as it might lead to longer down time. Admission control does not take fragmentation of slots into account when slot sizes are manually defined with advanced settings.

What can go wrong in HA• VM Network lost• HA network lost• Storage Network lost

Failed Not Failed What happen as a result

VM Network HA NetworkStorage Network

Users can’t access VM. If there are active users, they will complain.HA does nothing as it’s not within the scope of HA in vSphere 4.1

HA Network VM NetworkStorage Network

It depends: Split Brain or Partitioned?If the host is isolated, it will execute Isolation Response (shut down VM)Lock is released.Other host will gain lock. Other host will then start the VM

Storage Network Does not matter VM probably crash as it can’t access disk.Lock expires. Host will lose connection to array. Other host (first one to get the lock?) will boot the VM.

Page 124: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

124 Confidential

VMware HA and DRS Split Brain >< Partitioned Cluster

• A large cluster that spans across racks might experience partitioning. Each partition will think they are full cluster. So long there is no loss is storage network, each partition will happily run their own VM.

• Split Brain is when 2 hosts want to run a VM.• Partitioned can happen when the cluster is separated by multiple switches. Diagram below shows a cluster of 4 ESX.

Page 125: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

125 Confidential

HA: Admission Control Policy (% of Cluster) Specify a percentage of capacity that needs to be reserved for failover

• You need to manually set it so it is at least equal to 1 host failure.• E.g. you have a 8 node cluster and wants to handle 2 node failure. Set the % to be 25%

Complexity arises when nodes are not equal• Different RAM or CPU• But this also impact the other Admission Control option. So always keep node size equal, especially in Tier 1.

Total amount of reserved resource < (Available Resources – Reserved Resources) If no reservation is set a default of 256 MHz is used for CPU and 0MB + overhead for MEM

Monitor the thresholds with vCenter on the Cluster’s “summary” tab

Page 126: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

126 Confidential

MS SQL Server 2008: Licensing Always refer to official statement from vendor web site.• Emails, spoken words, SMS from a staff (e.g. Sales Manager, SE) is not legally binding

Licensing a Portion of the Physical Processors If you choose not to license all of the physical processors, you will need to know the number of virtual processors supporting each virtual OSE (data point A) and the number of cores per physical processor/socket (data point B). Typically, each virtual processor is the equivalent of one core

vSphere 4.1 introduce multi-core. Will you save more $? Need to check with MS reseller + official MS documents

Page 127: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

127 Confidential

SQL Server 2008 R2 Get the Express from http://www.microsoft.com/express/Database/ In most cases, the Standard edition will be sufficient. vCenter 4.1 and Update Manager 4.1 does not support the Express edition.• Hopefully Update 1 will?

Page 128: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

128 Confidential

Windows Support

Interesting. It is the other way around.vSphere 4.1 passed the certification for Win08 R2. So Microsoft supports Win03 too.It is version specific. Check for vSphere 5

http://www.windowsservercatalog.com/default.aspx

Page 129: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

129 Confidential

SQL Server: General Best Practices Follow Microsoft Best Practices for SQL Server deployments Defrag SQL Database(s) – http://support.microsoft.com/kb/943345 Preferably 4-vCPU, 8+GB RAM for medium/larger deployments Design back-end to support required workload (IOPS) Monitor Database & Log Disks -Disks Reads/Writes, Disk Queues Separate Data, Log, TempDB etc., IO

Use Dual Fibre Channel Paths to storage• Not possible in vmdk

Use RAID 5 for database & RAID 1 for logs in read-intensive deployments Use RAID 10 for database & RAID 1 for logs for larger deployments SQL 2005 TempDB (need to update to 2008)• Move TempDB files to dedicated LUN• Use RAID 10 • # of TempDB files = # of CPU cores (consolidation) • All TempDB files should be equal in size• Pre-allocate TempDB space to accommodate expected workload

• Set file growth increment large enough to minimize TempDB expansions. • Microsoft recommends setting the TempDB files FILEGROWTH increment to 10%

Page 130: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

130 Confidential

What is SQL Database Mirroring? Database-level replication over IP…, no shared storage requirement Same advantages as failover clustering (service availability, patching, etc.) At least two copies of the data…, protection from data corruption (unlike failover clustering) Automatic failover for supported applications (DNS alias required for legacy) Works with SRM too. VMs recover according to SRM recovery plan

Page 131: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

131 Confidential

VMware HA with Database Mirroring for Faster Recovery

Highlights:• Can use Standard Windows and SQL Server editions• Does not require Microsoft clustering• Protection against HW/SW failures and DB corruption• Storage flexibility (FC, iSCSI, NFS)• RTO in few seconds (High Safety)• vMotion, DRS, and HA are fully supported!

Note:• Must use High Safety Mode

for Automatic Failover• Clients applications must be aware of Mirror or use DNS Alias

Page 132: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

132 Confidential

Symantec ApplicationHA Can install agent to multiple VM simultaneously Additional Roles for security It does not cover Oracle yet Presales contact for ASEAN: Vic

Page 133: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

133 Confidential

MS SharePoint 2010 Go for 1 VM = 1 Role

Page 134: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

134 Confidential

Java Application RAM best practice

• Size the virtual machine’s memory to leave adequate space• For the Java heap• For the other memory demands of the Java Virtual Machine code • For any other concurrently executing process that needs memory from the same guest operating system • To prevent swapping in the guest OS

• Do not reserve RAM 100% unless HA Cluster is not based on Host Failure.• This will impact HA Slot Size

• Consider the VMware vFabric as it takes advantage of vSphere.

Others• Use the Java features for lower resolution timing as supplied by your JVM (Windows/Sun JVM example: -XX:+ForceTimeHighResolution)• Use as few virtual CPUs as are practical for your application • Avoid using /pmtimer in boot.ini for Windows with SMP HAL

Page 135: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

135 Confidential

VM Size for Benchmark 4 vCPU 8 vCPU

SD Users 1144 2056

Response Time (s) 0.97 0.98

SAPS 6250 11230

VM CPU Utilization 98% 97%

ESX Server CPU Utilization <30% <80 %

SAP No new benchmark data on Xeon 5600.

• Need to check latest Intel data.

Regarding the vSphere benchmark.• It’s a standard SAP SD 2-tier benchmark. In real life, we should split DB and CI instance, hence cater for more users• vSphere 4.0, not 4.1• SLES 10 with MaxDB• Xeon 5570, not 5680 or Xeon 7500 series.• SAP ERP 6.0 (Unicode) with Enhancement Package 4

Around 1500 SAPS per core• Virtual at 93% to 95% of Native Performance. For sizing, we can take 90%

of physical result.• Older UNIX servers (2006 – 2007) are good candidates for migration to

X64 due to low SAPS per core.

Central Instance can be considered for FT. • 1 vCPU is enough for most cases

Page 136: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

136 Confidential

SAP 3-Tier SD Benchmark

Page 137: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

137 Confidential

MS AD Good candidate.

• 1 vCPU 2 GB RAM are sufficient. Use the UP HAL.• 100,000 users require up to 2.75GB of memory to cache directory (x86)• 3 Million users require up to 32GB of memory to cache entire directory (x64)

• Disk is rather small• Disk2 (D:) for Database. Around ~16GB or greater for larger directories• Disk3 (L:) for Log files. Around 25% of the database LUN size

Changes in MS AD design once all AD are virtualised• VM is not a reliable source of time. Time drift may happens inside a VM.• Instead of synchronising with the Forest PDC emulator or the “parent” AD, synchronise with Internal NTP Server.

Best practices• Set the VM to auto boot.• Boot Order

• vShield VM• AD• vCenter DB• vCenter App

• Regularly monitor Active Directory replication• Perform regular system state backups as these are

still very important to your recovery plan

Page 138: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

138 Confidential

Exchange 2003 32-bit Windows 900 MB database cache 4 Kb block size High read/write ratio

Exchange 2007 64-bit Windows 32+ GB database cache 8 Kb block size 1:1 read/write ratio 70% reduction in disk I/O

Exchange 2010 64-bit Windows

32 Kb block size I/O pattern optimization Further 50% I/O reduction

MS Exchange Exchange has become leaner

and more scalable

Building block CPU and RAM sizing for 150 sent/received• http://technet.microsoft.com/en-us/library/ee712771.aspx

Database Availability Group (DAG)• DAG feature in Exchange 2010 necessitates a different approach to sizing the Mailbox Server role, forcing the administrator to account for

both active and passive mailboxes.• Mailbox Servers that are members of a DAG can host one or more passive databases in addition to any active databases for which they may

be responsible.• Not supported by MS when combined

Building Block 1000 mail box

Profile 150 sent/received daily

Megacycle Requirement 3,000

vCPU 2 (1.3 actual)

Cache Requirement 9 GB

Total Memory Size 16 GB

Page 139: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

139 Confidential

VMware HA + DAGs (no MS support)

Protects from hardware and application failure• Immediate failover (~ 3 to 5 secs)• HA decreases the time the database is in an

‘unprotected state’

No passive servers. Windows Enterprise edition. Exchange Standard or Enterprise editions Complex configuration and capacity planning 2x or more storage needed Not officially supported by Microsoft

Page 140: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

140 Confidential

Realtime Applications Overall: Extremely Latency Sensitive

• All apps are somewhat latency sensitive• RT apps break with extra latency

“Hard Realtime Systems”• Financial trading systems• Pacemakers

“Soft Realtime Systems”• Telecom: Voice over IP

• Technically Challenging, but possible. Mitel and Cisco both provide official support. Need 100% reservation.• Not life-or-death risky

Financial Desktop Apps (need hardware PCoIP)• Market News• Live Video• Stock Quotes• Portfolio Updates

Page 141: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

141 Confidential

File Server Why virtualise?

• Cheaper• Simpler.

Why not virtualise• You already have an NFS server• You don’t want additional layer.

Page 142: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

142 Confidential

Management | Network | Storage

Page 143: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

143 Confidential

Security Compliance: PCI DSS PCI applies to all systems “in scope”

• Segmentation defines scope• What is within scope? All systems that Store, Process, or Transmit

cardholder data, and all system components that are in or connected to the cardholder data environment (CDE).

The DSS is vendor agnostic• Does not seem to cover virtualisation.

Relevant statements from PCI DSS• “If network segmentation is in place and will be used to reduce the scope

of the PCI DSS assessment, the assessor must verify that the segmentation is adequate to reduce the scope of the assessment.” - (PCI DSS p.6)

• “Network segmentation can be achieved through internal network firewalls, routers with strong access control lists or other technology that restricts access to a particular segment of a network.” – PCI DSS p. 6

• “At a high level, adequate network segmentation isolates systems that store, process, or transmit cardholder data from those that do not. However, the adequacy of a specific implementation of network segmentation is highly variable and dependent upon such things as a given network's configuration, the technologies deployed, and other controls that may be implemented. “– PCI DSS p. 6

• “Documenting cardholder data flows via a dataflow diagram helps fully understand all cardholder data flows and ensures that any network segmentation is effective at isolating the cardholder data environment.” – p.6

Page 144: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

144 Confidential

Security Compliance: PCI DSS Added complexity from Virtualisation

• System boundaries are not as clear as their non-virtual counterparts• Even the simplest network is rather complicated• More components, more complexity, more areas for risk• Digital forensic risks are more complicated• More systems are required for logging and monitoring• More access control systems• Memory can be written to disk• VM Escape?• Mixed Mode environments

Sample Virtualized CDE

Page 145: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

145 Confidential

Requirement Unique Risks to Virtual Environments How you can address them

3.Protect stored cardholder data.

There is a chance that memory that was previously only stored as volatile memory may now be written to disk as stored (i.e., taking snapshots of systems).

How are memory resources and other shared resources protected from access? (How do you know that there are no remnants of stored data?)

Apply data retention and disposal policy to CDE-VMs, snap-shots, and any other components which have the possibility of storing CHD, encryption keys, passwords, etc.

Document storage configuration and SAN implementation.

Document any encryption process, encryption keys, & encryption key management used to protects stored CHD?

Fully isolate the VMotion network to ensure that as hosts are moved from one physic server to another, memory and other sensitive running data cannot be sniffed or logged.

7.Restrict access to cardholder data by business need-to-know.

Access controls are more complicated. In addition to hosts, there are now additional applications, virtual components, and storage of these components (i.e., what protects their access while they are waiting to be provisioned).

Organizations should carefully document all the access controls in place, and ensure that there are separate access controls for different “security zones.”

Document all the types of different Role Based Access Controls (RBAC) used for access to physical hosts, virtual hosts, physical infrastructure, virtual infrastructure, logging systems, IDS/IPS, multi-factor authentication, and console access.

Ensure that physical hosts do not rely on virtual RBAC systems that they host.

9.Restrict physical access to cardholder data.

Risks are greater since physical access to the hypervisor could lead to logical access to every component.

Ensure that you are considering physical protection in your D/R site.

Address the risk that physical access to a single server or SAN can result in logical access to hundreds of servers.

10.Track and monitor all access to network resources and cardholder data.

Some virtual components do not have the robust logging capabilities of their physical counterparts. Many systems are designed for troubleshooting and are not designed to create detailed event and system logs which provide sufficient detail to meet PCI logging requirements and assist with a digital forensic investigation.

PCI requires logs to be stored in a central location that is independent of the systems being logged.

Establish unified and centralized log management solutions which cannot be altered or disabled by access to the hypervisor.

ESX logs should not be stored on a virtual host on the same ESX server, as compromising the ESX server could compromise the logs. Be prepared to demonstrate that the logs are forensically sound.

PCI: Virtualization Risks by Requirement

Page 146: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

146 Confidential

vNetwork Appliances

Advantages• Flexible deployment• Scales naturally as more ESX hosts are deployed

Architecture• Fastpath agent filter packets in datapath, transparent to vSwitch• Optionally forward packets to VM (slowpath agent)

Solutions• VMware vShield, Reflex, Altor, Checkpoint, etc.

Lightweight filtering in “Fast Path”

agent

Heavyweight filtering in “Slow Path” agent

Page 147: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

147 Confidential

vShield

Setup Perimeter servicesInstall vShield Edge

• External – InternalProvision Services

• Firewall• NAT, DHCP• VPN• Load Balancer

Setup Internal Trust ZonesInstall vShield App

• vDS / dvfilter setup• Secure access to shared services

Create interior zones• Segment internal net• Wire up VMs

Shared Services

APP DBDMZ

INTERNET

vShield Edge

Org vDC

vSphere vSphere vSphere vSphere

Virtual Distributed Switch

vShield App vShield App vShield App

Page 148: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

148 Confidential

vShield and Fail-Safe http://www.virtualizationpractice.com/blog/?p=9436

Page 149: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

149 Confidential

Security Steps to delete “Administrator” from vCenter

• Move it to the “No Access” role. Protect it with alarm if this is modified.• All other plug-in or mgmt products that use Administrator will break

Steps to delete “root” from ESX• Replaced with another ID. Can’t be tied to AD?• Manual warns of removing this user.

Create another ID with root group membership• vSphere 4.1 now support MS AD integration

Page 150: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

150 Confidential

Managing remote sites Some small sites may not warrant its own vCenter

• Consider vSphere Essential Plus ROBO edition. Need 10 sites for best financial return.

Features that are network heavy should be avoided.• Auto deploy means sending around 150 MBtye. If link is 10 Mbit shared, it will add up.

Best practices• Install a copy of template at remote site.• Increase vCenter Server and vSphere hosts timeout values to ~3 hours• Consider manual vCenter agent installs prior to connecting ESX hosts• Use RDP/SSH instead of Remote Console for VM console access

• If absolutely needed, reduce remote console displays to smaller values, e.g. 800x600/16-bit

vCenter 5 improvement over 4.1 on remote ESX• Can use web client if vCenter is remote, which uses less bandwidth• No other significant changes

vCenter 4.1 improvements over 4.0: • 1.5x to 4.5x improvement in operational time associated with typical vCenter management tasks• All traffic between vCenter and ESXi hosts is now compressed• Statistics data sent between hosts and vCenter Server flows over TCP, not UDP; eliminates lost metrics • Most vCenter operations fare well over 64 Kbps links

Certain vCenter operations that involve a heavier payload• E.g. Add Host, vCenter agent upgrades, HA enablement, Update Manager based host patching

Page 151: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

151 Confidential

Linux New features in ext4 filesystem:• Extents reduce fragmentation• Persistent preallocation• Delayed allocation• Journal checksumming• fsck is much faster

RHEL 6 & ext4 properly align filesystems Tips: use the latest OS• Constant Improvements

• Built-in paravirtual drivers• Better timekeeping• Tickless kernel. On-demand timer interrupts. Systems stay totally idle

• Hot-add capabilities• Reduces need to oversize “just in case”• Might need to tweak udev. See VMware KB 1015501

• Watch for jobs that happen at the same time (across VM)• Monitoring (every 5 minutes)• Log rotation (4 AM)

• Don’t need sysstat & sar running. Use vCenter metrics instead

Page 152: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

152 Confidential

Guest Optimization Swap File Location Swap file for Windows guests should be on separate dedicated drives

• Cons:• This requires another vmdk file. Management overhead as it has to be resized when RAM changes too.

• Pro:• No need to back up• Keep the application traffic and OS disk traffic separate from the page file traffic thereby increasing performance.

• Swap partition equal to 1.5x RAM• 1.5x is the default recommendation for best performance (knowing nothing about the application). • Monitor the page file usage to see how much of it is actually being used, in the old days whatever memory was installed was what they were

committed to and making a change was an act of congress, look to leverage the virtual flexibility and modify for best usage.• http://support.microsoft.com/kb/889654 Microsoft limits on page file’s .

Microsoft’s memory recommendations and definition of physical address extension explained http://support.microsoft.com/?kbid=555223

152

Page 153: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

153 Confidential

Snapshot Only keep for maximum 1-3 days.

• Delete or commit as soon as you are done.• A large snapshot may cause issue when committing/deleting.

For high transaction VM, delete/commit as soon as you are done verifying• E.g. databases, emails.

3rd party tool• Snapshots taken by third party software (called via API) may not show up in the vCenter Snapshot Manager. Routinely check for snapshots

via the command-line.

Increasing the size of a disk with snapshots present can lead to corruption of the snapshots and potential data loss. • Check for snapshot via CLI before you increase

Page 154: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

154 Confidential

vMotion Can be encrypted. At a cost. If vMotion network is isolated, then there is no need. May lose 1 ping.

Page 155: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

155 Confidential

P2V Avoid if possible. Best practice is to install from template (which was optimised for virtual machine)

• Remove unneeded devices after P2V

MS does not support P2V of AD Domain Controller. Static servers are good candidate for P2V:

• Web servers, print servers

Servers with retail licence/key will require Windows reactivation. Too many hardware changes. Resize

• Relative CPU comparison• MS Domain Controller: 1 vCPU, 2 GB is enough.

Page 156: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

156 Confidential

VM

Clone & Power On VM

OS

Tools

VM

Upload customizationagent and parameters

OS

Tools

VM

Run agent tomodify OS

OS

Tools

VM

Shutdown

OS

Tools

Classic (vCenter template Style):

Modern (vApp Style):

VM

Template

OS

ToolsCreate

VM

OS

ToolsDeploy &Customize

Clone

Deployment Parameters

VM

OS

ToolsPower On

Self-Customize

Deployment Approach

Studio 2.1 automates template creation• Building templates from build

profiles

• Windows- and Linux- support

• Full OVF 1.1 integration, including OVF properties and OVF environment

• Free download from http://www.vmware.com/go/studio

Page 157: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

157 Confidential

vMA: Centralised Logging Benefits

• Ability to search across ESX• convenience

Best practices• One vMA per 100 hosts with vilogger• Place vMA on management LAN• Use static IP address, FQDN and DNS• Limit use of resxtop (used for real time troubleshooting not monitoring)

Enable remote system logging for targets• vilogger (enable/disable/updatepolicy/list)

• Rotation default is 5• Maxfiles defaults to 5MB• Collection period is 10 seconds

• ESX/ESXi log files go to /var/log/vmware/<hostname>• vxpa logs are not sent to syslog

• See KB1017658

Page 158: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

158 Confidential

Feature Comparison Among Switches (partial)

Feature vSS vDS Cisco N1K

VLAN yes yes yes

Port Security yes yes yes

Multicast Support yes yes yes

Link Aggregation static static LACP

Traffic Management limited yes yes

Private VLAN no yes yes

SNMP, etc. no no yes

Management Interface vSphere Client vSphere client Cisco CLI

Netflow No yes yes

Page 159: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

159 Confidential

Port Groups are policy definitions for a set

or group of ports.e.g. VLAN membership,

port security policy,teaming policy, etc

Uplinks (physical NICs)attached to vSwitch.

vNetwork Standard Switch (vSwitch)

vSS defined on a per host basis from Home Inventory Hosts and Clusters.

vNetwork Standard Switch: A Closer Look

Page 160: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

160 Confidential

DV Port Groups span all hosts

covered by vDSand are groups

of portsdefined with the

same policye.g. VLAN, etc

DV Uplink Port Groupdefines uplink policies

DV Uplinks abstractactual physical nics (vmnics) on hosts

vmnics on each hostmapped to dvUplinks

vNetwork Distributed Switch: A Closer Look vDS operates off the local cache – No operational dependency on vCenter server

• Host local cache under /etc/vmware/dvsdata.db and /vmfs/volumes/<datastore>/.dvsdata• Local cache is a binary file. Do not hand edit

Page 161: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

161 Confidential

Nexus 1000V: VSM VM properties

• Each requires a 1 vCPU, 2 GB RAM. Must be reserved, so it will impact the cluster Slot Size.• Use “Other Linux 64-bit" as the Guest OS.• Each needs 3 vNIC. • Requires the Intel e1000 network driver. Because No VMware Tools installed?

Availability• 2 VSMs are deployed in an active-standby configuration, with the first VSM functioning in the primary role and the other VSM functioning in

a secondary role. • If the primary VSM fails, the secondary VSM will take over. • They do not use VMware HA mechanism.

Unlike cross-bar based modular switching platforms, the VSM is not in the data path. • General data packets are not forwarded to the VSM to be processed, but rather switched by the VEM directly.

Page 162: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

162 Confidential

Nexus 1000V: VSM has 3 Interface for “mgmt” Control Interface

• VSM – VEMs communication, and VSM – VSM communication• Handles low-level control packets such as heartbeats as well as any configuration data that needs to be exchanged between the VSM and

VEM. Because of the nature of the traffic carried over the control interface, it is the most important interface in Nexus 1000V• Requires very little bandwidth (<10 KBps) but demands absolute priority. • Always the first interface on the VSM. Usually labeled "Network Adapter 1" in the VM network properties.

Management Interface• VSM – vCenter communication.• Appears as the mgmt0 port on a Cisco switch. As with the management interfaces of other Cisco switches, an IP address is assigned to

mgmt0. • Does not necessarily require its own VLAN. In fact, you could use the same VLAN with vCenter

Packet Interface• carry network packets that need to be coordinated across the entire Nexus 1000V. Only two type of control traffic: Cisco Discovery Protocol

and Internet Group Management Protocol (IGMP) control packets.• Always the third interface on the VSM and is usually labeled "Network Adapter 3" in the VM network properties.• Bandwidth required for packet interface is extremely low, and its use is very intermittent. If Cisco Discovery Protocol and IGMP features are

turned off, there is no packet traffic at all. The importance of this interface is directly related to the use of IGMP. If IGMP is not deployed, then this interface is used only for Cisco Discovery Protocol, which is not considered a critical switch function

Page 163: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

163 Confidential

vNetwork Distributed Portgroup Binding

Port Binding: Association of a virtual adapter with a dvPort

Static Binding: Default configuration• Port bound when vnic connects to portgroup

Dynamic binding• Use when #VM adapters > #dvPorts in a portgroup and all VMs are not active

Ephemeral binding• Use when #VMs > #dvPorts and port history is not relevant• Max Ports is not enforced

VMware ESX

ProxySwitch

DVPort created on proxySwitch and bound to vnic

Use static binding for best performance and scale

Page 164: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

164 Confidential

Physical Wire

SCSI

Network Stack Comparison Good attributes of FCoE

• Has less overhead than FCIP or iSCSI. See diagram below.• FCoE is managed like FC at initiator, target, and switch level • Mapping FC frames over Ethernet Transport• Enables Fibre Channel to run over a lossless Ethernet medium• Single Adapter, less device proliferation, lower power consumption• No gateways required • NAS certification: FCoE CNAs can be used to certify NAS storage. Existing NAS devices listed on VMware SAN Compatibility Guide do not

require recertification with FCoE CNAs.

Mixing of technologies always increase complexity

Ethernet

IP

TCP

iSCSI FCIP

FCoE FC

FCP

SCSI iSCSI FCIP FCoE FC

Page 165: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

165 Confidential

Physical Switch Setup Spanning Tree Protocol

• vSwitch won’t create loops• vSwitch can’t be linked.• vSwitch does not take incoming packet from pNIC and forward as outgoing

packet to another pNIC

Recommendations1. Leave STP on in physical network2. Use “portfast” on ESX facing ports3. Use “bpduguard” to enforce STP boundary

VM0 VM1

vSwitch

Physical Switches

vSwitch

MAC a MAC b MAC c

Page 166: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

166 Confidential

1 GE switch Sample from Dell.com (US site, not Singapore) Around US$5 K. Need a pair. 48 ports

• Each ESXi needs around 7 – 13 ports (inclusive of iLO port)

Page 167: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

167 Confidential

10 GE switch Sample from Dell.com (US site, not Singapore) Around US$10 – 11 K. Need a pair. 24 ports

• Each ESXi only need 2 port• iLO port can connect to existing GE/FE switch

Compared with 1 GE switch,Price is very close. Might be even cheaper in TCO

Page 168: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

168 Confidential

10 GE design Design consideration• We only have 2 physical port.

• This means we only have 1 vSwitch. This means we must choose between Local Switch or Distributed Switch (or Nexus). • This is less than ideal. Ideally, we should keep Management port on the local/simple vSwitch. • Some customers have gone with 4 physical ports as 20 GE may not be enough for both Storage and Network

• Distributed Switch relies on vCenter• Database corruption on vCenter will impact it.• vCenter availability is more critical.

• Use Load Based Teaming• This prevents one burst from impacting Production. For example, a large vMotion (vSphere 4.1 can do 8 concurrent vMotion) can send a

lot of traffic.

Page 169: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

169 Confidential

Unified Fabric with Fabric Extender

Multiple points of managementFCEthernetBlade switches

High cable count

Unified fabric with Fabric extender Single point of management Reduced cables

Fiber between racksCopper in racks

End of Row Deployment Fabric Extender

Page 170: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

170 Confidential

Storage IO Control Suggested Congestion Threshold values

One: Avoid different settings for datastores sharing underlying resources• Use same congestion threshold on A, B• Use comparable share values

(e.g. use Low/Normal/High everywhere)

Storage Media Congestion Threshold

Solid State Disks 10 - 15 milliseconds

Fiber Channel 20 - 30 milliseconds

SAS 20 - 30 milliseconds

SATA 30 - 50 milliseconds

Auto-tiered StorageFull LUN auto - tiering

Vendor recommended value. If none provided, recommended threshold from above for the slowest storage

Auto-tiered StorageBlock level / sub-LUN auto - tiering

Vendor recommended value. If none provided, combination of thresholds from above for the fastest and the slowest media types

Physical drives

Datastore A Datastore B

SIOC SIOC

Page 171: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

171 Confidential

NAS & NFS Two key NAS protocols: • NFS (the “Network File System”). This is what we support.• SMB (Windows networking, also known as “CIFS”)

Things to know about NFS• “Simpler” for person who are not familiar with SAN complexity• To remove a VM lock is simpler as it’s visible.

• When ESX Server accesses a VM disk file on an NFS-based datastore, a special .lck-XXX lock file is generated in the same directory where the disk file resides to prevent other ESX Server hosts from accessing this virtual disk file.

• Don’t remove the .lck-XXX lock file, otherwise the running VM will not be able to access its virtual disk file.• No SCSI reservation. This is a minor issue• 1 Datastore will only use 1 path

• Does Load Based Teaming work with it?• For 1 GE, throughput will peak at 100 MB/s. At 16 K block size, that’s 7500 IOPS.

• The Vmkernel in vSphere 5 only supports NFS v3, not v4. Over TCP only, no support for UDP.• MSCS (Microsoft Clustering) is not supported with NAS.• NFS traffic by default is sent in clear text since ESX does not encrypt it.

• Use only NAS storage over trusted networks. Layer 2 VLANs are another good choice here.• 10 Gb NFS is supported. So is Jumbo Frames.• Deduplication can save sizeable amount. See speaker notes

Page 172: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

172 Confidential

iSCSI Use Virtual port storage system instead of plain Active/Active• I’m not sure if they cost much more.

Has 1 additional Array Type over traditional FC: Virtual port storage system• Allows access to all available LUNs through a single virtual port. • These are active-active Array, but hide their multiple connections though a single port. ESXi multipathing cannot detect the

multiple connections to the storage. ESXi does not see multiple ports on the storage and cannot choose the storage port it connects to. These array handle port failover and connection balancing transparently. This is often referred to as transparent failover

• The storage system uses this technique to spread the load across available ports.

Page 173: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

173 Confidential

iSCSI Limitations• ESX/ESXi does not support iSCSI-connected tape devices.• You cannot use virtual-machine multipathing software to perform I/O load balancing to a single physical LUN.• A host cannot access the same LUN when it uses dependent and independent hardware iSCSI adapters simultaneously.• Broadcom iSCSI adapters do not support IPv6 and Jumbo Frames.• Some storage systems do not support multiple sessions from the same initiator name or endpoint. Multiple sessions to such

targets can result in unpredictable behavior. Dependant and Independent• A dependent hardware iSCSI adapter is a third-party adapter that depends on VMware networking, and iSCSI configuration and

management interfaces provided by VMware. This type of adapter can be a card, such as a Broadcom 5709 NIC, that presents a standard network adapter and iSCSI offload functionality for the same port. The iSCSI offload functionality appears on the list of storage adapters as an iSCSI adapter

Error correction• To protect the integrity of iSCSI headers and data, the iSCSI protocol defines error correction methods known as header digests

and data digests. These digests pertain to the header and SCSI data being transferred between iSCSI initiators and targets, in both directions.

• Both parameters are disabled by default, but you can enable them. Impact CPU. Nehalem processors offload the iSCSI digest calculations, thus reducing the impact on performance

Hardware iSCSI• When you use a dependent hardware iSCSI adapter, performance reporting for a NIC associated with the adapter might show

little or no activity, even when iSCSI traffic is heavy. This behavior occurs because the iSCSI traffic bypasses the regular networking stack

Page 174: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

174 Confidential

iSCSI & NFS: caveat when used together

Avoid using them together iSCSI and NFS have different HA models.

• iSCSI uses vmknics with no Ethernet failover – using MPIO instead• NFS client relies on vmknics using link aggregation/Ethernet failover• NFS relies on host routing table.• NFS traffic will use iSCSI vmknic and results in links without

redundancy• Use of multiple session iSCSI with NFS is not supported by NetApp• EMC supports, but best practice is to have separate subnets, virtual

interfaces

Page 175: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

175 Confidential

NPIV What it is

• Allow a single Fibre Channel HBA port to register with the Fibre Channel fabric using several worldwide port names (WWPNs). This ability makes the HBA port appear as multiple virtual ports, each having its own ID and virtual port name. Virtual machines can then claim each of these virtual ports and use them for all RDM traffic.

• Note that is WWPN, not WWNN• WWPN – World Wide Port Name• WWNN – World Wide Node Name• Single port HBA typically has a single WWNN and a single WWPN (which may be the same).• Dual port HBAs may have a single WWNN to identify the HBA, but each port will typically have its own WWPN.• However they could also have an independent WWNN per port too.

Design consideration• Only applicable to RDM• VM does not get its own HBA nor FC driver required. It just gets an N-port, so it’s visible from the fabric.• HBA and SAN switch must support NPIV• Cannot perform Storage vMotion or VMotion between datastores when NPIV is enabled. All RDM files must be in the same datastore.

• Still in place in v5

First one is WW Node NameSecond one is WW Port Name

Page 176: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

176 Confidential

Backup: VADP vs Agent-based ESX has 23 VM. Each VM is around 40 GB.

• All VMs are idle, so this CPU/Disk are purely on back up.• CPU Peak is >10 GHz (just above 4 cores)• But Disk Peak is >1.4 Gbps of IO, almost 50% of a 4 Gb HBA.

After VAPD, both CPU and Disk drops to negligible

Page 177: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

177 Confidential

VADP: Adoption Status This is as at June 2010. Always check with vendor for the most accurate data

Partner Name Product Name Version Integration Status

CA ArcServe 12.5 w/patch Released

Commvault Simpana 8.0 SP5 Released

EMC Avamar 5.0 Released

EMC Networker 7.6.x Not yet

HP Data Protector 6.1.1 with patch Not yet

IBM Tivoli Storage Manager 6.2.0 Released

Symantec Backup Exec 2010 Released

Symantec Backup Exec System Recovery 2010 Released

Symantec NetBackup 7.0 Released

Vizioncore vRanger Pro 4.2 Released

Veeam Backup & Replication 4.0 Released

Page 178: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

178 Confidential

Data Recovery No integration with tape

• Can do manual

If a third-party solution is being used to backup the deduplication store, those backups must not run while the Data Recovery service is running. Do not back up the deduplication store without first powering off the Data Recovery Backup Appliance or stopping the datarecovery service using the command service datarecovery stop.

Some limits• 8 concurrent jobs on the appliance at any time (backup & restore).• An appliance can have at the most 2 dedupe store destinations due to the overhead involved in deduping.• VMDK or RDM based deduplication stores of up to 1TB or CIFS based deduplication stores of up to 500GB.• No IPv6 addresses• No multiple backup appliances on a single host.

VDR cannot back up VMs• that are protected by VMware Fault Tolerance.• with 3rd party multi-pathing enabled where shared SCSI buses are in use.• with raw device mapped (RDM) disks in physical compatibility mode.• Data Recovery can back up VMware View linked clones, but they are restored as unlinked clones.

Using Data Recovery to backup Data Recovery backup appliances is not supported. • This should not be an issue. The backup appliance is a stateless device, so there is not the same need to back it up like other types of VMs.

Page 179: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

179 Confidential

2 TB VMDK barrier You need to have > 2 TB disk within a VM.

• There are some solutions, each with pro and cons.• Say you need a 5 TB disk in 1 Windows VM.• RDM (even with physical compatibility) and DirectPath I/O do not increase virtual disk limit.

Solution 1: VMFS or NFS• Create a datastore of 5 TB.• Create 3 VMDK. Present to Windows• Windows then combine the 3 disk into 1 disk.• Limitation

• Certain low level storage-softwares may not work as they need 1 disk (not combined by OS)

Solution 3: iSCSI within the Guest• Configure the iSCSI initiator in Windows• Configure a 5 TB LUN. Present the LUN directly to Windows, bypassing the ESX layer. You can’t monitor it.• By default, it will only have 1 GE. NIC teaming requires driver from Intel. Not sure if this supported.

Page 180: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

180 Confidential

Storage: Queue Depth When should you adjust the queue depth?

• If a VM generates more commands to a LUN than the LUN queue depth; Adjust the device/LUN queue.• Generally with fewer, very high IO VMs on a host, larger queues at the device driver will improve performance.

• If the VM’s queue depth is lower than the HBA’s; Adjust the vmkernel.

Be cautious when setting queue depths• With too large of device queues, the storage array can easily be overwhelmed and its performance may suffer with high latencies.• Device driver queue depths is global and set per LUN setting.

• Change the device queue depth for all ESX hosts in the cluster

Calculating the queue depth:• To verify that you are not exceed the queue depth for an HBA use the following formula:

• Max. queue depth of the HBA = Device queue setting * # of LUNs on HBA

180

Page 181: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

181 Confidential

Sizing the Storage Array

• For RAID 1 (it has IO Penalty of 2)• 60 Drives= ((7000 x 2 x 30%) + (7000 x 70%)) / 150 IOPS

• Why RAID 5 has 4 IO Penalty?

RAID Level IO Penalty

1 2

5 4

6 6

Page 182: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

182 Confidential

Storage: Performance Monitoring Get a baseline of your environment during a “normal” IO time frame.

• Capture as many data points as possible for analysis.• Capture data from the SAN Fabric, the storage array, and the hosts.

Which statistics should be captured• Max and average read/write IOps • Max and average read/write latency (ms)• Max and average Throughput (MB/sec)• Read and write percentages• Random vs. sequential• Capacity – total and used

Page 183: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

183 Confidential

SCSI Architecture Model (SAM)

Page 184: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

184 Confidential

Fibre Channel Multi-Switch Fabric

184

Fabric Switch 1

TR

RC

N_Port 0

Node A

Node D

TR

RC

N_Port 3F_PortRC

TR

F_PortRC

TR TR

RCN_Port 1

TR

RCN_Port 2

Node B

Node C

F_PortRC

TR

F_PortRC

TR

Fabric Switch 2

TR

RC

N_Port 0

Node E

Node F

TR

RC

N_Port 3F_PortRC

TR

F_PortRC

TR TR

RCN_Port 1

TR

RCN_Port 2

Node G

Node H

F_PortRC

TR

F_PortRC

TR

E_P

ortR

C

TRE

_Por

tR

CTR

Page 185: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

185 Confidential

Page 186: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

186 Confidential

Upgrade Best Practices Turn Upgrade into Migrate

• Much lower risk. Ability to roll back and much simpler project.• Fewer stages. 3 stages 1

• Upgrade + New Features + Rearchitecture in 1 clean stage.• Faster overall project• Need to do server tech refresh for older ESXi

Think of both data centers• vCenter 5 can’t linked-mode to vCenter 4.

Involve App Team• Successful upgrade should result in faster performance

Involve Network and Storage team• There cooperation is required to take advantage of vSphere 5

Compare Before and After• …. and document your success!

Page 187: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

187 Confidential

Migrate: Overall Approach Document the Business Drivers and Technical Goals

• Upgrade is not simple. And you’re not doing it for fun • If you are going to support larger VM, you might need to change server

Check compabitility• Array to ESXi 5.

• Is it supported?• You need firmware upgrade to take advantage of new vStorage API

• Backup software to vCenter 5• Products that integrates with vCenter 5

• VMware “integration” products: SRM, View, vCloud Director, vShield, vCenter Heartbeat• Partner integration products: TrendMicro DS, Cisco Nexus• VMware management products, partner management products.• All these products should be upgraded first

Assuming all the above is compatible, proceed to next step Read the Upgrade Guide Plan and Design the new architecture

• Based on vSphere 5 + SRM 5 + vShield 5 + others• Decide which architectural changes you are going to implement. Examples:

• vSwitch to vDS?• Datastore Cluster?• Auto-deploy?• vCenter appliance? Take note the limitation (View, VCM, LinkedMode, etc limitation)

• What improvements are you implementing? Examples:• Datastore clean up or consolidation.• SAN: fabric zoning, multi-pathing, 8 Gb, FCoE• Chargeback? This will impact your design

Page 188: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

188 Confidential

Migrate: Overall Approach Upgrade vCenter Create the first ESXi cluster

• Start with IT cluster

Migrate first 4.x cluster into vCenter 5• 1 cluster at a time.• Follow VM schedule downtime• Capture Before Performance, for comparison or proof.• Back up VM, then migrate.• Once last VM migrated, the hosts are free for reuse or decommissioned.

Repeat until last cluster is migrated Upgrade VM to latest hardware version and upgrade VMware Tools.

Page 189: Private Cloud Sample Architectures based on vSphere  5  platform Singapore, Sep 2011

189 Confidential

New features that impact design New features with major design impact

• Storage Cluster• Auto Deploy

• You need infrastructure to support it• vCenter appliance• VMFS-5

• Larger datastore, so your datastore strategy might change to “less but larger” one.

Other new features can wait after upgrade.• Example, Network IO Control can be turned on after upgrade.