lhc2424be 200 to 40,000 vms in 24 months: building … vmware experts •scalable and easier to...
TRANSCRIPT
Ahmed Abro, Staff Solutions Architect - VMwareTim Jabaut, Staff Solutions Architect - VMware
LHC2424BE
#VMworld2017 #LHC242BE
200 to 40,000 VMs in 24 Months: Building Highly Scalable SDDC on Hybrid Cloud: Real-World Example
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
2
VMworld 2017 Content: Not fo
r publication or distri
bution
Meet your Speakers
Tim Jabaut
3
VMware Staff Solution Architect
Current Role is as an Embedded Architect with IBM in all of their Repeatable Reference Architecture offerings involving Cloud and VMware.
I currently reside in Raleigh North Carolina with my beautiful wife of 18 years and 2 teenage children that keep us extremely busy with football and soccer.
@timjabaut
Author of two books on SDN, multiple drafts and research papers for IEEE and IETF.
Married since 10 years and have 2 young ones.
@ahmedabro
VMware Staff Solution Architect
Currently embedded with Accenture for VMware solution stack for private & hybrid cloud.
Ahmed Abro
VMworld 2017 Content: Not fo
r publication or distri
bution
Lets set the stage
• What is this session
– Technical walkthrough of a real world hybrid cloud case study
– High level design discussion
– Real world challenges and potential solutions
– Lessons Learned
• What its not
– Sales pitch
– Hybrid cloud training
– Product design
– Hands-on
4
Please hold all questions until the end.VMworld 2017 Content: N
ot for publicatio
n or distribution
Our Customer
Large US Nationwide Health
Insurance Company
6
• Millions of registered policy holders
• Annual Revenue in the billions
• Employees over 100,000
Services
Provide quality health care at a reasonable costto subscribers. Handles the delivery, financingand administration of health care service.
VMworld 2017 Content: Not fo
r publication or distri
bution
Current State
• Aging Vblock infrastructure
• Running out of capacity
• Expansive to refresh and maintain
• Not agile enough
• Hard to scale
7
90% Virtualized
3 Regions, 2 US Data Centers
vSphereSAN
Storage
Vblocks
VMworld 2017 Content: Not fo
r publication or distri
bution
The First Step on the Journey
8
STORAGE
IBM CloudData Center
REPLICATE
BACKUP/ RECOVER
FAILBACK
VIRTUALMACHINES
VMOTION
STORAGE
IBM CloudData Center
VIRTUALMACHINES
SL Private Network
Hybrid Cloud SSO Domain
VIRTUALMACHINES
STORAGE
Legacy vBlockData Center
L2 Bridging
VMOTION
REPLICATE
On-Prem SSO Domain
VMworld 2017 Content: Not fo
r publication or distri
bution
Bluemix Cloud Overview
17
• Streamlines and facilitates VMware
deployments from months to minutes
– Automated approach
• Designed and validated in conjunction
with VMware experts
• Scalable and easier to scale and manage
using existing VMware tools
Physical Infrastructure
Storage Virtualization
Network Virtualization
Compute Virtualization
VMware on IBM Bluemix Cloud
Apps Apps Apps Apps Apps Apps Apps
Management
VMworld 2017 Content: Not fo
r publication or distri
bution
Extending your data center into the IBM Bluemix Cloud…
20
STORAGE
IBM CloudData Center
REPLICATE
BACKUP/ RECOVER
FAILBACK
VIRTUALMACHINES
VMOTION
STORAGE
IBM CloudData Center
VIRTUALMACHINES
SL Private Network
Hybrid Cloud SSO Domain
VIRTUALMACHINES
STORAGE
Legacy vBlockData Center
L2 Bridging
VMOTION
REPLICATE
On-Prem SSO Domain
VMworld 2017 Content: Not fo
r publication or distri
bution
IBM Standard Reference Architecture
• IBM Bluemix Cloud uses a VMware certified hardware BoM that ensures consistency.
• We utilize a modular approach so that we can easily calculate and scale.
• A standard building block of at least (4) Hosts in a “collapsed cluster” model provides for a fully HA Cluster supporting approximately 200 VM’s.
• Conservative Overcommit Ratio
– vCPU:pCPU – 6:1
– vRAM:pRAM – 1.3:1
• Reference VM:
– 2 vCPU
– 8GB RAM
22
ESXi Host Dual Intel Xeon E5-2690 v3 Processor, 12 cores
RAM 512GB
Disk Controller Array Controller: Avago 9361-8i
Boot Disk Storage: 2 (1TB SATA (OS))
Network Quad 10G NICs – RSS & TSO Support
200 VM’sAchieved
VMworld 2017 Content: Not fo
r publication or distri
bution
That feeling when you achieve your goal, only to have the
customer come back with:
”That’s great, now let’s go to 40,000 VM’s in 24 months.”
VMworld 2017 Content: Not fo
r publication or distri
bution
Changing Requirements leads to a change in Approach
VMworld 2017 Content: Not fo
r publication or distri
bution
Deployment Timeline
25
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Number of VM's Deployed
Number of VM's Deployed
Approx 9
Building
Blocks
(~36 Hosts)
1,700 VMs
Per Month
VMworld 2017 Content: Not fo
r publication or distri
bution
Lets do the math
Production
13,000
Test/Dev/
QA
12,000
DR
13,000
On-Prem
Staging
2,00040,000
VMworld 2017 Content: Not fo
r publication or distri
bution
Workload Tiering Chart
Tiers Application Class Total Workloads Tier Detail
Tier-0 Mission Critical 2000 0 Data Loss – Synchronous
Replication/Clustering
Prod Hi- Tier Business Critical 4000 RPO 1hr/RTO 8hr
vSphere Replication/SRM
Prod Low- Tier Secondary
Applications
9000 RPO 24hr/RTO 72hr
Array Based Mirroring
Non-PROD No Tiering
DEV/TEST/QA
12000 No BC/DR
Total 27000
+
13000 DR WorkloadsVMworld 2017 Content: N
ot for publicatio
n or distribution
Extending your data center into the IBM Bluemix Cloud…
28
VIRTUALMACHINES
STORAGE
Bluemix On-PremData Center
STORAGE
IBM CloudData Center
San Jose
REPLICATE
BACKUP/ RECOVER
FAILBACK
Secure Dedicated Link
VIRTUALMACHINES
VMOTION
STORAGE
IBM CloudData Center
Dallas
VIRTUALMACHINES
IBM Private Network
Hybrid Cloud SSO DomainCross-vC NSX Domain
VIRTUALMACHINES
STORAGE
Legacy VblockData Center
L2
VMOTION
RELOCATE
Vblock SSO Domain
On Prem
VMworld 2017 Content: Not fo
r publication or distri
bution
Logical Workload Breakdown
30
OnP
rem
-vC
1
2000 Mixed Tier
vC
1
3000 PROD-LowTier
1000 NON-PROD
vC
2
2000 PROD Hi-Tier
3000 PROD-LowTier
1000 NON-PROD
vC
3
2000 PROD Hi-Tier
3000 PROD-LowTier
1000 NON-PROD
DR
1
2000 PROD Hi-Tier
4500 PROD-LowTier
4500 NON-PROD
DR
2
2000 PROD Hi-Tier
4500 PROD-LowTier
4500 NON-PROD
• PROD-HiTier workloads rely on SRM/vSphere Replication. This imposes a 2000 VM limit per vCenter
• PROD-LowTier will utilize Endurance Storage Mirroring to satisfy RPO/RTO
• NON-PROD has no BC/DR component
VMworld 2017 Content: Not fo
r publication or distri
bution
Migrate Workloads to the Cloud
Migration from the Vblock Environment to the
On-Prem Staging Environment will be
accomplished by vMotion and svMotion.
When placed in the Staging Environment,
VM’s need to be Replicated to the Cloud
Environment.
As we procure hosts over time, we rebalance
workloads to keep even asset utilization for
proper capacity planning.
Additionally Site Recovery Manager can
orchestrate the replication and migration into
the Cloud Environment.
31
Vblock
On-Prem
Production
Staging
On-Prem
2000 VMsProduction
2000 VMs
SJC Cloud
DC
vSphere
ReplicationvMotion
Staging
VMworld 2017 Content: Not fo
r publication or distri
bution
Moving Workloads Within the Cloud
Workloads can be migrated from Staging On-
Prem to SJC DC
Workloads can be vMotioned between Cloud
DCs (This is how we balance workloads)
Replication is not enabled between staging
and DAL DC
32
Staging
On-Prem
Production
2000 VMs
SJC Cloud
DC
Staging Production
DAL Cloud
DC
16,000 VMs
16,000 VMs
Secure Dedicated Link IBM Private Network
VMworld 2017 Content: Not fo
r publication or distri
bution
Component Level Design
33
SJC DC DAL DC
SRM Protected Site (SJC) SRM Recovery Site (DAL)
vCenter Wrk-
SJC-01
Endurance
Storage
Endurance
Storage
On-Prem
NSX
Secondary
vCenter Wrk-
SJC-02
NSX
Secondary
vCenter Wrk-
SJC-03
NSX
Secondary
Mirror
vCenter Wrk-
DAL-02
NSX
Secondary
vCenter Mgmt-
On-Prem-01
NSX Manager
Wrk-On-Prem-01
(Primary)
On-PremMgmt Cluster (4 Hosts)
Hybrid Cloud SSO Domain
Management Cluster (4 Hosts)Management Cluster (4 Hosts)
On-Prem Compute Cluster 1 (18 Hosts)
SAN
Storage
Compute Cluster Compute Cluster
Compute Cluster Compute Cluster
vCenter Wrk-
DAL-01NSX
SecondaryvCenter Wrk-
On-Prem-01
On-Prem Compute Cluster 2 (18 Hosts)
Universal NSX
Controllers
vCenter Mgmt-
SJC-01
vCenter Mgmt-
DAL-01
Rep App
Rep App
Rep App
Rep App
Rep App
Rep AppVMworld 2017 Content: N
ot for publicatio
n or distribution
Physical Network to Hybrid Cloud
VXLAN
Private VDS
Mgmt
eth-2eth-0
VLANs Trunked to
eth-0& eth2
Backup
ESXi Host
eth-2eth-0
eth-1 eth-3
VLAN 1140 | ESXi MGMT & vMotion VMKPortable Subnet 10.255.248.48/26
VLAN 1324 | vSAN VMK VLANPortable Subnet 10.255.248.64/26
VLAN 1452 | VXLAN VTEP VMK Portable Subnet 10.255.248.80/26
VLANs Trunked to
eth-0& eth2
Bare Metal ESXi Host
External VLAN
External
Public VDSVXLAN Native
Private VDSvMotion-FT &
StorageMgmt
BCS/BAS
Backend Customer Routed Network
vMotion-FT
CE
On-Premise
Private VLAN Boundary
SoftLayer POD
VLAN 301 | MGMT Services VM s10.255204.0 /24
VLAN 304 | VXLAN VTEP VMK 10.255.253.0/ 24
VLAN 240 | vMotion VMK 10.255.91.0 /25
VLAN 303 | ESXi MGMT VMK 10.255.141.0 /25
MGMT Services
VLAN 1032 | Backup10.0.117.0 /24
VLAN 234 | 2nd Customer VLAN (Spare)10.255.240.0 /24
2nd VLAN
BCR
MBR
Custer Assigned Subnet
SL PODInfrastructure
DAR
SL Backbone
BBR XCR
NPOP
SL Global VRF Routed
Internal SL IP s
Other SL PODs SL Assigned IP Address
SL Direct Link
VLANsVXLANs
eth-1 eth-3
13
VMworld 2017 Content: Not fo
r publication or distri
bution
Multisite Bluemix Networking for Hybrid Cloud
Bluemix – SJCOn-Prem
NSX ESG
Web Web
App - VXLAN 900020 - 192.168.20.0/24
DB - VXLAN 900030 - 192.168.30.0/24
App
DB
Universal Distributed Logical Router (UDLR)
Web - VXLAN 900010 - 192.168.10.0/24
NSX ESG
XCRCE BBR DAR MBR BCR10G Circuit
Universal Transit VXLAN Uplinks
PSCPSC
VMworld 2017 Content: Not fo
r publication or distri
bution
Logical Network for Bluemix DR
36
Universal DLR
DBWeb Web App DB
Universal Logical Switch
U-DFW
Universal Control VM
Universal Control VM
Active N-S Stand-by N-S
App
U-DFW
U-DFWWeb U-LS
App U-LS
DB U-LS
U-DFW
U-DFW
U-DFW
SRM SRM
U-DLR Control VM
Allow prefix list:
Web, App, DB Subnet
U-DLR Control VM
Deny prefix list:
Web, App, DB Subnet
RecoveryProtected
VMworld 2017 Content: Not fo
r publication or distri
bution
Scalable Hybrid Cloud Maturity Model
40
Automate it
Scale & Protect it
Build itvSphere, NSX, IBM Endurance
vSphere Replication, SRM
vRealize Automation & Orchestration
VMworld 2017 Content: Not fo
r publication or distri
bution
Lesson Learned
• Lack of overall vision led to changing solutions on the fly
• Lack of complete requirements led to a lost time and productivity to make a complete solution
• Customers requirements put substantial constraints on the resulting design. Not always the best approach, but a valid approach none-the-less
• Stable Common Services cannot be overlooked. DNS, NTP, Certificate Services need to be consistent, reliable and stable
• Need careful planning especially with BYOIP
VMworld 2017 Content: Not fo
r publication or distri
bution
Just Announced VMware HCX
43
• Extend your Datacenter into the cloud
• Ability to Migrate VM’s of differing versions
• DR to the Cloud
For more information goto: https://cloud.vmware.com/vmware-hcxVMworld 2017 Content: Not fo
r publication or distri
bution