lhc2424be 200 to 40,000 vms in 24 months: building … vmware experts •scalable and easier to...

32
Ahmed Abro, Staff Solutions Architect - VMware Tim Jabaut, Staff Solutions Architect - VMware LHC2424BE #VMworld2017 #LHC242BE 200 to 40,000 VMs in 24 Months: Building Highly Scalable SDDC on Hybrid Cloud: Real- World Example VMworld 2017 Content: Not for publication or distribution

Upload: phamminh

Post on 27-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Ahmed Abro, Staff Solutions Architect - VMwareTim Jabaut, Staff Solutions Architect - VMware

LHC2424BE

#VMworld2017 #LHC242BE

200 to 40,000 VMs in 24 Months: Building Highly Scalable SDDC on Hybrid Cloud: Real-World Example

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

2

VMworld 2017 Content: Not fo

r publication or distri

bution

Meet your Speakers

Tim Jabaut

3

VMware Staff Solution Architect

Current Role is as an Embedded Architect with IBM in all of their Repeatable Reference Architecture offerings involving Cloud and VMware.

I currently reside in Raleigh North Carolina with my beautiful wife of 18 years and 2 teenage children that keep us extremely busy with football and soccer.

@timjabaut

[email protected]

Author of two books on SDN, multiple drafts and research papers for IEEE and IETF.

Married since 10 years and have 2 young ones.

@ahmedabro

[email protected]

VMware Staff Solution Architect

Currently embedded with Accenture for VMware solution stack for private & hybrid cloud.

Ahmed Abro

VMworld 2017 Content: Not fo

r publication or distri

bution

Lets set the stage

• What is this session

– Technical walkthrough of a real world hybrid cloud case study

– High level design discussion

– Real world challenges and potential solutions

– Lessons Learned

• What its not

– Sales pitch

– Hybrid cloud training

– Product design

– Hands-on

4

Please hold all questions until the end.VMworld 2017 Content: N

ot for publicatio

n or distribution

Overview

VMworld 2017 Content: Not fo

r publication or distri

bution

Our Customer

Large US Nationwide Health

Insurance Company

6

• Millions of registered policy holders

• Annual Revenue in the billions

• Employees over 100,000

Services

Provide quality health care at a reasonable costto subscribers. Handles the delivery, financingand administration of health care service.

VMworld 2017 Content: Not fo

r publication or distri

bution

Current State

• Aging Vblock infrastructure

• Running out of capacity

• Expansive to refresh and maintain

• Not agile enough

• Hard to scale

7

90% Virtualized

3 Regions, 2 US Data Centers

vSphereSAN

Storage

Vblocks

VMworld 2017 Content: Not fo

r publication or distri

bution

The First Step on the Journey

8

STORAGE

IBM CloudData Center

REPLICATE

BACKUP/ RECOVER

FAILBACK

VIRTUALMACHINES

VMOTION

STORAGE

IBM CloudData Center

VIRTUALMACHINES

SL Private Network

Hybrid Cloud SSO Domain

VIRTUALMACHINES

STORAGE

Legacy vBlockData Center

L2 Bridging

VMOTION

REPLICATE

On-Prem SSO Domain

VMworld 2017 Content: Not fo

r publication or distri

bution

Bluemix Cloud Overview

17

• Streamlines and facilitates VMware

deployments from months to minutes

– Automated approach

• Designed and validated in conjunction

with VMware experts

• Scalable and easier to scale and manage

using existing VMware tools

Physical Infrastructure

Storage Virtualization

Network Virtualization

Compute Virtualization

VMware on IBM Bluemix Cloud

Apps Apps Apps Apps Apps Apps Apps

Management

VMworld 2017 Content: Not fo

r publication or distri

bution

Conceptual Design

VMworld 2017 Content: Not fo

r publication or distri

bution

Extending your data center into the IBM Bluemix Cloud…

20

STORAGE

IBM CloudData Center

REPLICATE

BACKUP/ RECOVER

FAILBACK

VIRTUALMACHINES

VMOTION

STORAGE

IBM CloudData Center

VIRTUALMACHINES

SL Private Network

Hybrid Cloud SSO Domain

VIRTUALMACHINES

STORAGE

Legacy vBlockData Center

L2 Bridging

VMOTION

REPLICATE

On-Prem SSO Domain

VMworld 2017 Content: Not fo

r publication or distri

bution

Datacenter Locations

21

San Jose

Dallas

VMworld 2017 Content: Not fo

r publication or distri

bution

IBM Standard Reference Architecture

• IBM Bluemix Cloud uses a VMware certified hardware BoM that ensures consistency.

• We utilize a modular approach so that we can easily calculate and scale.

• A standard building block of at least (4) Hosts in a “collapsed cluster” model provides for a fully HA Cluster supporting approximately 200 VM’s.

• Conservative Overcommit Ratio

– vCPU:pCPU – 6:1

– vRAM:pRAM – 1.3:1

• Reference VM:

– 2 vCPU

– 8GB RAM

22

ESXi Host Dual Intel Xeon E5-2690 v3 Processor, 12 cores

RAM 512GB

Disk Controller Array Controller: Avago 9361-8i

Boot Disk Storage: 2 (1TB SATA (OS))

Network Quad 10G NICs – RSS & TSO Support

200 VM’sAchieved

VMworld 2017 Content: Not fo

r publication or distri

bution

That feeling when you achieve your goal, only to have the

customer come back with:

”That’s great, now let’s go to 40,000 VM’s in 24 months.”

VMworld 2017 Content: Not fo

r publication or distri

bution

Changing Requirements leads to a change in Approach

VMworld 2017 Content: Not fo

r publication or distri

bution

Deployment Timeline

25

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Number of VM's Deployed

Number of VM's Deployed

Approx 9

Building

Blocks

(~36 Hosts)

1,700 VMs

Per Month

VMworld 2017 Content: Not fo

r publication or distri

bution

Lets do the math

Production

13,000

Test/Dev/

QA

12,000

DR

13,000

On-Prem

Staging

2,00040,000

VMworld 2017 Content: Not fo

r publication or distri

bution

Workload Tiering Chart

Tiers Application Class Total Workloads Tier Detail

Tier-0 Mission Critical 2000 0 Data Loss – Synchronous

Replication/Clustering

Prod Hi- Tier Business Critical 4000 RPO 1hr/RTO 8hr

vSphere Replication/SRM

Prod Low- Tier Secondary

Applications

9000 RPO 24hr/RTO 72hr

Array Based Mirroring

Non-PROD No Tiering

DEV/TEST/QA

12000 No BC/DR

Total 27000

+

13000 DR WorkloadsVMworld 2017 Content: N

ot for publicatio

n or distribution

Extending your data center into the IBM Bluemix Cloud…

28

VIRTUALMACHINES

STORAGE

Bluemix On-PremData Center

STORAGE

IBM CloudData Center

San Jose

REPLICATE

BACKUP/ RECOVER

FAILBACK

Secure Dedicated Link

VIRTUALMACHINES

VMOTION

STORAGE

IBM CloudData Center

Dallas

VIRTUALMACHINES

IBM Private Network

Hybrid Cloud SSO DomainCross-vC NSX Domain

VIRTUALMACHINES

STORAGE

Legacy VblockData Center

L2

VMOTION

RELOCATE

Vblock SSO Domain

On Prem

VMworld 2017 Content: Not fo

r publication or distri

bution

Logical Workload Breakdown

30

OnP

rem

-vC

1

2000 Mixed Tier

vC

1

3000 PROD-LowTier

1000 NON-PROD

vC

2

2000 PROD Hi-Tier

3000 PROD-LowTier

1000 NON-PROD

vC

3

2000 PROD Hi-Tier

3000 PROD-LowTier

1000 NON-PROD

DR

1

2000 PROD Hi-Tier

4500 PROD-LowTier

4500 NON-PROD

DR

2

2000 PROD Hi-Tier

4500 PROD-LowTier

4500 NON-PROD

• PROD-HiTier workloads rely on SRM/vSphere Replication. This imposes a 2000 VM limit per vCenter

• PROD-LowTier will utilize Endurance Storage Mirroring to satisfy RPO/RTO

• NON-PROD has no BC/DR component

VMworld 2017 Content: Not fo

r publication or distri

bution

Migrate Workloads to the Cloud

Migration from the Vblock Environment to the

On-Prem Staging Environment will be

accomplished by vMotion and svMotion.

When placed in the Staging Environment,

VM’s need to be Replicated to the Cloud

Environment.

As we procure hosts over time, we rebalance

workloads to keep even asset utilization for

proper capacity planning.

Additionally Site Recovery Manager can

orchestrate the replication and migration into

the Cloud Environment.

31

Vblock

On-Prem

Production

Staging

On-Prem

2000 VMsProduction

2000 VMs

SJC Cloud

DC

vSphere

ReplicationvMotion

Staging

VMworld 2017 Content: Not fo

r publication or distri

bution

Moving Workloads Within the Cloud

Workloads can be migrated from Staging On-

Prem to SJC DC

Workloads can be vMotioned between Cloud

DCs (This is how we balance workloads)

Replication is not enabled between staging

and DAL DC

32

Staging

On-Prem

Production

2000 VMs

SJC Cloud

DC

Staging Production

DAL Cloud

DC

16,000 VMs

16,000 VMs

Secure Dedicated Link IBM Private Network

VMworld 2017 Content: Not fo

r publication or distri

bution

Component Level Design

33

SJC DC DAL DC

SRM Protected Site (SJC) SRM Recovery Site (DAL)

vCenter Wrk-

SJC-01

Endurance

Storage

Endurance

Storage

On-Prem

NSX

Secondary

vCenter Wrk-

SJC-02

NSX

Secondary

vCenter Wrk-

SJC-03

NSX

Secondary

Mirror

vCenter Wrk-

DAL-02

NSX

Secondary

vCenter Mgmt-

On-Prem-01

NSX Manager

Wrk-On-Prem-01

(Primary)

On-PremMgmt Cluster (4 Hosts)

Hybrid Cloud SSO Domain

Management Cluster (4 Hosts)Management Cluster (4 Hosts)

On-Prem Compute Cluster 1 (18 Hosts)

SAN

Storage

Compute Cluster Compute Cluster

Compute Cluster Compute Cluster

vCenter Wrk-

DAL-01NSX

SecondaryvCenter Wrk-

On-Prem-01

On-Prem Compute Cluster 2 (18 Hosts)

Universal NSX

Controllers

vCenter Mgmt-

SJC-01

vCenter Mgmt-

DAL-01

Rep App

Rep App

Rep App

Rep App

Rep App

Rep AppVMworld 2017 Content: N

ot for publicatio

n or distribution

Physical Network to Hybrid Cloud

VXLAN

Private VDS

Mgmt

eth-2eth-0

VLANs Trunked to

eth-0& eth2

Backup

ESXi Host

eth-2eth-0

eth-1 eth-3

VLAN 1140 | ESXi MGMT & vMotion VMKPortable Subnet 10.255.248.48/26

VLAN 1324 | vSAN VMK VLANPortable Subnet 10.255.248.64/26

VLAN 1452 | VXLAN VTEP VMK Portable Subnet 10.255.248.80/26

VLANs Trunked to

eth-0& eth2

Bare Metal ESXi Host

External VLAN

External

Public VDSVXLAN Native

Private VDSvMotion-FT &

StorageMgmt

BCS/BAS

Backend Customer Routed Network

vMotion-FT

CE

On-Premise

Private VLAN Boundary

SoftLayer POD

VLAN 301 | MGMT Services VM s10.255204.0 /24

VLAN 304 | VXLAN VTEP VMK 10.255.253.0/ 24

VLAN 240 | vMotion VMK 10.255.91.0 /25

VLAN 303 | ESXi MGMT VMK 10.255.141.0 /25

MGMT Services

VLAN 1032 | Backup10.0.117.0 /24

VLAN 234 | 2nd Customer VLAN (Spare)10.255.240.0 /24

2nd VLAN

BCR

MBR

Custer Assigned Subnet

SL PODInfrastructure

DAR

SL Backbone

BBR XCR

NPOP

SL Global VRF Routed

Internal SL IP s

Other SL PODs SL Assigned IP Address

SL Direct Link

VLANsVXLANs

eth-1 eth-3

13

VMworld 2017 Content: Not fo

r publication or distri

bution

Multisite Bluemix Networking for Hybrid Cloud

Bluemix – SJCOn-Prem

NSX ESG

Web Web

App - VXLAN 900020 - 192.168.20.0/24

DB - VXLAN 900030 - 192.168.30.0/24

App

DB

Universal Distributed Logical Router (UDLR)

Web - VXLAN 900010 - 192.168.10.0/24

NSX ESG

XCRCE BBR DAR MBR BCR10G Circuit

Universal Transit VXLAN Uplinks

PSCPSC

VMworld 2017 Content: Not fo

r publication or distri

bution

Logical Network for Bluemix DR

36

Universal DLR

DBWeb Web App DB

Universal Logical Switch

U-DFW

Universal Control VM

Universal Control VM

Active N-S Stand-by N-S

App

U-DFW

U-DFWWeb U-LS

App U-LS

DB U-LS

U-DFW

U-DFW

U-DFW

SRM SRM

U-DLR Control VM

Allow prefix list:

Web, App, DB Subnet

U-DLR Control VM

Deny prefix list:

Web, App, DB Subnet

RecoveryProtected

VMworld 2017 Content: Not fo

r publication or distri

bution

Execution

VMworld 2017 Content: Not fo

r publication or distri

bution

Scalable Hybrid Cloud Maturity Model

40

Automate it

Scale & Protect it

Build itvSphere, NSX, IBM Endurance

vSphere Replication, SRM

vRealize Automation & Orchestration

VMworld 2017 Content: Not fo

r publication or distri

bution

Lesson Learned

• Lack of overall vision led to changing solutions on the fly

• Lack of complete requirements led to a lost time and productivity to make a complete solution

• Customers requirements put substantial constraints on the resulting design. Not always the best approach, but a valid approach none-the-less

• Stable Common Services cannot be overlooked. DNS, NTP, Certificate Services need to be consistent, reliable and stable

• Need careful planning especially with BYOIP

VMworld 2017 Content: Not fo

r publication or distri

bution

Just Announced VMware HCX

43

• Extend your Datacenter into the cloud

• Ability to Migrate VM’s of differing versions

• DR to the Cloud

For more information goto: https://cloud.vmware.com/vmware-hcxVMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution