vmware solutions

© 2014 VMware Inc. All rights reserved.

VMware SolutionsMohamed El ShorbagyCloud Consultant @ eSky IT

2

Agenda

1 eSky IT Profile

2 VMware Vision

3 VMware Solutions

4 VMware vCloud Suite

5 VMware vCenter Operation Manager

6 VMware Health Check Service

History

CONFIDENTIAL 4

History Partners References Services

POCDemo

ConsultancyDesign

DeploymentTrainingSupport

Site Assessment

CONFIDENTIAL 5

The VMware Vision

Empower people and organizations by radically simplifying IT through virtualization software

CONFIDENTIAL 8

The same principles that transformed a single layer of the data center…

and delivered unprecedented value for customers…

What if…

Abstract. Pool. Automate.

were applied to the entire data center?

CONFIDENTIAL 9

Software-Defined Data CenterAll infrastructure is virtualized and delivered as a service, and the control of this data center is entirely automated by software.Abstract. Pool. Automate.

CONFIDENTIAL 10

Data Centers Are Silos

Windows Linux DatabasesMissionCritical

HPC Big Data

CONFIDENTIAL 11

Abstract Pool Automate


HPC Big Data

MGMT

Network/Security

Storage/Availability

Compute

CONFIDENTIAL 12

Software-Defined Data Center

Virtual Data

Center

Virtual Data

Center

Virtual Data

Center

Virtual Data

Center

Virtual Data

Center

Software-Defined Data Center Services


HPC Big Data

Abstract Pool Automate

CONFIDENTIAL 13

A New Standard for Agility

Storage/Availability Servers Networking Security Management/

Monitoring

2008 2012 SDDC

WeeksDays/Hours

Minutes/Seconds

Software-DefinedData Center Services

Virtual Data Center

CONFIDENTIAL 14

Real Business Results: Innovation Velocity

CONFIDENTIAL 15

Two Paths to IT as a Service

Software-Defined Data Center

Virtual

Cloud IT as a Service

Managed Virtualization

CONFIDENTIAL 16

Data Center Virtualization and Cloud

Infrastructure

VMware Solutions

End User Computing

Infrastructure as a Service

Personal Desktop

Network & Security

Management

VMware vSphere Solution

VMware vSphere

• Virtualization– VMware vSphere Hypervisor abstracts traditional physical machine

resources and runs workloads as virtual machines

– Each virtual machine runs a guest operating system and applications

18

Cloud Computing

• IT as a Service (ITaaS)– Abstracts complexity in the enterprise data center

– Achieves economies of scale

– Renews focus on application services• Availability• Security• Scalability

Enterprise Cloud

Cloud OS

Management

19

VMware vCloud Solution

CONFIDENTIAL 26

Automating provisioning reduces IT labor requirements

CONFIDENTIAL 27

Automating provisioning reduces IT labor requirements

CONFIDENTIAL 29

vCloud Architecture

vCenter Server

ESX/ESXi Hosts

vCloud Agent

vCloud Agent

vCloud Agent

vCloud Agent

vCloud Agent

vCloud Agent

Datastores

VMware vSphere

vCenter database

LDAP

VMware vSphere®Web Client™

vCenter Chargeback web interface

vCenter Chargeback database

vCenter Chargeback

vCenter Chargeback server

VMware vCloud Director

vCloud Director cell

vCloud Director database

vCloud Director Web Console

end users and administrators

VMware vCloud® API

vCNS vCloud Networking and security andvCNS Virtual Appliances

Data Collectors

NFS server

vCloud Director cell

load balancer

vCloud Agent

vCloud Connector Virtual Appliance

vCC plug-in

vCloud Connector

CONFIDENTIAL 32

Admin & User UIs Built-in

VMware vCenter Operation Manager

CONFIDENTIAL 46

vSphere has transformed how companies deploy and use IT

Agility. Efficiency. Resiliency.

• How much time before my current capacity runs out?

• Which virtual machines are over-provisioned?

• How can I identify emerging performance issues before they impact the business?

…but new customer challenges arise

CONFIDENTIAL 47

Virtualize Smarter with Insight to Workload Capacity and Health

vSphere vCenter Server

• Capacity planning – know how many days before capacity runs out so IT can continue to be responsive

• Optimize efficiency – know on which virtual machines might be overprovisioned

• Improve performance - faster root cause identification of emerging issues

• Proven virtualization platform – provide availability for your business applications

VMware vSphereThe proven compute virtualization platform

vSphere with Operations Management

• World’s leading virtualization platform

• Insight to workload capacity and health

CONFIDENTIAL 48

Gaining Visibility into Your Workload Capacity and Health

!Problem Maintenance

Slow performance

Identify sourceCorrective action

Current Utilization

Reclaim capacity

Ensure and RestoreService Levels

Optimize forEfficiency and Cost

Future needs

Detect

IsolateRemediate

Analyze

ForecastOptimize

Comprehensive visibility

CONFIDENTIAL 49

vCOPs is built to complement vCenter

Is it healthy = Health

• Workload

• Anomalies

• Faults

Is it enough = Risk

• Time remaining

• Capacity remaining

• Stress period

Is it optimised = Efficiency

• What we can reclaim?

• Density, key ratio!

Daily update at midnight!

Immediate Problems

Future Problems

Opportunities to Optimize

CONFIDENTIAL 50

Bird-eye view

This is a small environment 1 vCenter

1 Datacenter

2 clusters

4 hosts

9 VMs (including off)

2 datastore

CONFIDENTIAL 51

Visibility across vCenters

CONFIDENTIAL 52

Ensuring and Restoring Service Levels


Slow performance


Current Utilization

Reclaim capacity



Future needs

Detect

IsolateRemediate

Analyze

ForecastOptimize


CONFIDENTIAL 53

Detect: Find the BottlenecksDETECT

REMEDIATE ISOLATE!

CONFIDENTIAL 54

Remediate: Intelligent Tools to Resolve Problems

DETECT

REMEDIATE ISOLATE!

Recommendations on how to fix issues

CONFIDENTIAL 55

Optimizing Your Capacity Efficiency


Slow performance


Current Utilization

Reclaim capacity



Future needs

Detect

IsolateRemediate

Analyze

ForecastOptimize


CONFIDENTIAL 56

Analyze: Monitor and Plan Capacity UtilizationANALYZE

OPTIMIZE FORECAST

Let’s look at capacity shortfalls

Very low on capacity

CONFIDENTIAL 57

Forecast: “What-If” AnalysisANALYZE

OPTIMIZE FORECAST

Current capacity cross-over point

Actual VMs deployed

VM count capacity

Capacity state today

New capacity shortfall if I add

10 new VMs

CONFIDENTIAL 58

Optimize: View Opportunities to OptimizeANALYZE

OPTIMIZE FORECAST

Let’s look at powered off, idle and oversized

VMs

Reclaimable capacity

CONFIDENTIAL 59

Badges – Health

Answers complex questions like:• How is the entire virtual data center doing?

• For every cluster, host, datastore, what’s their health?

Health is the current operational state• It represents what is wrong now and should be

addressed within 1 day. Thus Health needs to be scored

such that if it’s red, then it really needs attention.

Weather Map

• Simple way to check that entire farm is healthy

• Shows health of all parent and child objects

• Each square can be VM, ESX, datastore, cluster datacenter,

vCenter

Value Explanation

75 – 100 Normal behaviour

50 – 75 The object experience some problems.

25 – 50 The object might have serious problems. Check, and take action as soon as possible

0 – 25 The object is either not functioning properly or will stop functioning soon

CONFIDENTIAL 60

Badges – Workload Answers complex questions like:• For every object how is Demand vs Spply?

• For every single VM, is CPU/Memory/Disk/Network

bound?

• Any VM is not getting what they are entitled/required?

• What’s the normal workload range for every object in

our vDC?

Workload is not utilisation or usage

• More accurate than utilisation as it takes many factors

than just utilisation

Workload = (Demand/Entitlement)

• Entitlement is dynamic. Affected by shares, limit, etc.

• Demand ≠ Usage

• Usage may mean passive usage (RAM page is there but no

write/read at all

• Score is Max(CPU, RAM, Disk IO, Net IO)

Value Explanation

0 – 80 Workload is not high.

80 – 90 The object is experiencing somehigh resource workloads.

90 – 95 Workload on the object isapproaching its capacity in ≥1 areas.

>95 Workload on the object is at or over its capacity in ≥1 areas.

CONFIDENTIAL 61

Badges – Anomalies Answers complex questions like:• Is our vDC doing as usual? Are there any unexpected

changes (as we have dynamic environment)?

• Which VMs, ESX, cluster, datastore etc are behaving

abnormally?

• … and exactly which counters are the culprits?

Identifying metric abnormalities

• It needs to learn dynamic ranges of “Normal” for each

metric, so give it >3 cycle per metric

• A month-end job means it needs 3 months

• Normal range changes after configuration or application

changes

Anomalies score

• High number of anomalies:

• Usually an indication of problem

• Demand change

• Application team changed code/app

• KPI (Key performance Indicator) metrics impacts the

anomalies more than non KPI metrics

Value Explanation

0 – 50 Normal Anomaly range

50 – 75 The score exceeds the normal range.

75 – 90 The score is very high.

> 90Most of the metrics are beyond their thresholds. This object might not be working properly or will stop working soon.

CONFIDENTIAL 62

Badges – Faults Answers complex questions like:• What fault do we experience in our vDC?

• For every object, what faults does it have?

Specific knowledge of which vCenter events

• Which events affect Availability and Performance of

which object?

• Pulled from active vCenter events

• Example:

• Loss of redundancy in NICs or HBAs

• Memory checksum errors

• HA failover problems.

• Each fault has a default score

• Highest individual Fault Score drives the Fault object

score

Best Practices

• Do not change Fault Threshold

• Use Alerts View to manage Faults. You can Filter it to

just show Faults.

Value Explanation

0 – 25 No fault is registered on the object

25 – 50 Faults of low importance happens on object.

50 – 75 Faults of high importance happens on object.

> 75 Faults of critical importance happens on object

CONFIDENTIAL 63

Badges – Risk Answers complex questions like:• Do we have risk from performance or capacity in our

vDC? If yes, where are they and how serious?

• Which objects are at risk? What is the specific risk?

Risk Score takes into account

• Time Remaining

• Capacity Remaining

• Stress

Risk is an early warning system

• Identifies potential problems that could eventually hurt

the performance

• The Risk Chart shows Risk score over the last 7 days,

giving a view of trend

Value Explanation

0 – 50 No problems are expected in the future.

50 – 75 There is a low chance of future problems or a potential problem might occur in the far future.

75 – 100 There is a chance of a more serious problem or a problem might occur in the medium-term future.

100 The chances of a serious future problem are high or a problem might occur in the near future

CONFIDENTIAL 64

Badges – Time remaining Answer complex questions like:• How much time do we have before we need to buy

more server, storage, network before performance

starts to degrade or we run out of capacity?

• For every cluster, VM, datastore, how much time do we

have?

Measures time remaining before each

resource type reaches its capacity• CPU

• Memory

• Disk (IOPS & Space)

• Network I/O

Early warning of upcoming provisioning

needs• Based on Score Provisioning buffer. Default value is 30

days.

• Set in “Capacity & Time Remaining” section

Value Time remaining

50 – 100 > 2x SP Buffer (60 days)

25 – 50 < 2x SP Buffer

<25 Near SP Buffer

0 < SP buffer (30 days)

CONFIDENTIAL 65

Badges – Capacity remaining Answer complex questions like:• How many more VM can we put without impacting

performance or using up capacity?

• For every cluster, VM, datastore, which components (CPU,

RAM, Disk, Network) would run out first?

Early warning system• A low score of 1 mean you still have >30 days.

• Measures how many more VMs can be placed on the

object

Percentage of Total VM “Slots” Remaining• Based on the average size of the VM on the object (e.g.

VM profile)

• Each object has its OWN VM profile size: Host, Cluster,

Datacenter, Etc.

From the table, notice value is not linear

• It is also not the same with Time Remaining threshold.

• A value of 30 means >120 days for capacity but around 40

days for time.

Value Capacity remaining

>10 >120 days

5 – 10 60 – 120 days

2 – 5 30 – 60 days

1 <30 days

CONFIDENTIAL 66

Capacity remaining calculation Determine capacity constraint resources

Deployed or Powered On VMs• Powered off VMs only use disk space resources

• Powered off VMs use ALL of the 4 resources

Calculation example:

• The limit is 40 more VMs

• We have 9 deployed VMs

• 40/(40+9) = 81%

You can drill down to see details

• You can check all 9 components as shown on right

• This helps to answer the question which components have

how many days or VM left

• Summary = min (all 9 components)

CONFIDENTIAL 67

Badges – Stress Answer complex questions like:

• In our vDC, do we have stress points or periods? How bad is it?

• For every cluster, VM, datastore, which ones are experiencing

stress and how bad is it?

Measures long-term or chronic workload (6

weeks)

• Chart shows weeks break down of Stress for each day/hour

averaged over the last 6 Weeks

• Workloads > 70% = “Stressed”

• Threshold Configurable as per screenshot below Value Explanation

0 – 1 Normal score. No action needed

1 – 5 Some of the object resources arenot enough to meet the demands.

5 – 30 The object is experiencing regular resource shortage.

>30Most of the resources on the object are constantly insufficient. The object might stop functioning properly.

CONFIDENTIAL 68

Stress Calculation

Stress Score is a % and is based on area of Workload Above “Stress Line”

Threshold compared to the Total Capacity of the object• Stress Score = (Stress area / Stress Zone) *100

• But max value can be > 100% as the workload can be >100.

Example• Stress Line is 70% Workload

• 12% of the area is above the 70% threshold

• Stress Score is 12

0

100

70

Stress Zone

Workload Line

12%

CONFIDENTIAL 69

Badges – Efficiency Answer complex questions like:

• Are there optimization opportunities in our vDC?

• How well do we do in terms of VM provisioning? Do

we get them right?

Efficiency Score factors

• Reclaimable waste

• Density ratio

Graph Depicts VMs by Percent

• Optimal – Optimally Provisioned VMs

• Waste – Over Provisioned VMs

• Stress – Under Provisioned VMs

• Not used in Efficiency Calculation (see Risk)

Value Explanation

>25 The efficiency is good. The resource use on the selected object is optimal.

10 – 25 The efficiency is good, but can be improved. Some resources are not fully used.

0 – 10 The resources on the selected object are not used in the most optimal way.

0 The efficiency is bad. Many resources are wasted.

CONFIDENTIAL 70

Badges – Reclaimable waste Answer complex questions like:

• Do we over provisioned the VMs in terms of CPU, RAM and

Disk? If yes, what’s the degree of over provisioning?

• For every cluster, VM, datastore, what can we reclaim?

It identifies the amount of reclaimable

resources

• CPU

• Memory

• Disk

Reclaimable Waste = Reclaimable Capacity /

Deployed Capacity

• Waste Score = Max(CPU Waste Score, RAM Waste Score,

Disk Space Waste Score)

• Disk calculation can also include old snapshots and

templates

Value Explanation

0 – 50 No resources are wasted on theselected object.

50 – 75 Some resource can be used better.

75 – 100 Many resources are underused

100 Most of the resources on the selected object are wasted.

CONFIDENTIAL 71

Badges – Density Answer complex questions like:

• How high can we push our consolidation ratio before we experience performance problem?

• Now that’s a million dollar question!

• For every datacenter, cluster, ESXi, what are our key ratios and how much head room do we have?

Contrasts Actual vs Ideal Density

• Identify Optimal Resource Deployment Before Contention Occurs

• Ideal is based on demand, not simple

configuration.

• High Density is good. 100 is not too high.

Value Explanation

>25 Good consolidation

10 – 25 Some resources are not fully consolidated

0 – 10 The consolidation for many resources is low

0 The resource consolidation is extremely low.

CONFIDENTIAL 72

Using badges together

Workload High & Anomalies Low & Stress High

• Workload – Object is Running Hot. Potentially Starving

for Resources

• Anomalies – Normal Behavior for this timeframe

• Stress – Object is often running under high Workload.

Workload High & Anomalies Low & Stress Low


for Resources

• Anomalies – Normal Behavior for this timeframe

• Stress – Object usually has enough resources

Workload High & Anomalies High


for Resources

• Anomalies – Abnormal behavior for this timeframe

If there are Alert and Fault too, then it is a sign

of major issue

Add resources

Not likely a big problem…

a cyclical workload spike?

Something is a miss! Immediate attention.

CONFIDENTIAL 73

Quick Comparison: VMware vs Point Solution Competitors

Virtual Environment

Best-of-breed, execution of software defined datacenter

Narrow focus, limited expandability✖

Integrated Performance and Capacity

• Performance• Capacity• vSphere Health

Models

Limited to narrow use cases incomplete visibility

Automated Operations

• Accurate root cause through behavioral analytics

• Dynamic thresholds

• Smart alerts

Leverages only a limited collection of (often misinterpreted) memory & storage metrics

✖

Point Competitors

VMware vSphere® Health Check Service

CONFIDENTIAL 75

Assessment and Health Check Report Standardized assessment

• Virtual datacenter

• VMware ESX®/VMware ESXi™ hosts

• VMware vCenter™ Server and plug-ins

• Networking

• Storage

• Virtual machines

VMware vSphere Health Check Report

• Recommended action items

• Justification for recommendations

• Checklist of assessment performed

• Audited inventory list

What is the optimal

configuration and usage?

How are you doing?

What should you

be doing?

What changes

should be made?

CONFIDENTIAL 76

What Does Your Architecture Look Like?

vCenter Database

ESX/ESXi Host

vCenter Server

Datastores

“Datacenter”

“Cluster”

vCenter Orchestrator vCenter Converter Guided Consolidation Update Manager

vSphere Web Access (Browser)*

Update Manager Database

Datastores

vSphere CLI

*ESX only (not ESXi)

vSphere Client vCenter Converter plug-in Update Manager plug-in

vCenter Server

vCenter Linked Mode

vCenter Database

vSphere Management Assistant (vMA)

vSphere PowerCLI

CONFIDENTIAL 77

Discuss Technical component specifications, configuration, and usage

• Compute resources

• Networking

• Storage

• Virtual datacenter

• Virtual machines

Topics

• Availability

• Manageability

• Performance

• Recoverability

• Security

CONFIDENTIAL 78

VMware Infrastructure / vSphere Topology and Access Have information available for ESX/ESXi and vCenter

• ESX/ESXi hosts

• IP address and host name

• Root login and password

• vCenter Server

• IP address and hostname

• vCenter administrator login and password (or account with vCenter Server Read-Only+License role)

CONFIDENTIAL 79

Follow-Up Interviews and Discussions

Identify key people and schedule follow-up interviews and discussions

• Technical architects

• Administrators

• Operations

• Virtual machine administrators

• Security

• Storage

• Networking

CONFIDENTIAL 80

To Be Delivered – VMware vSphere Health Check Report

Identify report recipients and schedule

Conference call for review

VMware vSphere Health Check Report

• Recommended action items

• Justification for recommendations

• Checklist of assessment performed

• Audited inventory list

CONFIDENTIAL 81

Recommendations

Host Avoid installing additional agents in the service console

HostFor large systems and existing systems with additional agents in the service console, allocate the maximum size for service console memory (800MB) and swap size (1600MB)

HostAutomate the ESX installation and configuration process using a combination of kickstart scripts and host profiles

Host

Avoid logging in to the ESX service console—manage existing ESX hosts like you would VMware vSphere ESXi™ using vCenter Server and VMware vSphere Command-Line Interface (vCLI), VMware vSphere Management Assistant (vMA), or VMware vSphere PowerCLI™

CONFIDENTIAL 82

Recommendations

Network Set 1Gbps physical adaptors to autonegotiation for optimum performance

NetworkChange the default port group security settings ForgedTransmits and MACAddressChange to Reject

NetworkAvoid mixing NICs with different speeds and duplex settings on the same uplink for a port group/dvportgroup

StorageSeparate the space allocations on shared datastores for templates and media/ISOs from virtual machines

CONFIDENTIAL 83

Recommendations

VirtualMachines

Set the memory reservation value for Java-based (JVM) virtual machines to the OS required memory plus the JVM heap size

VirtualDatacenter

Use vCenter Server roles, groups, and permissions to provide appropriate access and authorization for virtual infrastructure administration. Avoid using Windows built-in groups (Administrators)

VirtualMachines

Use as few vCPUs as possible. Do not use virtual SMP if application is single threaded and will not benefit from additional vCPUs

VirtualDatacenter

Set up a redundant service console port group to use a separate vmnic on a separate subnet for improved HA redundancy

101

Questions

Contacts

Mohamed El Shorbagy– Cloud Consultant

– [email protected]

Thank you for your time!

vmware solutions

Documents