vmware solutions
TRANSCRIPT
© 2014 VMware Inc. All rights reserved.
VMware SolutionsMohamed El ShorbagyCloud Consultant @ eSky IT
2
Agenda
1 eSky IT Profile
2 VMware Vision
3 VMware Solutions
4 VMware vCloud Suite
5 VMware vCenter Operation Manager
6 VMware Health Check Service
History
CONFIDENTIAL 4
History Partners References Services
POCDemo
ConsultancyDesign
DeploymentTrainingSupport
Site Assessment
CONFIDENTIAL 5
The VMware Vision
Empower people and organizations by radically simplifying IT through virtualization software
CONFIDENTIAL 8
The same principles that transformed a single layer of the data center…
and delivered unprecedented value for customers…
What if…
Abstract. Pool. Automate.
were applied to the entire data center?
CONFIDENTIAL 9
Software-Defined Data CenterAll infrastructure is virtualized and delivered as a service, and the control of this data center is entirely automated by software.Abstract. Pool. Automate.
CONFIDENTIAL 10
Data Centers Are Silos
Windows Linux DatabasesMissionCritical
HPC Big Data
CONFIDENTIAL 11
Abstract Pool Automate
Windows Linux DatabasesMissionCritical
HPC Big Data
MGMT
Network/Security
Storage/Availability
Compute
CONFIDENTIAL 12
Software-Defined Data Center
Virtual Data
Center
Virtual Data
Center
Virtual Data
Center
Virtual Data
Center
Virtual Data
Center
Software-Defined Data Center Services
Windows Linux DatabasesMissionCritical
HPC Big Data
Abstract Pool Automate
CONFIDENTIAL 13
A New Standard for Agility
Storage/Availability Servers Networking Security Management/
Monitoring
2008 2012 SDDC
WeeksDays/Hours
Minutes/Seconds
Software-DefinedData Center Services
Virtual Data Center
CONFIDENTIAL 14
Real Business Results: Innovation Velocity
CONFIDENTIAL 15
Two Paths to IT as a Service
Software-Defined Data Center
Virtual
Cloud IT as a Service
Managed Virtualization
CONFIDENTIAL 16
Data Center Virtualization and Cloud
Infrastructure
VMware Solutions
End User Computing
Infrastructure as a Service
Personal Desktop
Network & Security
Management
VMware vSphere Solution
VMware vSphere
• Virtualization– VMware vSphere Hypervisor abstracts traditional physical machine
resources and runs workloads as virtual machines
– Each virtual machine runs a guest operating system and applications
18
Cloud Computing
• IT as a Service (ITaaS)– Abstracts complexity in the enterprise data center
– Achieves economies of scale
– Renews focus on application services• Availability• Security• Scalability
Enterprise Cloud
Cloud OS
Management
19
VMware vCloud Solution
CONFIDENTIAL 26
Automating provisioning reduces IT labor requirements
CONFIDENTIAL 27
Automating provisioning reduces IT labor requirements
CONFIDENTIAL 29
vCloud Architecture
vCenter Server
ESX/ESXi Hosts
vCloud Agent
vCloud Agent
vCloud Agent
vCloud Agent
vCloud Agent
vCloud Agent
Datastores
VMware vSphere
vCenter database
LDAP
VMware vSphere®Web Client™
vCenter Chargeback web interface
vCenter Chargeback database
vCenter Chargeback
vCenter Chargeback server
VMware vCloud Director
vCloud Director cell
vCloud Director database
vCloud Director Web Console
end users and administrators
VMware vCloud® API
vCNS vCloud Networking and security andvCNS Virtual Appliances
Data Collectors
NFS server
vCloud Director cell
load balancer
vCloud Agent
vCloud Connector Virtual Appliance
vCC plug-in
vCloud Connector
CONFIDENTIAL 32
Admin & User UIs Built-in
VMware vCenter Operation Manager
CONFIDENTIAL 46
vSphere has transformed how companies deploy and use IT
Agility. Efficiency. Resiliency.
• How much time before my current capacity runs out?
• Which virtual machines are over-provisioned?
• How can I identify emerging performance issues before they impact the business?
…but new customer challenges arise
CONFIDENTIAL 47
Virtualize Smarter with Insight to Workload Capacity and Health
vSphere vCenter Server
• Capacity planning – know how many days before capacity runs out so IT can continue to be responsive
• Optimize efficiency – know on which virtual machines might be overprovisioned
• Improve performance - faster root cause identification of emerging issues
• Proven virtualization platform – provide availability for your business applications
VMware vSphereThe proven compute virtualization platform
vSphere with Operations Management
• World’s leading virtualization platform
• Insight to workload capacity and health
CONFIDENTIAL 48
Gaining Visibility into Your Workload Capacity and Health
!Problem Maintenance
Slow performance
Identify sourceCorrective action
Current Utilization
Reclaim capacity
Ensure and RestoreService Levels
Optimize forEfficiency and Cost
Future needs
Detect
IsolateRemediate
Analyze
ForecastOptimize
Comprehensive visibility
CONFIDENTIAL 49
vCOPs is built to complement vCenter
Is it healthy = Health
• Workload
• Anomalies
• Faults
Is it enough = Risk
• Time remaining
• Capacity remaining
• Stress period
Is it optimised = Efficiency
• What we can reclaim?
• Density, key ratio!
Daily update at midnight!
Immediate Problems
Future Problems
Opportunities to Optimize
CONFIDENTIAL 50
Bird-eye view
This is a small environment 1 vCenter
1 Datacenter
2 clusters
4 hosts
9 VMs (including off)
2 datastore
CONFIDENTIAL 51
Visibility across vCenters
CONFIDENTIAL 52
Ensuring and Restoring Service Levels
!Problem Maintenance
Slow performance
Identify sourceCorrective action
Current Utilization
Reclaim capacity
Ensure and RestoreService Levels
Optimize forEfficiency and Cost
Future needs
Detect
IsolateRemediate
Analyze
ForecastOptimize
Comprehensive visibility
CONFIDENTIAL 53
Detect: Find the BottlenecksDETECT
REMEDIATE ISOLATE!
CONFIDENTIAL 54
Remediate: Intelligent Tools to Resolve Problems
DETECT
REMEDIATE ISOLATE!
Recommendations on how to fix issues
CONFIDENTIAL 55
Optimizing Your Capacity Efficiency
!Problem Maintenance
Slow performance
Identify sourceCorrective action
Current Utilization
Reclaim capacity
Ensure and RestoreService Levels
Optimize forEfficiency and Cost
Future needs
Detect
IsolateRemediate
Analyze
ForecastOptimize
Comprehensive visibility
CONFIDENTIAL 56
Analyze: Monitor and Plan Capacity UtilizationANALYZE
OPTIMIZE FORECAST
Let’s look at capacity shortfalls
Very low on capacity
CONFIDENTIAL 57
Forecast: “What-If” AnalysisANALYZE
OPTIMIZE FORECAST
Current capacity cross-over point
Actual VMs deployed
VM count capacity
Capacity state today
New capacity shortfall if I add
10 new VMs
CONFIDENTIAL 58
Optimize: View Opportunities to OptimizeANALYZE
OPTIMIZE FORECAST
Let’s look at powered off, idle and oversized
VMs
Reclaimable capacity
CONFIDENTIAL 59
Badges – Health
Answers complex questions like:• How is the entire virtual data center doing?
• For every cluster, host, datastore, what’s their health?
Health is the current operational state• It represents what is wrong now and should be
addressed within 1 day. Thus Health needs to be scored
such that if it’s red, then it really needs attention.
Weather Map
• Simple way to check that entire farm is healthy
• Shows health of all parent and child objects
• Each square can be VM, ESX, datastore, cluster datacenter,
vCenter
Value Explanation
75 – 100 Normal behaviour
50 – 75 The object experience some problems.
25 – 50 The object might have serious problems. Check, and take action as soon as possible
0 – 25 The object is either not functioning properly or will stop functioning soon
CONFIDENTIAL 60
Badges – Workload Answers complex questions like:• For every object how is Demand vs Spply?
• For every single VM, is CPU/Memory/Disk/Network
bound?
• Any VM is not getting what they are entitled/required?
• What’s the normal workload range for every object in
our vDC?
Workload is not utilisation or usage
• More accurate than utilisation as it takes many factors
than just utilisation
Workload = (Demand/Entitlement)
• Entitlement is dynamic. Affected by shares, limit, etc.
• Demand ≠ Usage
• Usage may mean passive usage (RAM page is there but no
write/read at all
• Score is Max(CPU, RAM, Disk IO, Net IO)
Value Explanation
0 – 80 Workload is not high.
80 – 90 The object is experiencing somehigh resource workloads.
90 – 95 Workload on the object isapproaching its capacity in ≥1 areas.
>95 Workload on the object is at or over its capacity in ≥1 areas.
CONFIDENTIAL 61
Badges – Anomalies Answers complex questions like:• Is our vDC doing as usual? Are there any unexpected
changes (as we have dynamic environment)?
• Which VMs, ESX, cluster, datastore etc are behaving
abnormally?
• … and exactly which counters are the culprits?
Identifying metric abnormalities
• It needs to learn dynamic ranges of “Normal” for each
metric, so give it >3 cycle per metric
• A month-end job means it needs 3 months
• Normal range changes after configuration or application
changes
Anomalies score
• High number of anomalies:
• Usually an indication of problem
• Demand change
• Application team changed code/app
• KPI (Key performance Indicator) metrics impacts the
anomalies more than non KPI metrics
Value Explanation
0 – 50 Normal Anomaly range
50 – 75 The score exceeds the normal range.
75 – 90 The score is very high.
> 90Most of the metrics are beyond their thresholds. This object might not be working properly or will stop working soon.
CONFIDENTIAL 62
Badges – Faults Answers complex questions like:• What fault do we experience in our vDC?
• For every object, what faults does it have?
Specific knowledge of which vCenter events
• Which events affect Availability and Performance of
which object?
• Pulled from active vCenter events
• Example:
• Loss of redundancy in NICs or HBAs
• Memory checksum errors
• HA failover problems.
• Each fault has a default score
• Highest individual Fault Score drives the Fault object
score
Best Practices
• Do not change Fault Threshold
• Use Alerts View to manage Faults. You can Filter it to
just show Faults.
Value Explanation
0 – 25 No fault is registered on the object
25 – 50 Faults of low importance happens on object.
50 – 75 Faults of high importance happens on object.
> 75 Faults of critical importance happens on object
CONFIDENTIAL 63
Badges – Risk Answers complex questions like:• Do we have risk from performance or capacity in our
vDC? If yes, where are they and how serious?
• Which objects are at risk? What is the specific risk?
Risk Score takes into account
• Time Remaining
• Capacity Remaining
• Stress
Risk is an early warning system
• Identifies potential problems that could eventually hurt
the performance
• The Risk Chart shows Risk score over the last 7 days,
giving a view of trend
Value Explanation
0 – 50 No problems are expected in the future.
50 – 75 There is a low chance of future problems or a potential problem might occur in the far future.
75 – 100 There is a chance of a more serious problem or a problem might occur in the medium-term future.
100 The chances of a serious future problem are high or a problem might occur in the near future
CONFIDENTIAL 64
Badges – Time remaining Answer complex questions like:• How much time do we have before we need to buy
more server, storage, network before performance
starts to degrade or we run out of capacity?
• For every cluster, VM, datastore, how much time do we
have?
Measures time remaining before each
resource type reaches its capacity• CPU
• Memory
• Disk (IOPS & Space)
• Network I/O
Early warning of upcoming provisioning
needs• Based on Score Provisioning buffer. Default value is 30
days.
• Set in “Capacity & Time Remaining” section
Value Time remaining
50 – 100 > 2x SP Buffer (60 days)
25 – 50 < 2x SP Buffer
<25 Near SP Buffer
0 < SP buffer (30 days)
CONFIDENTIAL 65
Badges – Capacity remaining Answer complex questions like:• How many more VM can we put without impacting
performance or using up capacity?
• For every cluster, VM, datastore, which components (CPU,
RAM, Disk, Network) would run out first?
Early warning system• A low score of 1 mean you still have >30 days.
• Measures how many more VMs can be placed on the
object
Percentage of Total VM “Slots” Remaining• Based on the average size of the VM on the object (e.g.
VM profile)
• Each object has its OWN VM profile size: Host, Cluster,
Datacenter, Etc.
From the table, notice value is not linear
• It is also not the same with Time Remaining threshold.
• A value of 30 means >120 days for capacity but around 40
days for time.
Value Capacity remaining
>10 >120 days
5 – 10 60 – 120 days
2 – 5 30 – 60 days
1 <30 days
CONFIDENTIAL 66
Capacity remaining calculation Determine capacity constraint resources
Deployed or Powered On VMs• Powered off VMs only use disk space resources
• Powered off VMs use ALL of the 4 resources
Calculation example:
• The limit is 40 more VMs
• We have 9 deployed VMs
• 40/(40+9) = 81%
You can drill down to see details
• You can check all 9 components as shown on right
• This helps to answer the question which components have
how many days or VM left
• Summary = min (all 9 components)
CONFIDENTIAL 67
Badges – Stress Answer complex questions like:
• In our vDC, do we have stress points or periods? How bad is it?
• For every cluster, VM, datastore, which ones are experiencing
stress and how bad is it?
Measures long-term or chronic workload (6
weeks)
• Chart shows weeks break down of Stress for each day/hour
averaged over the last 6 Weeks
• Workloads > 70% = “Stressed”
• Threshold Configurable as per screenshot below Value Explanation
0 – 1 Normal score. No action needed
1 – 5 Some of the object resources arenot enough to meet the demands.
5 – 30 The object is experiencing regular resource shortage.
>30Most of the resources on the object are constantly insufficient. The object might stop functioning properly.
CONFIDENTIAL 68
Stress Calculation
Stress Score is a % and is based on area of Workload Above “Stress Line”
Threshold compared to the Total Capacity of the object• Stress Score = (Stress area / Stress Zone) *100
• But max value can be > 100% as the workload can be >100.
Example• Stress Line is 70% Workload
• 12% of the area is above the 70% threshold
• Stress Score is 12
0
100
70
Stress Zone
Workload Line
12%
CONFIDENTIAL 69
Badges – Efficiency Answer complex questions like:
• Are there optimization opportunities in our vDC?
• How well do we do in terms of VM provisioning? Do
we get them right?
Efficiency Score factors
• Reclaimable waste
• Density ratio
Graph Depicts VMs by Percent
• Optimal – Optimally Provisioned VMs
• Waste – Over Provisioned VMs
• Stress – Under Provisioned VMs
• Not used in Efficiency Calculation (see Risk)
Value Explanation
>25 The efficiency is good. The resource use on the selected object is optimal.
10 – 25 The efficiency is good, but can be improved. Some resources are not fully used.
0 – 10 The resources on the selected object are not used in the most optimal way.
0 The efficiency is bad. Many resources are wasted.
CONFIDENTIAL 70
Badges – Reclaimable waste Answer complex questions like:
• Do we over provisioned the VMs in terms of CPU, RAM and
Disk? If yes, what’s the degree of over provisioning?
• For every cluster, VM, datastore, what can we reclaim?
It identifies the amount of reclaimable
resources
• CPU
• Memory
• Disk
Reclaimable Waste = Reclaimable Capacity /
Deployed Capacity
• Waste Score = Max(CPU Waste Score, RAM Waste Score,
Disk Space Waste Score)
• Disk calculation can also include old snapshots and
templates
Value Explanation
0 – 50 No resources are wasted on theselected object.
50 – 75 Some resource can be used better.
75 – 100 Many resources are underused
100 Most of the resources on the selected object are wasted.
CONFIDENTIAL 71
Badges – Density Answer complex questions like:
• How high can we push our consolidation ratio before we experience performance problem?
• Now that’s a million dollar question!
• For every datacenter, cluster, ESXi, what are our key ratios and how much head room do we have?
Contrasts Actual vs Ideal Density
• Identify Optimal Resource Deployment Before Contention Occurs
• Ideal is based on demand, not simple
configuration.
• High Density is good. 100 is not too high.
Value Explanation
>25 Good consolidation
10 – 25 Some resources are not fully consolidated
0 – 10 The consolidation for many resources is low
0 The resource consolidation is extremely low.
CONFIDENTIAL 72
Using badges together
Workload High & Anomalies Low & Stress High
• Workload – Object is Running Hot. Potentially Starving
for Resources
• Anomalies – Normal Behavior for this timeframe
• Stress – Object is often running under high Workload.
Workload High & Anomalies Low & Stress Low
• Workload – Object is Running Hot. Potentially Starving
for Resources
• Anomalies – Normal Behavior for this timeframe
• Stress – Object usually has enough resources
Workload High & Anomalies High
• Workload – Object is Running Hot. Potentially Starving
for Resources
• Anomalies – Abnormal behavior for this timeframe
If there are Alert and Fault too, then it is a sign
of major issue
Add resources
Not likely a big problem…
a cyclical workload spike?
Something is a miss! Immediate attention.
CONFIDENTIAL 73
Quick Comparison: VMware vs Point Solution Competitors
Virtual Environment
Best-of-breed, execution of software defined datacenter
Narrow focus, limited expandability✖
Integrated Performance and Capacity
• Performance• Capacity• vSphere Health
Models
Limited to narrow use cases incomplete visibility
Automated Operations
• Accurate root cause through behavioral analytics
• Dynamic thresholds
• Smart alerts
Leverages only a limited collection of (often misinterpreted) memory & storage metrics
✖
Point Competitors
VMware vSphere® Health Check Service
CONFIDENTIAL 75
Assessment and Health Check Report Standardized assessment
• Virtual datacenter
• VMware ESX®/VMware ESXi™ hosts
• VMware vCenter™ Server and plug-ins
• Networking
• Storage
• Virtual machines
VMware vSphere Health Check Report
• Recommended action items
• Justification for recommendations
• Checklist of assessment performed
• Audited inventory list
What is the optimal
configuration and usage?
How are you doing?
What should you
be doing?
What changes
should be made?
CONFIDENTIAL 76
What Does Your Architecture Look Like?
vCenter Database
ESX/ESXi Host
vCenter Server
Datastores
“Datacenter”
“Cluster”
vCenter Orchestrator vCenter Converter Guided Consolidation Update Manager
vSphere Web Access (Browser)*
Update Manager Database
Datastores
vSphere CLI
*ESX only (not ESXi)
vSphere Client vCenter Converter plug-in Update Manager plug-in
vCenter Server
vCenter Linked Mode
vCenter Database
vSphere Management Assistant (vMA)
vSphere PowerCLI
CONFIDENTIAL 77
Discuss Technical component specifications, configuration, and usage
• Compute resources
• Networking
• Storage
• Virtual datacenter
• Virtual machines
Topics
• Availability
• Manageability
• Performance
• Recoverability
• Security
CONFIDENTIAL 78
VMware Infrastructure / vSphere Topology and Access Have information available for ESX/ESXi and vCenter
• ESX/ESXi hosts
• IP address and host name
• Root login and password
• vCenter Server
• IP address and hostname
• vCenter administrator login and password (or account with vCenter Server Read-Only+License role)
CONFIDENTIAL 79
Follow-Up Interviews and Discussions
Identify key people and schedule follow-up interviews and discussions
• Technical architects
• Administrators
• Operations
• Virtual machine administrators
• Security
• Storage
• Networking
CONFIDENTIAL 80
To Be Delivered – VMware vSphere Health Check Report
Identify report recipients and schedule
Conference call for review
VMware vSphere Health Check Report
• Recommended action items
• Justification for recommendations
• Checklist of assessment performed
• Audited inventory list
CONFIDENTIAL 81
Recommendations
Host Avoid installing additional agents in the service console
HostFor large systems and existing systems with additional agents in the service console, allocate the maximum size for service console memory (800MB) and swap size (1600MB)
HostAutomate the ESX installation and configuration process using a combination of kickstart scripts and host profiles
Host
Avoid logging in to the ESX service console—manage existing ESX hosts like you would VMware vSphere ESXi™ using vCenter Server and VMware vSphere Command-Line Interface (vCLI), VMware vSphere Management Assistant (vMA), or VMware vSphere PowerCLI™
CONFIDENTIAL 82
Recommendations
Network Set 1Gbps physical adaptors to autonegotiation for optimum performance
NetworkChange the default port group security settings ForgedTransmits and MACAddressChange to Reject
NetworkAvoid mixing NICs with different speeds and duplex settings on the same uplink for a port group/dvportgroup
StorageSeparate the space allocations on shared datastores for templates and media/ISOs from virtual machines
CONFIDENTIAL 83
Recommendations
VirtualMachines
Set the memory reservation value for Java-based (JVM) virtual machines to the OS required memory plus the JVM heap size
VirtualDatacenter
Use vCenter Server roles, groups, and permissions to provide appropriate access and authorization for virtual infrastructure administration. Avoid using Windows built-in groups (Administrators)
VirtualMachines
Use as few vCPUs as possible. Do not use virtual SMP if application is single threaded and will not benefit from additional vCPUs
VirtualDatacenter
Set up a redundant service console port group to use a separate vmnic on a separate subnet for improved HA redundancy
101
Questions