solutions for server, san & storage optimization may 16, 2012
TRANSCRIPT
Infrastructure Optimization
Solutions for Server, SAN & Storage Optimization
May 16, 2012
Agenda
About Virtual Instruments
Your IT Infrastructure Challenges
Solutions to Your Challenges & Use Cases
Key Benefits of Virtual Instrument’s Solutions
Customer Examples
Infrastructure Maturity Model
Summary & Next Steps
Cloud challenges
VirtualWisdom 3.x, ProbeFC8, licensing, etc.
• Founded June 2008 via Finisar spinout
• Leader in SAN & Virtual Infrastructure Optimization
• Privately-held
San Jose, CA Headquarters
• Key relationships:
• ~140 employees, growing ~200%
Virtual Instruments Intro
Virtual Instruments ensures the performance and availability of mission-critical applications by providing a comprehensive infrastructure optimization platform.
“You can’t optimize what you can’t measure”
Virtual Instruments Mission Statement
Virtual Instruments Kudos
VirtualWisdom carries out analysis of SAN I-O traffic in real time - and can do this in relation to individual business critical applications running within a virtualized setup. This cross-domain, vendor-neutral, view facilitates rapid and precise performance tuning and fault diagnosis - such tools should be a must.
“By analyzing SAN traffic, VirtualWisdom makes it easier both to locate the real source of a problem, and to optimize the SAN for best application and cloud performance”
“IT pros should consider the VirtualWisdom SAN optimization solution as an integral part of their cost optimization and infrastructure modernization strategy”
“Application owners, virtualization admins, and storage admins can all benefit from the holistic approach to performance optimization provided by Virtual Instruments, and return real capital savings back to the business.”The
Virtualization Practice
“VirtualWisdom is the only solution that can give a real time, comprehensive, and deterministic view of every transaction that is flowing through your FC SAN and how those transaction times are impacting the performance of physical and virtual workloads.”
The Industry Challenge
“The Perfect Storm”
HealthcareFinance/Insurance Services
Retail & E-commerce ManufacturingGovernment
Leaders Deploying Virtual Instruments
Finance & Insurance
ServicesHealthcare/Pharma
Retail & E-commerce ManufacturingGovernment
Customers Who Have Deployed Virtual Instruments
Common Large-scale SAN Challenges
• Explaining/avoiding application outages & slowdowns
• Identifying SAN problems
• Identifying physical layer problems
• Reducing vendor finger-pointing
• Tracking SLAs & compliance
• Over-provisioning and consolidation
• Storage tiering
• Environmental costs (avoiding new data centers)
• Capacity planning
• Containing rising costs of storage/SAN w/ flat budget
What are YOUR Major Challenges?
What are YOUR Major Challenges?
• Explaining/avoiding application outages & slowdowns
• Increasing server consolidation ratios
• Reducing vendor finger-pointing
• Tracking SLAs & compliance
• I/O subsystem troubleshooting
• Deploying more mission critical applications
• Showing adherence to performance standards
• Isolating workload peaks that cause resource conflicts and bottlenecks
Common Virtual Infrastructure Challenges
Common IT Challenges
Control exploding data growth and costs
Increase use of server virtualization
Manage risk & eliminate application and outages
Bring new mission critical projects online faster with reduced risk
Make progress on ‘green’ initiatives
What are YOUR Major Challenges?
Common Private Cloud Challenges
Increasing server & storage virtualization limits visibility, increases complexity
Data growth >50%/yr – leading to heavy investments in SAN infrastructure
Additional physical infrastructure increases OPEX & business risk; results in over-provisioning
Heterogeneity increases the visibility challenge
What are YOUR Major Challenges?
I/O
I/O
Why Does Virtual Instruments Exist?
1. The FC SAN has lacked any real I/O systems-level performance
◦ Lacks self-health, diagnostics and full path transparency
FC Fabric
There’s a “perfect storm” in data management today…
Servers & Virtual Machines
StorageArrays
SAN Cloud
Servers & Virtual Machines
Why Does Virtual Instruments Exist?
1. The SAN has lacked any real I/O systems-level performance
2. Data growth at an unprecedented rate (average 30-60% CAGR)
◦ A 200TB shop in ‘11 growing 50% will be about 1.5 PB in 5 years
There’s a “perfect storm” in data management today…
SAN Cloud
Why Does Virtual Instruments Exist?
1. The SAN has been a “black box”, lacking any real I/O systems-level performance, so it’s heavily over-provisioned as a result
2. Data growth at an unprecedented rate (average 30-60% CAGR)
3. More “abstraction” being added
◦ Further limits I/O visibility
◦ Challenges performance
◦ Slows cloud deployment
There’s a “perfect storm” in data management today…
Virtual Server Cloud
SAN Cloud
Storage Virtualization Cloud
Identify / fix infrastructure problems before they occur
Ensure no loss of revenue/ productivityReduce Risk
Optimize IT asset utilization and personnelReduce Costs
Align to business dynamicsImprove
Performance
What if There Was a Way to…
Create “Predictability”
Guests
OS
APP
OS
APP OS
APP
OS
APP OS
APP
OS
APP
Why Does Virtual Instruments Exist?
There’s a “perfect solution” for performance management…
SAN FabricSAN Availability
Probe
Virtual
Server
Probe
SANInsight (TAP)
Storage Virtualization
Server Cluster
SAN Performance Probe
OS
APP
OS
APPOS
APP
OS
APP
Virtual Machines
Lack of visibility in the SAN leads to:◦ Over-provisioning◦ Inability to optimize system performance◦ Inability to diagnose problems◦ Wasted OPEX & CAPEX Your dashboard to see
through the clouds!
Instrumentation is Key
ServersAnd VMs
FC Switches
StorageArrays
SAN Availability Probe
TAPs
Virtual Server Probe
VirtualWisdom Deployment
SAN AvailabilityProbe
TAPsSAN PerformanceProbe
Rovers Virtual Server Probe
Rover
SAN Performance Probe
Traffic Access Point (TAP) Patch Panel System
VirtualWisdom Server with easy to use dashboard & GUI
Monitors SAN switches via SNMP for utilization info & error conditions
Collects data on VMs from VMware’s vCenter
Non-intrusively copies FC frames: a “light splitter”
Enables periodic sampling for lower cost monitoring
Collects FC frame header data and key metrics in real-time, “out-of-band”
VirtualWisdom Architecture
OS
APP
OS
APP
OS
APPOS
APP
OS
APP
OS
APP
VirtualWisdom Deployment
SAN Availability Probe
TAPs (hardware)
SAN Performance Probe
Virtual Server Probe
• Reports on link failure, link errors, dropped frames
• Immediate root cause detection• Speed problem resolution• Prevent outages and
performance problems• Historical trending analysis• Find multi-path failures• Identifies over-provisioned links
• Latency and response times• Reduce number of array ports• Improve performance • Queue depth performance• Degraded device metrics• Optimize storage tiering• Quickly identify root cause• Reduce power, space req’ments• “What if” modeling
• Collects over 120 vCenter metrics
• Reduces the risk of implementing mission-critical applications on virtual servers
• Early detection of slowdowns
• Enables workload balancing
• Increases consolidation ratios
Improve AvailabilityFaster TroubleshootingImprove SAN Utilization
Optimize Tier 1 App Performance
Optimize Storage Tiering
Improve VM DensityIncrease Use of Server
Virtualization
SAN Availability Probe SAN Performance Probe Virtual Server Probe
Value of VirtualWisdom Probes
Deployment Options and Use Cases
SAN1. SAN / app problem diagnosis – fast troubleshooting1
2. SAN / app problem diagnosis - proactive optimization1
3. Reduce trouble tickets through proactive monitoring1
4. Optimize storage tiering1
5. Reducing OPEX through better tiering1
6. Proving Tier II storage is not to blame1
7. Multi-pathing and HBA utilization1
8. Identify multi-pathing risks1
9. Using historical reports, find slow draining devices – B2B1
10. ID effect of SAN changes/ plan network capacity needs1
11. How to measure latency to find slowdown cause1
12. Drill down to identify physical layer problems1
13. Storage owner view1
14. Prevent problems finding excessive SCSI reservations1
15. Reducing CAPEX by reducing port over-provisioning1
16. De-risk storage consolidation1
17. VMware and SAN admins working together: queue depths effect on latency 1
18. Correcting SAN switch oversubscription ratios1
19. Performance optimization with the IBM SVC1
20. Troubleshooting virtualized storage (HP XP and EVA)1
21. VSP consolidation opportunity (what if) 1
22. Server owner view1
23. Improve server consolidation with modeling 1
24. Value of frames/sec metric over IOPS or MB/s1
25. Disaster recovery monitoring1
22
Server Virtualization and Private Cloud
26. SCSI reservation conflicts on VMware estate1
27. Reduce risk of implementing a private cloud1
28. Increase use of Virtual Servers for critical apps1
29. VMware load balancing with modeling1
30. Reduce risk of VM consolidation with modeling1
31. Improving VM consolidation32. vSphere Admin view33. Cross dept view of effect of queue depths1
34. LOB specific dashboard1
35. Shared infrastructure view for VDI1
36. ID latency through granular monitoring1
Applications37. De-risk Oracle, SAP, similar apps1
38. Troubleshooting Oracle – SAN fabric is not the problem1
39. Troubleshooting Oracle – SAN fabric is not the problem1
40. Troubleshooting applications (MS Exchange) 1
41. De-risk EPIC application1
42. Application owner view1
43. De-risk Oracle consolidation and migration project1
1 Use cases that vCenter Ops cannot perform well or at all
Sample of Virtual Wisdom Use Cases
SAN Availability Probe Software to Optimize Utilization and Availability
Deployment Options: SAN Availability Probe
Reports on link failure, link errors, dropped frames
Immediate root cause detection
Speed problem resolution
Prevent outages and performance problems
Historical trending analysis
Find multi-path failures
Identifies over-provisioned links
SAN Availability Probe - Metrics
Key Metrics
Usage
CRC Errors Immediately identify physical layer problems on links.
Link Resets Immediately be alerted to any unplanned server reboots.
Link Failures Quickly be alerted and identify unplanned path failures, cable pulls or server reboots.
Loss of Signal Proactively identify light loss due to a cable/SFP problem, HBA or storage port reset.
Loss of Sync Identify SFP or cable issues before they cause link failures
Class 3 Discards Proactively identify heavy traffic and port utilization as well as legacy zoning issues.
Received / Transmitted frames
Gauge bandwidth usage
Received / Transmitted "bytes"
Customize alerts to determine if a port is running, or receiving data.
Records the following metrics, allowing users to historically trend & correlate, enabling proactive identification & remediation.
Ensure
availability and
performance of
key applications,
by proactively
identifying
physical
infrastructure
risks and by
enabling the
fastest
identification of
physical layer
problems
Key Benefit
Improving SAN Utilization and Mitigating Risk
SAN utilization < 2%
Some links hitting 100%
Traffic on ISL’s causing contention
SFP low-light levels & flopping HBA’s causing CRC issues
Categorization Summary Count % of LinksBalanced 1228 69%
Passive 85 5%
Active 85 5%
Imbalanced 228 13%
Single (not redundant) 143 8%
SAN Availability ProbeSoftware Audit
Identify multi-pathing issues
SANInsight Traffic Access Points (TAPs) for physical layer access
Deployment Options: Traffic Access Points
Foundation for instrumentation
Passive network access
Will not interfere with network traffic
Does not add latency, non-powered
Shows all traffic, bit‐for‐bit, unlike Mirror or SPAN ports
SAN Availability Probe software, TAPs, SAN Performance Probe
hardware to optimize performance and availability
Deployment Options: SAN Performance Probe
Latency and response time reports
Reduce number of array ports
Improve performance
Queue depth performance impact
Identify degraded device metrics
Optimize storage tiering
Quickly identify root cause
Reduce power, space requirements
“What if” modeling for consolidation, reconfigurations
Reducing Trouble Tickets
SAN Performance Probe SAN monitoring solution with flexible thresholds and alerts
Gathers switched fabric performance statistics
Vendor-agnostic view with no impact on switch performance
• Customer averaged 900+ trouble tickets per month
• Number of tickets dropped by more than 65% within 3 months of VirtualWisdom installation
Trouble Ticket Incident Volume
Q107
Q207
Q307
Q407
Q108
Q208
Q308
Q508
Q109
Q209
Q309
Q409
Q110
0
200
400
600
800
1000
1200
Urgent+HighMediumLowTotal
VirtualWisdom installed
Record and play back metric recordings of intermittent
problems before they build up and disrupt the SAN
Faster Troubleshooting & Root Cause Analysis
SAN Performance Probe Continuously monitors and filters in real-time
Calculates statistics based on measuring all fibre channel frame traffic
Automatically notifies staff based on exceeded policy thresholds
Real-time root-cause analysis
Avoiding Performance Problems
SAN Performance Probe Identifies potential application slow-down causes Recommends corrective action Often enables fixes before application owner is aware of the problem
Provides visibility into Queue depths, CRC errors, physical link errors, protocol errors, code violations,
etc
Loss of signal, link
failures identify
potential more serious
problems
Accelerating Application Deployments
SAN Performance Probe Continuously monitors infrastructure and READ/WRITE
response times
Determines if any changes affect performance
Baseline, before change
After changereal-time update
Optimizing Application Performance
SAN Performance Probe Measures all network statistics Proactively alerts administrator based on policies Enables real-time tuning for maximum performance
Improving Environmentals through Tiering
SAN Performance Probe Deployment of storage tiers by application – reduce risk Increased use of higher capacity, less power consuming drives Use fewer drives by utilizing more cost effective RAID
configurations Reduce floor space; delay DC build-out
15K RPM FC ArraysRAID 10
7.2K RPM SATA ArraysRAID 5
~ 70% less power~ 70% less cooling~ 80% less floor space
The Primary Virtual Infrastructure Challenge
We have found greater than 90 percent
of the VMware- related performance
issues encountered by our customers are
due to the storage tier.*
Performance Specialists
* http://www.vmware.com/files/pdf/Oracle_Databases_on_vSphere_Deployment_Tips.pdf
vCenter
Virtual Server Probe Software to Optimize VMware
Performance and Consolidation Ratios
Deployment Options: Virtual Server Probe
Increase use of Virtual Servers into tier 1 applications
Collects over 100 vCenter metrics
Reduces the risk of implementing mission-critical applications
Increases consolidation ratios
Offers VM to LUN correlation
OS
APP
OS
APP
OS
APPOS
APP
OS
APP
OS
APP
Expanding VMware to Mission-critical Applications
Virtual Server Probe Monitors CPU, memory & SAN utilization and I/O response time Identifies performance bottlenecks & recommends vMotion transfers Enables “what if” load balancing simulations Proves consolidation ratios can be improved
w/out performance degradation
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
• SOS-4-SANS Troubleshooting
• Infrastructure Audit & HealthCheck
• Infrastructure Optimization
PAK
Protocol Analyzer “On Demand”
Deployment Options: Services
Portable PAK:Performance Assessment Kit
Root cause problem resolution
Find/fix all physical-layer issues
Identify performance or availability issues
Optimize application performance
Detect application issues
OS
APP
OS
APP
OS
APPOS
APP
OS
APP
OS
APP
Deployment Options: Complete Solution
Find problem root cause
Reduce trouble tickets
Avoid performance problems
Avoid over-provisioning
Optimize storage tiering
Reduce power and space requirements
Expand VMware deployments to business-critical applications
“What if” modeling for consolidation, moves, reconfigurations
SAN Availability Probe Software, TAPs, SAN Performance Probe Hardware,
ProbeVM Software and Services
OS
APP
OS
APP
OS
APPOS
APP
OS
APP
OS
APP
Key Challenges Addressed by VirtualWisdom
Use Case How VirtualWisdom® Solves the Challenge Example Customers
Reduce troubleshooting time
Acts as an automated storage admin which constantly looks for problem areas and alerts to issues
Provides extensive correlation of metrics to application latency to quickly identify problem areas
PayPal, Qualcomm, Unilever, US Customs & Border Protection, Siemens, Essilor, Barclay’s Card Services
Reduce performance problems
Measures Exchange Completion Time to accurately assess the effect of SAN related configuration changes, component health, virtualization
Bank of America, CareFirst, Lloyds Bank, Groupama, McKesson, NASA
Reduce downtime
Physical layer access identifies opportunities to replace failing hardware components before they affect applications
Blue Cross/ Blue Shield, Logica, Bosch, SAIC, US Patent & Trade Office
Identify costly over-provisioning
Measures actual effect of cache to reduce over-buying
Identifies under-utilized links
AT&T, Fidelity, JPS Health Services, Pentagon, US Patent & Trade Office
Identify failed multi-pathing
Doesn’t just identify improper configurations but identifies actual failed paths, and which paths are unbalanced
Bank of America, Bureau of Public Debt, Mayo Clinic, Raymond James Financial
Reduce risk of storage tiering
Easily identifies tiering opportunities Validates tiering meets performance SLAs
Austrian Lotteries, PayPal, SuperValu
Increase use of Server Virtualization
Full path (VM to LUN) visibility substantially reduces risk
Improves data for better load balancing Improves consolidation ratios by correlating
configuration changes with application latency
AT&T, Fidelity, Groupama, JPS Health Services
The Tangible Benefits of Virtual Instruments
Operating Income EfficiencySpeed, efficiency
• Problem remediation – Faster Mean Time To Resolution (MTTR)
• Reduced risk from infrastructure changes accelerates new deployments
Moves infrastructure management from reactive to proactive• Problem avoidance through better predictive analysis
• Performance assurance enabled by SLAs
Capital EfficiencyIncreased consolidation and storage tiering
• More aggressive use of Virtual Server on tier 1 applications
• Reduced over-provisioning of ports
• More aggressive tiering based on performance data
Leading Healthcare Company
Trouble tickets - OPEX
“VirtualWisdom reduced our
weekly trouble tickets.
It also significantly reduced the time to troubleshoot those remaining
tickets”
Leading Grocery Retailer
Storage Tiering - CAPEX
“By investing in the Virtual
Instruments solution we were able to garner
Tier 1 performance by
utilizing Tier 2 storage”
Leading Financial
Institution
Downtime -OPEX
“VirtualWisdom helps us be
proactive instead
of waiting for an application owner
to contact us about an outage”
Leading Financial Services Company
“We were able to identify unused capacity on our switches and improve port
utilization. This avoided significant
unnecessary expenditures”
OPEX and CAPEX impact
Network Utilization -
CAPEX
25%50% 10% 25%
3 Year Total
$4,459,089
$2,357,424$6,816,513
$9,198,961
$14,375,000
$23,573,961
$30,390,474
Year 3
$2,380,231
$961,968$3,342,199
$4,478,730
$6,125,000
$10,603,730
$13,945,929
Year 2
$1,392,508
$771,936$2,164,444
$2,799,206
$4,500,000
$7,299,206
$9,463,651
Year 1
$686,350
$623,520$1,309,870
$1,921,024
$3,750,000
$5,671,024
$6,980,894
Quantifying Realized Value: An Example
Sample Summary of Projected Benefits
Sample Business Benefits Overview Inputs
Reduced Downtime
Reduced Trouble TicketsTotal OPEX Savings
Capital Expense SavingsReduced SAN Links
Storage Re-Tiering
Total CAPEX Savings
3-YEAR TOTAL SAVINGS (OPEX + CAPEX)
Operating Expense Savings
Baseline: Current Environment Major Minor
45% 2 8
10.0 $100,000 $25,000
1,024 8 4
1,500 120 20
7,200
Outages: Incidents in last year
Outages: Business Impact per Hour
Outages: Average Incident Duration (Hours)
Outages: Admin hours per incident
CAGR: Storage and storage array links
Current Storage -Total (PB)
Current Storage Ports
Annual Trouble Ticket count
Total Switch Ports
• Increase virtual machine to physical server ratios
• Avoid unnecessary equipment expenditures
• Match application needs with storage tier
Benefits of VirtualWisdom - Summary
Reduce OPEX by
50%+
• Faster troubleshooting
• Fewer trouble tickets
• Less downtime
• Eliminate SAN overprovisioning
• Reduce maintenance fees
• Maintain and prove SLA compliance
• Provide auditable results for regulating agencies
• Reduce mis-configuration & incompatibilities
• Validate new technologies instantly
• Avoid project delays
• Immediately validate vendor claims
• Independent view of IT infrastructure compatibilities
• Fewer SAN devices & vendors
Reduce CAPEX by >25%
Improve SLA / Compliance
Accelerate IT Deployments
Reduce Risk
Customer Examples
• Improve service levels, while reducing costs
• Maintain flat headcount, while growing data 50% / year
• Globally align IT by building consistent processes and capabilities
Solutions
Results
Challenge
• Implemented VIO solution across server & storage tiers
ChallengeSolutionsResults • OPEX and Productivity
• Reduced weekly trouble tickets by 75%
• Reduced backup performance troubleshooting from weeks to “instant”
• Quickly discovered opportunities to improve multi-pathing
“VirtualWisdom is not just about real-time monitoring, it’s also great for measuring trends, so we can see how our storage capacity changes over time, which allows us to manage future storage and other hardware requirements in a far more comprehensive manner. There has been a significant reduction in IT staff required to manage the SAN, allowing us to deploy them on other tasks.”
Paul Faid, Unilever
Customer Success Story
Increasing production virtual server deployments
Application performance degradation Inability to agree on root causes
between storage/server admins & vendors
Additional storage capacity/bandwidth failed to resolve problems
Solutions
Results
Challenge
• Implemented VIO solution across server & storage tiersChallengeSolution
sResults • Detection of VMware configuration
problems
• Diagnosis of storage I/O latency
• Identification of overloaded “hot” ports
• Correlation between VMware vMotion and performance degradation
Leading US-based Stock Brokerage Firm
Customer Success Story
Infrastructure Maturity Model
Infrastructure Maturity Model (IMM)- Introduction
Management framework designed to systematically improve infrastructure business value through the continuous improvement of people, processes, and technology
◦ Reduce the risk of infrastructure - related application outages and performance degradation
◦ Enable business-aligned performance-based SLAs
◦ Provide the fundamental metrics required to design and scale a cost-effective storage infrastructure
◦ Progress measured in Key Process Areas via Key Performance Indicators (KPIs)
4Quantitatively
Managed
5Optimizing
3Defined
2Managed
1Initial
4Quantitatively
Managed
IMM- State of the Infrastructure
Business ImpactMeasured
andIT Optimized
Operational Excellence
SAN is cleaned up … for a
point-in-timeHELP! You don’t want to be here
5Optimizing
3Defined
2Managed
1Initial
CHAOS REACTIVE PROACTIVE TACTICALOPTIMIZATION
STRATEGICOPTIMIZATION
• Point-in-time stability
• Reactive to problems
• Physical layer issues addressed
• Continuous stability
• Baseline measured
• Monitoring and alerting
• Proactively avoid problems
• Applications instrumented and well-understood
• Recurring application baselines
• Current infrastructure optimized
• Performance-based SLAs
• Business-aligned design and architecture
• Tiered storage
• Technology assessments
• Optimized spending on future purchases
• Current state
• Pre-engagement
4Quantitatively
Managed
5Optimizing
3Defined
2Managed
• Asset use• Port reduction• Cabling reduction• Switch reduction• Storage target
reduction
• More improved asset use
• SLAs lead to Tiering
• More efficient architectures used
• Highest resource efficiency
• Aggressive tiering
• Cross-functional strategic planning
• Reduced number of incidents
• Critical issues resolved
• Reduced number of incidents
• Reduced downtime
• Improved MTTR
• Personnel efficiency
• Reduced downtime
• Improved MTTR
• Consistent infrastructure performance
• Maximize admin efficiency
• Minimize downtime
• Optimized disk usage & purchases
IMM - Impact on OPEX and CAPEX
CAPEXSAVINGS
OPEXSAVINGS
4Quantitatively
Managed
5Optimizing
3Defined
2Managed
1Initial
IMM – VI Professional Services Assistance
Rapid roll-out and customization of VirtualWisdom
• Baseline SAN and current performance; begin tracking Key Performance Indicators
• Custom dashboard configurations
• Establish and deploy custom alert thresholds
• Create and automate custom reports (performance, utilization trends, error conditions)
• Establish custom SLA-focused dashboards and reports
• Establish custom alerts based on SLAs
THANKS
www.virtualinstruments.com