do it yourself (diy) and - amazon web services... · 2016-06-14 · 6/11/2013 1 do it yourself...
TRANSCRIPT
6/11/2013
1
Do it Yourself (DIY) andOut‐tasking (Vendor Supported) Resiliency
Finding the Right Balance
Ron MartelIBM Business Continuity and Resiliency Services
June 18, 2013
The instrumented, interconnected and intelligent smarter planet
Smart is:
Maintaining continuous business and IT operations
while rapidly adapting to risks and opportunities
SMART IS: reducing cost through proactive incident response and reduced downtime
SMART IS: ensuring resilient service delivery in a 24/7 world
SMART IS: Responding with speed and agility while minimizing risk
SMART IS: managing risk with an enterprise-wide resilience strategy
SMART IS: reducing cost by reducing downtime thru proactive incident response
6/11/2013
2
IT Risk Management Strategies
Accept
Accept the risk
Deemed acceptable to the business to accept
the risk
Do Nothing
Mitigate
Mitigate the risk
Strategy required and implemented to
reduce risks
Do it Yourself
Transfer
Transfer the risk
Transfer to another entity (i.e. insurance or
managed services)
Out-Taskor or
IBM commissioned a study on IT’s impact to reputational risk.
#1 IT risks have a major impact on a company’s reputation
#2 Companies have rising IT risk concerns related to emerging technology trends
#3 Companies are integrating IT risk and reputational risk management X
4
“IT and reputational risk management and mitigation are… key success factors of our business and must be given due emphasis.”
C-level executive, Malaysian agriculture and agribusiness company
6/11/2013
3
555
Unlike RTO, “reputation recovery” is measured in months
Website outage0-6 months
78%6-12 months
14%12+ months
System failure 72% 17% 10%
Workforce mobility 71% 18% 11%
Data loss 70% 17% 12%
Inadequate continuity plans 65% 21% 13%
Insufficient DR measures 63% 24% 12%
New technology 64% 18% 18%
Data breach 65% 19% 16%
Compliance failure 64% 22% 14%
Poor IT skills / tech support 64% 22% 14%
8%
Balancing Costs and RisksThe right risk for the right price
Potential financial elements
Optimum resilience risk balance
Resilience optimization
Lower Level of resilience
Costs fromrisk events
Costs of mitigation solutions
Higher
Potentialmitigation elements
Total costs associated with risk and mitigation
Avoiding lossHigh-risk capitalallocation positionMaintaining credit ratingAvoiding fines and penaltiesMaintainingcustomer confidenceMaintaining socialresponsibility Avoiding costs
IT resilience architecture
IT service delivery topology
People and processes
Workplace strategy
Data and information protection
Regulatory compliance
6/11/2013
4
Critical Factors for Comparative Analysis:
RTO/RPO
Control
Reputational Risk
Capabilities and Techniques
Testing Capabilities
1
2
3
4
5
6
7
8
9
10
Process and Procedures
Change Management
Program Management
Monitoring and Management
Staffing
Costs
11
12Complexity
8
Primary Cost Elements1.Technology:
– Hardware, Software, Maintenance.– Purchased, Leased or Subscription.
2.Facilities:– Acquisition, Build out and Maintenance– Opportunity Cost Avoidance
3.Network:– WAN, LAN, Internet, Replication– On Demand vs. Dedicated
4.Staff Support:– Design and Implementation– On Going Support
6/11/2013
5
The Fundamental Economics of Resiliency
Active-Active HA Cluster
Dedicated Servers w/Data Mirroring
Vendor HotSite w/VTL
QuickShip$
$$$$$
Cos
t
Recovery Time Objectives (RTO)
Automated Restoration
WeeksMinutes Days
Vendor HotSite Servers w/Data Mirroring
Vendor HotSiteColdSite
PERCEPTIONDedicated = Lo RTO
DIY Vendor
Dedicated N/A
Vendor HotSite N/A
The tradeoff between risk and cost
Hours
The Fundamental Economics of Resiliency
Active-Active HA Cluster
Dedicated Servers w/Data Mirroring
Vendor HotSite w/VTL
QuickShip$
$$$$$
Cos
t
Recovery Time Objectives (RTO)
Automated Restoration
WeeksMinutes Days
Vendor HotSite Servers w/Data Mirroring
Vendor HotsiteColdSite
REALITYRTO = Data Xfer and Server Restore
DIY Vendor
Dedicated
Vendor HotSite Limited
Hours
57% of respondents source equal mix of
”in-house” and out-tasked DR solutions*
6/11/2013
6
Comparative Analysis
Criteria
Dedicated Assets
Hybrid
Vendor HotSite
RTO • < 24 Hours • 1-3 Days • 3-7 Days RPO < 1 Hour • < 1 Hour • 24 Hours Pro
• Shortest RTO • Shortest RPO • Lowest Risk • No tape media issues • Most Control of Environment • Access to Production Staff • Most Flexible Test Schedules
• Most aggressive RPO • Scalable only to server
inventory • Low Risk • No tape media issues • 100% success rate
• Lowest Cost • Scalable only to inventory • 100% success rate
Con
• Highest Cost • Requires most discipline
• Compete for shared assets • Difficult to schedule tests • Scalable only to server
inventory
• Tape restore issues • Compete for all assets • Most difficult to schedule
tests • Longest RTO & RPO
Technology
• Dedicated Technology • Dedicated Network • Dedicated Facility • CapEx
• Dedicated Disk • Subscription Servers • Dedicated Replication Network • On Demand MPLS WAN
• Equipment assigned at Event • Production network on demand • Scalable w/i vendor inventory • OpEx
Continuity Issues
• Maximum control • Most discipline required
• Maximum control of data • Some flexibility
• Least control • Least flexibility
Fees
$$$$$
$$$
$
Recovery Time Objective
Cos
t
Mirrored Datacenter4-8 Hour RTO
Tape Based Hot-Site48 - 96 Hour RTO
Data Mirrored Hot-Site 24 – 36 Hour RTO
Recovery Time Objective
Cos
t
Mirrored Datacenter4-8 Hour RTO
Tape Based Hot-Site48 - 96 Hour RTO
Data Mirrored Hot-Site 24 – 36 Hour RTO
Cos
t
Mirrored Datacenter4-8 Hour RTO
Mirrored Datacenter4-8 Hour RTO
Tape Based Hot-Site48 - 96 Hour RTO
Tape Based Hot-Site48 - 96 Hour RTO
Data Mirrored Hot-Site 24 – 36 Hour RTO
Data Mirrored Hot-Site 24 – 36 Hour RTO
12
• Knowledgeable about disaster recovery and high availability.
• DR plans documented.• Recovery strategies to
achieve required recovery objectives.
• Reliant on various restore techniques
• Recovery requirements are clearly understood.
• Resiliency Program in place
• Comprehensive strategies and plans.
• Regular and effective testing.
• Executive Support
• Crisis Management Process
• Meet Customer Expectations
• Governance model in place
• Strategy is used as competitive advantage
• Validated and Tested in an integrated manner.
• Effective Advanced Recovery solutions implemented for top tier.
• Poorly defined.
• No processes developed or followed.
• Poor change management between production & recovery.
What is the maturity of your Resilience Program and where do you want it to be?
Toler
ance
for
Risk
Maturity Level
High
Low
Medium
Mature World ClassAware CapableUnfocused
Each maturity level has different characteristics, techniques and approaches.
6/11/2013
7
13
Recovery Options
1. Dedicated Assets:a. Implement data mirroringb.Engineer automated server and data restoration techniques
2. Hybrid Approach (Dedicated and Subscription Assets): a. Replace dedicated servers with subscription assetsb. Implement combination of data restoration techniques
3. Subscription Assets (i.e. IBM BCRS):a. Utilize subscription disk and servers assets b. Implement tape restoration and/or appliances.
14
Option #1: Dedicated Assets
Production Data Center
Recovery Data Center
• All assets, including facility, technology and network dedicated to recovery • Potential for most effective resiliency infrastructure.• Identify facility capabilities to support space and power requirements.• Opportunity to reduce cost by repurposing non‐production (T&D, QA) assets. • Opportunity cost to repurpose or dispose of existing dedicated facilities.• Most costly approach.
6/11/2013
8
15
Option #2: Hybrid Approach
• Balanced use of dedicated and subscription based assets.• Reduce RTO/RPO by applying dedicated assets to selected “choke points.”• Assets (dedicated and subscription) are OpEx rather than CapEx.• Reduce costs by repurposing selected dedicated assets to recovery site.• Opportunity cost of eliminating current dedicated facility.• Optimal use of client and vendor resources balances cost vs. risk.
Production Data Center
Recovery Data Center
16
Option #3: Subscription Assets
• All assets, including servers, disk and network are subscription based. • Must carefully plan for tape backup and restoration events.• Assets and services are OpEx rather than CapEx. • Opportunity cost of eliminating current dedicated facility.• Potential for industry acceptable solution.• Scalable to the other alternatives• Most cost effective approach.
Production Data Center
Recovery Data Center
6/11/2013
9
Other Common Recovery Topologies
Reciprocal Recovery Split Production
Local DR/HA plus Out of Region DR Star Recovery
18
Workload Balanced Data Centers
True Active‐Active:• High Availability: In region proximity due to latency (component failure)• Failover implementation: In region HA clustering, GDPS, etc.• Out of Region: Read only, low volatility or specific order entry • Requires sufficient capacity for both workloads
Workload Balance Data Center A Data Center B
By Application Appls #1‐50 Appls #51‐100
By Region West East
By Platform Mainframe Open Systems
6/11/2013
10
Asymmetrical Data Centers:• Reciprocal Recovery between two asymmetrical Data Centers.• One center had insufficient capacity and facilities to accommodate primary center. • The more balanced the workloads, the more effective reciprocal centers can be.
Platform Recovery/Critical/Overspend:• Referenced recovery technique for platform, not business processes. • Other platforms and tiers unaffordable and unachievable with given resources.• All platforms and tiers recovered with dedicated resources and mirroring..
Recovery in the Cloud:• Niche market, similar to production.• Limited by platform.• Limited by capacity.• Limited by process.• Growing in acceptance.
Myth Busters: • Free data center is not free.• Excess server capacity is not free• Quick Ship is not necessarily quick.• Repurposing T&D has consequences• Cascading limits production capabilities.
Case Studies
Reasons firms brought DR in‐house
Source: a commissioned study conducted by Forrester Consulting on behalf of IBM, December, 2012
“If you brought all or part of your DR in-house in the past five years, what was the primary reason?"
Base: 75 Enterprise Hardware decision makers in the US, UK, and India
6/11/2013
11
Organizations have brought DR “in‐house” but…
are not sure they could respond to a real disaster
of firms face a lack of focus on in-house DR
relative to other IT projects
of firms struggle against lack of funding to keep
DR infrastructure up to date
of do-it-yourselfers have trouble running enoughDR tests and exercises
of do-it-yourselfers lack adequate in-house
DR skillsThe Risks of “Do It Yourself” Disaster Recovery, a commissioned study conducted by Forrester Consulting on behalf of IBM, January 2013
22
Business Resilience requires a multi‐faceted approach
Governance
Program Execution
Applications
Facilities
Security
Systems Management
Solution Design
BusinessJustification
Recovery DesignProduction
Data Center
Recovery Data Center
6/11/2013
12
IT Risk Management Strategies
Accept
Accept the risk
Deemed acceptable to the business to accept
the risk
Do Nothing
Mitigate
Mitigate the risk
Strategy required and implemented to
reduce risks
Do it Yourself
Transfer
Transfer the risk
Transfer to another entity (i.e. insurance or
managed services)
Out-TaskAnd/Or
The instrumented, interconnected and intelligent smarter planet
Smart is:
Maintaining continuous business and IT operations
while rapidly adapting to risks and opportunities
SMART IS: reducing cost through proactive incident response and reduced downtime
SMART IS: ensuring resilient service delivery in a 24/7 world
SMART IS: Responding with speed and agility while minimizing risk
SMART IS: managing risk with an enterprise-wide resilience strategy
SMART IS: reducing cost by reducing downtime thru proactive incident response
6/11/2013
13
Thank You
MerciGrazie
GraciasObrigado
Danke
Japanese
French
Russian
GermanItalian
Spanish
Portuguese
Arabic
Traditional Chinese
Simplified Chinese
Hindi
Romanian
Korean
Multumesc
Turkish
Teşekkür ederim
English