do it yourself (diy) and - amazon web services... · 2016-06-14 · 6/11/2013 1 do it yourself...

13
6/11/2013 1 Do it Yourself (DIY) and Outtasking (Vendor Supported) Resiliency Finding the Right Balance Ron Martel IBM Business Continuity and Resiliency Services June 18, 2013 The instrumented, interconnected and intelligent smarter planet Smart is: Maintaining continuous business and IT operations while rapidly adapting to risks and opportunities SMART IS: reducing cost through proactive incident response and reduced downtime SMART IS: ensuring resilient service delivery in a 24/7 world SMART IS: Responding with speed and agility while minimizing risk SMART IS: managing risk with an enterprise-wide resilience strategy SMART IS: reducing cost by reducing downtime thru proactive incident response

Upload: others

Post on 30-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

1

Do it Yourself (DIY) andOut‐tasking (Vendor Supported) Resiliency

Finding the Right Balance

Ron MartelIBM Business Continuity and Resiliency Services

June 18, 2013

The instrumented, interconnected and intelligent smarter planet

Smart is:

Maintaining continuous business and IT operations

while rapidly adapting to risks and opportunities

SMART IS: reducing cost through proactive incident response and reduced downtime

SMART IS: ensuring resilient service delivery in a 24/7 world

SMART IS: Responding with speed and agility while minimizing risk

SMART IS: managing risk with an enterprise-wide resilience strategy

SMART IS: reducing cost by reducing downtime thru proactive incident response

Page 2: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

2

IT Risk Management Strategies

Accept

Accept the risk

Deemed acceptable to the business to accept

the risk

Do Nothing

Mitigate

Mitigate the risk

Strategy required and implemented to

reduce risks

Do it Yourself

Transfer

Transfer the risk

Transfer to another entity (i.e. insurance or

managed services)

Out-Taskor or

IBM commissioned a study on IT’s impact to reputational risk.

#1 IT risks have a major impact on a company’s reputation

#2 Companies have rising IT risk concerns related to emerging technology trends 

#3 Companies are integrating IT risk and reputational risk management X

4

“IT and reputational risk management and mitigation are… key success factors of our business and must be given due emphasis.”

C-level executive, Malaysian agriculture and agribusiness company

Page 3: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

3

555

Unlike RTO, “reputation recovery” is measured in months

Website outage0-6 months

78%6-12 months

14%12+ months

System failure 72% 17% 10%

Workforce mobility 71% 18% 11%

Data loss 70% 17% 12%

Inadequate continuity plans 65% 21% 13%

Insufficient DR measures 63% 24% 12%

New technology 64% 18% 18%

Data breach 65% 19% 16%

Compliance failure 64% 22% 14%

Poor IT skills / tech support 64% 22% 14%

8%

Balancing Costs and RisksThe right risk for the right price

Potential financial elements

Optimum resilience risk balance

Resilience optimization

Lower Level of resilience

Costs fromrisk events

Costs of mitigation solutions

Higher

Potentialmitigation elements

Total costs associated with risk and mitigation

Avoiding lossHigh-risk capitalallocation positionMaintaining credit ratingAvoiding fines and penaltiesMaintainingcustomer confidenceMaintaining socialresponsibility Avoiding costs

IT resilience architecture

IT service delivery topology

People and processes

Workplace strategy

Data and information protection

Regulatory compliance

Page 4: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

4

Critical Factors for Comparative Analysis:

RTO/RPO

Control

Reputational Risk

Capabilities and Techniques

Testing Capabilities

1

2

3

4

5

6

7

8

9

10

Process and Procedures

Change Management

Program Management

Monitoring and Management

Staffing

Costs

11

12Complexity

8

Primary Cost Elements1.Technology:

– Hardware, Software, Maintenance.– Purchased, Leased or Subscription.

2.Facilities:– Acquisition, Build out and Maintenance– Opportunity Cost Avoidance

3.Network:– WAN, LAN, Internet, Replication– On Demand vs. Dedicated

4.Staff Support:– Design and Implementation– On Going Support

Page 5: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

5

The Fundamental Economics of Resiliency

Active-Active HA Cluster

Dedicated Servers w/Data Mirroring

Vendor HotSite w/VTL

QuickShip$

$$$$$

Cos

t

Recovery Time Objectives (RTO)

Automated Restoration

WeeksMinutes Days

Vendor HotSite Servers w/Data Mirroring

Vendor HotSiteColdSite

PERCEPTIONDedicated = Lo RTO

DIY Vendor

Dedicated N/A

Vendor HotSite  N/A

The tradeoff between risk and cost

Hours

The Fundamental Economics of Resiliency

Active-Active HA Cluster

Dedicated Servers w/Data Mirroring

Vendor HotSite w/VTL

QuickShip$

$$$$$

Cos

t

Recovery Time Objectives (RTO)

Automated Restoration

WeeksMinutes Days

Vendor HotSite Servers w/Data Mirroring

Vendor HotsiteColdSite

REALITYRTO = Data Xfer and Server Restore

DIY Vendor

Dedicated

Vendor HotSite Limited

Hours

57% of respondents source equal mix of

”in-house” and out-tasked DR solutions*

Page 6: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

6

Comparative Analysis

Criteria

Dedicated Assets

Hybrid

Vendor HotSite

RTO • < 24 Hours • 1-3 Days • 3-7 Days RPO < 1 Hour • < 1 Hour • 24 Hours Pro

• Shortest RTO • Shortest RPO • Lowest Risk • No tape media issues • Most Control of Environment • Access to Production Staff • Most Flexible Test Schedules

• Most aggressive RPO • Scalable only to server

inventory • Low Risk • No tape media issues • 100% success rate

• Lowest Cost • Scalable only to inventory • 100% success rate

Con

• Highest Cost • Requires most discipline

• Compete for shared assets • Difficult to schedule tests • Scalable only to server

inventory

• Tape restore issues • Compete for all assets • Most difficult to schedule

tests • Longest RTO & RPO

Technology

• Dedicated Technology • Dedicated Network • Dedicated Facility • CapEx

• Dedicated Disk • Subscription Servers • Dedicated Replication Network • On Demand MPLS WAN

• Equipment assigned at Event • Production network on demand • Scalable w/i vendor inventory • OpEx

Continuity Issues

• Maximum control • Most discipline required

• Maximum control of data • Some flexibility

• Least control • Least flexibility

Fees

$$$$$

$$$

$

Recovery Time Objective

Cos

t

Mirrored Datacenter4-8 Hour RTO

Tape Based Hot-Site48 - 96 Hour RTO

Data Mirrored Hot-Site 24 – 36 Hour RTO

Recovery Time Objective

Cos

t

Mirrored Datacenter4-8 Hour RTO

Tape Based Hot-Site48 - 96 Hour RTO

Data Mirrored Hot-Site 24 – 36 Hour RTO

Cos

t

Mirrored Datacenter4-8 Hour RTO

Mirrored Datacenter4-8 Hour RTO

Tape Based Hot-Site48 - 96 Hour RTO

Tape Based Hot-Site48 - 96 Hour RTO

Data Mirrored Hot-Site 24 – 36 Hour RTO

Data Mirrored Hot-Site 24 – 36 Hour RTO

12

• Knowledgeable about disaster recovery and high availability.

• DR plans documented.• Recovery strategies to

achieve required recovery objectives.

• Reliant on various restore techniques

• Recovery requirements are clearly understood.

• Resiliency Program in place

• Comprehensive strategies and plans.

• Regular and effective testing.

• Executive Support

• Crisis Management Process

• Meet Customer Expectations

• Governance model in place

• Strategy is used as competitive advantage

• Validated and Tested in an integrated manner.

• Effective Advanced Recovery solutions implemented for top tier.

• Poorly defined.

• No processes developed or followed.

• Poor change management between production & recovery.

What is the maturity of your Resilience Program and where do you want it to be?

Toler

ance

for

Risk

Maturity Level

High

Low

Medium

Mature World ClassAware CapableUnfocused

Each maturity level has different characteristics, techniques and approaches.

Page 7: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

7

13

Recovery Options

1. Dedicated Assets:a. Implement data mirroringb.Engineer automated server and data restoration techniques

2. Hybrid Approach (Dedicated and Subscription Assets):  a. Replace dedicated servers with subscription assetsb. Implement combination of data restoration techniques 

3. Subscription Assets (i.e. IBM BCRS):a. Utilize subscription disk and servers assets b. Implement tape restoration and/or appliances.

14

Option #1: Dedicated Assets

Production Data Center

Recovery Data Center

• All assets, including facility, technology and network dedicated to recovery • Potential for most effective resiliency infrastructure.• Identify facility capabilities to support space and power requirements.• Opportunity to reduce cost by repurposing non‐production (T&D, QA) assets. • Opportunity cost to repurpose or dispose of existing dedicated facilities.• Most costly approach.

Page 8: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

8

15

Option #2: Hybrid Approach

• Balanced use of dedicated and subscription based assets.• Reduce RTO/RPO by applying dedicated assets to selected “choke points.”• Assets (dedicated and subscription) are OpEx rather than CapEx.• Reduce costs by repurposing selected dedicated assets to recovery site.• Opportunity cost of eliminating current dedicated facility.• Optimal use of client and vendor resources balances cost vs. risk.

Production Data Center

Recovery Data Center

16

Option #3: Subscription Assets

• All assets, including servers, disk and network are subscription based.  • Must carefully plan for tape backup and restoration events.• Assets and services are OpEx rather than CapEx. • Opportunity cost of eliminating current dedicated facility.• Potential for industry acceptable solution.• Scalable to the other alternatives• Most cost effective approach.

Production Data Center

Recovery Data Center

Page 9: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

9

Other Common Recovery Topologies

Reciprocal Recovery Split Production

Local DR/HA plus Out of Region DR Star Recovery

18

Workload Balanced Data Centers

True Active‐Active:• High Availability: In region proximity due to latency (component failure)• Failover implementation: In region HA clustering, GDPS, etc.• Out of Region: Read only, low volatility or specific order entry • Requires sufficient capacity for both workloads

Workload Balance Data Center A Data Center B

By Application Appls #1‐50 Appls #51‐100

By Region West East

By Platform Mainframe Open Systems

Page 10: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

10

Asymmetrical Data Centers:• Reciprocal Recovery between two asymmetrical Data Centers.• One center had insufficient capacity and facilities to accommodate primary center. • The more balanced the workloads, the more effective reciprocal centers can be.

Platform Recovery/Critical/Overspend:• Referenced recovery technique for platform, not business processes. • Other platforms and tiers unaffordable and unachievable with given resources.• All platforms and tiers recovered with dedicated resources and mirroring..

Recovery in the Cloud:• Niche market, similar to production.• Limited by platform.• Limited by capacity.• Limited by process.• Growing in acceptance.

Myth Busters:  • Free data center is not free.• Excess server capacity is not free• Quick Ship is not necessarily quick.• Repurposing T&D has consequences• Cascading limits production capabilities.

Case Studies

Reasons firms brought DR in‐house

Source: a commissioned study conducted by Forrester Consulting on behalf of IBM, December, 2012

“If you brought all or part of your DR in-house in the past five years, what was the primary reason?"

Base: 75 Enterprise Hardware decision makers in the US, UK, and India

Page 11: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

11

Organizations have brought DR “in‐house” but…

are not sure they could respond to a real disaster

of firms face a lack of focus on in-house DR

relative to other IT projects

of firms struggle against lack of funding to keep

DR infrastructure up to date

of do-it-yourselfers have trouble running enoughDR tests and exercises

of do-it-yourselfers lack adequate in-house

DR skillsThe Risks of “Do It Yourself” Disaster Recovery, a commissioned study conducted by Forrester Consulting on behalf of IBM, January 2013

22

Business Resilience requires a multi‐faceted approach

Governance

Program Execution

Applications

Facilities

Security

Systems Management

Solution Design

BusinessJustification

Recovery DesignProduction

Data Center

Recovery Data Center

Page 12: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

12

IT Risk Management Strategies

Accept

Accept the risk

Deemed acceptable to the business to accept

the risk

Do Nothing

Mitigate

Mitigate the risk

Strategy required and implemented to

reduce risks

Do it Yourself

Transfer

Transfer the risk

Transfer to another entity (i.e. insurance or

managed services)

Out-TaskAnd/Or

The instrumented, interconnected and intelligent smarter planet

Smart is:

Maintaining continuous business and IT operations

while rapidly adapting to risks and opportunities

SMART IS: reducing cost through proactive incident response and reduced downtime

SMART IS: ensuring resilient service delivery in a 24/7 world

SMART IS: Responding with speed and agility while minimizing risk

SMART IS: managing risk with an enterprise-wide resilience strategy

SMART IS: reducing cost by reducing downtime thru proactive incident response

Page 13: Do it Yourself (DIY) and - Amazon Web Services... · 2016-06-14 · 6/11/2013 1 Do it Yourself (DIY) and Out‐tasking (Vendor Supported) Resiliency Finding the Right Balance Ron

6/11/2013

13

Thank You

MerciGrazie

GraciasObrigado

Danke

Japanese

French

Russian

GermanItalian

Spanish

Portuguese

Arabic

Traditional Chinese

Simplified Chinese

Hindi

Romanian

Korean

Multumesc

Turkish

Teşekkür ederim

English