disaster recovery on cloud€¦ · multi-site (e.g. active-active load sharing and failover)...

http://www.trianz.com/cloud

Disaster Recovery

On Cloud

Given the above scenario, it is not surprising that many organizations are willing to consider the

use of Public Cloud for DR. Adopting DR on Public Cloud leads to multiple advantages over

traditional DR (see Fig 1).

The primary reasons for these outages generally follow a familiar pattern: Hardware/Network failure,

Error during upgrade/patching, human error, Application error, Power outage and a few others.

DR

Outages

Yet, some of the statistics borne by vendor surveys published on the web outline the following:

Nearly 40 – 60% of organizations fail to meet their service availability goals

Roughly 50% of organizations are “underprepared” to recover services in the event of a disaster

Nearly 50% of organizations still use Backup (and Restore) as the primary (and often ONLY)

means of DR readiness

Top challenges in meeting availability goals include budget constraints, insufficient IT resources

and lack of in-house expertise

Even in organizations where DR has been implemented, DR testing is seen to be inadequate due

to insufficient testing resources, complex processes or being a non-priority task.

Disaster Recovery of on-premise workloads is emerging as one of the key use-cases for Public

Clouds such as AWS and Azure. Traditional models of deploying DR have often proven to be

inadequate and left many Enterprises vulnerable to outages.


Fig 1: Traditional DR Vs DR on Public Cloud

Adopting

DR on

Public Cloud

Preparing

for DR using

Public Cloud

DR state

of the industry

Traditional DR DR on Public Cloud

Pre-provisioned Capex (high costs) On-demand (Opex) leads to lower costs (upto 70% savings

potential)

Tradeoffs on full-scale vs limited-scale: usually only

high-critical systems and 25% - 50% Production capacity is

DR enabled

Can start small and auto scale; more workloads can be

brought under DR

Requires careful DR planning for DR testing/drills DR can be tested quickly

Building DR Geo diversity needs planning DR on another Geo is a given

Requires lot more work to automate recovery Easier to automate recovery

Understanding

RPO/RTO


Data Lost Service

Down time

Disaster

event RPO RTO

While planning for using DR on Public Cloud, one must also take into account the fact that

servers & storage in the secondary site (Public Cloud) can indeed fail; this may require

planning for running the DR services for Multi-zone availability within the Public Cloud.

Integrating on-premise IT systems (eg. Active Directory, IT service management) with the

Public Cloud also needs to be factored into the design & deployment.

Added

considerations for

DR on Public Cloud

Fig 2: RPO/RTO visualized

1Source: Info derived from websites of AWS, Cloudvelox, NTT, Evolveip, et al

DR Strategy Recovery Time Costs Best suited for

Backup High Minimal No business critical services; services

can afford outage of up to 48 hours

Pilot Light Moderate Low Basic business critical services

Warm Standby Minimal Moderate Core business critical services

Active Active None High True Business Continuity

Fig 3: DR operating scenarios

Approach for

DR on Public Cloud

(AWS illustration)

Identify RPO/RTO requirements and associated Cloud strategy

Choose one or more DR operating scenarios covering (see Fig 3):

Simple Backup/Restore

Pilot Light (e.g. only DB is replicated for subsequent failover)

Warm Standby (e.g. Fully provisioned systems on standby for failover)

Multi-site (e.g. Active-Active load sharing and failover)

Fail-over and Fail-back considerations integrated into DR strategy

Use of offline data seeding (AWS import) where needed

Single-Click DR recovery automation as per workload needs

Verify & Test frequently

Replication

DC

www.trianz.com | [email protected] | +1-732-642-2996

Follow us on www.linkedin.com/company/trianz

www.twitter.com/trianz

www.facebook.com/trianz

www.youtube.com/user/trianzinc

Example implementation of DR-on-AWS for a on-premise 3-tier CRM App (with RPO/RTO of sub-15 mins):

About Trianz

History

Founded in 2001 in Silicon Valley,

with unique perspectives on Clients,

Values and a core theme of Business

Execution.

15+ years track record of successful

Client partnerships and

engagements.

Purpose

Trianz enables business and

technology leaders in the

formulation and execution of

operational strategies.

We partner with business and

technology leaders in turn-key

execution of strategic initiatives.

We serve market-leaders and

emerging clients across Technology,

Finance, Insurance, Media,

Manufacturing, Retail, Healthcare

and Public Sector industries.

Client Industries

Global professional services firm enabling clients to leverage the new Cloud, Digital, Analytics and Security paradigms to

transform business eco-systems and achieve performance through superior strategies and execution.

Secure Tier-1 Operations Campuses |

ISO 9001, SSAE SOC2 and ISO 27001

Compliant | SAS70 Certified

Silicon Valley | Washington DC Metro | New Jersey | Dubai | Bengaluru | Mumbai | Delhi NCR | Chennai | Hyderabad

Typical challenges we face when Implementing

DR on AWS:

Minimum of 5 Mbps and above dedicated

bandwidth is required to start with

Volume larger than a certain size cannot be

handled by certain replication solutions

Some Replication solutions cannot handle a

number of volumes beyond a limit

Different issues encountered during Failover and

Failback

Internet

Corporate

data center

AWS

WAF

Amazon

Route 53

AWS Region

EBS volume -

WEB

EBS volume -

APP

EBS volume -

DB

C3

Optimized

Instance

C3

App

Instance

C3

Replication Software

Appliance

Web VM

APP VM DB VM

Replication

Software

Appliance

Public-VLAN

Corporate-VLAN

disaster recovery on cloud€¦ · multi-site (e.g. active-active load sharing and failover)...

Documents