disaster recovery on cloud€¦ · multi-site (e.g. active-active load sharing and failover)...
TRANSCRIPT
http://www.trianz.com/cloud
Disaster Recovery
On Cloud
Given the above scenario, it is not surprising that many organizations are willing to consider the
use of Public Cloud for DR. Adopting DR on Public Cloud leads to multiple advantages over
traditional DR (see Fig 1).
The primary reasons for these outages generally follow a familiar pattern: Hardware/Network failure,
Error during upgrade/patching, human error, Application error, Power outage and a few others.
DR
Outages
Yet, some of the statistics borne by vendor surveys published on the web outline the following:
Nearly 40 – 60% of organizations fail to meet their service availability goals
Roughly 50% of organizations are “underprepared” to recover services in the event of a disaster
Nearly 50% of organizations still use Backup (and Restore) as the primary (and often ONLY)
means of DR readiness
Top challenges in meeting availability goals include budget constraints, insufficient IT resources
and lack of in-house expertise
Even in organizations where DR has been implemented, DR testing is seen to be inadequate due
to insufficient testing resources, complex processes or being a non-priority task.
Disaster Recovery of on-premise workloads is emerging as one of the key use-cases for Public
Clouds such as AWS and Azure. Traditional models of deploying DR have often proven to be
inadequate and left many Enterprises vulnerable to outages.
http://www.trianz.com/cloud
Fig 1: Traditional DR Vs DR on Public Cloud
Adopting
DR on
Public Cloud
Preparing
for DR using
Public Cloud
DR state
of the industry
Traditional DR DR on Public Cloud
Pre-provisioned Capex (high costs) On-demand (Opex) leads to lower costs (upto 70% savings
potential)
Tradeoffs on full-scale vs limited-scale: usually only
high-critical systems and 25% - 50% Production capacity is
DR enabled
Can start small and auto scale; more workloads can be
brought under DR
Requires careful DR planning for DR testing/drills DR can be tested quickly
Building DR Geo diversity needs planning DR on another Geo is a given
Requires lot more work to automate recovery Easier to automate recovery
Understanding
RPO/RTO
http://www.trianz.com/cloud
Data Lost Service
Down time
Disaster
event RPO RTO
While planning for using DR on Public Cloud, one must also take into account the fact that
servers & storage in the secondary site (Public Cloud) can indeed fail; this may require
planning for running the DR services for Multi-zone availability within the Public Cloud.
Integrating on-premise IT systems (eg. Active Directory, IT service management) with the
Public Cloud also needs to be factored into the design & deployment.
Added
considerations for
DR on Public Cloud
Fig 2: RPO/RTO visualized
1Source: Info derived from websites of AWS, Cloudvelox, NTT, Evolveip, et al
DR Strategy Recovery Time Costs Best suited for
Backup High Minimal No business critical services; services
can afford outage of up to 48 hours
Pilot Light Moderate Low Basic business critical services
Warm Standby Minimal Moderate Core business critical services
Active Active None High True Business Continuity
Fig 3: DR operating scenarios
Approach for
DR on Public Cloud
(AWS illustration)
Identify RPO/RTO requirements and associated Cloud strategy
Choose one or more DR operating scenarios covering (see Fig 3):
Simple Backup/Restore
Pilot Light (e.g. only DB is replicated for subsequent failover)
Warm Standby (e.g. Fully provisioned systems on standby for failover)
Multi-site (e.g. Active-Active load sharing and failover)
Fail-over and Fail-back considerations integrated into DR strategy
Use of offline data seeding (AWS import) where needed
Single-Click DR recovery automation as per workload needs
Verify & Test frequently
Replication
DC
www.trianz.com | [email protected] | +1-732-642-2996
Follow us on www.linkedin.com/company/trianz
www.twitter.com/trianz
www.facebook.com/trianz
www.youtube.com/user/trianzinc
Example implementation of DR-on-AWS for a on-premise 3-tier CRM App (with RPO/RTO of sub-15 mins):
About Trianz
History
Founded in 2001 in Silicon Valley,
with unique perspectives on Clients,
Values and a core theme of Business
Execution.
15+ years track record of successful
Client partnerships and
engagements.
Purpose
Trianz enables business and
technology leaders in the
formulation and execution of
operational strategies.
We partner with business and
technology leaders in turn-key
execution of strategic initiatives.
We serve market-leaders and
emerging clients across Technology,
Finance, Insurance, Media,
Manufacturing, Retail, Healthcare
and Public Sector industries.
Client Industries
Global professional services firm enabling clients to leverage the new Cloud, Digital, Analytics and Security paradigms to
transform business eco-systems and achieve performance through superior strategies and execution.
Secure Tier-1 Operations Campuses |
ISO 9001, SSAE SOC2 and ISO 27001
Compliant | SAS70 Certified
Silicon Valley | Washington DC Metro | New Jersey | Dubai | Bengaluru | Mumbai | Delhi NCR | Chennai | Hyderabad
Typical challenges we face when Implementing
DR on AWS:
Minimum of 5 Mbps and above dedicated
bandwidth is required to start with
Volume larger than a certain size cannot be
handled by certain replication solutions
Some Replication solutions cannot handle a
number of volumes beyond a limit
Different issues encountered during Failover and
Failback
Internet
Corporate
data center
AWS
WAF
Amazon
Route 53
AWS Region
EBS volume -
WEB
EBS volume -
APP
EBS volume -
DB
C3
Optimized
Instance
C3
App
Instance
C3
Replication Software
Appliance
Web VM
APP VM DB VM
Replication
Software
Appliance
Public-VLAN
Corporate-VLAN