dcag training on vmware dr process

13
DR Planning Project Training Document Prepared by: Thomas Bronack, CBCP (917) 673-6992 [email protected] Thomas Bronack © Initial Training Class Phone: (917) 673- 6992 / Email: [email protected]

Upload: thomas-bronack

Post on 08-Jun-2015

124 views

Category:

Business


5 download

DESCRIPTION

Description on implementing a recovery environment with VMware, vSphere, vConnect, and RPA as an initial training document to application DR Teams going through Application Recovery Certification with links to additional materials.

TRANSCRIPT

Page 1: Dcag training on VMware DR Process

DR Planning Project

Training Document

Prepared by:

Thomas Bronack, CBCP

(917) 673-6992

[email protected]

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 2: Dcag training on VMware DR Process

What do we want to achieve?

• Fully converted Information Technology Environment.

• Savings through equipment, locations, and vendor contracts.

• Savings through better controls and efficiency.

• Continuity of Business achieved through Enterprise Resiliency.

• World-Wide Compliance achieved through Corporate Certification.

• Additional savings through integration with everyday functions.

• Improved Reputation and Higher Employee Morale.

• Better retention of staff and clients.

• More able to recruit new personnel and close client business.

• Costs go down and efficiency goes up.

• Improved Savings and Profitability.

2

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 3: Dcag training on VMware DR Process

Outsourcing Project Time Line of Events

Bid Inventory Build Regionals Transition Disaster RecoveryBuild Recovery Compliance

• Laws and Regulations;• Requirements to

Comply with;• Present Compliance;• Gaps & Exceptions;• Obstacles;• Domestic;• International; and• Cross Border

Requirements.

• “Proof of Concept”;• Infrastructure Readiness;• Disaster Recovery;• Application Recovery;• Business Recovery;• Workplace Safety and

Violence Prevention;• Emergency Management;• Crisis Management;• Protection, Salvage, and

Restoration;• Supply Chain

Management;• Insurance;• Community Relations;• Communications; and• Use of Social Media.

• What they Have;• Infrastructure;• Equipment;• Software;• Applications;• Locations;• Computer Sites;• Recovery Sites;• Applications with

Recovery Plans; and• Application that need

Recovery Plans.

• RFP;• Bid;• SOW;• Scope; • Goals and• Timeframe

• Prod 1;• Prod 2;• Prod 3.

Recovery Site

• Move Applications to Regional Data Center;

• Test Successful Operation; and

• Use Virtualization.

Phase I Phase II Phase III Phase IV Phase V Phase VI

3

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 4: Dcag training on VMware DR Process

4

Prod 1(Americas)

UserSites

UserSites

UserSites

Prod 2(Europe)

Prod 3(Asia Pacific)

Global Recovery Site

Cloudor

WAN

Cloudor

WAN

User Locations connected to Regional Data Centers and Global Recovery Site

Three Regional Data Centers and One Global Recovery Site

Phase V – Perform Application Recovery Certification

Phase V – Application Recovery Certification is accomplished; initially for selected applications to validate Regional Sites can recover to Global Recovery Site,

Recovery Site is Built so that existing recovery sites and vendor contracts can be eliminated.

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 5: Dcag training on VMware DR Process

5

ProductionSite

Old ID Address

RecoverySite

New IP Address

Failover

Failback

User 1 User 2 User 3 User n

Cloudor

WAN

Cloudor

WAN

Production Path

Disaster Recovery Path

Users Switched to Recovery Site

Failover / Failback DR Process

1. Users stay at their site, while Production is switched to Recovery Site.2. User has to move to a secondary site because User site is lost, connect to Region Site & Test Recovery.3. Users move to recovery site and production is switched to Recovery.

• Declare Disaster;• Failover to Recovery

Site;• Continue User

Processing within RTO;• Supplies are routed to

Recovery Site;• Original Site is

Safeguarded, Salvaged, and Restored;

• Failback to Original Site

• Use Existing Recovery Plan to Certify Application Recovery

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 6: Dcag training on VMware DR Process

Virtual Machines (VM) are maintained by the VMware vSphere system which manages a vCenter Server used for Site Recovery Management. VM can be considered as a Resource Manager that separates Real Equipment (Storage, Computer, Network, etc.) into Logical Equipment Sections. Each VM can represent a Real Server, but many VM can reside in a Real Server which will free up real servers presently used for disposal and a reduction in cost. VMs save space, power, and reduce environmental concerns, all of which affects the bottom line and reputation of the company. It also takes fewer people to manage a Virtual Environment that the number of people now required to manage a real environment. Servers are Rack Mounted in what is called a Blade to save floor space and infrastructure. Switches can re-route servers to the Recovery Point Application to the Recovery Site when a disaster event occurs.

Logical DR Environment 6

WAN

Array ReplicationOver WAN

Recovery Site

DR Logical Architecture from Prod to Recovery

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 7: Dcag training on VMware DR Process

This diagram describes what the DR Environment will look like when completed. Remote sites are transformed and virtualized via the Avamar Virtual Environment, which will allow for the removal of remote equipment and support personnel. Windows, UNIX, and ESX Operating Systems will be housed in an EMC VNX Unified Storage facility. Network Backup Servers will protect communications, and Data Domains will protect Remote Users. A Tape Library is provided for long term storage and electronic transfer to the Iron Mountain Tape Vault via encrypted communications. The System Attached Network (SAN) and EMC Unified Storage Facility are connected to the wide Area Network through EMC Recovery Point Applications (RPAs) that can automatically switch a failing location to the recovery site to continue processing.

DR Environment Target7

Recovery Site

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 8: Dcag training on VMware DR Process

Secondary Site

Primary Site

Primary Site

Disaster Event:• Event;• Analyze;• Declare;• Failover.

Primary Site

Safeguard:• Evacuate;• Protect Site;• First

Responders.

Primary Site

Salvage:• Clean

Facility;• Repair;• Resupply.

Primary Site

Restoration:• Restart;• Test;• Success;• Failback.

Primary Site

Resume:• Reload

Data;• Restart;• Continue.

Failover Production Recovery Processing

Failback from Secondary Site after Restoration

FailoverStart Up

FailbackShut Down

High Availability (HA) is RTO / SLA

based Switch

Continuous Availability (CA) is immediate Switch

Repair Primary Site to Resume Production via Failback

CA HA

Production Production

“The goal of Enterprise Resiliency is to achieve ZERO DOWNTIME by implementing Application Recovery Certification for HA and Gold Standard Recovery Certification for CA Applications”

Flip / FlopSwitch Over

Data Sync

Flip / FlopSwitch Over

RPO (Last Snapshot)

RTO

Point of Failure

Lifecycle of a Disaster Event (Why we create Recovery Plans) 8

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 9: Dcag training on VMware DR Process

Disaster Recovery Testing Process

The DR Testing process is illustrated here and includes:

1. Select Application for DR Testing;2. Define DR Testing Goals and

Objectives;3. Define Production Site where

application resides;4. Complete Pre-Staging form to

provide DR team with the information need to make the recovery site ready to perform DR Testing;

5. Complete DR Exercise Booklet for Application;

6. Conduct the Actual DR Exercise;7. DR Coordinator receives Work

Sheets and prepares a Report and Presentation of findings for the Post Mortem;

8. Implement recommendations for improvement,

9

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 10: Dcag training on VMware DR Process

Recovery Testing processA. Develop Recovery Objectives and Testing Schedule:

1. Create a Recovery Site DR Site Testing;2. Validate Production Site to Recovery Site Connectivity; 3. Disaster Recovery Plans for interruptions to Information technology;4. Application Recovery Certification (CA, HA, Best Effort, Deferred);5. Business Recovery for loss of a location;6. Emergency Management for Incidents and Natural Disasters; etc.

B. DR Testing is conducted in Five Steps, which are:1. DR Planning Meeting – to orientate Application DR Team;2. Infrastructure Readiness – To prepare the Recovery Site and Obtain Data;3. DR Pre-Test – To prepare the Recovery Site for Application DR Test:

a. Recovery Site establishes recovery environment for disaster event or test. b. Develop procedures for providing Recovery Site with the information they

need.4. Actual DR Recovery / Test – to DR Test the Application:

a. Follow the “Script of Actions” contained in the Recovery Plan.b. Record event times, comments, and encountered problems.

5. Post Mortem Meeting – Review of DR Test Results:a. To discuss recovery events and recommend improvements.

10

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 11: Dcag training on VMware DR Process

Where do we go from hereA. Develop Application Actual DR Testing process:

1. Application Actual DR Test Activities Sheet is completed with Estimated Times.2. Production Servers are brought-down in Production.3. Recovery Servers are brought-up in Recovery.4. Application is connected to Recovery Facility.5. Data is Synchronized to point just before failure.6. Application resumes normal processing like in Production Mode.7. Application connectivity and functionality is verified.8. Recovery Servers are Brought-Down.9. Production Servers are Brought-Up.10. Application resumes processing at Production Site and is verified.11. If Successful, Application receives Application Recovery Certification – otherwise Application

DR problems are repaired and the Application goes through DR Testing again until Application Recovery Certification is achieved.

B. Develop Application Work Sheet;

1. Same as Activities Sheets, but is used to record Actual Times, Durations, Encountered Problems, and Comments.

C. Post Mortem Meeting is conducted to review results, go over “Lessons Learned” and make “Recommendations for Improvement”.

1111

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Page 12: Dcag training on VMware DR Process

Disaster Recovery Dashboard and Documents

1. DR Planning Guide

2. DR Management Dashboard

3. DR Exercise Booklet Template

4. Planning Meeting Agenda

12

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Microsoft Word Document

Microsoft Excel Worksheet

Microsoft Word Document

Microsoft Excel Worksheet

Page 13: Dcag training on VMware DR Process

What should be accomplished during the Planning Meetings

1. Infrastructure Readiness Information

2. Contact List

3. EMC Disaster Recovery and Business Continuity Solutions.

4. VMware vSphere, vCenter Prep

5. VMware Usage and Recovery

13

Microsoft Excel Worksheet

Microsoft Excel Worksheet

Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]

Microsoft PowerPoint Presentation

C:\Users\Thomas\Documents\Dashboard\Dashb

C:\Users\Thomas\Documents\Dashboard\Dashb