structured approach to it business system availability and continuity planning, analysis and design

73
Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design Alan McSweeney

Upload: guest1c9378

Post on 16-May-2015

2.094 views

Category:

Business


1 download

DESCRIPTION

Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design

TRANSCRIPT

Page 1: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design

Alan McSweeney

Page 2: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 2

Objectives

• To provide details on a structured approach to analyse and define availability and continuity requirements for ITsystems

• To provide background information on the changing landscape of availability and continuity

Page 3: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 3

Agenda

• Availability and Continuity Overview

• Availability Management

• Continuity Management

• Summary

Page 4: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 4

Availability and Continuity

• Availability is the ability of a system or service to perform its required function at a stated instant or over a stated period of time.

• Availability is expressed as the availability ratio− The proportion of time that the service is actually available for use by the

customers within the agreed service hours

• Continuity is concerned with preparing to address unwanted occurrences− May relate to the recovery of IT systems or entire business processes.

• Continuity is concerned with ensuring that IT Services are recovered within agreed time scale

• Availability is a superset of Continuity and encompasses the continued operation of systems in the event of a disaster

• Continuity ensures availability in extreme circumstances

• Availability defines what is to be available in these extreme circumstances

Page 5: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 5

Availability and Continuity Relationship

Availability Continuity

Continuity Provides Business Impact Analysis to Availability

Availability Provides Availability Criteria to Continuity

Page 6: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 6

Availability and Continuity Relationships with Other IT Management Processes

Availability Continuity

Capacity Planning and Management

IT ArchitectureChange

Management

Service Planning and Management

Security Management

Finance Management

Puts a Cost on Lack of AvailabilityControls Expenditure on Availability and Continuity

Defines the Capacity Required for Continuity and Availability

Ensures Systems and Infrastructure are Designed to Incorporate Continuity and Availability

Controls Change that May Impact Availability or Require Continuity to be Invoked

Ensures that Continuity and Availability are Incorporated into Service Agreements and Provisions

Controls Security that May Impact Continuity and Availability

Continuity Provides Business Impact Analysis to Availability

Availability Provides Availability Criteria to Continuity

Page 7: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 7

Availability and Continuity

• Availability

−Defines availability of service during operating hours

• Under normal circumstances

• Under extraordinary circumstances

• Continuity

−Defines continued operations of critical services and their availability

• Time until services are available and state of service after recovery

• Under extraordinary circumstances

Page 8: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 8

Availability and Continuity

Service 1 Service 2

Component 1

Component 2

Component 3

Component 1

Component 4

Component 5

Service 3 Service 4

Component 1

Component 5

Component 6

Component 1

Component 2

Component 7

Primary IT Facilities

Service 1

Component 1

Component 2

Component 3

Service 3

Component 1

Component 5

Component 6

Recovery IT Facilities

Availability of Services During Normal Operations

Availability of Services After Continuity

Continuity of

Operations

Page 9: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 9

Availability and Continuity

Service 1 Service 2

Component 1

Component 2

Component 3

Component 1

Component 4

Component 5

Service 3 Service 4

Component 1

Component 5

Component 6

Component 1

Component 2

Component 7

Primary IT Facilities

Service 1

Component 1

Component 2

Component 3

Service 3

Component 1

Component 5

Component 6

Recovery IT Facilities

Continuity of

Operations

Full View of Availability

Page 10: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 10

Availability and Continuity

Non-disruptive system maintenance such as data backup combined with continuous availability of

agreed business systems

Protection against unplanned outages such as disasters through

reliable and predictable recovery and continuity of operations

Fault-tolerant, failure-resistant infrastructure supporting continuous

availability of agreed business systems

ContinuousOperation

Disaster Recovery

High Availability

Business Continuity

Page 11: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 11

Availability and Continuity

Availability During Normal Operations

Availability During Housekeeping and Maintenance Operations

Availability After Some Component Failures

Availability After Complete Failure of Primary Facility

Availability

Continuity

Page 12: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 12

Availability and Continuity Heat Map

InstantlySecondsMinutesHoursDays

Days

Hours

Minutes

Last Transaction

Recovery Time Objective (RTO) – Time to Recover Service/Time By Which Service Needs to be Recovered

Recovery Point

Objective (RPO) –Amount of Data

Loss Tolerable

After

Recovery

Increasing Availability (and Continuity)

Requirements

Page 13: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 13

RTO and RPO

• Recovery Point Objective (RPO)

−Amount of Data Loss Tolerable After Recovery

• Either amount of data immediately available after recovery or amount of data available for some time after recovery

• Can be different

• Provide some data for minimal operations initially

• Provide more/all data

• Recovery Time Objective (RTO)

− Time to Recover Service/Time By Which Service Needs to be Recovered

Page 14: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 14

RTO and RPO With Cost of Lack of Availability

Recovery Point

Objective (RPO) –Amount of Data

Loss Tolerable

After

Recovery

Recovery Time Objective (RTO) – Time to Recover Service/Time By Which

Service Needs to be Recovered

Cost of Lack of Availability of Service/Cost

Benefit of Providing High Availability and High Continuity

Business Critical Services Requiring Immediate Access With

Very Limited/No Data Loss and Requiring Continued Operation in

the Event of a Disaster

• Add extra dimension to Availability and Continuity Heat Map to allow for explicit identification of those systems that need to be continuously available

Page 15: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 15

What is a Business Critical Application?

• Applications deemed business/mission critical

− 2006 – 16%

− 2007 – 36%

− 2008 – 56%

− 2009 – 60%

• Availability and continuity are merging as most applications are being deemed mission critical

Page 16: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 16

How Often Have You had to Invoke Continuity Plan in Last Five Years?

Once 14%

Twice 6%

Three 3%

None 73%

Five or More 2%

Four 2%

• 27% of organisations have declared at least one disaster in the last five years

Page 17: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 17

What Were the Causes of Having to Invoke Continuity Plans?

22.5%

16.6%

11.2%

8.9%

8.4%

6.3%

6.3%

5.6%

3.9%

3.5%

1.9%

1.9%

1.5%

1.1%

0.4%

Power Failure

Hardware Failure

Network Failure

Software Failure

Human Error

Flood

Other

Hurricane

Fire

Winter Storm

Terrorism

Not Specified

Earthquake

Tornado

Chemical Spill

Page 18: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 18

Continuity Testing Seen as Disruptive

• 40% of organisations state that continuity testing impacts customers

• 32% of organisations state that continuity testing impacts sales

• Reasons for lack of testing

− Lack of time resources

− Lack of technology

− Disruption to employees

− Budget

− Disruption to customers

− Disruption to sales

− Disruption to production systems

− Not seen as a priority

Page 19: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 19

Business Impact of Lack of Availability and Continuity Increase Exponentially Over Time

Seconds Minutes Hour Hours Day Days

Duration of Loss of Continuity

Fin

an

cia

l Lo

ss

Revenue Loss Staff Productivity Loss

Reputational Damage Financial Performance

Page 20: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 20

Availability Design and Management

• Availability design optimises the capability of the IT infrastructure, services and supporting organisation to deliver a cost effective and sustained level of availability that enables the business to satisfy its business objectives− Ensures IT systems and infrastructure are designed to deliver the levels of

availability required by the business

− Provides a range of availability reporting to ensure that agreed levels of Availability are continuously measured and monitored

− Optimises the availability of the IT infrastructure to deliver cost effective improvements that deliver real benefits to the business

− Ensures shortfalls in availability are recognised and corrective actions are identified and performed

− Reduces problems and incidents that impact availability

− Creates and maintains an Availability Plan aimed at improving the overall availability and infrastructure components to ensure business availability requirements can be satisfied

Page 21: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 21

Continuity Design and Management

• Continuity design is concerned with responding to and recovering business operations in the event of an outage or disaster rendering significant impact on the organisation

− Support the business by ensuring that the required IT facilities can be recovered within required and agreed business timescales

− Provides the strategic and operational framework to review the way the organisation continues to provide its services while increasing its ability to recover from disruption, interruption or loss

−Depends both on management and operations

− Requires management commitment

Page 22: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 22

People, Process, Technology

• Start availability and continuity design with a business impact analysis and risk assessment

• Technology exists to supports availability and continuity design - technology not constitute a plan

• Focus on prevention before investing in technology

• However, availability and continuity is seen as the preserve of IT

− The business frequently does not have the required project focusor experience

• Embed availability and continuity into IT architectures

Page 23: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 23

Questions

• Do you have adequate control over prevention of business process or IT infrastructure downtime?

• Do you have adequate IT capabilities to insure continuous operations?

• Do you know the risks your business and its business systems face?

• What would the cost and impact of downtime be to your business?

• Is your current continuity plan sufficient to meet your RPO and RTO objectives?

• Do you know how much will business continuity costs?

• What business problems will implementing availability and continuity solve even if you do not experience an unplanned IT outage?

• What is the overall business value of availability and continuity to the business?

• How should we define what level of business continuity we really need?

Page 24: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 24

Availability Design and Management

Page 25: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 25

Availability Design and Management Process

2. Availability Report

Evaluation and Improvement

1. Availability Reporting

3. Management Escalations of

Service Availability Violations

2. Document System and Application Architecture

1. Availability Requirements

Analysis

4. Availability Review

3. Gap Analysis and

Recommendations

Availability Process Quality Control

Availability Process Design and Management

Availability Design and Management Consists of Two Parallel Sub-Processes

Page 26: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 26

Structured Approach to Availability Design and Management

• Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape

• Scope is to define a plan to implement agreed availability

Page 27: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 27

Scope of Availability Design and Management

• Planning for service availability

• Designing for service availability by anticipating disruptions, estimating and measuring reliability and maintainability

• Planning for availability within SLA and reporting on them

• Ensuring cost effectiveness of availability solutions

• Reducing the duration of problems and incidents affecting availability

• Ensuring that security requirements are defined and incorporated within the overall availability design

Page 28: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 28

Availability Design and Management Driven by Requirements

• Availability requirements are based on the needs of the business

• Requirements are gathered, defined, and validated by the key users and business management

• Includes hours of uptime as well as planned and unplanned downtime

• Includes ongoing support and procedures to address service disruptions

Page 29: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 29

Benefits of a Structured Approach to Availability Design and Management

• Reduce Risks

− SLAs will incorporate availability design based on architecture,

− Reduced risk of violating SLAs

• Cost Reduction

− A defined and agreed acceptable level of service prevents over-delivery

− Unnecessary expenditure on maintenance and resilience building is avoided

• Improved Service Agility

− Changing business availability requirements are addressed quickly

− Cost of changes in availability of different levels is defined or can be assessed quickly.

• Improved Service Quality

− Improvement in Service Quality results from reduced Incidents as well as a reduced time to restore service

Page 30: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 30

Structured Approach to Availability Design and Management

Availability Analysis and Design

1. Availability Requirements Analysis

2. Document System and Application Architecture

3. Gap Analysis and Recommendations

4. Availability Review

1.1 Understand Service Goals

1.2 Document Availability Requirements

1.3 Validate with Service Level Management

Function

2.1 Define Service Critical Components

2.2 Document Service Critical Components and

Their Relationships

2.3 Document and Review Components

Monitoring Capability

2.4 Document System and Application Architecture

3.1 Perform Gap and Risk Analysis

3.2 Identify Single points of Failure

3.3 Evaluate Alternative Approaches and Costs

3.4 Produce Gap Closure Recommendation and

Specification

3.5 Plan and Summarise Downtime

3.6 Create Statement of Work to Implement

4.1 Define Availability Measurement Model

4.2 Perform Trend Analysis

4.3 Analyse Expanded Incident Lifecycle

4.4 Investigate Major Outages

4.5 Analyse Availability Reports

Page 31: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 31

Step 1 - Availability Requirements Analysis

Validated availability requirements

Overall service management planValidate availability draft requirements with service level agreements and overall service management plan

1.3 Validate with Service Level Management Function

Documented and agreed availability requirements

Draft service level agreementProduce draft availability requirements based on understanding of business goals

1.2 Document Availability Requirements

Documented and agreed business goals

Service design specification Document business goals for the service

1.1 Understand Service Goals

Documented and agreed availability requirements

Request for new service or changes to existing service

Request for change to availability

Determine availability requirements related to supporting the needs of the business

Validate with other IT management processes

Create draft service agreement and assess for feasibility from availability perspective

1. Availability Requirements Analysis

OutputsInputsScopeStep

Page 32: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 32

Step 2 - Document System and Application Architecture

Architecture documentRepresentation of individual components, their attributes and relationships

Defined service monitoring criteria

Complete architecture document that describes how the service is delivered according to the service level agreement

2.4 Document System and Application Architecture

Defined service monitoring criteria

Existing service monitoring procedures

Review existing service monitoring facilities and update or replace if required

2.3 Document and Review Components Monitoring Capability

Representation of individual components, their attributesand relationships

Configurations of individual components, their attributes and relationships

Document the structure of the service breakdown - individual components and and their relationships that deliver the service

2.2 Document Service Critical Components and Their Relationships

Documented and agreed list of individual components that comprise the service

Service design specification

Configurations of individual components that comprise the service

Define the configurations of individual components that comprise the service

2.1 Define Service Critical Components

Documented and agreed existing architecture for service delivery

Service design specification

Configurations of individual components that comprise the service level agreement

Analyse operating environment of the individual components that comprise the service

2. Document System and Application Architecture

OutputsInputsScopeStep

Page 33: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 33

Step 3 - Gap Analysis and Recommendations

Statement of work for projectSpecifications for the availability design and architecture

Initiate project for implementing changes to address availability issues

3.6 Create Statement of Work to Implement

Planned downtimeDecision on design and implementation

Plan downtime for components and aggregate downtime across services

3.5 Plan and Summarise Downtime

Decision on design and implementation

Specifications for the availability design and architecture

Approach for required availability

Cost information

Decision on how the closure should be implemented based on financial and business reasons

Develop specifications for the availability design and architecture

3.4 Produce Gap Closure Recommendation and Specification

Approach for required availability

IT strategy and architecture

Gaps analysed and risks identified and documented

Explore various options within the approved range and identify a suitable approach based on requirements and cost justification

3.3 Evaluate Alternative Approaches and Costs

Identified points of failureComponents attributes and relationships

Identify individual components whose failure can cause service disruption

3.2 Identify Single points of Failure

Gaps analysed and risks identified and documented

Problem and incident data

Availability requirements

Architecture document

Based on knowledge derived from Incident and Problem data identify gaps in current services

3.1 Perform Gap and Risk Analysis

Availability designValidated availability requirements

Architecture document

Service problem and incident history

Perform gap analysis and recommend suitable approach, create specifications and cost justification

3. Gap Analysis and Recommendations

OutputsInputsScopeStep

Page 34: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 34

Step 4 - Availability Review

Identified availability concerns

Statement of work for identified changes

Availability reports Review availability reports and update infrastructure if required

4.5 Analyse Availability Reports

Identified availability concerns

Detailed incident analysis for specific incidents, fault, problems and performance reports

Investigate large outages and update availability design if required

4.4 Investigate Major Outages

Identified specific areas which need improvement

Analyse breakdown of incident resolution to validate and update design considerations

Analyse expanded incident lifecycle

4.3 Analyse Expanded Incident Lifecycle

Identified availability concerns

Incident and problem trend reportsAnalyse incident and problem data to arrive at a high level view of availability

4.2 Perform Trend Analysis

Defined data sources for availability measurement

Documented and agreed availability requirements

Define availability measurement model

4.1 Define Availability Measurement Model

Identified availability concerns and amended design if required

Incident, problem, fault reportsAssess, review and update availability design if required

4. Availability Review

OutputsInputsScopeStep

Page 35: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 35

Core Principles

• Core principles ensure consistency of work and outputs

• Ensure processes will meet the requirements of the business

• Work will be of a high quality

• Core principles should serve as a checklist against which all work is assessed

Page 36: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 36

Availability Design and Management Core Principles

1. Availability requirements are based on the agreed and defined needs of the business

2. The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business

3. Infrastructure needs to be designed to routinely incorporate availability requirements

4. The availability design and management process must adhere to security policies and procedures

5. An availability plan will be used to track and manage availability requirements and information collected

6. Data on service reliability, maintainability, resiliency must be collected and monitored

7. The IT function will use continuous process improvement to achieve and maintain level of service availability

8. Planned downtime must be minimised for business-critical functions and unplanned downtime is handled by service management processes including Incident Management, Service Request Management, Continuity Management

Page 37: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 37

Core Principle 1 - Availability Requirements Are Based On The Agreed And Defined Needs Of The Business

• Elements

− Conditions for availability must be aligned with the needs of the business

− Relevant availability data must be gathered and analysed

− Input and validation of requirements must be solicited from the business

− Availability requirements must be documented and distributed for agreement and approval

• Benefits

− Expectations are clearly defined and accepted

− User satisfaction is increased

− Growth can be forecast more easily

− Problem areas can be identified

Page 38: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 38

Core Principle 2 - The IT Function Determines The Overall Requirement Of Availability, Performance And Recoverability Of Systems

• Elements

− Requirements are met under defined and agreed service agreements

− Good working relationships need to exist with key suppliers and vendors

− Changes to environment must be reflected in service agreements

• Benefits

− There is a structure of supporting contracts in place from suppliers and vendors to met business availability requirements

Page 39: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 39

Core Principle 3 - Infrastructure Needs To Be Designed To Routinely Incorporate Availability Requirements

• Elements

− Changes in infrastructure and business needs must reflected in availability planning and design

− Availability and recovery requirements need to be explicitly incorporated at the design stage

• Benefits

− Availability requirements and expectations are clearly defined and accepted

Page 40: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 40

Core Principle 4 - Availability Design And Management Process Must Adhere To Security Policies And Procedures

• Elements

− Access to IT services must be provided in a secure environment

− Availability processes must be aligned with security policies

• Benefits

− Security measures will be followed

− There will be an ability to differentiate between security problems and availability problems

Page 41: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 41

Core Principle 5 - Availability Plan Will Be Used To Track And Manage Availability Requirements And Information Collected

• Elements

− An availability plan must be developed and distributed

− Availability planning must be defined and outlined

− The availability plan must define the details about the to be data collected: what, how often, analysis, reporting, distribution, responses required

• Benefits

− Availability management goals are clearly defined and documented

− There will be a clearly communicated process for availability planning and reporting

− Data provided for availability reporting, analysis and forecasting

Page 42: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 42

Core Principle 6 - Data On Service Reliability, Maintainability, Resiliency Must Be Collected And Monitored

• Elements

− The data to be collected and monitored must be defined, documented and communicated

− A supporting procedure to collect and monitor data, including response to potential problems must be defined

− Data needs to be reviewed on a regular and consistent basis

• Benefits

− Availability management will be proactive and responsive rather than reactive

− The expectations of the business can be set accurately

− There will be an ability to prepare for potentially increased future requirements

− Availability trends can be identified and addresses

Page 43: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 43

Core Principle 7 - IT Function Will Use Continuous Process Improvement To Achieve And Maintain Level Of Service Availability

• Elements

− Collected availability data will be used to identify areas requiring improvement

− Implementation of any availability process improvement must be controlled by the change management process to control impact

• Benefits

− The business is enabled to make recommendations on availability improvements

Page 44: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 44

Core Principle 8 - Planned Downtime Must Be Minimised For Business-Critical Functions And Unplanned Downtime Is Handled By Service Management Processes

• Elements

− Planned and unplanned downtime must be clearly notified to the business

− Acceptable versus unacceptable unplanned downtime for business-critical functions must be defined

− Escalation procedures will be developed and distributed

• Benefits

− Expectations are set with the business

− IT demonstrates commitment to supporting business-critical functions

Page 45: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 45

Use Core Principles as Checklist for Independent Verification of Availability Design and Processes

�4.2 Availability processes must be aligned with security policies

�4.1 Access to IT services must be provided in a secure environment

�4 Availability Design And Management Process Must Adhere To Security Policies And Procedures

�3.2 Availability and recovery requirements need to be explicitly incorporated at the design stage

�3.1 Changes in infrastructure and business needs must reflected in availability planning and design

�3 Infrastructure needs to be designed to routinely incorporate availability requirements

�2.3 Changes to environment must be reflected in service agreements

�2.2 Good working relationships need to exist with key suppliers and vendors

�2.1 Requirements are met under defined and agreed service agreements

�2 The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business

�1.4 Availability requirements must be documented and distributed for agreement and approval

�1.3 Input and validation of requirements must be solicited from the business

�1.2 Relevant availability data must be gathered and analysed

�1.1 Conditions for availability must be aligned with the needs of the business

�1 Availability requirements are based on the agreed and defined needs of the business

Page 46: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 46

Continuity Design and Management

Page 47: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 47

Continuity Design and Management Process

2. Continuity Report Evaluation and Improvement

1. Continuity Reporting

3. Management Escalations of Service Continuity Violations

2. Conduct Business Impact Analysis

1. Conduct Risk and Disaster Avoidance

Assessment

4. Form Continuity and Disaster Recovery

Team

3. Determine Data Backup and Recovery

Options

Continuity Process Quality Control

Continuity Process Design and Management

Continuity Design and Management Consists of Two Parallel Sub-Processes

6. Continuity Processing for Critical Service Components

5. Design and Develop Disaster Recovery Plan

8. Maintain Continuity and Disaster Recovery

Plan

7. Conduct Continuity and Disaster Recovery

Rehearsal

Page 48: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 48

Structured Approach to Continuity Design and Management

• Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape

• Scope is to define a plan to implement agreed continuity

Page 49: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 49

Scope of Continuity Design and Management

• Conducting impact analyses on loss of business systems

• Designing for service continuity by anticipating disruptions, estimating and measuring reliability and maintainability

• Supporting business critical functions

• Designing and developing a Disaster Recovery Plan

• Design and developing Disaster Recovery Training

• Planning for and performing disaster mitigation and avoidance

• Assessing and managing risk

Page 50: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 50

Structured Approach to Continuity Design and Management

Continuity Analysis and

Design

1. Conduct Risk and Disaster Avoidance

Assessment

2. Conduct Business Impact

Analysis

3. Determine Backup and

Recovery Options

4. Form Continuity and Disaster

Recovery Team

1.1 Identify Potential Threats

1.2 Assess Probability of

Threats

1.3 Evaluate Current Disaster

Avoidance Measures

2.1 Define Business Impact

Analysis Methodology

2.2 Identify Business

Functions to be Analysed

2.3 Define Business Function

Criticality Categorisation

2.4 Design Questions and

Conduct Interviews

3.1 Identify Backup and

Recovery Options for Critical Functions

3.2 Evaluate Operation of Backup and

Recovery Options

3.3 Determine Backup and

Recovery Options for Critical Functions

3.4 Design Backup and Recovery Procedures

4.1 Define Recovery Team

Structure

4.2 Define Recovery Team

Functions

4.3 Define Team Leaders and

Members

4.4 Define Team Charter

5. Design and Develop Disaster

Recovery Plan (DRP)

6. Continuity Processing for Critical Service Components

7. Conduct Continuity and

Disaster Recovery Rehearsal

8. Maintain Continuity and

Disaster Recovery Plan

1.4 Assess Risk Controls to

Mitigate Threats

1.5 Determine Impact of

Reduced Controls

1.6 Determine Value of

Additional Controls

2.5 Analyse Results of Interviews

2.6 Summarise and Present

Results

5.1 Determine DRP Structure and

Methodology

5.2 Define DRP Notification

Schedule and Process

5.3 Define DRP Escalation Process

5.4 Define Key Recovery

Objectives

5.5 Define Recovery Steps

5.6 Define Critical Function

Restoration Process

6.1 Identify Critical

Components for Continuity

6.2 Develop Options for Continuity

6.3 Develop Continuity

Processing Steps

6.4 Develop Return from

Continuity Process

7.1 Design Rehearsal

Programme

7.2 Develop Rehearsal Scenarios

7.3 Plan and Schedule

Rehearsals

7.4 Develop Rehearsal Evaluation

Criteria

7.5 Conduct Rehearsals

7.6 Review and Analyse

Rehearsals

8.1 Assign Responsibility for DRP Maintenance

8.2 Establish DRP Review and

Maintenance Procedures and

Schedule

8.3 Integrate DRP Maintenance into

Change Management

8.4 Agree and Maintain DRP

Distribution List

Page 51: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 51

Step 1 - Conduct Risk and Disaster Avoidance Assessment

Value to organisation of additional controls

Assessment of risk controls to reduce threats, impact to organisation

Determine which risks the organisation is willing to accept and those to be controlled

1.6 Determine Value of Additional Controls

Impact to organisation without adequate disaster recovery controls

Assessment of risk controls to reduce threats

Determine how effective a control would be in deterring the threat, limiting the cost of the risk and minimising the impact threats have

1.5 Determine Impact of Reduced Controls

Assessment of risk controls to reduce threats

Current avoidance measuresDetermine the effectiveness of controls in deterring threats

1.4 Assess Risk Controls to Mitigate Threats

Evaluation of current disaster avoidance measures

Potential threats affecting IT systems are identified and their probability

Evaluates current disaster avoidance measures

1.3 Evaluate Current Disaster Avoidance Measures

Assessment of probability of identified potential threats

Potential threats affecting IT systems are identified

Assess the probability of the potential threats affecting IT systems are identified

1.2 Assess Probability of Threats

Potential threats affecting IT systems are identified

Agreement on scope of Continuity recovery plan

Identify potential threats, internal and external, including weaknesses in the organisation that will cause failure of IT systems

1.1 Identify Potential Threats

Risk assessment report with recommendations for improvements

Risks and threats, historical data, current environment, current policies, processes and procedures

Identify and quantify risks and vulnerabilities to the organisation

1. Conduct Risk and Disaster Avoidance Assessment

OutputsInputsScopeStep

Page 52: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 52

Step 2 - Conduct Business Impact Analysis

Conclusions and final report of Business Impact Analysis

Analysis of dataDevelop conclusions and present final report regarding Business Impact Analysis

2.6 Summarise and Present Results

Analysis of dataValidation of business losses Analyse the data and validate findings if necessary

2.5 Analyse Results of Interviews

Validation of business losses Defined criteria for categories of business functions

Design and validate questions and conduct interviews

2.4 Design Questions and Conduct Interviews

Criteria for categorising business functions

Identified business functions Defined categorisation criteria for each business function

2.3 Define Business Function Criticality Categorisation

Business functions identified for analysis

Agreed methodologies and processes to be used in Business Impact Analysis

Identify business functions to be analysed for risk and disasters

2.2 Identify Business Functions to be Analysed

Agreed methodologies and processes to be used in Business Impact Analysis

Business systemsDefines methodology and process to be used in Business Impact Analysis based on the risk and disaster avoidance assessment

2.1 Define Business Impact Analysis Methodology

Critical function categorisation

List of recovery requirements for processing critical functions

Risk and disaster avoidance assessmentConduct business impact analysis In order to know which functions are the most critical to the organisation for survival

2. Conduct Business Impact Analysis

OutputsInputsScopeStep

Page 53: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 53

Step 3 - Determine Data Backup and Recovery Options

Backup procedures for critical business functions

Backup options for critical business functions

Design backup procedures for all critical business functions

3.4 Design Backup and Recovery Procedures

Backup options for all critical business functions

Evaluated backup options for critical business functions

Determine backup options for those critical business functions that currently do not have any backup options or where the options do not work correctly

3.3 Determine Backup and Recovery Options for Critical Functions

Evaluated backup options for critical business functions

Backup options for critical functionsEvaluate previously identified backup options needs to be for various scenarios

3.2 Evaluate Operation of Backup and Recovery Options

Backup options for critical functions

Conclusions and final report of Business Impact Analysis

Work with business units to identify possible backup options for critical business functions

3.1 Identify Backup and Recovery Options for Critical Functions

Recovery objectives

List of backup options,

Supporting procedures

Available time to backup and recover

Acceptable downtime

Recovery requirements

Determine data backup and recovery options based on the requirements for recovering critical functions and the type of disaster or interruption being cater for

3. Determine Data Backup and Recovery Options

OutputsInputsScopeStep

Page 54: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 54

Step 4 - Form Continuity and Disaster Recovery Team

Charter and recovery procedures along with roles and responsibilities for each recovery team

Recovery team leader, alternate team leader and members

Define charter for each team along with the defined roles and responsibilities

Define recovery procedures for each team relevant to their team role and charter

4.4 Define Team Charter

Recovery team leader, alternate team leader and members

Functions for recovery teamDefine team leader, alternative leader and other team members for each type of disaster and business units

4.3 Define Team Leaders and Members

Functions for recovery teamStructure of disaster recovery teamDefine the function of each individual disaster recovery team of each business units

4.2 Define Recovery Team Functions

Structure of disaster recovery team

Decision to proceedDefine structure of disaster recovery team

4.1 Define Recovery Team Structure

Recovery team structure

Recovery team charter and members

Recovery procedures

Business needs

Recovery requirements

Establish recovery teams and specify what each team is to do in the event of a broad range of possibilities

4. Form Continuity and Disaster Recovery Team

OutputsInputsScopeStep

Page 55: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 55

Step 5 - Design and Develop Disaster Recovery Plan

Accepted restoration processDisaster recovery stepsDiscuss the DRP with business units to get acceptance to define final restoration process and define training to be provided

5.6 Define Critical Function Restoration Process

Disaster recovery stepsConsideration of key recovery objectives and policies

Define the framework for disaster recovery to ensure it contains the required recovery steps

5.5 Define Recovery Steps

Consideration of key recovery objectives and policies

Escalation procedureConsider the organisation’s key recovery objectives and policies while designing DRP

5.4 Define Key Recovery Objectives

Escalation procedureNotification schedule and recovery process

Define the DRP escalation criteria and procedure

5.3 Define DRP Escalation Process

Notification schedule and recovery process

Structure and methodology of developing DRP

Define the notification schedule and process of recovery

5.2 Define DRP Notification Schedule and Process

Structure and methodology of developing DRP

Structure of disaster recovery teamDetermine the structure and methodology of how the plan will be developed

5.1 Determine DRP Structure and Methodology

Recovery PlanRecovery objectives

Scope of plan

Business function classification

Disaster definitions and classification

Recovery team organisation

Develop and validate processes and procedures to support the critical business functions and validate,

5. Design and Develop Disaster Recovery Plan

OutputsInputsScopeStep

Page 56: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 56

Step 6 - Alternate Processing for Critical Service Components

Steps to return critical components to normal processing from alternate processing

Alternate processing stepsDevelop procedure to return from alternate processing to normal processing

6.4 Develop Return from Continuity Process

Alternate processing stepsOptions for alternate processingDevelop processing steps based on the options for alternate processing for critical components

6.3 Develop Continuity Processing Steps

Options for alternate processing Critical components identified Develop options for alternate processing for critical components in coordination with business units

6.2 Develop Options for Continuity

Critical components identified Accepted restoration processWork with business units to identify critical components that need alternate processing

6.1 Identify Critical Components for Continuity

Critical business function components timelines

Alternate procedures

Critical business function components

Alternatives for processing critical components

Evaluate critical business function components to determine if alternate processing procedures are necessary and feasible for the period between a disaster and recovery and how recovery should be achieved

6. Alternate Processing for Critical Service Components

OutputsInputsScopeStep

Page 57: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 57

Step 7 - Conduct Continuity and Disaster Recovery Rehearsal

Reports on conducted rehearsals

Conduct rehearsalsDocument and distribute outcomes of the rehearsals to all the members along with lessons learned and review reports

7.6 Review and Analyse Rehearsals

Conduct rehearsalsSchedule rehearsalsConduct rehearsals in coordination with all other members

7.5 Conduct Rehearsals

Evaluation techniques and criteria

Schedule rehearsalsDevelop evaluation techniques and criteria for each rehearsal scenarios

7.4 Develop Rehearsal Evaluation Criteria

Schedule rehearsalsRehearsal scenariosPlan and schedule rehearsals, both planned and unannounced

7.3 Plan and Schedule Rehearsals

Rehearsal scenariosPrograms for rehearsalsDevelop rehearsal scenarios based on the design of rehearsals

7.2 Develop Rehearsal Scenarios

Programs for rehearsalsDisaster Recovery PlanDesigned programmes for rehearsals7.1 Design Rehearsal

Lessons learned

Rehearsal report

Rehearsal plan

Recovery procedures

Alternate procedures

Rehearsal objectives

Conduct rehearsals to validate the success of an organisation’s ability to respond and recover from a disaster

7. Conduct Continuity and Disaster Recovery Rehearsal

OutputsInputsScopeStep

Page 58: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 58

Step 8 - Maintain Continuity and Disaster Recovery Plan

Distribution list Updated DRPAfter updating DRP create a distribution list to whom the DRP has to be distributed

8.4 Agree and Maintain DRP Distribution List

Updated DRPReview feedbacks and inputsIntegrate maintenance process with change management processes to assessed changes for their potential impact on the continuity plans

8.3 Integrate DRP Maintenance into Change Management

Procedure for review and maintenance of DRP

Assigned responsibilities to review and maintenance of DRP

Establish review and maintenance of procedures and schedules

8.2 Establish DRP Review and Maintenance Procedures and Schedule

Assigned responsibilities to review and maintenance of DRP

Rehearsal review reports

DRP

Review criteria and objectives

Identify reviewers responsible for plan maintenance and assign responsibility

8.1 Assign Responsibility for DRP Maintenance

Recommendations for improvements or changes

Approval list from reviewer

Disaster recovery plan

Review schedule

List of reviewers

Review criteria and objectives

Conduct scheduled reviews of the contents of the continuity plan

Updated the plan as part of the change management process and with other related changes

8. Maintain Continuity and Disaster Recovery Plan

OutputsInputsScopeStep

Page 59: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 59

Continuity Design and Management Core Principles

1. Scope of continuity plan must contain clear and realistic recovery objectives and recovery timeframes

2. Risk management and disaster avoidance measures should be in place and practiced

3. Continuity plan including disaster recovery should be designed and developed to support recovery of agreed critical business functions

4. Continuity plan should be rehearsed regularly

5. Continuity and recovery strategies or plans should be integratedinto design and deployment of changes to infrastructure

6. Continuity and recovery processes or plans should be reviewed and updated on a regular basis

Page 60: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 60

Core Principle 1 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes

• Elements

− Recovery process must be aligned to support business objectives

− It must be ensured that business impact and recovery investments have direct relationship

− Recovery time and objectives needs to be communicated and validated

− The disasters must be defined, which continuity plan will and will not address

− Scope of planning efforts must be stated

• Benefits

− Clear objectives

− Defined scope of efforts

− Expectations are agreed and defined

− Coordinated recovery efforts

Page 61: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 61

Core Principle 2 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes

• Elements

− Ensure that environment is constructed and operated to prevent potential disasters

− As infrastructure changes and business needs change, ensure risks and exposures are addressed

• Benefits

− Control of preventable, predictable disasters

− Minimising and deterring potential disasters

Page 62: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 62

Core Principle 3 - Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions

• Elements

− Investment for adequate preventative, proactive, and recovery methods for critical business functions

− All business functions and their criticality must be defined and communicated to the organisation

− Must be ensured that the key customers are reassured of continuity management process

• Benefits

− Expectations are set and agreed upon

− Minimise significant losses to the organisation in terms of financial, legal, and operational issues

Page 63: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 63

Core Principle 4 - Continuity Plan Should Be Rehearsed Regularly

• Elements

− Regular rehearsals must be conducted, both planned and unannounced

− Partial and full rehearsals must be conducted

− A variety of rehearsal techniques must be used

− Rehearsal objectives and success criteria must be clearly defined

• Benefits

− Potential for successful recovery is high

− Reinforces learning and commitment

− Demonstrates value to organisation

− Identification of potential weaknesses in plan

Page 64: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 64

Core Principle 5 - Continuity And Recovery Strategies Or Plans Should Be Integrated Into Design And Deployment Of Changes To Infrastructure

• Elements

− Must ensure the plans for changes to infrastructure are considered with continuity in mind

− Recovery procedures must be requested for new applications, systems, networks

• Benefits

− Continuity is critical component of operating environment

− Continuity strategies and plan have important role in design and deployment decisions and plans

Page 65: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 65

Core Principle 6 - Continuity And Recovery Processes Or Plans Should Be Reviewed And Updated On A Regular Basis

• Elements

− Regular reviews of continuity plans must be defined and scheduled

− Make sure reviewers are not involved in the development of the plan and are objective

− Integration into the change management process for plan updates must be ensured

− Revision, tracking, and distribution list must be defined and document

• Benefits

− Keeps continuity plan as a living document

− Ensures the plan is kept current

− Reminder of continuing purpose of plan and its benefits to the organisation

Page 66: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 66

Use Core Principles as Checklist for Independent Verification of Continuity Design and Processes

�3.3 Must be ensured that the key customers are reassured of continuity management process

�4.2 Partial and full rehearsals must be conducted

�4.1 Regular rehearsals must be conducted, both planned and unannounced

�4 Continuity Plan Should Be Rehearsed Regularly

�3.2 All business functions and their criticality must be defined and communicated to the organisation

�3.1 Investment for adequate preventative, proactive, and recovery methods for critical business functions

�3 Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions

�2.2 As infrastructure changes and business needs change, ensure risks and exposures are addressed

�2.1 Ensure that environment is constructed and operated to prevent potential disasters

�2 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes

�1.4 The disasters must be defined, which continuity plan will and will not address

�1.3 Recovery time and objectives needs to be communicated and validated

�1.2 It must be ensured that business impact and recovery investments have direct relationship

�1.1 Recovery process must be aligned to support business objectives

�1 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes

Page 67: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 67

Process Quality Control

Page 68: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 68

Common Process Quality Control Procedures for Availability and Continuity

2. Continuity Report Evaluation and Improvement

1. Continuity Reporting

3. Management Escalations of

Service Continuity Violations

Continuity Process Quality Control

2. Availability Report

Evaluation and Improvement

1. Availability Reporting

3. Management Escalations of

Service Availability Violations

Availability Process Quality Control

Page 69: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 69

Structured Approach to Availability and Continuity Process Quality Control

Availability and Continuity Process Quality Control

1. Generate Report Metrics and Reports

2. Evaluation and Improvement3. Management Escalations of Service Continuity Violations

1.1 Develop Management Reports Based on Agreed Metrics

1.2 Schedule Report

1.3 Generate Reports

2.1 Evaluate Process for Improvement

2.2 Develop Improvements and Implementation Plan

2.3 Create and Submit Improvement Implementation

Plan

2.4 Implement Improvement Plan1.4 Distribute Reports

1.5 Review Report Schedule

1.6 Update Reporting Schedule

2.5 Review Implementation

2.6 Update Process Improvement Plan

Page 70: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 70

Step 1 - Generate Report Metrics and Reports

Updated report scheduleReport scheduleUpdate report schedule with the new reports

1.6 Update Reporting Schedule

Review resultsReport schedule

Report details

Review regularly the report requirements

1.5 Review Report Schedule

Distributed reportsGenerated reportsDistribute the generated report to the target recipients

1.4 Distribute Reports

Generated reportsCollected metricsGenerate reports according to per schedule or in response to ad hoc requirements

1.3 Generate Reports

Updated report scheduleReport scheduleUpdate the report schedule1.2 Schedule Report

Accepted reports, frequency and costs

Report requirementsReport to management the contributions made by this process to overall service management

1.1 Develop Management Reports Based on Agreed Metrics

Generated or distributed Reports

Report Schedule

Request for Ad hoc reports

Generate report metrics and periodic and ad hoc reports as per requirement or plan

1. Generate Report Metrics and Reports

OutputsInputsScopeStep

Page 71: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 71

Step 2 - Evaluation and Improvement

Updated process improvement plan

Process Improvement plan

Review cycle

Update the process improvement plan with any changes

2.6 Update Process Improvement Plan

Closed improvement implementation plan

Review Results

Implemented improvements Monitor implementation to ensure that process is not disrupted and that the changes are working as intended

2.5 Review Implementation

Implemented improvements

Reduced costs

Improved process efficiency And effectiveness

Approved improvement implementation plan

Improvement strategy

Manage and coordinate the implementation of the process improvement plan

2.4 Implement Improvement Plan

Submitted improvement implementation plan

Improvement strategyCreate and submit improvement implementation plan

2.3 Create and Submit Improvement Implementation Plan

Improvement strategyImprovement plan

Gap analysis report

Revised business requirements

Develop and review proposed process improvements

2.2 Develop Improvements and Implementation Plan

Gap analysis reportImprovement planReview the effectiveness and efficiency of the continuity management process regularly

2.1 Evaluate Process for Improvement

Implemented improvements, Reduced costs, Improved process efficiency and effectiveness

Process metrics

Future directives

Service level expectations

Review schedule

Improvement plan

Perform periodic reviews for process performance improvement

2. Evaluation and Improvement

OutputsInputsScopeStep

Page 72: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 72

Summary

• Availability and continuity are merging into a single unbroken requirement

• Availability and continuity can be a significant overhead to an organisation so their cost should yield benefits elsewhere

• Most business systems and processes are defined as business critical

• Management commitment is needed to ensure availability and continuity can the required attention and resources

• Use core principles for availability and continuity for independent verification of processes and designs

• Availability and continuity should be embedded into system architectures and designs rather than being an afterthought

Page 73: Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

February 18, 2010 73

More Information

Alan McSweeney

[email protected]