introduction · web viewa problem is the underlying cause of one or more incidents. an incident is...

22
a Process Description – Problem Management document.docx March 6, 2022 Version 1.1

Upload: phamkhanh

Post on 29-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Process Description – Problem Management

document.docxMay 7, 2023Version 1.1

Contents

Document Change Control…………………………………………………………………………………………………………2

Introduction...................................................................................................................................3

Objectives & Scope........................................................................................................................4

Objectives...................................................................................................................................4

Scope.......................................................................................................................................... 5

Out-of-Scope...........................................................................................................................5Process Flow.................................................................................................................................. 6

Reactive Problem Management.................................................................................................7

Proactive Problem Management................................................................................................9

Incident Matching Procedure...................................................................................................10

Roles & Responsiblities................................................................................................................11

Individual Resolving an Incident...............................................................................................11

Problem “Assigned To” Individual............................................................................................11

Problem Task “Assigned To” Individual....................................................................................11

Problem Manager.....................................................................................................................11

Service Owner.......................................................................................................................... 12

Policies.........................................................................................................................................13

Escalations................................................................................................................................... 14

Problem Statuses.........................................................................................................................15

Key Performance Indicators.........................................................................................................16

Field Specifications...................................................................................................................... 17

Problem Record........................................................................................................................17

Changes to Incident Record......................................................................................................19

Change Management...............................................................................................................19

Harvard University Information Technology – Problem Management Page 1

Document Change ControlVersion # Date of Issue Author(s) Brief Description

Reg Lo Initial Template1.0 11/6/13 J.Worthington Initial draft for HUIT Review2.0 11/12/2013 J.Worthington HUIT Review Draft3.0 7/10/2014 S.Rivers HUIT Review/modifications

Harvard University Information Technology – Problem Management Page 2

INTRODUCTION

This document describes the Problem Management process for HUIT. It is based on the Information Technology Infrastructure Technology Library® (ITIL) and adapted to address HUIT’s specific requirements.

This document is divided into the following sections:

Section DescriptionObjectives & Scope Specifies the objectives of the Problem Management process.Process Flow Diagrams illustrating the high-level Problem Management

process. In particular the following scenarios are covered: Reactive Problem Management Proactive Problem Management Incident Matching Procedure

Roles & Responsibilities Identifies the roles within the Problem Management process and the responsibilities for each role.

Policies Policies that support the Problem Management processEscalations Automatic email notifications if there has not been any activity

for a given Problem.Problem Statuses Diagram illustrating the possible statuses of a Problem record,

how statuses are allowed to change and what triggers the status to be automatically updated.

Key Performance Indicators Specifies the metrics for measuring the success of the Problem Management process.

Fields on the Problem Record

Provides field specifications including drop down values for Service-Now.

Harvard University Information Technology – Problem Management Page 3

OBJECTIVES & SCOPE

A Problem is the underlying cause of one or more Incidents. An Incident is an unplanned interruption to an IT service or reduction in the quality of an IT service or a failure of a Configuration Item that has not yet impacted an IT service.

The purpose of Problem Management is to manage the lifecycle of problems from identification, through investigation, documentation and eventual removal. The objectives of Problem Management is to prevent problems and resulting incidents from happening, eliminate recurring Incidents, and minimize the impact of Incidents that cannot be prevented.

In contrast, the purpose of Incident Management is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that agreed levels of service quality are maintained.

Note: One can resolve an Incident without fixing the underlying Problem. A Problem is not another name for a Major Incident, although you may want to initiate Problem Management after a Major Incident. Not every Incident turns into a Problem.

OBJECTIVES

The specific objectives of Problem Management at HUIT are:

1. Reduced the number of Incidents and reduce the time it takes to resolve Incidents

2. Adopt a single Problem Management process for the entire IT organization that addresses both reactive and proactive Problem Management (based on Incident trending)

3. Coordinate Problem Management activity across the IT organization, e.g. if the investigation of a specific Problem involves multiple teams, create an agreed priority for researching the Problem, create visibility and accountability into tasks, facilitate communication among the teams, etc.

4. Report on metrics to evaluate effectiveness and efficiency of the process and drive future improvement in the process

5. Base the process on industry standards while addressing HUIT’s specific requirements

Harvard University Information Technology – Problem Management Page 4

6. Integrate Problem Management and Knowledge Management processes so that information about workarounds and known-errors are shared across the IT organization

7. Integrate Problem Management with all Major Incident Reviews.

SCOPE

The initial scope of the Problem Management process at HUIT will focus on improving communication and coordination of activities after a Major Incident (MI). These activities will focus largely on reactive Problem Management.

The ongoing scope of HUIT’s Problem Management process will include proactive process activities, including the development and maintenance of a Known-Error Data Base, Incident matching and other analysis activities (i.e., Incident trending, Service Reviews, etc.).

The Problem Management process will cover:

1. The investigation and documentation of problems, the documentation of workarounds and known-errors and submitting them to the Knowledge base, and triggering Change Management to fix a problem, if appropriate.

2. Reactive Problem Management: solving problems in response to one or more incidents. 3. Proactive Problem Management: identifying problems based on periodic scheduled

reviews and an analysis of Incident trends (enhancement – not for initial roll-out).4. Incident Matching Procedure: a set of activities performed during Initial Incident

Diagnoses during the Incident Management process to maximize the use of Knowledge / Problem Management (enhancement – not for initial roll-out).

OUT-OF-SCOPE

1. Knowledge Management – The broader Service Knowledge Management System (SKMS) is out of scope

a. General Knowledge Management (i.e., documentation retention/review) outside of the Known-Error Database

b. Errors coming out of development (i.e., Release Management integration)c. Self-Help / Self Service

2. Configuration Management – The Problem Management process will use currently available CI data

Harvard University Information Technology – Problem Management Page 5

PROCESS FLOW

The next few pages illustrate the following use case scenarios for Problem Management:

Reactive Problem Management Proactive Problem Management Incident Matching Procedure

The process flows use “swim-lane diagrams” to illustrate which role is responsible for the activity. These roles are described in more detail in the following section titled “Roles and Responsibilities”.

Harvard University Information Technology – Problem Management Page 6

REACTIVE PROBLEM MANAGEMENT

Harvard University Information Technology – Problem Management Page 7

PROACTIVE PROBLEM MANAGEMENT

Harvard University Information Technology – Problem Management Page 8

INCIDENT MATCHING PROCEDURE

Harvard University Information Technology – Problem Management Page 9

ROLES & RESPONSIBLITIES

INDIVIDUAL RESOLVING AN INCIDENT

1. Follow the Incident Matching procedure when initially diagnosing an Incident including linking Incidents to existing Problems where appropriate.

2. Determine if the Incident, or a trend of Incidents, is non-trivial and requires Problem Management.

3. Create the Problem Record, ensuring the Problem description is adequate, and assigning it to the right group after a Major Incident

PROBLEM “ASSIGNED TO” INDIVIDUAL

1. Record any activity performed in the work notes.

2. If individuals from multiple teams are involved in researching the problem, coordinate the investigation efforts.

3. If a workaround is discovered, document the workaround, publish it to the knowledge base, and inform the users affected by the Incident (if appropriate).

4. If a known-error is discovered, document the known-error and publish it to the knowledge base.

5. If there is a business justification for fixing the known-error, initiate the Change Management process.

PROBLEM TASK “ASSIGNED TO” INDIVIDUAL

1. Record any activity performed in the work notes and mark tasks as “Closed Complete” when completed.

PROBLEM MANAGER

The Problem Manager is the single individual responsible for the Problem Management process across all of IT. Their responsibilities include:

1. Ensures that all of IT follows the Problem Management process.

Harvard University Information Technology – Problem Management Page 10

2. Review Incident reports and trends and conduct regular reviews with Service Owners to identify trends.

3. Report on and analyze Problem metrics.

4. Sponsor improvements to the process or tool(s).

SERVICE OWNER

1. During the normal course of work, e.g. reviewing Incident reports / trends, talking to their teams, etc., if a Problem is identified, create and assign the Problem Record. (Proactive)

2. Periodically meet with the Problem Manager for formal service reviews to identify Problems.

3. Oversee Problem Management activity for their service.

Harvard University Information Technology – Problem Management Page 11

POLICIES

1. All Problems must be recorded in Service-now.

2. All Major Incidents will require the opening of a Problem Record, (i.e., and document if no action is required)

3. All Workarounds and Known-errors must be published in the Known Error Data Base.

4. First line support / Service Desk must perform the Incident Matching procedure.

5. If an Incident is related to a Problem, the Incident record must be linked to the Problem record.

6. [Configurable within ProblemEXCELerator] Problem Management must be initiated for all Major Incidents.

7. [Configurable within ProblemEXCELerator] Problem Management must be initiated when there were at least 5 incidents opened this month for a single category/subcategory.

8. [Configurable within ProblemEXCELerator] Problem Management must be initiated when there were at least 5 incidents opened this month for a single CI since last month.

9. Any work conducted must be recorded in the work notes.

10. Only the Assigned To individual can set the status of the Problem Record to “Closed – Cancelled”. If this status is selected, the reason for cancellation must be documented in the “Close Notes”.

11. If a Problem that will be fixed through a Change, the Change record must be linked with the Problem record.

12. Proactive Problem Management reviews must occur at least once a year for each service.

Harvard University Information Technology – Problem Management Page 12

ESCALATIONS

If a Problem is in the “Open” or “Workaround” state, then after the specified period of inactivity, the following people will be emailed.

Problem Priority

Email “Assigned To” individual every …

Email “Assigned Group” manager every …

1 – Major Incident-driven

2 business days if there is no activity 4 days if there is no activity following the MIR meeting

2 – Other (i.e., Service-Owner driven)

Week if there is no activity after agreed start-date

2 weeks if there is no activity after agreed start-date

Harvard University Information Technology – Problem Management Page 13

PROBLEM STATUSES

The following diagram illustrates the possible statuses of a Problem record, how statuses are allowed to change and what triggers the status to be automatically updated.

Harvard University Information Technology – Problem Management Page 14

KEY PERFORMANCE INDICATORS

Include the following KPIs in the Problem Overview Page:

1. Total number of Problems by category, priority, Assigned to, Assignment Group, etc.

2. Total number of Problems that have been escalated by category, priority, support group, support analyst, etc.

3. Number workarounds and known-errors published to the knowledgebase by time period.

4. Number of Incidents resolved by workarounds by time period.

5. Number of Incidents linked to known-errors by time period.

6. Number of Incidents linked to Problems by time period.

7. Number of Changes triggered by Problem Management by time period.

Harvard University Information Technology – Problem Management Page 15

FIELD SPECIFICATIONS

PROBLEM RECORD

Field Description Mandatory to Open

Problem

Mandatory to Close Problem

Problem Number Automatically generated by ServiceNow.

Category Same categorization scheme as Incidents. Copied from Incident Record where possible.

Configuration Item Drop down table of Service Names and Aliases

Priority Same priority drop-down as Incidents. Copied from Incident Record where possible.

Change Request Automatically completed when a Change is created from the Problem.

Yes, when status is

“Closed – Resolved”

Opened Automatically populated with date/time Problem record was opened.

Opened by Automatically populated with name of user that opened the Problem record.

Status Dropdown: Open Known Error Pending Change Closed-Resolved Closed-Cancelled

Assignment Group Pick from a lookup of all the support groups. Automatically populated when “Assigned to” is completed.

Assigned To Pick from a lookup of all the support analysts within the support group.

Work Notes List of support analysts that are

Harvard University Information Technology – Problem Management Page 16

emailed whenever a work note change, status, category, or assignment changes.

Short Description One line description of the Problem

Description Multi-line text box description of the Problem.

Workaround Multi-line text box for describing the workaround. Status of Problem Record changes to ‘Workaround’ when field is populated.

Known Error Checkbox Drop down free form text field opens. Status of Problem Record changes to ‘Known Error’.

Work Notes Multi-line text box to track notes as support analyst works on Problem.

Root Cause Analysis Multi-line text fieldChronology of Events Multi-line text field.Contributing Factors Multi-line text field.Lessons Learned Multi-line text field.Recommendations/Next Steps

Multi-line text field.

Cause Codes Drop down containing (same as root cause field

on Major Incident tab)Incidents tab / table List of related IncidentsProblem Task tab / table

List of tasks associated with the Problem. Possible Task Types:

Investigation Corrective Action

Harvard University Information Technology – Problem Management Page 17

CHANGES TO INCIDENT RECORDField Description Mandatory to

Open ProblemMandatory to Close Problem

REMOVERoot-Cause Analysis

Future - Multi-line text; (NOTE: will be part of Problem Record)

Root-cause Future - Dropdown containing (NOTE: will be part of Problem Record)

Knowledge Sharing opportunity

Future - Checkbox

First line knowledge text

Future - Multi-line text

ADDProblem Placed below Change Request in the

NOTES tab of the Incident Record. Will provide the Problem Number of a related Problem

Create Problem This is a menu option that will be added to the Incident Header drop-down

CHANGE MANAGEMENT

Need to ensure that when an RFC (from Problem) is closed, the related Problem record is ‘closed-complete’.

Harvard University Information Technology – Problem Management Page 18