incident management user guide v9[1]it.emory.edu/media/itil-im-guide.pdf · incident management...

3
Incident Management User Guide 1 | Page The goal of Incident Management is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring the best possible levels of service quality and availability are maintained. General Guidelines Incidents must be categorized in Remedy with a Request Type of “Incident” All other service or change requests must be categorized with any Request Type value other than “Incident” When you resolve an incident, change the Remedy Status to “Resolved” as quickly as possible All UTS managers are responsible for maintaining their Remedy queues. Terminology & Roles Incident Management includes any event which disrupts, or which could disrupt, a service. This includes events which are communicated directly by users, either through the Help Desk or through the web (help.emory.edu), and eventually through monitoring systems such as Smarts. Incidents are also reported by technical staff such as systems, applications or network staff when they notice something problematic with a hardware component, software application or network component. Service requests do not represent a disruption to the agreed service, but are a way of meeting the customer’s needs and may be addressing another service goal set in a specific Service Level Agreement (SLA). Incident An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item that has not yet impacted service is also an incident, for example failure of one disk from a mirror set. Incident Manager Individual responsible for monitoring incidents and ensuring good service. Works to inform and include other teams in an effort to provide the promised level of service to customers. Incident manager provides quality control for Help Desk operations as a part of this effort. Incident Management Incident Management is the process for managing the lifecycle of all incidents. The primary objective of Incident Management is to return the service to users as quickly as possible. Problem A cause of one or more Incidents. The root cause is not usually known at the time the Problem Record is created, and the Problem Management Process is responsible for further investigation. Service Impact Report A separate procedure with shorter timescales and greater urgency that is used for ‘major’ incidents referred to as SIRs. This process is managed by the Help Desk Crisis Manage, IT-Alert Process, and is followed for major incidents. Incident Management Tips 1. Resolve incidents as quickly as possible, utilizing a work around if necessary. 2. Only mark an incident as Resolved if you are confident the user(s) is operational. 3. Escalate to other teams quickly – both with ticket routing and with a follow-up if not accepted according to the SLA. Queue Management Tips 1. Assign an individual on your team to be responsible for managing the Remedy queue. This can be a permanently assigned role or one that rotates with each team member. 2. Ensure all tickets are categorized properly: Request Type of “Incident” for all service disruptions. 3. When working on a ticket, assign the ticket to yourself. This is the acceptance step and it lets your peers and customer(s) know someone is actively working and assigned to the incident or service request. 4. Reserve 5-10 minutes at the end of each day to quickly review any open incidents. 5. When assigning a ticket to another team’s queue, enter a Work Log entry indicating why you are forwarding the case and what troubleshooting steps have already been completed. Also, re-categorize the case appropriately for auto- routing. Be sure to assign to the group queue and not a specific individual in the group.

Upload: lamdung

Post on 31-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Incident Management User Guide

1 | P a g e

The goal of Incident Management is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring the best possible levels of service quality and availability are maintained.

General Guidelines

• Incidents must be categorized in Remedy with a Request Type of “Incident” • All other service or change requests must be categorized with any Request Type value other than “Incident” • When you resolve an incident, change the Remedy Status to “Resolved” as quickly as possible • All UTS managers are responsible for maintaining their Remedy queues.

Terminology & Roles

Incident Management includes any event which disrupts, or which could disrupt, a service. This includes events which are communicated directly by users, either through the Help Desk or through the web (help.emory.edu), and eventually through monitoring systems such as Smarts. Incidents are also reported by technical staff such as systems, applications or network staff when they notice something problematic with a hardware component, software application or network component. Service requests do not represent a disruption to the agreed service, but are a way of meeting the customer’s needs and may be addressing another service goal set in a specific Service Level Agreement (SLA). Incident An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a

configuration item that has not yet impacted service is also an incident, for example failure of one disk from a mirror set.

Incident Manager Individual responsible for monitoring incidents and ensuring good service. Works to inform and include other teams in an effort to provide the promised level of service to customers. Incident manager provides quality control for Help Desk operations as a part of this effort.

Incident Management Incident Management is the process for managing the lifecycle of all incidents. The primary objective of Incident Management is to return the service to users as quickly as possible.

Problem A cause of one or more Incidents. The root cause is not usually known at the time the Problem Record is created, and the Problem Management Process is responsible for further investigation.

Service Impact Report A separate procedure with shorter timescales and greater urgency that is used for ‘major’ incidents referred to as SIRs. This process is managed by the Help Desk Crisis Manage, IT-Alert Process, and is followed for major incidents.

Incident Management Tips

1. Resolve incidents as quickly as possible, utilizing a work around if necessary. 2. Only mark an incident as Resolved if you are confident the user(s) is operational. 3. Escalate to other teams quickly – both with ticket routing and with a follow-up if not accepted according to the SLA.

Queue Management Tips

1. Assign an individual on your team to be responsible for managing the Remedy queue. This can be a permanently assigned role or one that rotates with each team member.

2. Ensure all tickets are categorized properly: Request Type of “Incident” for all service disruptions. 3. When working on a ticket, assign the ticket to yourself. This is the acceptance step and it lets your peers and

customer(s) know someone is actively working and assigned to the incident or service request. 4. Reserve 5-10 minutes at the end of each day to quickly review any open incidents. 5. When assigning a ticket to another team’s queue, enter a Work Log entry indicating why you are forwarding the case

and what troubleshooting steps have already been completed. Also, re-categorize the case appropriately for auto-routing. Be sure to assign to the group queue and not a specific individual in the group.

Incident Management User Guide

2 | P a g e

Incident Management Process

Incident Management User Guide

3 | P a g e

Ticket Routing Process

Ticket Accept & Resolve Process