incident management “get your basics right” - … · why incident management? • knowing which...

Incident Management “Get Your Basics Right”

Introduction

•  Neil Thomas –  Industry experience in IT & IT support –  ITIL Vendor Product Management –  ITIL Consulting –  Specialised in Service Catalog & CMDB

•  Fully Accredited ITIL Training •  Fully Accredited SDI Training •  ITIL Consultancy •  eLearning •  Social Media Training & Consultancy •  Industry Webinars (ITSM & SM) •  Industry/Organizational Podcasts •  SDI Partner for Social Media Courses

Introduction

The Webinar Series

•  Service Catalog •  Developing a CMDB •  Incident Management •  Problem Management •  Change Management •  Measuring Service Desk Performance Metrics

Topics today

•  Incident Management & ITIL •  Service Desk •  Incident versus Service requests •  Other Incident Workflows •  Knowledge •  Service Level Agreements •  Incident & Problem •  Incident & Change

If Something Goes Wrong

(Incident Management)

Service If Something Keeps Goes

Wrong (Problem Management)

What Delivers it (Configuration Management)

Need to Improve or Resolve Problems

(Change Management)

Managing it (Service Portfolio

Management & Financial Management)

Ensuring it’s there in the Future

(Availability Management & Capacity Management &

Service Continuity Management)

Delivering Agreed Changes to Business

(Release Management)

User Needs Something (Service Requests & Service Catalogue)

How Quickly do we Support

(Service Level Management)

Incident Management

•  Restore normal services AS QUICKLY AS POSSIBLE while minimizing the impact

•  Incident definition: –  Any event that disrupts, or could disrupt, a service

Key Elements

•  Incidents – ANYTHING hardware and software errors •  Reported by email, phone self-service, Twitter etc •  Events detected within the IT infrastructure (Event Mgt V3) •  Normally recorded by the Service Desk to ensure compliance •  Data vital to improve resolution of service

Key Elements

•  Incident detection & recording •  Classification & initial support •  Investigation & diagnosis •  Resolution & recovery •  Incident closure •  Ownership, monitoring, tracking, & communication

(monitoring the progress of the resolution of the incident and keeping those who are affected by the incident up to date with the status)

No

No

End

No

No

From Event Mgmt From Web Interface User Phone Call Email Technical Staff

Incident Identification

Incident Logging

Incident Categorization

Service Request? Yes

Incident Prioritization

To Request Fulfilment

Major Incident Procedure Yes

Major Incident?

Initial Diagnosis

Yes Functional Escalation 2/3 Level

Yes Functional Escalation Needed?

Investigation & Diagnosis

Resolution and Recovery

Hierarchic Escalation Needed?

Yes Management Escalation

Incident Closure

The Incident Management Process

EXAMPLE

Record

•  Normally recorded by a Service Desk •  Record all incidents •  Ensures compliance with SLAs •  Records all relevant data •  Facility for users to to report incidents quickly & easily,

Categorize

Effective categorization of incidents has two aspects: •  Classification to determine incident type (for example IT

Service = degraded) •  The Configuration Item (CI) that is affected

•  Use standardized coding criteria.

Priority/Severity Level 4 No Business Impact No loss of service or resources

Priority/Severity Level 3 Minor Business Impact Minor loss of service or resources

Priority/Severity Level 2 Serious Business Impact Severe loss of service or resources acceptable workaround

Priority/Severity Level 1 Critical Business Impact Complete loss of service or resources and work cannot reasonably continue - the work is considered “mission critical”

Prioritize (Severity)

Escalate

•  Rapidly escalate incidents according to agreed service level •  allocate more support resources if necessary •  Escalation can follow two paths:

–  Horizontal escalation is required when the incident needs to be escalated to different SME groups that are better able to perform the Incident Management function.

–  Vertical escalation is where the incident needs to gain higher levels of priority.

•  Rules to ensure timely escalation •  For every resolution attempt, accurate data must be

attached to the incident detail to save repeating recovery procedures

Resolve, Recover & Restore

•  Check for known errors and use any “workarounds” •  Resolving the Incident with solutions or workarounds •  For some solutions, a Request for Change (RFC) will need

to be submitted •  Service Desk confirms with the user the error has been

rectified and that the incident can be closed

Goal of the Incident Management process is to restore service.

Key Functions

•  Take ownership for an incident •  Provide a prompt recovery of the business within SLA •  Keep the focus on the incident (no blindsiding) •  Escalating incidents: functional (higher technical skill) •  Escalating incidents: hierarchical (manager decision) •  Keep the customer informed •  Facilitate communication and act as an interface •  Keep tracking of time & activity

Service Level Agreements

•  Negotiated and AGREED level of response WITH organization

•  Different SLA’s for different: –  Priorities –  Configuration Items (assets) –  Service –  User

•  Appropriate to organizations needs •  Aim to RESTORE service asap given

the IMPORTANCE of the service

Major Incident

•  A Major Incident is an unplanned or temporary interruption of service with severe negative consequences

Problem Management

•  A Problem is the cause (typically unknown) of one or more incidents. Activities include: –  Analyze and identify the root cause of one or more incidents –  Validate and publish the workaround for incidents whose cause is

known (known error) –  Effect the systematic removal of the root cause via RFCs

Known Errors

•  Problem Management identifies the underlying causal factor •  It might take many incidents to understand the root cause. •  When identified the causal factor becomes a “known error” •  If a work-around exists then becomes a “workaround”

Knowledge & Incidents

Use of in Self Service •  Self help (knowledgebases, FAQs etc) •  Script based help •  Record that it self help has been used

Use of to Construct Knowledge •  Incidents contain DESCRIPTION •  Incidents contain RESOLUTION •  INFORM Problem

Service Desk & Incidents

•  Incident logging •  Customer

satisfaction •  Prioritization •  First line support •  Request fulfillment •  Escalations •  Communication •  Operational metrics

Know when to stop !

•  Beware over analyzing •  Appropriate Management Information

–  Closed 4,000 calls –  Received 45,000 SNMP Traps

•  SIGNIFICANCE

•  Why Measure? •  What is IMPORTANT to the Organization •  Key Performance Indicators

–  Customer satisfaction –  Time to resolution –  Key Incident Resolution

Service Catalog & Requests

Is it a bird or…?

•  When is an Incident a Service? •  Alternate “Incidents”

•  New Hire •  Leaver •  Equipment Request •  Software Provision •  Virus Scan

•  No “right” answer

Is it a bird or…?

•  Define the Process •  Manage by Priority •  Set realistic SLA’s

OR

•  Make Service Requests

New Hire Process

HR Tasks •  Recruitment request signed, attached and filed •  Recruitment offer signed, attached and filed •  Offer letter and T&C’s sent to candidate •  Signed letter back from candidate •  Starter letter sent out to candidate •  Created new employee in external systems •  Personal details completed •  Informed payroll / reception •  Collected acknowledgement forms:

•  Employee handbook •  H&S policy •  IT Policy •  Induction arranged •  References – requested & received

•  Healthcare cover arranged •  Pension arranged •  Parking permit issued •  Business cards arranged •  End of probation letter sent

IT Tasks •  PC/Laptop •  Network ID •  Email •  Telephony – Internal, Cell •  Security card •  Application access

FM Tasks •  Seating allocation

Incident & Change

•  Accurate analysis •  Identification of Configuration Items •  Good Problem analysis that touches ALL Incidents •  Link to Known Errors and Work Arounds

Configuration Management

•  Defines WHAT delivers a SERVICE •  Defines the RELATIONSHIPS & dependencies •  Know WHAT is important and HOW it connects •  Change Process uses IMPACT ANALYSIS

Why Incident Management?

•  Knowing which Service is most important Incidents to be prioritized •  Defines who a user/customer contact, what is the expected fix time etc •  If not then we fight the same fires over and over again •  Building better and more repeatable process around this firefighting will

drive efficiency and effectiveness and overall greater quality •  Builds on the body of knowledge of a call •  DOCUMENTS what has happened, who did what and when •  Stops duplication of work •  Avoids “difficult tasks” being ignored (bounce count) •  Communication occurs (or should occur)

incident management “get your basics right” - … · why incident management? • knowing which...

Documents