incident management “get your basics right” - … · why incident management? • knowing which...
TRANSCRIPT
Incident Management “Get Your Basics Right”
Introduction
• Neil Thomas – Industry experience in IT & IT support – ITIL Vendor Product Management – ITIL Consulting – Specialised in Service Catalog & CMDB
• Fully Accredited ITIL Training • Fully Accredited SDI Training • ITIL Consultancy • eLearning • Social Media Training & Consultancy • Industry Webinars (ITSM & SM) • Industry/Organizational Podcasts • SDI Partner for Social Media Courses
Introduction
The Webinar Series
• Service Catalog • Developing a CMDB • Incident Management • Problem Management • Change Management • Measuring Service Desk Performance Metrics
Topics today
• Incident Management & ITIL • Service Desk • Incident versus Service requests • Other Incident Workflows • Knowledge • Service Level Agreements • Incident & Problem • Incident & Change
If Something Goes Wrong
(Incident Management)
Service If Something Keeps Goes
Wrong (Problem Management)
What Delivers it (Configuration Management)
Need to Improve or Resolve Problems
(Change Management)
Managing it (Service Portfolio
Management & Financial Management)
Ensuring it’s there in the Future
(Availability Management & Capacity Management &
Service Continuity Management)
Delivering Agreed Changes to Business
(Release Management)
User Needs Something (Service Requests & Service Catalogue)
How Quickly do we Support
(Service Level Management)
Incident Management
• Restore normal services AS QUICKLY AS POSSIBLE while minimizing the impact
• Incident definition: – Any event that disrupts, or could disrupt, a service
Key Elements
• Incidents – ANYTHING hardware and software errors • Reported by email, phone self-service, Twitter etc • Events detected within the IT infrastructure (Event Mgt V3) • Normally recorded by the Service Desk to ensure compliance • Data vital to improve resolution of service
Key Elements
• Incident detection & recording • Classification & initial support • Investigation & diagnosis • Resolution & recovery • Incident closure • Ownership, monitoring, tracking, & communication
(monitoring the progress of the resolution of the incident and keeping those who are affected by the incident up to date with the status)
No
No
End
No
No
From Event Mgmt From Web Interface User Phone Call Email Technical Staff
Incident Identification
Incident Logging
Incident Categorization
Service Request? Yes
Incident Prioritization
To Request Fulfilment
Major Incident Procedure Yes
Major Incident?
Initial Diagnosis
Yes Functional Escalation 2/3 Level
Yes Functional Escalation Needed?
Investigation & Diagnosis
Resolution and Recovery
Hierarchic Escalation Needed?
Yes Management Escalation
Incident Closure
The Incident Management Process
EXAMPLE
Record
• Normally recorded by a Service Desk • Record all incidents • Ensures compliance with SLAs • Records all relevant data • Facility for users to to report incidents quickly & easily,
Categorize
Effective categorization of incidents has two aspects: • Classification to determine incident type (for example IT
Service = degraded) • The Configuration Item (CI) that is affected
• Use standardized coding criteria.
Priority/Severity Level 4 No Business Impact No loss of service or resources
Priority/Severity Level 3 Minor Business Impact Minor loss of service or resources
Priority/Severity Level 2 Serious Business Impact Severe loss of service or resources acceptable workaround
Priority/Severity Level 1 Critical Business Impact Complete loss of service or resources and work cannot reasonably continue - the work is considered “mission critical”
Prioritize (Severity)
Escalate
• Rapidly escalate incidents according to agreed service level • allocate more support resources if necessary • Escalation can follow two paths:
– Horizontal escalation is required when the incident needs to be escalated to different SME groups that are better able to perform the Incident Management function.
– Vertical escalation is where the incident needs to gain higher levels of priority.
• Rules to ensure timely escalation • For every resolution attempt, accurate data must be
attached to the incident detail to save repeating recovery procedures
Resolve, Recover & Restore
• Check for known errors and use any “workarounds” • Resolving the Incident with solutions or workarounds • For some solutions, a Request for Change (RFC) will need
to be submitted • Service Desk confirms with the user the error has been
rectified and that the incident can be closed
Goal of the Incident Management process is to restore service.
Key Functions
• Take ownership for an incident • Provide a prompt recovery of the business within SLA • Keep the focus on the incident (no blindsiding) • Escalating incidents: functional (higher technical skill) • Escalating incidents: hierarchical (manager decision) • Keep the customer informed • Facilitate communication and act as an interface • Keep tracking of time & activity
The Incident Management Process
Service Level Agreements
• Negotiated and AGREED level of response WITH organization
• Different SLA’s for different: – Priorities – Configuration Items (assets) – Service – User
• Appropriate to organizations needs • Aim to RESTORE service asap given
the IMPORTANCE of the service
Major Incident
• A Major Incident is an unplanned or temporary interruption of service with severe negative consequences
Problem Management
• A Problem is the cause (typically unknown) of one or more incidents. Activities include: – Analyze and identify the root cause of one or more incidents – Validate and publish the workaround for incidents whose cause is
known (known error) – Effect the systematic removal of the root cause via RFCs
Known Errors
• Problem Management identifies the underlying causal factor • It might take many incidents to understand the root cause. • When identified the causal factor becomes a “known error” • If a work-around exists then becomes a “workaround”
The Incident Management Process
Knowledge & Incidents
Use of in Self Service • Self help (knowledgebases, FAQs etc) • Script based help • Record that it self help has been used
Use of to Construct Knowledge • Incidents contain DESCRIPTION • Incidents contain RESOLUTION • INFORM Problem
Service Desk & Incidents
• Incident logging • Customer
satisfaction • Prioritization • First line support • Request fulfillment • Escalations • Communication • Operational metrics
Know when to stop !
• Beware over analyzing • Appropriate Management Information
– Closed 4,000 calls – Received 45,000 SNMP Traps
• SIGNIFICANCE
• Why Measure? • What is IMPORTANT to the Organization • Key Performance Indicators
– Customer satisfaction – Time to resolution – Key Incident Resolution
Service Catalog & Requests
Is it a bird or…?
• When is an Incident a Service? • Alternate “Incidents”
• New Hire • Leaver • Equipment Request • Software Provision • Virus Scan
• No “right” answer
Is it a bird or…?
• Define the Process • Manage by Priority • Set realistic SLA’s
OR
• Make Service Requests
New Hire Process
HR Tasks • Recruitment request signed, attached and filed • Recruitment offer signed, attached and filed • Offer letter and T&C’s sent to candidate • Signed letter back from candidate • Starter letter sent out to candidate • Created new employee in external systems • Personal details completed • Informed payroll / reception • Collected acknowledgement forms:
• Employee handbook • H&S policy • IT Policy • Induction arranged • References – requested & received
• Healthcare cover arranged • Pension arranged • Parking permit issued • Business cards arranged • End of probation letter sent
IT Tasks • PC/Laptop • Network ID • Email • Telephony – Internal, Cell • Security card • Application access
FM Tasks • Seating allocation
Incident & Change
• Accurate analysis • Identification of Configuration Items • Good Problem analysis that touches ALL Incidents • Link to Known Errors and Work Arounds
Configuration Management
• Defines WHAT delivers a SERVICE • Defines the RELATIONSHIPS & dependencies • Know WHAT is important and HOW it connects • Change Process uses IMPACT ANALYSIS
Why Incident Management?
• Knowing which Service is most important Incidents to be prioritized • Defines who a user/customer contact, what is the expected fix time etc • If not then we fight the same fires over and over again • Building better and more repeatable process around this firefighting will
drive efficiency and effectiveness and overall greater quality • Builds on the body of knowledge of a call • DOCUMENTS what has happened, who did what and when • Stops duplication of work • Avoids “difficult tasks” being ignored (bounce count) • Communication occurs (or should occur)
Q & A Time…….
Confidential, All Rights Reserved, ServiceSphere™ 2008 http://www.servicesphere.com