implementing service level management ian bestpd...implementing service level management ian best...
TRANSCRIPT
Implementing Service Level Management Ian Best
White Paper
Abstract
Implementing Service Level Management can initially be seen as an impossible project with no clear starting point. It is clear that there is much more involved than simply integrating a technology solution with existing OSS systems. There are a number of other aspects that must be included as part of such a project if the implementation is to deliver real benefits to the business. This paper identifies many of the aspects of Service Level Management that need to be considered and, with reference to several industry initiatives, provides a number of pointers to assist with a successful implementation.
Table of Contents WHITE PAPER...........................................................................................................................................1
1 INTRODUCTION ...............................................................................................................................3 1.1 GLOSSARY......................................................................................................................................3
2 IMPLEMENTING SERVICE LEVEL MANAGEMENT ..............................................................4 2.1 THE SLA UNIVERSE.......................................................................................................................4 2.2 END-TO-END QOS..........................................................................................................................6 2.4 TECHNOLOGY.................................................................................................................................7
APPENDIX 1 - INDUSTRY REFERENCES .........................................................................................11
APPENDIX 2� CONTRIBUTION TO TMF ON MEASUREMENT HIERARCHY ........................12
INTRODUCTION.....................................................................................................................................14 PURPOSE...................................................................................................................................................14 SCOPE.......................................................................................................................................................14
SECTION 2 ................................................................................................................................................15 KEY QUALITY INDICATORS ......................................................................................................................15 KQI PARAMETERS ...................................................................................................................................17
2002, Comnitel Technologies
1 Introduction Implementing Service Level Management ranging from the business processes through to identifying the measurables and implementing the solution. This paper outlines many of these aspects and also provides pointers to activities within the industry that assist the implementor with this activity.
1.1 Glossary
eTOM eBusiness Telecommunications Operations Map
KPI Key Performance Indicator
KQI Key Quality Indicator
SLA Service Level Agreement
TMF TeleManagement Forum
Page 3
2002, Comnitel Technologies
2 Implementing Service Level Management
2.1 The SLA Universe
mSure�ServiceAssure
ProductSLA
InternalSLA
SupplierSLA
ProductSLA
InternalSLA
SupplierSLA
Customer
Network
BusinessSupplier /Partner
Figure 1 - The SLM Universe
There are three distinct areas associated with the application of Service Level Agreements (SLAs) with the operators business. The most obvious and in many cases the area that is currently under focus, is that of internal SLAs. As is shown in Figure 4 - End-to-End QoS, internal SLAs are most often focussed on managing components of the service delivery chain and are aggregated to form the end-to-end service measurable. By their nature these SLAs are often not truly customer focussed in as much as they are not couched in terms that understandable by the end customer. The key users of these SLAs are the functions responsible for managing the service components and for managing those components against agreed quality objectives. Other parts of the operator�s business rely on internal SLAs to drive improved efficiencies and for understanding the service delivery performance within the functions of the business. It should also be noted that service level management is not restricted to network based service components but must also be applied to non-network services e.g. billing. As operators become more and more reliant on 3rd parties for the delivery of services and content, the need to implement SLAs against those suppliers / partners will grow in importance. Indeed the implementation of SLAs in this area will almost certainly improve the quality of content delivery. It will also enable the operator to pass some of the financial risk of service degradation onto their 3rd parties. These SLAs are very similar in nature to the internal SLA discussed above and will form part of the end-to-end SLA but may also have an impact on the accounting processes for paying for the content services. The final area that needs to be considered is the end customer or external SLA. In considering this area, the nature of the products sold to the customer needs to be taken into account. While internally the focus may be on individual services, the offering to the customer tends to be a product that consists of a number of services (both network and non-network).
Page 4
2002, Comnitel Technologies
� Voice� SMS� Voicemail� WAP� Handset� Customer Care� Itemised Billing� Per Second Billing� Handset Replacement� Insurance
Network
Non-Network
Figure 2 - Product Components
Figure 2 - Product Components above shows a typical product offering for a European mobile operator. As can be seen some of these components are quite obviously network derived, however there are a number of the product components that have little or no dependence on network resources. However the SLA with the customer is likely to encompass all aspects of the product including the non-network aspects. The external SLA is therefore likely to consist of a wide ranging set of parameters clearly worded in terms that are clearly understood by the customer. Deriving this set of SLA criteria requires a clear understanding of what is important to the end user and is therefore best achieved by discussion and negotiation with the users. Figure 3 � (Simplified) eTOM Process Flow, below, shows the processes (as defined in the TMForum eTOM) involved in defining the various types of SLAs discussed above. It also attempts to identify where those processes fit in terms of the two domains; Customer / Marketing and Operations.
Product Offer
ServiceDev. &
Retirement
ManageInternal SLA
ServiceQuality
Analysis & Rpt
ResourceData
Collection
ResourceDev.
Product Dev. &
Retirement
CustomerQoS/SLA
Mgmnt
Supply ChainCapability
Mgmnt
OrderHandling
Supply ChainDev & Change
Mgmnt
S/P PerformanceManagement
S/P SLA
Management
SLARequirements
SLATemplates
TargetKQIs
TargetKPIs
KQI / KPIMapping etc
ActualSLAs
TargetData
ActualData
S/PData
SLA Reqs
S/P SLA SLA Objectives
S/PPerf. Data
KQIs &Violations
Operation�s Domain
Customer / Marketing Domain
Figure 3 � (Simplified) eTOM Process Flow
Page 5
2002, Comnitel Technologies
2.2 End-to-End QoS Every day thousands of operators travel by car to work. Unsurprisingly, these drivers spend more than 95% of the time looking out of the front windscreen. Of the remaining 5%, some time will be spent checking the rear view mirror, regulating the temperature, adjusting the volume, listening to traffic reports etc. The drivers� focus is to get to their destination safely, comfortably and on time. So, less than 1% will be spent checking the vehicle instrumentation for problems. But when they get to work they will spend 95% of the time checking for alarms on the network�s �engine management system�. They will spend considerably more time drinking coffee than �looking out of the window� to see what their customers are seeing! An analysis carried out as part of the TMForum SQM Catalyst project showed that of the 4,000 or so possible alarms that the targeted network components were capable of generating, less than 0.1% gave any indication of the impact on the customer. So why do we manage our networks primarily on network events? Even PM data gives us more information on the performance of an individual network component than it does about the performance of the end-to-end service delivery. No one would realistically suggest that an operator should ignore the alarms being generated by the network components any more than a driver should ignore the oil pressure warning on a car engine. But surely the emphasis must be to look out of the window! The findings of the catalyst project highlighted the need to find more effective ways of managing service quality and that this could be achieved by collecting data from multiple sources and aggregating this information to provide an overall quality �measurement�. The project went on to prove the concept through the aggregation of alarms, performance stats, call detail records and application event logs.
IP T ra n s p o r tD o m a in
IS P ITN e tw o rkD o m a in
IP T r a n s p o rt
S L A
M o b ile A c c e s s
N e tw o r k S L A
G P R S /U M T SM o b ile A c c e s s N e tw o rk
D o m a in
M o b ile N e tw o r kS e r v ic e A c c e s s
P o in t (S P B T S )
IP T r a n s p o r t S e r v ic e
A c c e s s P o in t (S P G G S N )
IT N e tw o r k S e r v ic e A c c e s s
P o in t ( IS P IP R o u te r )
E n d - to -E n d S L A
IT N e tw o r k
S L A
A p p lic a t io nS e r v ic e A c c e s s
P o in t ( IS P E -m a il S e r v e r )
T E
D if f-S e rvM P L S -T E
D N S
D i f f-S e rvM P L S -T E
D N S D N S R A D IU SF ire w a ll
D i f f -S e rv
D N S R A D IU SF ire w a ll
D i f f -S e rv
D H C P
D if f-S e rv
D N S
IP
R A D IU S
G T P
D H C P
D if f-S e rv
D N S
IP
R A D IU S
G T P
Figure 4 - End-to-End QoS
Page 6
2002, Comnitel Technologies
Figure 4 - End-to-End QoS above shows an example of how the end to end QoS for a service may be measured. It shows how a service may be broken down into service components each of which may have an internal SLA associated with it. It is the combination of these internal SLAs (or performance indicators) that, when combined, provide the total end-to-end measurement that provides the indicator of the customer experience. The �granular approach� indicated in the diagram above is an important concept that is not always appreciated (or offered in all SLM systems). While it may be possible to measure service quality through the use of passive and intrusive probes that provide a full end-to-end measurement, this approach provides little information that will enable the operator to correct degradation in service. By breaking the service down into components, and making that information available to the user, it is possible to provide root cause information that will enable faster corrective action.
2.3 Industry Initiatives There are a number of activities within the industry that are focussed on Service Level Management. (A list of some of the key references are listed in Appendix 1 - Industry References. One of the active forums is the TeleManagement Forum ( www.tmforum.org) where there are a number of projects that are focussed on managing service quality. The TMF SLA Handbook (Ref: 4 SLA and Qos Handbook. TMF Version 1.0. November 2000) provides detail of the structure and application of SLAs. The TMF Wireless Services Measurements team (WSMT) are currently defining a set of measurables required for measuring service quality from UMTS RAN resources. These new sets of measurements have been submitted to 3GPP for inclusion in the release 5 specifications. The WSMT is also producing a handbook (Ref: 2 TMForum Wireless Services Team Handbook GB923) that defines the methodology for defining the SLA parameters. The work of this team has included mapping the processes defined in the TMF eTOM (Ref: 5 TMForum eTOM, The Business Process Framework, GB921 version 2.5) onto the process flow for defining SLAs.
2.4 Technology There are a number of offerings from vendors that claim to provide the technology to provide Service Level Management. The completeness of some of these solutions is however open to debate.
Page 7
2002, Comnitel Technologies
Active Service Testing Workflow Monitoring
Passive Service Testing/Probes CC/CR Service Level ManagemenService Level Managementt
QoS Policy Server Inventory
Physical and Logical network models Network Level FM Billing Mediation Network Level PM
IP/ATEM
SS7 LinkMonitor
SystemsManagementOM -R OM -G OM -S
FirewallFirewall SGSSGS GGSGGSBTBT
HLRHLRBSCBSC BTBTMSC/VLMSC/VL MSC/VLMSC/VL
ServerServer ServeServerr BTBT
Figure 5 - SLM Architecture
Some products are based primarily on the collection of fault data and are built on top of existing fault management solutions. However, as discussed above, in many cases, the ability to measure service quality on FM data alone is very restricted. This type of approach also tends towards proactive rather than reactive management. Figure 5 - SLM Architecture above shows that a number of different data sources are required to support effective service level management. As discussed above these data sources are not just network focussed but also need to extract data from other business systems. A solution based on aggregating data from multiple sources raises questions regarding how the complexity of such a solution can be implemented and managed. There are two keys to achieving this kind of implementation:
1. Hierarchical definition of services enabling the inheritance of generic key quality indicators.
Page 8
2002, Comnitel Technologies
Commercial Service
Time to provisionProvisioning successCustomer fault reportsMTTRM
Generic
SpecificSpecific
GPRS ServiceBSS/xGSN availabilityAttachPDP context
GPRS ServiceBSS/xGSN availabilityAttachPDP context
GPRS Interactive ServiceR trip delayData lossGPRS Interactive ServiceR trip delayData loss
GPRS Corporate access
Application hitsApplication access timeServer availabilityAverage response time
GPRS Corporate access
Application hitsApplication access timeServer availabilityAverage response time
KPIs
The diagram above uses the definition of a GPRS service to shown the �inheritance� approach for defining SLA parameters. From the diagram, we can see that a number of measurables may be defined that are common to all services. In addition to these other parameters may be identified that are applicable to all GPRS services. The approach then extends to defining additional parameters required for managing an SLA against a specific GPRS service. This hierarchical approach reduces the time and effort to implement SLAs for new services that are based on existing services.
TBF
++
ound+
ound+
++
2. Hierarchical measurements. In a similar way the solution must be capable of supporting the use of hierarchical measurements. A copy of a contribution made to the WSMT by Comnitel Technologies is included in Appendix 2� Contribution to TMF on Measurement Hierarchy .
There are a few solutions that support this granular hierarchical approach to service level management include ServiceAssure from Comnitel Technologies. The approach described above also enables the operator to implement service management in a structured way and to deliver some real benefits to their business quickly. A fully top down approach to defining SLA may be time consuming and in early implementations can appear quite daunting whereas the definition of generic services tends to be quicker, easier to understand, easier to implement and provides some quick wins.
2.5 Conclusion Implementing effective Service Level Management requires more than a simple deployment of an SLA management system. There are a number of other aspects to the project that need to be considered:
• Impact on existing Business Processes • Understanding of the mapping of services to service resources • Defintion of the Quality Indicators (KQIs) • Integartion with existing OSS for the collection of key performance indicators
Page 9
2002, Comnitel Technologies
o From multiple sources o Of differing data types o In �real time�
While at first the task may seem to be daunting to the point of being too complex, a hierachical approach to defining services and measurables provides a much faster time to implementation while delivering tangable benefits to the business.
Page 10
2002, Comnitel Technologies
Appendix 1 - Industry References (This list has been extracted from the TMForum Wireless Service Measurements Handbook - GB 923)
Ref: 1 TMForum Service Quality Management for IMT-2000 Business Agreement, TMF 506, version 1.5,
Ref: 2 TMForum Wireless Services Team Handbook GB923
Ref: 3 TMF 701. Performance Reporting Concepts & Definitions. Evaluation Version Issue 1.1 May 1999.
Ref: 4 SLA and Qos Handbook. TMF Version 1.0. November 2000
Ref: 5 TMForum eTOM, The Business Process Framework, GB921 version 2.5
Ref: 6 3G TS 23.060 V3.4.0 (2000-07), 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; General Packet Radio Service (GPRS); Service description; Stage 2; (Release 1999)
Ref: 7 Performance Reporting Concepts and Definitions, TMF 701
Ref: 8 3GPP TS 23.107 V3.4.0 (2000-10), 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects;QoS Concept and Architecture (Release 1999)
Ref: 9 3GPP TS 32.015 V3.4.0 (2000-12), 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects;Telecommunication Management;Charging and billing ;3G call and event data for the Packet Switched (PS) domain. (Release 1999)
Ref: 10 ETSI TS 122 105 V3.9.0 (2000-06) Universal Mobile Telecommunications System (UMTS);Service aspects;Services and Service Capabilities (3G TS 22.105 version 3.9.0 Release 1999)
Ref: 11 ETSI TS 100 615 V8.0.0 (2001-02). GSM 12.04 version 8.0.0 Release 1999
Page 11
2002, Comnitel Technologies
Appendix 2� Contribution to TMF on Measurement Hierarchy
Comnitel Technologies
Contribution to Wireless Service Measurements Team
Key Indicators for Service Measurements
Author Ian Best
Abstract
Proposed addition to the WSMT Handbook to explain the measurement hierarchy
Document Id: TBD
Version: 0.6
Supersedes:
Date: 30th May, 2002
Comnitel Technologies Ltd.,
5300, Cork Airport Business Park,
Kinsale Road, Cork.
Page 12
2002, Comnitel Technologies
REVISION HISTORY
0.1 Original 7th December 2001 Ian Best
0.2 First draft for review
0.3 Second draft for review
0.4 Third draft
0.5 KQI architecture updated to reflect the requirement for Product and Service KQIs.
0.6 Minor textual changes REFERENCES Ref [1.] TMF SLA Handbook GB917 Ref [2.] WSMT Handbook GB923 Ref [3.] TMF e-TOM GB921 GLOSSARY OF TERMS
eTOM eBusiness Telecommunications Operations Map
KPI Key Performance Indicator
KQI Key Quality Indicator
SLA Service Level Agreement
TMF TeleManagement Forum
Page 13
2002, Comnitel Technologies
Introduction
Purpose The Wireless Services Measurements Team is defining a new set of measurables that are specifically aimed at measuring the service performance delivered by the service provider. This contribution is targeted as being incorporated into the WSMT Handbook to explain how these measurables will be used to provide the aggregated service quality indicator. Scope The contribution explains the methodology for calculating service quality in generic terms and does not identify the actual measurements. The latter being the purpose of the WSMT Handbook (Ref [2.]). The contribution also attempts to map the measurement hierarchy onto the service delivery hierarchy described in the SLA Handbook (Ref [1.]) This contribution compliments the work carried out by Carolyn Smithson (BT-Cellnet) and the author that focussed on mapping the design and usage of service measurements against the processes described in the TMF e-TOM (Ref [3.]).
Page 14
2002, Comnitel Technologies
Section 2
Key Quality Indicators Service Providers have historically reported the performance of their networks against a set of business agreed Key Performance Indicators (KPIs). These KPIs are, by their nature, network focused and provide little direct indication of the end to end service delivery that the network supports. Nevertheless KPIs are an important measurement for network operations and will continue to be so for the foreseeable future as they provide and indication of performance of individual service components. (It should also be noted that KPIs are usually derived from a number of data sources and not limited to network performance statistical data. For example network fault data is also often used as a measurement of availability.) However, the move towards service focused management leads to a requirement for a new 'breed' of indicators that are focused on service quality rather than network performance. These new indicators or Key Quality Indicators (KQIs) provide a measurement of a specific aspect of the performance of the product, product components (services) or service elements and draw their data from a number of sources including the KPIs. Two main types of KQI need to be considered. At the highest level, a KQI or group of KQIs are required to monitor the quality of the product offered to the end user. These KQIs will often form part of the contractual SLA between the provider and the customer. Product KQIs will derive some of their data from the lower level Service KQIs with the latter focussed on monitoring the performance of individual product components (or services). Indeed in its simplest form a KQI may have one single KPI as its data source. The more likely scenario however is that a Service KQI will aggregate multiple KPIs to calculate the service element quality and Product KQIs will aggregate multiple Service KQIs. Figure 1 describes the key indicator hierarchy.
Page 15
2002, Comnitel Technologies
Figure 6 � Key Indicator Hierarchy
A key use of the KQI is for the calculation of Service Level Agreement compliance. In most circumstances multiple KQIs will feed the SLA management processes. Service KQIs typically supporting multiple internal or Supplier / Partner SLAs and Product KQIs supporting multiple Customer (external) SLAs. As a service element may depend upon a number of network resources, so it follows that a Service KQI may call upon multiple KPIs in the aggregation process. The nature of services that may be included in products offered by Service Providers is likely to result in variants of a 'base' service. It is important therefore that a KQI may also utilise other KQIs as part of its aggregation algorithm. At all levels additional �contextual� information is required to complete the aggregation process. For example the weighting factors applied to each KQI parameter. To summarise starting from the network elements, the network performance data is aggregated to provide KPIs that are an indication of service resource performance. Decomposition of the KPI provides the necessary information to be able to identify the individual service resource that is causing degradation in service quality. The KPIs are then used to produce the Service KQIs that are the key indicators of the service element performance. Service KQIs are then used are the primary input for management of internal or supplier / partner SLAs that calculates actual service delivery quality against design targets or in the case of supplier / partner, contractual agreements. Service KQIs provide the main source of data for the Product KQIs that are required to manage product quality and support the contractual SLAs with the customer.
Page 16
2002, Comnitel Technologies
Figure-2 describes the mapping of the key indicators onto the service decomposition described in the TMForum SLA Handbook. From the top down view; a degradation in service is identified at the KQI level. It is then possible to drill down through the component parts of the KQI to the KPI (or KQI) that is the cause of the degradation. Further decomposition of the violating KPI provides root cause information to enable the user to identify the service resource that is the cause of the reduction in service quality.
Internal SLAs
SKQIs
TMForum SLA Handbook Service Decomposition
Example
Key Indicator
Customer SLA
PKQI
KPIs
Figure 7 � Key Indicators vs SLA Handbook
KQI Parameters Based on the above description of a KQI it is proposed that a KQI will be defined by the following parameters: Parameter Mandatory/
Optional Type Weighting
Factor Remarks
Measurement Parameter 1
M KQI or KPI 0 - 1 The weighting factor for all measurements parameters should equal 1.
Measurement Parameter 2
O Ditto Ditto
Measurement Parameter n
O Ditto Ditto
TimeStamp M YYYY:MM:DD:hh:mm:ss UTC
Information element offering the information of when the KQI message has been update by a KPI.
Page 17
2002, Comnitel Technologies
The time stamp comprises the year, month, day, hour, minute, second and time zone.
Transformation Algorithm
M NA NA This is the algorithm to be applied to calculate the KQI value from the combination of measurement parameters and their weighting factors. The simplest form of algorithm is �=� where a single KPI becomes a KQI.
Objective Value M NA NA This is the value that is used to determine whether the calculated KQI is within the target value. The value is used to determine whether a violation should be generated
Violation direction
M +ve or -ve NA The determine of whether a breach occurs if the calculated value is above or below the objective value is determined by this field. i.e. if this filed is �ve a violation or warning will be generated if the calculated value drops below the objective value or warning threshold.
Warning level O % NA Optionally a % value may be specified to give a warning that a KQI objective is in danger of violating
Abetment threshold
O % NA This parameter introduces a hysteresisfactor to the warnings and alarms to prevent oscillation between an alarm and a clear condition.
Measurement Period
M Minutes NA This is the period over which the KQI performance is calculated (see note 1)
Grace Periods O # NA How many times a value can remain un-update without violation of the SLA in reference to the Measruement Period and / or KQI reporting period
Active/Inactive time
O A hh:mm, hh:mm I hh:mm, hh:mm
NA A KQI measurement period may be controlled by specifying a start and stop time for the KQI e.g. for a KQI that is only applicable between 9 a.m and 5 p.m. this field would be set to A 09:00,17:00 If the two field is preceded with the letter A, the time period has to be included, if preceded with the letter I, the time period has to be excluded from the measurement.
KPI Time Zone O String The Coordinated Universal Time (UTC) string, which permits
Page 18
2002, Comnitel Technologies
to identify from which time zone the underlying KPI�s is/are originated from.
KQI Time Zone O String The Coordinated Universal Time (UTC) string, which permits to identify from which time zone the KQI is originated from. Can be derived from homogeneous KPI or TimeStamp if valid.
Active/Inactive days
O A Day list or I Day list
NA List of days that the KQI measurement is active or inactive e.g. for a KQI that is measured from Monday through Friday the list would be A 2,3,4,5,6 To exclude Saturday and Sunday�s the list would be I 6,7
Note 1: As a KQI may be defined by a number of measurement parameters that have differing sample periods it must use a sliding window technique for calculating the current value. The current value will therefore be updated in �real time� as new data becomes available and aggregated over the time specified by the measurement period.
KQI Sliding Window Reporting
Parameter 1 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 Parameter 2 sample 1 sample 2 sample 3 Parameter 3 sample 1 sample 2 KQI report 1 report 2 report 3 report 4 KQI reporting period
Page 19
2002, Comnitel Technologies
Page 20