implementing offered load in seestat - home - הנדסת … in seestat abir koren professor avishai...

43
Center for Service Enterprise Engineering (SEE) http://ie.technion.ac.il/Labs/Serveng/ Technion - Israel Institute of Technology The William Davidson Faculty of Industrial Engineering and Management DataMOCCA DATA MOdels for Call Center Analysis Volume 6.2 Implementing the Offered-Load in SEEStat Abir Koren Professor Avishai Mandelbaum Dr Valery Trofimov Created: May, 2011

Upload: buihuong

Post on 26-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • Center for Service Enterprise Engineering (SEE) 0TUhttp://ie.technion.ac.il/Labs/Serveng/U0T

    Technion - Israel Institute of Technology

    The William Davidson Faculty of Industrial Engineering and Management

    DataMOCCA

    DATA MOdels for Call Center Analysis

    Volume 6.2

    Implementing the Offered-Load in SEEStat

    Abir Koren

    Professor Avishai Mandelbaum Dr Valery Trofimov

    Created: May, 2011

    http://ie.technion.ac.il/Labs/Serveng/

  • DataMOCCA DATA MOdel for Call Center Analysis

    The DataMOCCA Project is an initiative of researchers from the TechnionIsrael Institute of Technology and The Wharton SchoolUniversity of Pennsylvania. The mission of the project is to collect, pre-process, organize and analyze data from Telephone Call/Contact Centers. The raw data obtained are call-by-call records of at least one years duration from active Call Centers. Among the goals of the project are the development and distribution of Call Center databases, using a uniform schema. The data repository created, together with software tools, will be accessible through the world-wide-web and provide a resource for researchers and teachers of Service Engineering, Science and Management.

    List of Documents

    Volume Title Revision Date

    1 Model Description and Introduction to User Interface July 29, 2006

    2 Summary Tables Variable Definitions August, 2006

    3.1 SEEStat Guide I Beginning User to be completed

    3.2 SEEStat Guide II Advanced User July, 2008

    3.3 SEEStat Guide III Data Extraction Facility to be completed

    4.1 The Call Center of a US Bank" November 2, 2006

    4.2 The Call Center of IL Telecom" November 2, 2006

    4.3 Empirical Analysis of a Call Center in an Israeli Commercial Company

    July, 2009

    4.4 Empirical Analysis of a Call Center August, 2009

    5.1 Skills-Based-Routing in a US Bank February, 2008

    6.1 Empirical Analysis of Little's law using Data from the Call Center of US Bank May, 2010

    6.2 Implementing the Offered-Load in SEEStat May, 2011 For more information concerning access to the database and materials please contact: Professor Avishai Mandelbaum: [email protected]

    mailto:[email protected]

  • 3

    TABLE OF CONTENTS

    0T1 INTRODUCTION 0T ...................................................................................................................... 4 0T2 BASIC WORK PLAN0T ................................................................................................................ 4 0T3 THEORY VS. EMPIRICAL OFFERED LOAD COMPARISON0T ............................................ 5

    0T3.10T 0TDIFFERENCE ANALYSIS0T ........................................................................................................ 7 0T3.20T 0TTESTING MORE CASES 0T .......................................................................................................... 8

    0T4 CALCULATING THEORETICAL WORKLOAD USING VARYING SERVICE TIME DISTRIBUTION0T .................. 13 0T50T 0TOFFERED LOAD COMPARISON0T ......................................................................................... 16 0T60T 0TCALCULATING THEORETICAL OFFERED LOAD USING DIFFERENT REPRESENTATIONS0T ....................................................... 18

    0T6.1ESTIMATING 0T eS .................................................................................................................... 18 0T6.2 ESTIMATING R(T) USING SE 0T................................................................................................ 20

    0T7TESTING THE DIFFERENT CALCULATION METHODS USING SIMULATION 0T ......... 22 0T7.1 THE MODEL 0T .................................................................................................................... 23 0T7.20T 0TFIRST SETUP 0T .................................................................................................................... 24 0T7.30T 0TSECOND SETUP0T .................................................................................................................. 26 0T7.40T 0TTHIRD SETUP 0T .................................................................................................................... 29 0T7.50T 0TCONCLUSION 0T .................................................................................................................... 32

    0T80T 0TAVERAGE OFFERED-LOAD 0T ................................................................................................ 33 0T90T 0TFUTURE IMPROVEMENTS 0T .................................................................................................. 35

    0T9.10T 0TABANDONS SERVICE TIME GENERATION 0T .............................................................................. 35 0T9.20T 0TEXPANDING THE AMOUNT OF REALIZATIONS CONSIDERED 0T ................................................... 35 0T9.30T 0TTRYING TO FIT A MODEL TO DESCRIBE S(T) 0T ......................................................................... 35 0T9.40T 0TUSING THE SERVICE TIMES OF NON-WAITING CUSTOMERS 0T .................................................... 35

    0T100T 0TAPPENDIX 1- TEACHING NOTE 0T ......................................................................................... 36 0T110T 0TAPPENDIX 2- IMPLEMENTING THE EMPIRICAL METHOD IN SEESTAT0T ................. 39 0T120T 0TAPPENDIX 3- FITTING A MODEL FOR S(T) 0T ..................................................................... 41 0T130T 0TBIBLIOGRAPHY0T .................................................................................................................... 43

  • 4

    1 Introduction The offered load measures the amount of work in a system. It reflects both the arrival rates and the service times. The offered load process is a stochastic process that describes the offered load during a time interval. Each realization of the Offered load process is called a workload. So obviously a good estimator of the offered load would be an average of workload observations considered to be taken from the same process. The workload during a time interval represents the number of customers in a system with an infinite number of servers. A simple way to understand it would be to consider a parallel virtual system having an infinite number of servers handling the same arrival process. Therefore when a customer arrives to the system he is immediately served; hence there are no abandons. Another element that should be taken into consideration is the service time. To approximate the virtual system service time we will assume for simplicity that the service times are not effected by the waiting times. In other words, we will assume that the service times will be the same in both systems. This assumption is a part of many assumptions regarding the independence of the service times in several covariates, such as the identity (or type) of the customer/server, the part of the day and more. Taking into consideration the effect of these covariates requires a thorough study; hence, under this work limitation we will consider that the service time is not dependent on any of them. Of course, if there is no abandonment (in the virtual system) the abandon customers in the original system should be assigned with a virtual service time and hence, the remaining question is how to estimate the original system abandons' service time. According to Reich [1], assuming independence between service times and patience allows us to use the served customers' service time distribution for generation of the abandon customers' service times. In this paper we will discuss the implementation of these concepts into SEEStat, testing them empirically and using simulations.

    2 Basic Work Plan To be able to resemble the virtual system one needs to use some information from the actual system data set and to extract values that will represent the virtual system. To be more specific, the primary arrival to the system should remain the same while the length of stay in the system as well as secondary arrivals (entering a queue after receiving a different type of service) should be extracted from the data. In order to accomplish this, there are two issues to resolve:

    Shifting times. Generating service times for abandons.

    Shifting times refers mainly to service times. Due to the fact that in the virtual system there are an infinite number of servers, the service in this system is given when customer arrives. Practically, this is done by eliminating waiting times, so for every customer, the virtual service entry time will be set to be his queue entry time. In addition, for secondary services (i.e. a customer enters a new queue after receiving a different kind of service) one must consider the influence of this process on the arrival time. In other words, if a customer had waited a positive amount of time for his first service, then his arrival to the next queue is also delayed by the same amount of

  • 5

    time. Hence, the virtual queue entry time for secondary services will be extracted by eliminating all the previous waiting times. As discussed earlier, the virtual system requires assigning service times for abandon customers. One may consider different ways to generate service times for abandons. Under the presented assumptions, the service time assigned should not be affected by any covariate. In this work the method of service time generation selected is an empirical inversed distribution function. The concept is very simple; it uses the empirical CDF, extracted from the data set observation, with a uniform random variable to choose from. The technical way of how exactly this was implemented in SEEStat, is described in Appendix 2. From now on, we will refer to the described workload calculation method as the "Empirical method".

    3 Theory vs. Empirical Offered Load Comparison After implementing the Empirical method in SEEStat, comparing the outcome to theoretical estimations was needed. The most intuitive formulation of the time-varying offered load is given by

    ( ) ( ) ( )t

    R t u P S t u du

    = > . Using this formulation, and the proper measures estimators (namely ( )t and the service time survival function) one can estimate the Offered Load. On the other hand, using this formulation and the measures estimator based on one realization will estimate the workload of that same day. This concept was performed in this work to receive the Theoretical workload estimators. The practical calculation was done for every second tested by multiplying every earlier arrival by the corresponding probabilities and summing them up. The telesales service during 03/08/2001 was picked randomly for the first comparison. This comparison yielded the following charts:

    USBank workload, Telesales03.08.2001

    0.005.00

    10.0015.0020.0025.0030.0035.0040.0045.0050.0055.0060.0065.00

    00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00

    Time (Resolution 30 sec.)

    Num

    ber o

    f cas

    es

    Emp workload Theo workload

    Figure 1 Workload comparison, 03/08/01 Telesales, all day

  • 6

    Workload comparison 03-08-01 00:00-06:00, Telesales

    0.00

    1.00

    2.00

    3.00

    4.00

    5.00

    6.00

    00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00

    Emp workload Theo workload

    Figure 2 Workload comparison, 03/08/01 Telesales, 00:00-06:00

    Workload comparison 03-08-01 06:00-12:00, Telesales

    0.0010.0020.0030.0040.0050.0060.0070.00

    06:00:00 07:12:00 08:24:00 09:36:00 10:48:00 12:00:00

    Emp workload Theo workload

    Figure 3 Workload comparison, 03/08/01 Telesales, 06:00-12:00

  • 7

    Workload comparison 03-08-01 12:00-18:00, Telesales

    0.00

    10.00

    20.00

    30.00

    40.00

    50.00

    60.00

    12:00:00 13:12:00 14:24:00 15:36:00 16:48:00 18:00:00

    Emp workload Theo workload

    Figure 4 Workload comparison, 03/08/01 Telesales, 12:00-18:00

    Workload comparison 03-08-01 18:00-00:00, Telesales

    0.00

    5.00

    10.00

    15.00

    20.00

    25.00

    30.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload

    Figure 5 Workload comparison, 03/08/01 Telesales, 18:00-00:00 It is easy to see that the charts exhibit similar trends and the differences are minor during most parts of the day. During the first part of the day, it is easy to observe a conceptual difference between the two methods: While the method calculated via SEEStat creates a steps function, the theoretical method creates a kind of a saw-tooth function.

    3.1 Difference analysis The absolute difference was calculated for every workload instance. The average difference was 2.4655 and the standard deviation was 2.6815. Outside of the confidence interval (using 3 ) only 49 observations were less than 2%.

  • 8

    Furthermore, the same tests were performed to analyze the difference among different parts of the day. To do this, we divided each day into four quarters. The corresponding results were:

    1. First quarter - avg. 0.39 std. 0.43 and the relative error is 0.785 2. Second quarter - avg. 2.54 std. 2.64 and the relative error is 0.115 3. Third quarter - avg. 4.35 std. 3.2 and the relative error is 0.129 4. Fourth quarter - avg. 2.58 std. 1.89 and the relative error is 0.127

    According to these results, it is easy to see that even though the first quarter of the day provides the lowest error, the relative error during this part of the day is the biggest. One reasonable explanation may be that due to the low amount of calls and the conceptual difference between the two types of functions (discussed earlier), the absolute difference is very low but the relative difference is significant. During the rest of the day, the relative difference is pretty much similar and ranges between 11.5% to 13%.

    3.2 Testing more cases Of course, a stand-alone comparison is not enough so the next step was to expand the comparison basis. To achieve more accurate results, the next comparison was done using a service with a larger amount of calls and agents. The retail service of the same day (03/08/01) was chosen. This comparison yielded the following charts:

    Workload comparison 03-08-01, Retail

    0.0050.00

    100.00150.00200.00250.00300.00350.00400.00450.00

    00:00:00 04:48:00 09:36:00 14:24:00 19:12:00 00:00:00

    Emp workload Theo workload

    Figure 6 Workload comparison, 03/08/01 Retail, all day

  • 9

    Workload comparison 03-08-01 00:00-06:00 Retail

    0.005.00

    10.0015.0020.0025.0030.0035.0040.00

    00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00

    Emp workload Theo workload

    Figure 7 Workload comparison, 03/08/01 Retail, 00:00-06:00

    Workload comparison 03-08-01 06:00-12:00 Retail

    0.0050.00

    100.00150.00200.00250.00300.00350.00400.00450.00

    06:00:00 07:12:00 08:24:00 09:36:00 10:48:00 12:00:00

    Emp workload Theo workload

    Figure 8 Workload comparison, 03/08/01 Retail, 06:00-12:00

  • 10

    Workload comparison 03-08-01 12:00-18:00 Retail

    0.0050.00

    100.00150.00200.00250.00300.00350.00400.00

    12:00:00 13:12:00 14:24:00 15:36:00 16:48:00 18:00:00

    Emp workload Theo workload

    Figure 9 Workload comparison, 03/08/01 Retail, 12:00-18:00

    Workload comparison 03-08-01 18:00-00:00 Retail

    0.0020.0040.0060.0080.00

    100.00120.00140.00160.00180.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload

    Figure 10 Workload comparison, 03/08/01 Retail, 18:00-00:00

    Again we can see that there is quite a good resemblance between the charts and the trends during the day. The parameters received from this comparison are:

    1. All day - Avg. 11.13 std. 9.62 and the relative error is 0.082; when 30 observations exceeds a confidence interval of 3 which is approximately 1%.

    2. First quarter - Avg. 2.72 std. 2.1 and the relative error is 0.237 3. Second quarter - Avg. 17.2 std. 10.2 and the relative error is 0.082 4. Third quarter - Avg. 14.1 std. 9.94 and the relative error is 0.055 5. Fourth quarter - Avg. 10.5 std. 6.81 and the relative error is 0.066

    We can see that here the relative errors are lower than in the Telesales service, while keeping the same trends among the different parts of the day.

  • 11

    Although the numbers show a good match between the theoretical and the empirical workload, there is still room for improvement especially when considering the last part of the day (in both services there is a bias in this part of the day). One reasonable explanation to these discrepancies may be the method used to compute unhandled customer service times. To reduce the effect of this issue, a comparison on a day with a low amount of unhandled calls was needed. After a quick search using SEEStat 29/10/01 Retail service (only 214 unhandled calls of 126977 calls arrived that day) was chosen, and the workload comparison yielded the following charts:

    USBank Workload, Retail29.10.2001

    0.0025.0050.0075.00

    100.00125.00150.00175.00200.00225.00250.00275.00300.00

    00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00

    Time (Resolution 30 sec.)

    Num

    ber o

    f cas

    es

    Emp workload Theo workload

    Figure 11 Workload comparison, 29/10/01 Retail, all day

    Workload comparison, 29/10/01 00:00-06:00, Retail

    0.002.004.006.008.00

    10.0012.0014.0016.0018.0020.00

    00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00

    Emp workload Theo workload

    Figure 12 Workload comparison, 29/10/01 Retail, 00:00-06:00

  • 12

    Workload comparison, 29/10/01 06:00-12:00, Retail

    0.0050.00

    100.00150.00200.00250.00300.00350.00

    06:00:00 07:12:00 08:24:00 09:36:00 10:48:00 12:00:00

    Emp workload Theo workload

    Figure 13 Workload comparison, 29/10/01 Retail, 06:00-12:00

    Workload comparison, 29/10/01 12:00-18:00, Retail

    0.0050.00

    100.00150.00200.00250.00300.00350.00

    12:00:00 13:12:00 14:24:00 15:36:00 16:48:00 18:00:00

    Emp workload Theo workload

    Figure 14 Workload comparison, 29/10/01 Retail, 12:00-18:00

    Workload comparison, 29/10/01 18:00-00:00, Retail

    0.0020.0040.0060.0080.00

    100.00120.00140.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload

    Figure 15 Workload comparison, 29/10/01 Retail, 18:00-00:00

  • 13

    Again we can see that there is quite a good resemblance between the charts and the trends during the day. The parameters received from this comparison are:

    1. All day - Avg. 8.03 std. 7.4 and the relative error is 0.074; when 32 observations exceeds a confidence interval of 3 which is approximately 1%.

    2. First quarter - Avg. 2.33 std. 2.4 and the relative error is 0.54 3. Second quarter - Avg. 10.18 std. 6.66 and the relative error is 0.063 4. Third quarter - Avg. 12.66 std. 8.72 and the relative error is 0.061 5. Fourth quarter - Avg. 6.97 std. 5.74 and the relative error is 0.052

    Although there is a slight improvement in the relative difference some of the absolute differences are still significant. Furthermore the charts still show bias during the last part of the day. This may indicate that the imputation of service times for abandons is not the source for the discrepancies.

    4 Calculating Theoretical Workload Using Varying Service Time Distribution

    Another issue that might have caused these differences was the use of constant empirical service time distribution. In other words, both theoretical workload and empirical workload were calculated under the implicit assumption of constant service time distribution, while in practice it is not necessarily constant. This may affect the service time generation when calculating empirical workload as well as the probability to survive estimation when calculating the theoretical workload. According to this, the theoretical estimator is much more affected from this assumption due to the fact that the survival function estimator takes place in each calculation of this estimator. Under these circumstances, the first step for improving the match between the two estimators was to improve the survival function estimator for the theoretical method. A primary approach was to use a discrete varying service time distribution (i.e. a finite number of service time distributions describing the same number of time intervals). Practically this was done by dividing the day into 4 parts (according to the day quarters presented earlier) and each call was multiplied by the probabilities derived from the appropriate distribution (the appropriate distribution for a customer is determined by his arrival time). The theoretical estimator was calculated again using these estimators and a comparison between the three methods (Empirical method, theoretical method using constant service time distribution and theoretical method using varying service time distributions) was carried out. The following charts show the differences between the three methods during only the last part of the day for each examined service:

  • 14

    Workload comparison, 03-08-01 18:00-00:00, Telesales

    0.00

    5.00

    10.00

    15.00

    20.00

    25.00

    30.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload Varying theo workload

    Figure 16 Workload comparison using 3 methods, 03/08/01 Telesales, 18:00-00:00

    Workload comparison 03-08-01 18:00-00:00 Retail

    0.0020.0040.0060.0080.00

    100.00120.00140.00160.00180.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload Varying theo workload

    Figure 17 Workload comparison using 3 methods, 03/08/01 Retail, 18:00-00:00

  • 15

    Workload comparison, 29/10/01 18:00-00:00, Retail

    0.0020.0040.0060.0080.00

    100.00120.00140.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload Varying theo workload

    Figure 18 Workload comparison using 3 methods, 29/10/01 Retail, 18:00-00:00 It is easy to see that using varying service times distributions we get better results and almost no bias during the last part of the day. Furthermore, almost every other parameter during all parts of the day was improved. The results are summarized in the following table: 03/08/01

    Telesales 03/08/01 Retail 29/10/01 Retail

    Cons. Var. Cons. Var. Cons. Var. All day Difference avg. 2.4654 2.0902 11.1302 8.2865 8.0357 6.9475

    Difference std. 2.6851 2.4702 9.6217 8.6170 7.3928 7.2086 Relative difference avg. 0.1551 0.1270 0.0818 0.0609 0.0737 0.0639 Observations exceeds 3 49 60 30 48 32 38 Proportion of exceeding observations 0.017 0.021 0.01 0.017 0.0111 0.0132

    First qtr.

    Difference avg. 0.3912 0.3075 2.7170 2.3061 2.3321 1.2842 Difference std. 0.4277 0.4084 2.0981 2.2023 2.3992 1.1457 Relative difference avg. 0.7845 0.8548 0.2364 0.2235 0.5370 0.3839

    Second qtr.

    Difference avg. 2.5370 2.3276 17.2025 13.9360 10.1818 9.4965 Difference std. 2.6408 2.3945 10.1992 9.4258 6.6614 7.1895 Relative difference avg. 0.1145 0.1032 0.0821 0.0690 0.0633 0.0607

    Third qtr.

    Difference avg. 4.3529 4.0840 14.1008 12.2008 12.6627 11.7919 Difference std. 3.2003 3.0526 9.9398 9.6108 8.7177 8.3066 Relative difference avg. 0.1286 0.1177 0.0551 0.0470 0.0609 0.0562

    Fourth qtr.

    Difference avg. 2.5807 1.6418 10.5005 4.7031 6.9663 5.2173 Difference std. 1.8910 1.3390 6.8140 3.9091 5.7394 4.5771 Relative difference avg. 0.1274 0.0772 0.0658 0.0286 0.0520 0.0383

    Table 1 constant vs. varying service time distribution comparison

  • 16

    These results clearly show that using varying service time distribution is better than using a constant service time distribution in every tested parameter except the amount of exceptional observations. Furthermore, the difference between the two methods of workload calculating is very low, so assuming that the results are similar is reasonable.

    5 Offered Load Comparison As mentioned before, the workload is a realization of the Offered load process, and hence, a good estimator would be an average of several workload realizations considered to have the same characteristics. Using SEEStat, empirical workload estimators for the Wednesdays of August 2001 were generated and their average was calculated. Respectively, the theoretical offered load using

    ( ) ( ) ( )t

    R t u P W t u du

    = > was calculated by inputting parameters that were estimated based on all the days counted (Wednesdays of August 2001). This comparison yielded the following charts:

    USBank offered_load comparison, RetailAugust 2001, Wednesdays

    0.0025.0050.0075.00

    100.00125.00150.00175.00200.00225.00250.00275.00300.00325.00

    00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00

    Time (Resolution 30 sec.)

    Ave

    rage

    num

    ber o

    f ca

    ses

    Emp offered load Theo offered load

    Figure 19 Offered load comparison, Aug Wednesdays 2001, Retail

    Offered load comparison, Aug Wednesdays 2001 Retail, 00-06

    0.00

    5.00

    10.00

    15.00

    20.00

    25.00

    00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00

    Emp workload Theo workload

    Figure 20 Offered load comparison, Aug Wednesdays 2001, Retail, 00:00-06:00

  • 17

    Offered load comparison, Aug Wednesdays 2001 Retail, 06-12

    0.0050.00

    100.00150.00200.00250.00300.00350.00

    06:00:00 07:12:00 08:24:00 09:36:00 10:48:00 12:00:00

    Emp workload Theo workload

    Figure 21 Offered load comparison, Aug Wednesdays 2001, Retail, 06:00-12:00

    Offered load comparison, Aug Wednesdays 2001 Retail, 12-18

    0.00

    50.00

    100.00

    150.00

    200.00

    250.00

    300.00

    12:00:00 13:12:00 14:24:00 15:36:00 16:48:00 18:00:00

    Emp workload Theo workload

    Figure 22 Offered load comparison, Aug Wednesdays 2001, Retail, 12:00-18:00

  • 18

    Offered load comparison, Aug Wednesdays 2001 Retail, 18-24

    0.0020.0040.0060.0080.00

    100.00120.00140.00160.00

    18:00:00 19:12:00 20:24:00 21:36:00 22:48:00 00:00:00

    Emp workload Theo workload

    Figure 23 Offered load comparison, Aug Wednesdays 2001, Retail, 18:00-24:00 It is easy to see that the charts display a very good similarity during all parts of the day excluding a gap between the two curves during the interval 18:3519:40. This paper may not find the reasons for all of these discrepancies; on the other hand, it is pretty reasonable to have these discrepancies due to the fact that practically the comparison is done between two different estimators. The analysis is done using empirical data so the actual offered load is not known and can only be estimated. Hence, the appearance of a gap between the two curves cannot tell us which estimation is wrong (or less accurate). This issue is discussed in part 7.

    6 Calculating the Theoretical Offered Load Using Different Representations

    A different way to represent R(t) was using the ( ( )) ( )eE t S E S representation. To use this representation, one needs to estimate Se. A primary approach might be to use a simpler approximation using only ( )eE S in which case [ ( ( ))] ( )eE t E S E S is reasonable.

    6.1 Estimating eS The first step was to estimate ( )eE S for every second of one day. Basically eS is defined as the remaining service time. Hence an observation of eS at time t would be the remaining service time of a customer (in service) at time t and a reasonable estimator for ( )eE S will be the average of these observations. The Retail service of 1/08/01 was chosen for this analysis. This estimation yielded the following chart:

  • 19

    Figure 24 Average Se during 1/08/2001, Retail service The chart is easily divided into two parts:

    Up to approximately 27000 sec (7:30 am) it is very noisy (probably due to a low number of observations during this part of the day) and

    27000 up to 86400 sec, which is less noisy and exhibits a pretty stable average with a bit of an elevating trend.

    Of course the average does not tell us the whole story, and hence I have plotted some histograms for some chosen seconds during the day (the second part of it):

    Figure 25 Se distribution on 1/08/2001 13:53:20

  • 20

    Figure 26 Se distribution on 1/08/2001 18:03:20 It seems like the histograms exhibit an exponential distribution. To understand if the exponential distribution can describe the Se distribution, a standard statistical test was done. The goal was to extract 2R values from the regression model fitted according to

    ln(1 ( ))nF x x

    = where x represents the observations at time t (of the second part of the day). This calculation yielded the following chart:

    Figure 27 2R during the second part of 1/08/2001, Retail service The chart shows that the values of 2R are pretty high, varying between 0.9 to 1 (most of the time) which indicates that the exponential distribution is a good option to describe the Se distribution.

    6.2 Estimating R(t) using Se After estimating Se for every second of the day, what is left to be done in order to estimate R(t) is to calculate lambda and the average service time for every second of the day and then simply to use [ ( ( ))] ( )eE t E S E S (from here on this method will be referred to as the Se method or Se workload). Practically, for each second the proper arrival rate is calculated (the arrival rate at ( )et E S ) and multiplied by the

  • 21

    average service time. The following chart displays this calculation for Retail service during 01/08/2001:

    Figure 28 Se calculated offered load 1 sec resolution 1/08/01 Retail

    It is easy to see that the result is very noisy; furthermore, it varies between 0 and 5000 which is pretty much out of the actual range. The reason for that is probably a lack of accuracy in estimating lambda and average service times for small intervals (namely 1 sec intervals). According to this conclusion, the next step was to try and fit the same model using bigger resolutions. The following charts present the comparison of the Se calculated offered load to the empirical offered load in a 30 sec resolution and a 5 min resolution:

    USBank Workload, Retail01.08.2001

    0.0050.00

    100.00150.00200.00250.00300.00350.00400.00450.00500.00550.00600.00650.00

    00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00

    Time (Resolution 30 sec.)

    Num

    ber o

    f cas

    es

    Emp workload Se workload

    Figure 29 Se workload vs. empirical workload, 1/08/01 Retail, 30 sec resolution

  • 22

    USBank Workload, Retail01.08.2001

    0.00

    50.00

    100.00

    150.00

    200.00

    250.00

    300.00

    350.00

    400.00

    00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00

    Time (Resolution 5 min.)

    Num

    ber o

    f cas

    es

    Emp workload Se workload

    Figure 30 Se workload vs. empirical workload, 1/08/01 Retail, 5 min resolution

    It is easy to see that in both charts the Se curve and the empirical curve exhibit similar trends, but the Se curve in the 30-sec resolution is much noisier. One reasonable explanation might be the inaccuracy in estimating E(S) and Lambda based on a short time interval. We have now compared two different estimators and hence we once again confront the question of which estimation is more accurate. To answer this question we must know what the actual offered load is. Part 7 describes a simulation model built for this purpose.

    7 Testing the Different Calculation Methods Using Simulation

    This paper explored several methods of offered load calculation using empirical data. Of course testing theory with actual observations is essential, but on the other hand, there is a major disadvantage: all the measures are estimated. In other words, the comparison between the empirical offered load and the theoretical offered load was based on an estimated theoretical offered load, and as shown previously, some of the discrepancies may be due to that reason. To overcome this issue, a simple service system simulation model was built in which one can control the system parameters (arrival rate, service time and number of servers). So basically using this model, the offered load is known and determined; thus all the estimation methods can be compared to the actual offered load.

    The theoretical offered load was calculated using the ( ) ( ) ( )t

    R t u P W t u du

    = > representation and inputting the known (and not estimated) parameters. To compare the three methods (empirical, theoretical and Se) the RMSE index was calculated considering the differences between each method and the theoretical offered load using 1 sec resolution.

  • 23

    7.1 The model The model was built using the Arena simulation program. One basic model was used as a basis and enabled us to define three different setups:

    1. Constant arrival rate and constant exponential service time distribution; 2. Constant arrival rate and constant log-normal service time distribution; 3. Constant arrival rate and time varying exponential service time distribution.

    Figure 31 The basic simulation model in Arena

    Each setup was tested on a period of 11 hours. Using the create module one is able to control the arrival rate to the system. In all 3

    setups the arrival process was homogenous Poisson withmin

    10 Customers= .

    After arrival each customer is assigned with patience generated of an exponential distribution with a 15 min average (empirical results show that the patience often ranges between 23 service time averages). Each patience assignment is independent of the other assignments and furthermore independent of the service time assignment. After the patience assignment, the customer enters the queue (called line due to technicalities) which delays the customer only if all the servers are busy. After exiting the line, each customer is assigned a service time; if the customer waited longer than his patience he will be counted as an abandon and will be assigned with 0 service time (so practically he did not affect the other customers waiting time). One may prefer to assign the service time upon the customer's arrival to the system, but in this case the assignment was done after waiting in line (if the service time does not vary, it does not really matter).

  • 24

    After exiting the service, the customers are divided into abandoned and served customers and their IDs (serial numbers according to the order of arrival): queue entry time, service entry time, service exit time and patience are recorded into files. For each setup the varying parameter was the number of servers. The number of servers tested varies between 4 and the highest number of servers that still created some abandoning customers. There was also a test containing an infinite number of servers for each setup. For each setup and number of servers, 20 realizations were generated.

    7.2 First setup This setup was the simplest one. The arrival rate was set to be homogeneous Poisson with an average of 10 arrivals per minute. The service time was generated from an exponential distribution with a 5 min average. The number of servers that were tested was: 4,7,13,16,20,25,30,35,40,45,50,54,58,63,68 and an infinite number of servers. The following charts present some of the comparisons done for this setup. The charts present only a part of each test due to the 1-sec resolution.

    Empirical Method for 50 servers

    0102030405060

    0:00

    :00

    0:30

    :00

    1:00

    :00

    1:30

    :00

    2:00

    :00

    2:30

    :00

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    Time

    Offe

    red

    load

    Theoretical Empirical method

    Figure 32 Comparing empirical method to theoretical offered load 1PstP setup 50 servers

  • 25

    Se Method, infinite number of servers

    01020304050607080

    0:00

    :00

    0:30

    :00

    1:00

    :00

    1:30

    :00

    2:00

    :00

    2:30

    :00

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    Time

    Offe

    red

    load

    Theoretical Se method

    Figure 33 Comparing Se method to theoretical offered load 1Pst P setup infinite servers

    Theoretical method, 25 servers

    0102030405060

    0:00

    :00

    0:30

    :00

    1:00

    :00

    1:30

    :00

    2:00

    :00

    2:30

    :00

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    Time

    Offe

    red

    load

    Theoretical Theoretical estimation

    Figure 34 Comparing theoretical estimation to theoretical offered load 1Pst P setup 25 servers Using only these charts one cannot conclude much about the differences between the three methods. The following charts sum up the relation between the number of servers and the abandon rate to the RMSE calculated for each method.

  • 26

    RMSE vs. # Servers

    0

    5

    10

    15

    20

    25

    30

    0 20 40 60 80

    # Servers

    RM

    SE

    Emp Se Theo

    Figure 35 1Pst P setup RMSE to number of servers 3 methods comparison (80 servers stand for infinite number of servers)

    RMSE vs P(Ab)

    05

    101520

    2530

    0 0.2 0.4 0.6 0.8 1

    P(Ab)

    RM

    SE

    Emp Se Theo

    Figure 36 P(Ab) to number of servers 3 methods comparison It is easy to see that for each value (P(Ab) or number of servers) the empirical method provides the lowest RMSE; then comes the theoretical estimation and finally the Se method. Furthermore, the empirical method has pretty much stable RMSE values regardless of the number of servers or abandon rate. Therefore for this setup the empirical method is definitely the best.

    7.3 Second setup This setup was very similar to the first one. The arrival rate was set to be homogeneous Poisson with an average of 10 arrivals per minute. The service time was generated from a log-normal distribution with a 5-min average. The number of servers

  • 27

    that were tested were: 4,7,13,16,20,25,30,35,40,45,50,54,58,63,68 and an infinite number of servers. The following charts present some of the comparisons done for this setup. The charts present only a part of each test due to the 1-sec resolution.

    Empirical method, 16 servers

    0102030405060

    0:00

    :00

    0:30

    :00

    1:00

    :00

    1:30

    :00

    2:00

    :00

    2:30

    :00

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    Time

    Offe

    red

    load

    Theoretical Empirical

    Figure 37 Comparing empirical method to theoretical offered load 2PndP setup 16 servers

    Se method, 50 servers

    01020304050607080

    0:00

    :00

    0:30

    :00

    1:00

    :00

    1:30

    :00

    2:00

    :00

    2:30

    :00

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    Time

    Offe

    red

    load

    Theoretical Se method

    Figure 38 Comparing Se method to theoretical offered load 2PndP setup 50 servers

  • 28

    Theoretical method, 68 servers

    0

    10

    20

    30

    40

    50

    60

    0:00

    :00

    0:30

    :00

    1:00

    :00

    1:30

    :00

    2:00

    :00

    2:30

    :00

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    Time

    Offe

    red

    load

    Theoretical Theoretical estimation

    Figure 39 Comparing theoretical estimation to theoretical offered load 2PndP setup 68 servers Again, this is not the full picture. The next two charts summarize the relation between the RMSE and the number of servers and the abandon rates.

    RMSE vs. # Servers

    0

    5

    10

    15

    20

    25

    30

    0 20 40 60 80

    # Servers

    RM

    SE

    Emp Se Theo

    Figure 40 2PndP setup RMSE to number of servers 3 method comparison (80 servers stand for infinite number of servers)

  • 29

    RMSE vs P(Ab)

    05

    1015202530

    0 0.2 0.4 0.6 0.8 1

    P(Ab)

    RM

    SE

    Emp Se Theo

    Figure 41 2PndP setup RMSE to P(Ab) 3 method comparison The results for this setup exhibit the same conclusions reached from the first setup. Clearly the empirical method provides the best estimation and is not dependent on the number of servers (or service rate).

    7.4 Third setup This setup was different from the first due to a single change in the service time distribution. The arrival rate was again set to be homogeneous Poisson with average of 10 arrivals per minute. The service time was generated from an exponential distribution with 5 minutes average during the first 6 hours and then changed to exponential with an average of 6.5 minutes. The numbers of servers that were tested were: 4,7,13,16,20,25,30,35,40,45,50,54,58,63,68,72,76,80,85,90 and infinite number of servers. The following charts present some of the comparisons done for this setup. The charts present only a part of each test due to the 1-sec resolution.

  • 30

    Empirical method, 85 servers

    01020304050607080

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    8:30

    :00

    9:00

    :00

    9:30

    :00

    10:0

    0:00

    Time

    Off

    ered

    load

    Theoretical Empirical

    Figure 42 Comparing empirical method to theoretical offered load 3PrdP setup 85 servers

    Se method, 20 servers

    020406080

    100120

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    8:30

    :00

    9:00

    :00

    9:30

    :00

    10:0

    0:00

    Time

    Offe

    red

    load

    Theoretical Se method

    Figure 43 Comparing Se method to theoretical offered load 3PrdP setup 20 servers

  • 31

    Theoretical method, 58 servers

    010203040506070

    3:00

    :00

    3:30

    :00

    4:00

    :00

    4:30

    :00

    5:00

    :00

    5:30

    :00

    6:00

    :00

    6:30

    :00

    7:00

    :00

    7:30

    :00

    8:00

    :00

    8:30

    :00

    9:00

    :00

    9:30

    :00

    10:0

    0:00

    Time

    Offe

    red

    load

    Theoretical Theoretical estimation

    Figure 44 Comparing Theoretical estimation to theoretical offered load 3PrdP setup 58 servers Again, this is not the full picture. On the other hand, it is easy to see the influence of the change in the service time distribution and the way that each method "handles" this change. The following two charts summarize the relation between the RMSE and the number of servers and the abandon rates.

    RMSE vs. # Servers

    05

    101520253035

    0 20 40 60 80 100

    # Servers

    RM

    SE

    Emp Se Theo

    Figure 45 3PrdP setup RMSE to number of servers 3 method comparison (110 servers stand for infinite number of servers)

  • 32

    RMSE vs P(Ab)

    05

    101520253035

    0 0.2 0.4 0.6 0.8 1

    P(Ab)

    RM

    SE

    EMP Se Theo

    Figure 46 3PrdP setup RMSE to P(Ab) 3 method comparison Average offered load For this setup all of the methods provided higher RMSE. The reason for that is probably the service time distribution change. In all 3 methods, estimating the offered load involved using all the observations without dividing them into the different distributions. Again we can see that the empirical method provided the best estimation, but here there is an influence of the number of servers; the RMSE is slightly increasing with the abandon rate. Another difference between this setup compared to the other two regards the other two methods, starting from 20 servers (or 65% abandonment). The Se method provided better estimation than the theoretical method. This is consistent with the chart in Figure 44. Using the theoretical method requires an estimation of ( )tSP > , and when using observations taken from two different distributions we get an average estimator as seen in Figure 44.

    7.5 Conclusion The results from the simulation analysis determine that the empirical method provides the best estimation for the offered load. For all the tests performed it yielded the best estimator with values ranging from 1.9 to 1.5 (except for the third setup with less than 50 servers). These values of RMSE are consistent with APE values of around 0.020.03 which again indicates a good estimation. Furthermore, the empirical estimation is pretty much robust to changes in abandon rates and number of servers. The theoretical method seems to be the most sensitive to change in the service time distribution and the Se method is the most sensitive to changes in the abandon rate. But both of them are considerable given constant service time distribution and low abandon rates. These conclusions give a different point-of-view to the empirical results discussed earlier. Basically we have discussed the discrepancies between the empirical offered load and the theoretical estimation that was considered to be more accurate. Now it is clear that the empirical estimation is better, so comparing it with a different estimator will mostly indicate the goodness of the other estimator.

  • 33

    8 Average Offered Load Until now only instances of the offered load (or workload) were discussed. This section will handle the average offered load (or workload) on an interval. A routine for this calculation was implemented in SEEStat based on the work that was done for the instance workload calculations. Eventually, the actual way of calculating the average workload involved counting how many customers are in the virtual system at each moment, and averaging it on the interval. Now since we have actually created data for the virtual system we can extract more parameters, such as average service time or arrival rate (which is the same as in the original system) and hence we can compare the empirical average workload with the estimation calculated using ( )SER = (basically this estimation could have been done using the theoretical estimation for each second of the interval and average it on the interval, but it would be similar to my former comparisons of these two methods). Comparing these two calculation methods yielded the following charts:

    USBank Av_work_load, Retail10.10.2001

    0.0025.0050.0075.00

    100.00125.00150.00175.00200.00225.00250.00275.00300.00

    00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00

    Time (Resolution 30 min.)

    Num

    ber o

    f cas

    es

    Av_work_load Lambda*E(S)

    Figure 47 Two calculations comparison, 10/10/2001, Retail.

  • 34

    USBank Av_work_load, Retail03.08.2001

    0.0025.0050.0075.00

    100.00125.00150.00175.00200.00225.00250.00275.00300.00325.00350.00

    0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00

    Time (Resolution 30 min.)

    Num

    ber o

    f cas

    es

    Av_work_load Lambda*E(S)

    Figure 48 Two calculations comparison, 03/08/2001, Retail. It is easy to see that there is a good match between the two curves. If we check the same comparison for 10/10 with lower resolution (e.g. 5 min) we will get:

    USBank Av_work_load, Retail10.10.2001

    0.0025.0050.0075.00

    100.00125.00150.00175.00200.00225.00250.00275.00300.00325.00

    0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00

    Time (Resolution 5 min.)

    Num

    ber o

    f cas

    es

    Av_work_load Lambda*E(S)

    Figure 49 Two calculations comparison, 10/10/2001, Retail 5 min resolution. It is easy to see that the matching decreased along with the resolution. This is consistent with Little's law research conclusions.

  • 35

    9 Future Improvements This presented work is very primary, and as discussed there are some inaccuracies. To decrease these inaccuracies there are some improvements worth considering.

    9.1 Abandons service time generation Currently, generating service times for the abandon customers is done using empiric service time distribution extracted from the handled day data and including all day service times. As shown before (for the theoretical offered load calculation) dividing the day into smaller parts may increase accuracy. The division presented earlier was pretty much arbitrary, and improving the generating process may involve more sophisticated algorithms to create a better division of the service times according to the relevant part of the day.

    9.2 Expanding the amount of realizations considered Basically, all the calculations presented were based on parameters extracted from data of only one realization (i.e. one-day observations). To receive more accurate estimators using a broader database is required. Creating such a database requires using days which are statistically similar.

    9.3 Trying to fit a model to describe S(t) As discussed earlier assuming that S distribution is not constant in time improves the matching between the theoretical and empirical offered load. Under this assumption a rough division of the day was done and further improvements are discussed in Section 9.1. Taking this idea one step further takes the time intervals to 0 or in other words, to try fit a model to describe S(t).

    9.4 Using the service times of non-waiting customers According to Reich [1], the unconditional distribution of the service time is expressed through the service time of non-waiting customers. Practically, it can be estimated from calls that enjoy very short waiting times (e.g. less than 5 seconds).

    Using the ( ) ( ) ( )t

    R t u P S t u du

    = > method, one can estimate the offered-load function in terms of the arrival process and the unconditional distribution of the service time, without taking into account the relationship between patience and service time. Furthermore, the empirical method may be implemented using the unconditional empirical distribution for imputing service times of abandon customers. This paper was written under the assumption of independence between S and , but using this information allows the same comparison process without this assumption.

  • 36

    10 Appendix 1- Teaching Note One of the ways to understand the offered load is to think about it as the expectation of the number of customers in a parallel system with an infinite number of servers. So naturally one of the ways to estimate it would be to simulate such a system. Now, if we assume that there is an infinite number of servers, then we practically assume that when a customer arrives to the system he immediately enters a service; in other words there is no queue, no one is waiting for service and hence there are no abandons as well. To perform such a simulation we would like to use all the information we have on the original system and to implement it under the virtual system's conditions. The relevant information we posses is the arrival sequence, and most of the service times (except for abandon customers). So practically there is a need to handle 2 issues:

    1. Waiting times elimination - Under the assumption that service time is independent of waiting time, all there is to do is to consider the arrival time as service entry time.

    2. Abandons service time estimation- There are many different ways to generate service times for abandon customers.

    Handling these issues for a realization of the original system yields a realization in the virtual system. Now, all there is to do is to implement the same methods used on the original data to extract the desired parameters out of the virtual realization. Specifically, if we would like to calculate ( )L t , all we have to do is to count the number of customer at each moment in the virtual system. Let us demonstrate the concept through a setup of a simple system:

    Let be constant in time and equal to 31 (i.e. 1 customer every 3 time units).

    For each customer the service time is constant and equals 5 time units, which we will denote by S. The service is given only by 1 server. One realization of this system can be that the first customer arrives at t=0. Let us present it graphically:

  • 37

    We will calculate R(4) R(7) R(8.5) and E(R) theoretically and empirically. Theoretically:

    If we use the ( )t

    t D

    u du presentation we would get the following calculation:

    4 7 8.5

    0 2 3.5

    1 4 1 5 1 5(4) (7) (8.5)3 3 3 3 3 3

    R du R du R du= = = = = =

    and generally for every 5t , 5( )3

    R t = .

    Empirically: First let us understand the virtual system, since we have assumed there are no abandons the only issue we need to handle is waiting times elimination. So basically the arrivals sequence remains the same and each customer is served the moment he arrives for 5 time units. A graphical presentation will be:

    A(t)

    Time 3 5 0 10 6 9 15 12

    Wait time

    Service time

    Original system- D|D|1

  • 38

    The easiest way to calculate the number of customers at a certain moment is to draw a vertical line at the certain moment and to count the number of rectangles it intersects. According to this calculation we get:

    (4) 2L = , (7) 2L = , (8.5) 1L = But we aim to calculate R(t), so the next step should be to calculate ( )( )E L t . As mentioned before the given example is only 1 realization of the described system. One can generate a different realization simply by setting a different time for the first arrival and hence creating a different sequence of arrivals. It is important though to understand that since the time between arrivals is constant, the set of arrivals sequences is bounded (i.e. first arrival can occur in the half open segment [0,3)). To calculate ( )( )E L t we need to calculate the average of L(t) considering all possible realizations. Let us calculate ( )(7)E L : If the first arrival takes place in the segment [0,1) then (7) 2L = . If it takes place in the segment [1,2) then (7) 1L = and if it takes place in the segment [2,3) then (7) 2L = .

    Hence 2 1 1 1 2 1 5(7) ( (7))3 3

    R E L + +

    = = = .

    Furthermore we will receive the exact same calculations for all 3t . So, to conclude the results, we see that for all 5t the theoretical calculation yields

    5( )3

    R t = , and for all 3t we get the same result; hence in this scenario for 5t the

    two calculation methods are equivalent.

    A(t)

    Time 3 5 0 8 6 9 15 11

    Service time

    Virtual system- D|D|

    12 14

    t=7, intersection with 2 rectangles

  • 39

    11 Appendix 2- Implementing the Empirical Method in SEEStat

    We aim to be able to calculate the offered load for every desired moment. Therefore the required type of the SEEStat variable was instances (i.e. how many customers are in the virtual system in a certain moment). To implement this model the attributes lacking were "virtual segment start" and "virtual segment end". Therefore the goal was to create a new table, consisting of these new attributes. To create the new table I used the basis of "cust subcall" table and added some columns. The first step was to define the vectors that I intend to use:

    st_int_vector ^v1 = gcnew st_int_vector (); v1 = h["virtual_segment_start"]->cintvec (); st_int_vector ^v2 = gcnew st_int_vector (); v2 = h["virtual_segment_end"]->cintvec (); st_int_vector ^v3 = gcnew st_int_vector (); v3 = h["service"]->cintvec (); st_int_vector ^v4 = gcnew st_int_vector (); v4 = h["cust_subcall"]->cintvec (); st_int_vector ^v5 = gcnew st_int_vector (); v5 = h["virtual_wait_time"]->cintvec (); st_int_vector ^v6 = gcnew st_int_vector (); v6 = h["virtual_service_time"]->cintvec (); st_int_vector ^v7 = gcnew st_int_vector (); v7 = h["outcome"]->cintvec (); st_int_vectors ^vec_array= gcnew st_int_vectors(17); st_double_vector ^avg_vec = gcnew st_double_vector (); avg_vec->assign (17,0); st_double_vector ^size_vec = gcnew st_double_vector (); size_vec->assign (17,0);

    Vectors dictionary:

    o V1- virtual segment start: holds the virtual time of entering service (or system); it is shifted according to the previous calls of the same customer (if cust_subcall is bigger than 1).

    o V2- virtual segment end: holds the virtual time of exiting service (or system); it is shifted according to the previous calls of the same customer and the waiting time in the actual segment.

    o V3- service: holds the service type. o V4-cust subcall: holds the counter of current service in the current call. o V5-virtual wait time: accumulates waiting time of all previous

    subcalls. o V6-virtual service time: holds the actual service time for calls that

    were served and generated service time for calls that were unhandled. o V7- outcome: the subcall outcome. o Vec_array- holds service times divided by the service types. o Size_vec- holds the size of each vector in vec array.

    The next step was to eliminate all waiting times. To do that I used the attribute "virtual waiting time" that holds each segment's actual waiting time, at the beginning, and eventually adds all the previous subcalls waiting time using the presented code:

    for (int i = 0; i bound; i++)

  • 40

    { if (v4[i]>1) v5[i]+=v5[i-1]; } The virtual waiting time elimination was then performed using:

    for (int i = 0; i bound; i++) { v2[i]-=v5[i]; if (v4[i]>1) v1[i]-=v5[i-1]; } After waiting time elimination was completed the only thing left to do was to generate service times for abandons. Many methods were considered to perform this generation, I chose the inverse function method, using the service times empirical distribution function. To perform the above I needed to create a sorted vector of actual service times. Of course generation of service times depends on the type of service given; hence the first step of service time generating was to divide the service times vector by services and to sort it: st_int_vector ^tmp_v6 = gcnew st_int_vector (); for (int i=0; iset(v6); int k=0; for (int j = 0; jbound; j++) { if(v3[j]!=i+1 || (v7[j]>10 && v7[j]size()>0) { tmp_v6->erase(k); } } else { k++; } } tmp_v6->sort(); vec_array[i]->set(tmp_v6); tmp_v6->clear(); } The actual service time generation was performed by generating a random uniform (uni(0,1)) variable and multiplying it by the size of the service times vector of the appropriate service. Using the floor function on the result of this multiplication gives an integer between 0 and the size of the relevant vector, so basically it indicates a certain cell in the vector. According to the inverse function theorem the number in this cell will be the generated service time. The last thing to do to complete the process is to update the virtual segment end for the calls with generated service time. This is the only attribute affected by the service time generation since the virtual segment start is the same no matter what the service time, and no further calls will be affected since we are assuming that if a call was unhandled there will be no additional segments after the unhandled segment.

  • 41

    srand(time(NULL));

    int t; double x; for (int i = 0; i bound; i++) { if (v7[i]>10 && v7[i]size())/double (RAND_MAX)); t=v3[i]-1; v6[i]=vec_array[t][x]; v2[i]+=v6[i]; } } The described process eventually created a new access table. The new table was imported to SEEStat for analysis.

    12 Appendix 3- Fitting a Model for S(t) My first attempt was to fit a model using every single observation of August Wednesdays. First I tried a polygon which yielded (using stepwise regressions): Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.441e+02 8.708e+00 28.031 < 2e-16 *** h -1.434e-06 1.769e-07 -8.106 5.27e-16 *** c -3.644e-15 5.005e-16 -7.282 3.32e-13 *** b 1.163e-10 1.493e-11 7.786 6.97e-15 *** d 5.336e-20 7.879e-21 6.772 1.28e-11 *** e -3.238e-25 5.146e-26 -6.293 3.12e-10 *** g 5.354e-36 9.832e-37 5.446 5.16e-08 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 256.7 on 99993 degrees of freedom Multiple R-squared: 0.002995, Adjusted R-squared: 0.002935 F-statistic: 50.06 on 6 and 99993 DF, p-value: < 2.2e-16

    Table 2 Regression parameters for S vs. t polynom For simplicity I noted t using h, 2t using b, 3t using c, 4t using d, 5t using e, 6t using f and 7t using g. It is very easy to see that despite the fact that all parameters are significant 2R is very poor. Another attempt involved fitting a polygon of ln( )t to explain S. Using the same notation the calculation yielded:

  • 42

    Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.286e+02 8.587e+00 26.622 < 2e-16 *** h -1.164e-06 1.949e-07 -5.974 2.32e-09 *** c -3.524e-15 6.208e-16 -5.676 1.38e-08 *** b 1.022e-10 1.727e-11 5.919 3.25e-09 *** d 5.956e-20 1.104e-20 5.393 6.95e-08 *** e -4.924e-25 9.655e-26 -5.100 3.40e-07 *** f 1.592e-30 3.311e-31 4.807 1.54e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 255.1 on 99993 degrees of freedom Multiple R-squared: 0.002787, Adjusted R-squared: 0.002728 F-statistic: 46.58 on 6 and 99993 DF, p-value: < 2.2e-16

    Table 3 Regression parameters S vs. ln( )t polynom Again all the parameters are significant and 2R is poor. Due to the presented regression results and the fact that at every moment S can vary in a significant range I tested polynomial regression for the service time averages. For the same set of days (2001 August Wednesdays) using the service time averages yielded a better fit due to the noise reduction: Call: lm(formula = fmla, data = train) Residuals: Min 1Q Median 3Q Max -168.73 -13.64 0.06 13.86 322.91 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.399e+02 6.237e+00 38.464 < 2e-16 *** h -1.061e-06 1.385e-07 -7.656 8.25e-14 *** b 6.968e-11 1.041e-11 6.694 5.21e-11 *** c -1.410e-15 2.451e-16 -5.752 1.44e-08 *** e 3.506e-25 8.086e-26 4.336 1.72e-05 *** f -4.304e-30 1.128e-30 -3.816 0.000151 *** g 1.621e-35 4.794e-36 3.382 0.000768 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 43.02 on 569 degrees of freedom Multiple R-squared: 0.3008, Adjusted R-squared: 0.2934 F-statistic: 40.79 on 6 and 569 DF, p-value: < 2.2e-16

  • 43

    Due to the fact that the calls between 00:00 to 06:00 are sporadic, the data during this part of the day is very noisy; hence the next step was to fit a polygon model to the hours 6:00 to 24:00. This attempt yielded: lm(formula = fmla, data = train) Residuals: Min 1Q Median 3Q Max -148.396 -10.404 0.361 9.596 92.721 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.022e+04 6.060e+03 -3.336 0.000925 *** t 3.719e+00 1.097e+00 3.391 0.000763 *** h -2.894e-04 8.411e-05 -3.441 0.000637 *** b 1.252e-08 3.572e-09 3.505 0.000506 *** c -3.286e-13 9.202e-14 -3.570 0.000397 *** d 5.358e-18 1.475e-18 3.632 0.000315 *** e -5.309e-23 1.440e-23 -3.688 0.000256 *** f 2.925e-28 7.832e-29 3.735 0.000213 *** g -6.878e-34 1.822e-34 -3.776 0.000182 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 22.23 on 423 degrees of freedom Multiple R-squared: 0.5447, Adjusted R-squared: 0.5361 F-statistic: 63.26 on 8 and 423 DF, p-value: < 2.2e-16 It is very easy to see that the model refinement improved the fitting of the model. To perform the regression I used a function which is already in use in SEEStat, so all I had to do were some adjustments. At first I used the value received from the regression, or in other words, for every abandon I set the average service time appropriate to its segment. I compared the new calculation to the previous one and to the theoretical offered load. This yielded the following charts:

    13 Bibliography 1. Reich, M., "The Offered-Load Process: Modeling, Inference and

    Applications", Ph.D. Thesis, Technion, 2011.

    DataMOCCADATA MOdels for Call Center AnalysisVolume 6.2Implementing theOffered-Load in SEEStatDataMOCCADATA MOdel for Call Center AnalysisThe DataMOCCA Project is an initiative of researchers from the TechnionIsrael Institute of Technology and The Wharton SchoolUniversity of Pennsylvania. The mission of the project is to collect, pre-process, organize and analyze data from Telephone C...List of Documents1 Introduction2 Basic Work Plan3 Theory vs. Empirical Offered Load Comparison3.1 Difference analysis3.2 Testing more cases

    4 Calculating Theoretical Workload Using Varying Service Time Distribution5 Offered Load Comparison6 Calculating the Theoretical Offered Load Using Different Representations6.1 Estimating6.2 Estimating R(t) using Se

    7 Testing the Different Calculation Methods Using Simulation7.1 The model7.2 First setup7.3 Second setup7.4 Third setup7.5 Conclusion

    8 Average Offered Load9 Future Improvements9.1 Abandons service time generation9.2 Expanding the amount of realizations considered9.3 Trying to fit a model to describe S(t)9.4 Using the service times of non-waiting customers

    10 Appendix 1- Teaching Note11 Appendix 2- Implementing the Empirical Method in SEEStat12 Appendix 3- Fitting a Model for S(t)13 Bibliography