statistical and economic evaluation of time series models ... · statistical and economic...
TRANSCRIPT
Statistical and Economic Evaluation of Time Series
Models for Forecasting Arrivals at Call Centers
Andrea Bastianin
University of Milan, Italy
Marzio Galeotti
University of Milan
IEFE Bocconi, Italy
Matteo Manera
University of Milan-Bicocca, Italy
Fondazione Eni Enrico Mattei, Milan
April, 2018
Abstract: Call centers’ managers are interested in obtaining accurate point and distributional forecasts of
call arrivals in order to achieve an optimal balance between service quality and operating costs. We present a
strategy for selecting forecast models of call arrivals which is based on three pillars: (i) flexibility of the loss
function; (ii) statistical evaluation of forecast accuracy; (iii) economic evaluation of forecast performance
using money metrics. We implement fourteen time series models and seven forecast combination schemes on
three series of daily call arrivals. Although we focus mainly on point forecasts, we also analyze density fore-
cast evaluation. We show that second moments modeling is important both for point and density forecasting
and that the simple Seasonal Random Walk model is always outperformed by more general specifications.
Our results suggest that call center managers should invest in the use of forecast models which describe both
first and second moments of call arrivals.
Key Words: ARIMA; Call center arrivals; Loss function; Seasonality; Telecommunications forecasting.
JEL Codes: C22, C25, C53, D81, M15.
arX
iv:1
804.
0831
5v1
[ec
on.E
M]
23
Apr
201
8
1 Introduction
Hiring, staffing and scheduling are strategic decisions for the management of call centers,
which represent a highly labor-intensive and large services industry, where human resources
costs account for 60-70% of the operating budget (Gans et al., 2003). Point and density
forecasts of call arrivals are a key input for choices relating to the acquisition and deployment
of human resources, therefore accurate forecasts ultimately determine the ability of managers
to achieve an optimal balance between service quality and operating costs (Aksin et al., 2007).
We present a novel strategy to select time series models of daily call arrivals that is
based on three pillars: (i) flexibility of the loss function; (ii) statistical evaluation of fore-
cast accuracy; (iii) economic evaluation of forecast performance using money metrics. We
estimate fourteen time series models - including the Seasonal Random Walk (SRW) as a
benchmark - that capture different key features of daily call arrivals Data of this sort are
typically characterized by the presence of intra-weekly and intra-yearly seasonality, inter-day
dependency (i.e. non-zero auto-correlation), over-dispersion (i.e. the variance of the arrival
count per time period is larger than its expected value) and conditional heteroskedasticity.
These features are shared by the three series we analyze in this paper, namely daily arrivals
at call centers operated by an Italian electric utility and by two retail banks, one in the U.S.
and the other in Israel.
The use of a flexible loss function and the implementation of statistical tests to rank and
select forecasting models represent the first novelty the paper. In fact, in this strand of the
literature, most studies only provide model rankings based on symmetric loss functions or
informal forecast comparisons (see Ibrahim et al., 2016 for a survey). Since over–forecasting
leads to over–staffing, hence to unnecessarily high operating costs, while under–forecasting
results in under–staffing and in low service quality, the choice of the metric used to evaluate
competing time series models depends on the preferences of the call center management,
that are not necessarily well captured by a symmetric loss function. We rely on the loss
function put forth by Elliott et al. (2005), that nests both symmetric and asymmetric loss
functions as special cases and describes a wide range of call center managers’ preferences.
The second novelty of this study is the translation of statistical measures of forecasting
1
performance into money metrics. Money measures of performance are intimately related to
the profit maximizing behavior of economic agents and should be considered by call center
managers as more intuitive evaluation instruments to complement the information provided
by loss functions and statistical tests (Leitch and Tanner, 1991).
Moreover, since it is well documented that combined forecasts often outperform forecasts
generated by individual models (Timmermann, 2006), the third novelty of the paper is to
implement seven forecast combination methods applied to five sets of models. Overall, we
produce a total of 47 alternative forecasts.
We show that second moment modeling using Generalized Autoregressive Conditional
Heteroskedasticity (GARCH) models is the preferred approach when forecasting daily call
arrivals, which suggests that volatility modeling is useful. This result is robust, since it holds
true not only for the call center managed by the Italian utility, but also for the two retail
banks’ arrival series. The simple SRW model is always outperformed by other, more general,
easily implementable specifications, both for point and density forecasts. From the point
of view of a call center manager, our findings imply that it is worth investing in the use of
forecasting models which describe both the first and second moments of call arrivals.
Two papers are closely related to our work. The economic evaluation of the forecast
accuracy we propose is similar in spirit to Shen and Huang (2008), although with some
important distinctions. Our accuracy metric is the money a manager can earn, while their is
the staffing level; moreover, we design a compensation scheme that incorporates asymmetric
preferences, whereas they rely on a symmetric loss function; lastly, the forecast horizon is
different. The second paper which is related to our work is Taylor (2008), since, to the best
of our knowledge, this is the only contribution that evaluates a large number of time series
models and forecast combination schemes. Compared with Taylor (2008), we do not deal
with intra–day forecasts, but we evaluate the performance of models more thoroughly.
More generally, our paper belongs to the literature dealing with forecasts of daily call
volumes (Andrews and Cunningham, 1995; Antipov and Meade, 2002; Bianchi et al., 1998;
Mabert, 1985), but is also related with more recent studies focusing on density and intra–day
predictions (Kim et al., 2012; Taylor, 2008, 2012; Tych et al., 2002; Weinberg et al., 2007).1
1There are at least two ways in which our results can be used to derive intra–daily forecasts. First, daily
2
The plan of the paper is as follows. Section 2 describes data, empirical methods and our
approach to the economic evaluation of forecasts. Empirical results are discussed in Section
3. Extensions and robustness checks are presented in Section 4, while Section 5 concludes.
2 Data and methods
2.1 Data
We analyze three series of daily call arrivals received by call centers in different industries
and countries. The main results of the paper are obtained from the arrival series of a call
center managed by an anonymous Italian energy utility. Robustness checks in Section 4
extend the analysis using call arrival series at call centers operated by two anonymous retail
banks, one located in Israel, the other in the U.S.. The number of observations is 749 for
Italy, 361 for Israel and 893 for the U.S..2 To save space, descriptive statistics, plots and
statistical tests for the three series appear in an Appendix that is available from the authors
upon request.
Being daily totals, the three series exhibit large count values, but with very different
sample averages: the average daily call volume is approximately 1000, 31000, 45000 for the
Israeli, the Italian and the U.S. call centers. Call arrivals for the Italian utility are equal to
zero in correspondence of closing days associated with public holidays. These days are known
in advance and are kept in the estimation sample by substituting the zeros with the number
of calls recorded during the previous week. Forecasts for these days are subsequently set to
zero. The rest of the series values are always strictly positive. The analysis of the coefficient
of variation of the three series, calculated on their levels and on their log–transform, shows
arrivals can be thought as part of a top–down approach (Gans et al., 2003), where forecasts of daily call
arrivals are translated into hourly density forecasts by some simple methods, as illustrated in Channouf et al.
(2007). Second, we can assume that the call center manager designs intra–day staffing schedules using a
judgmental process based on his/her forecasts of daily totals.2We thank Avi Mandelbaum for providing access to the retail banks data through the Technion Service
Enterprise Engineering (SSE) Laboratory. See http://ie.technion.ac.il/Labs/Serveng for a detailed
description of these series. The call arrivals for the Italian utility have been collected by the authors, and
neither the name of company, nor the data can be disclosed.
3
that in all cases the logarithm seems to stabilize the variability of each series. Therefore all
models, except count data specifications, will rely on log–transformed data.
Count data specifications include the Poisson and the Negative Binomial models. The
Poisson model implies that the variance of call arrival volumes in each period is equal to
its expected value during the same time frame. This property — known as equi–dispersion
— is often not consistent with the features of call arrival data, which are likely to be over–
dispersed, with variance larger than the mean. The Negative Binomial distribution includes
the Poisson as a special case and allows for both under– and over–dispersion. A test of the
null hypothesis of equi–dispersion, implemented along the lines put forth by Cameron and
Trivedi (1990), provides evidence against the mean–variance equality. Moreover, estimates
of the Negative Binomial model suggest that the three series are over-dispersed.3
The call center operated by the Italian energy utility, whose employees mainly help cus-
tomers with invoicing problems, operates fourteen hours per day. A plot of this series appears
in Figure 1 and suggests the presence of both daily and monthly seasonality. Specifically,
the number of incoming calls decreases steadily from Monday to Sunday. Moreover, given
the nature of the service provided by the Italian company, the intensity of calls varies with
the season of the year. Lastly, even controlling for seasonality and autoregressive dynamics,
the LM test for ARCH effects rejects the null hypothesis of homoskedasticity.
The call arrival series for the two banks also display day-of-the-week seasonality, but
month–of–the–year seasonal factors seem less evident. Moreover, only the U.S. series exhibits
conditional heteroskedasticity. On the contrary, for Israely call arrivals we cannot reject the
null of homoskedasticity.4
3Details are reported in the Appendix4We have also implemented a test of seasonality in the variance. Since the null hypothesis of no seasonality
in the variance can be rejected for the U.S. series only, we have decided not to incorporate this type of
seasonality in the forecasting models presented in our paper.
4
Figure 1: Daily call arrivals for the Italian call center
08/09/2008 2009 2010 26/09/20100
25000
50000
75000
100000
num
ber
of c
alls
CallsAverage = 31258Closing days
Notes: the figure shows the daily call arrival series (continuous line), its sample average (horizontal dashed line) and closingdays (circles).
2.2 Forecasting models and methods
Given the nature of our series, all models have been chosen to capture different key features of
the data, such as autocorrelation, seasonality, over–dispersion and conditional heteroskedas-
ticity. The empirical specifications implemented in our analysis are shown in Table 1 and
can be divided into three families.
The first family includes the SRW model, which is used as a benchmark, and a variety
of time series models incorporating day–of–the–week dummies as exogenous variables. We
focus on well established specifications, such as the Box–Jenkins Airline model, ARMAX
and SARMAX models with and without GARCH effects, the Periodic Autoregressive (PAR)
model and multiplicative Holt–Winters exponential smoothing. Although we are aware that
the SRW model is simplistic, nonetheless we refer to this model as a benchmark, since it is
widely used not only in this strand of the literature (see e.g. Taylor, 2008, 2012), but also in
practical situations (see Mehrotra and Fama, 2003, p. 135).
The second family is formed by simple dynamic models for count data, which we think
are reasonable specifications in the present context. In fact, the total number of calls arriving
5
at call centers in a given time period is a count, which is usually modeled as a Poisson arrival
process (Gans et al., 2003). The underlying assumption is that there is large population of
potential customers, each of whom makes calls independently with a very low probability;
that is, the total number of incoming calls in a given time period is approximately Pois-
son (Ibrahim et al., 2016). The Poisson modeling approach posits that the arrival rate
is a deterministic function of the regressors. This implies that the variance of the arrival
count per time period is equal to its expected value. Since in our case there is evidence
of over–dispersion, we can deal with this feature of the data by assuming the arrival rate
to be stochastic. We consider a Poisson distribution with Gamma–distributed arrival rate,
which implies that the call arrival series follows a Negative Binomial distribution. Another
advantage of this model is that it can also be used to produce density forecasts of the arrival
rate. Following Jung and Tremayne (2011), who have shown that there is not a dominating
modeling approach when forecasting with count data, we consider three specifications based
on the Exponential, Poisson and Negative Binomial distributions.
We add to the first two families of standard models a third family of seasonal autore-
gressive specifications that have not been previously used to predict incoming calls (see
Taylor, 2008, and references therein). In particular, a linear model with smoothly changing
deterministic seasonality (i.e. the Time Varying Dummy AR, TVD–AR) is implemented to
evaluate whether taking into account changes in the deterministic seasonal pattern of the se-
ries can improve the forecasting performance of daily call arrivals. As pointed out by Franses
and van Dijk (2005), given that changes in technology, institutions, habits and tastes usually
occur gradually, shifts of the seasonal pattern can be described with a smooth function (i.e. a
logistic function) of the day–of–the–week dummy variables. The Multiplicative Error Model
(MEM) — designed for dealing with non-negative variables such as volatility, volume and
duration — has been introduced by Engle (2002) and shares many features with GARCH
models, as well as with count data models and SARMAX specifications. Lastly, we fore-
cast the series of incoming calls using a two–step modeling strategy. First, we remove from
the series the changes in its sample average with a natural cubic spline function. Second,
we apply a SARMAX model on the adjusted series, in order to capture the slowly–moving
changes in the sample average of the series due to low–frequency seasonal factors.
6
Table 1: Summary of modelsDependent
id Name Variable Explanatory Variables
M0 Seasonal Random Walk Yt Yt−7
M1 ARMAX yt AR(1), AR(7), AR(8), MA(1), Dt
M2 ARMAX-GARCH(1,1) yt AR(1), AR(7), AR(8), MA(1), Dt
M3 TVD-AR yt Dt
M4 SARMAX yt AR(1), SAR(7), MA(1), SMA(28), Dt
M5 SARMAX-GARCH(1,1) yt AR(1), SAR(7), MA(1), SMA(28), Dt
M6 PAR(2) yt yt−1, yt−2,Dt
M7 Airline ∆×∆7yt MA(1), SMA(8)
M8 Poisson Yt Yt−1,Dt
M9 NegBin Yt Yt−1,Dt
M10 Exponential Yt Yt−1,Dt
M11 MEM yt/ySR,t yt−1/ySR,t−1,Dt
M12 Spline-SARX yt/yLR,t AR(1), SAR(7), Dt
M13 Holt-Winters yt Multiplicative
Notes: Yt is the number of incoming calls; yt ≡ log Yt; Dt is a vector of dummies, one for each day of the week; ∆k = (1−Lk)where L is the lag operator; ySR,t denotes fitted values from the regression of yt on the vector of dummies; yLR,t denotes fittedvalues from the interpolation of yt with a natural cubic spline, with the number of knots that equals the number of months inthe sample; ARMA and seasonal ARMA terms are denoted as AR(.), MA(.), SAR(.) and SMA(.), where the number in bracketsrepresents their order; “multiplicative” indicates that forecasts from M13 are obtained with the multiplicative Holt-Wintersexponential smoothing, see Gardner (2006) for details.
Model estimation and forecasting is carried out recursively, that is the estimation sample
expands by including a new observation at each iteration. The first iteration relies on an
estimation sample of R = 371 days. The forecast horizon, h, ranges from one day to
one month (28 days). The recursive scheme implies that the number of predictions, Ph,
h = 1, . . . , 28, varies from P1 = 378 to P28 = 351.
For each model, the selection of the optimal specification is performed only once, using
the sample of data pertaining to the first iteration of the estimation–forecasting scheme.
We have selected most of the specifications in Table 1 using both the Schwarz Information
Criterion (SIC) to choose the optimal number of lags (or, for some models, SARMA terms)
and a stepwise regression approach.5 Models are also subject to non-rejecting at 5% critical
level the null hypothesis of absence of error autocorrelation tested with a Lagrange Multiplier
test against the alternative hypotheses of first-to-eighth order error autocorrelation. For
ARMA(p,q) models we set pmax, qmax = 28; for seasonal AR(k) and MA(l) terms, we tried
5The stepwise approach starts with a model including all the explanatory variables selected with the SIC:
Then it is repeatedly applied until all variables included in the final specification are significant at the 5%
critical level.
7
kmax, lmax = 7, 14, 21, 28.
Some models include a GARCH component because squared residuals from ARMAX
and SARMAX specifications display some neglected dynamics. Moreover, the inclusion of a
GARCH equation can help us to shed light on the usefulness of modeling second moments
for forecasting call arrivals. As for count data models, one lag of the dependent variable is
proved to be enough to remove most of the autocorrelation in the residuals. A parsimonious
specification of the dynamics involved in our count data models allows us to focus the forecast
comparison on the role played by the distributional assumptions.
2.3 Forecast combination methods
Since combined forecasts are often found to outperform individual models (see Timmermann,
2006), we implement several combination schemes to predict call arrivals.
As shown in the upper panel of Table 2, our fourteen models are collected into five
groups, which do not include the SRW benchmark model. The first group, G1, is the largest,
for it excludes the Holt-Winters exponential smoothing only. The second group of models,
G2, includes two ARMAX specifications and the TVD-AR model. Group G3 differs from G4
in that the latter excludes the PAR model from the set containing ARMAX, TVD-AR and
SARMAX models. The last group, G5, is composed by time series models for count data.
We combine forecasts from these five groups of models with average, trimmed average,
median, minimum, maximum and Approximate Bayesian Model Averaging (ABMA) com-
bining schemes (see the lower panel of Table 2). All these methods have a feature in common:
they do not require a set of out–of–sample observations, hence they can be used in real–time
by the forecast user.
Stock and Watson (2004) have shown that simple combining methods such as the aver-
age, the trimmed average and the median work well in macroeconomic forecasting. If com-
pared to the simple average method, both the median forecast approach and the trimmed
average combination method (that excludes the highest and the lowest forecasts) reduce the
impact of individual outlying forecasts. The maximum and minimum combination methods
are used to represent two opposite situations, that is a manager who is either adverse to
8
under–staffing, or a manager trying to minimize labor costs and who is not subject to any
kind of service level agreement.
Table 2: Groups of models and combining methods
id Models
G1 Mi for i = 1, ..., 12
G2 M1,M2,M3
G3 M1,M2,M3,M4,M5,M6
G4 M1,M2,M3,M4,M5
G5 M8,M9,M10
Method Description
Average (c1) fc1Gi,t = 1MGi
∑MGim=1 fm,t
Trimmed Average (c2) fc2Gi,t = 1MGi
−2
∑(MGi
−2)
m=1 fm,t
Median (c3) fc3Gi,t = median(fGi,t
)
Min (c4) fc4Gi,t = max(fGi,t
)
Max (c5) fc5Gi,t = min(fGi,t
)
ABMA-SIC (c6) fc6Gi,t =∑MGim=1 w
cjm,tfm,t
ABMA-AIC (c7) fc7Gi,t =∑MGim=1 w
cjm,tfm,t
Notes: models are described in Table 1. fcjGi,t denotes the forecast at time t obtained with combining method j on Gi, for
j = 1, . . . , 7 and i = 1, . . . , 5; fm,t is the forecast at time t from model m, for m = 1, . . . ,MGi , where MGi is the number ofmodels in the i-th group; fGi,t is a
(MGi × 1
)vector of forecasts from models in Gi; Approximate Bayesian Model Averaging
(ABMA) uses weights, wcjm,t =
exp {ζm,t}∑MGi
m=1 exp {ζm,t}, where ζm,t = ICm,t −max
(ICGi,t
), for m = 1, ...,MGi , j = 6, 7, i = 2, ..., 5
and IC = SIC, AIC; the (MGi × 1) vector, ICGi,t, contains the IC of models in the i-th group; combining method c6 is basedon the SIC, while c7 uses the AIC; both exclude G1 from ABMA.
ABMA, successfully applied to macroeconomic forecasting by Garratt et al. (2003), uses
the SIC and the Akaike Information Criterion (AIC) to approximate the posterior probability
of individual models. ABMA is applied only to models whose dependent variable is expressed
with a common unit of measure, i.e. to all groups of models, G1 excluded.
2.4 Statistical measures of forecast accuracy
The need for a loss function that can be either symmetric or asymmetric arises in many
economic and management problems. A flexible loss functions serves two purposes. First,
it allows to assign a different cost to positive and negative forecast errors. Second, it helps
the call center manager and the professional forecaster to decide the shape of the loss and
to use the same metric of predictive performance. The statistical evaluation of forecasts is
based on the loss function put forth by Komunjer and Owyang (2012) that have proposed a
multivariate generalization of the loss function due to Elliott et al. (2005, 2008). The strength
9
of these metrics is to encompass a variety of symmetric and asymmetric loss functions that
are often used in empirical applications.
Let fi,t+h|t be the h–step ahead forecast of Yt+h issued with model i (Mi), where h is the
forecast horizon. The corresponding forecast error is ui,t+h|t = Yt+h − fi,t+h|t. From now on
we drop the model subscript i for ease of notation. Let us consider a set of (P × 1) vectors
of forecasts errors, one for each forecast horizon: u(h) = [u(h)1, . . . , u(h)P ]′ for h = 1, . . . , H,
where H is the maximum forecast horizon (i.e. 28 days in our case) and P is the size of the
evaluation sample. These vectors form the rows of a (H × P ) matrix: u = [u(1) . . .u(H)]′,
where up for p = 1, . . . , P indicates its p-th column.
With this notation, we can write the multivariate loss function of Komunjer and Owyang
(2012) as:
Lp (up; ρ, τ ) =(‖up‖ρ + τ ′u
)‖up‖ρ−1
ρ (1)
where ‖up‖ρ ≡(∑H
h=1
∣∣u(h)p
∣∣ρ) 1ρ
and −1 ≤ τ ≤ 1. The shape of the function is determined
by ρ > 0 and τ ; we assume that all the elements of τ are equal to τ , so that the degree
of asymmetry is the same at all forecast horizons h. The loss is symmetric and additively
separable for τ = 0 and includes some special cases. When τ = 0 and ρ = 2, we obtain the
trace of the Mean Squared Error (MSE) loss function, while for τ = 0 and ρ = 1 equation
(1) reduces to the trace of the Mean Absolute Error (MAE) loss function. In both cases,
symmetry ensures the multivariate loss to be additively separable in univariate losses. On
the contrary, when τ 6= 0 the multivariate loss function does not correspond to the sum of
individual losses.
We also evaluate the forecasts separately at different horizons, so that H = 1 and
u = u′(1) = [u(1)1, . . . , u(1)P ]. In this case we can simplify the notation further and write
u = [u1, . . . , uP ]. Equation (1) reduces to the univariate flexible loss of Elliott et al. (2005):
Lp (up; ρ, τ) = 2 [φ+ (1− 2φ)× I(up < 0)]× |up|ρ (2)
where I(.) is the indicator function and τ = 2φ − 1. This loss function is asymmetric for
φ 6= 0.5 (τ 6= 0). Over–forecasting (negative forecast errors) is costlier than under–forecasting
10
for φ < 0.5 (τ < 0). On the contrary, when φ > 0.5 (τ > 0), positive forecast errors
(under-prediction) are more heavily weighed than negative forecast errors (over-prediction).
Equation (2) nests several special cases. For ρ = 1, the lin–lin loss is obtained. Moreover,
when φ = 0.5 (τ = 0), the function is symmetric and boils down to the MAE. For ρ = 2 we
get the quad-quad loss, that coincides with the MSE loss for φ = 0.5 (τ = 0).6
For each model and combination method, we produce a total of 28 forecast error series,
one for each forecast horizon, h. Given that presenting detailed results for each h is not a
viable option, we rank models according their overall performance using the multivariate loss
in equation (1), while we rely on equation (2) to evaluate forecast performance at horizon
h = 1, 7, 28. Hereafter we fix ρ = 2. In this case, with a common τ for all h = 1, . . . , H, non-
negativeness of Lp requires |τ | < 1√H
. Thus, for H = 28, we need |τ | < 0.18 (since φ = 0.5×
(τ + 1), which implies φ ∈ (0.41, 0.59)). For this reason, we vary the asymmetry parameter
across the following set of values: φ = {0.42, 0.50, 0.58} (i.e. τ = {0.16, 0.00,−0.16}). When
φ = 0.5, the ranking is equivalent to the MSE ranking in the univariate case and coincides
with the trace of MSE ranking in the multivariate case. When φ = 0.42, over–forecasting is
costlier than under–forecasting, and vice versa for φ = 0.58.7
2.5 Monetary measures of forecast accuracy
For the economic evaluation of forecasts, the magnitude of the asymmetry is not relevant.
In other words, it would not be realistic to state that a practitioner has a clear opinion on
the parameters of the loss function. On the contrary, we believe that it is crucial to take a
stance on the direction of the asymmetry. An asymmetric loss function might be required
when firms sign a contract with an external organization that manages a call center. These
contracts often involve some sort of Service Level Agreement (SLA), which defines quality
standards that the outsourced call centers should meet. For instance, the widely applied
6Optimal forecasts from the lin–lin and quad-quad loss are conditional quantiles and expectiles, respec-
tively (Newey and Powell, 1987; Koenker and Bassett, 1978; Gneiting, 2011). See Bastianin et al. (2014) for
the use of expectiles in a forecast horse–race.7Additional details about flexible loss functions and their graphical representations appear in the Ap-
pendix.
11
80/20 SLA implies that eighty percent of the incoming calls must be answered within twenty
seconds (Stolletz, 2003). SLAs not only set quality standards, but also dictate penalties for
failing to meet contractual terms (Milner and Olsen, 2008). The presence of penalty fees
might render a reasonable degree of over–forecasting less costly than the same amount of
under–prediction, possibly because workers becoming free at short notice might be assigned
to meetings and training (Taylor, 2008). Anecdotal evidence suggests that managers facing
a SLA do prefer over–forecasting in order to minimize their chances of incurring a penalty.8
Assumptions. We assume that the call center operates under a 80/20 SLA and is subject
to a fee when the waiting time exceeds a given threshold. Therefore, a reasonable degree of
over–staffing is less costly than the same amount of under–staffing. Moreover, we assume
that at day t the manager uses his/her forecast of inbound calls for day t+1 in an algorithm
determining the number of agents (nt) needed to comply with the company’s SLA.
The algorithm used to staff the call center is the Erlang–C queuing model. We assume
that the average call duration is three minutes and that the call center is open fourteen
hours a day. Notwithstanding its well–known limitations, the Erlang–C model is widely
used in practice, possibly because of its simplicity (Aksin et al., 2007; Gans et al., 2003).
Moreover, as shown by Jongbloed and Koole (2001), even when the assumptions underlying
the Erlang–C algorithm are not valid (e.g. in the presence of over-dispersion in the data), it
8Roubos et al. (2012) observe that “A higher than necessary service level is generally not a problem,
but managers might be penalized for failing to meet the target in too many periods. To this end some
managers deliberately opt for a higher expected service level or a lower target time in order to meet the
original target with higher likelihood”. Similarly, Thomas (2005) reports that “(. . .), some managers facing
contractual service–level commitments (. . .) “overshoot” to improve their chances of success. That is, if they
need to exceed a 95% fill rate to avoid penalty, they might set the stock level based on a 97% long-run fill
rate.” However, as emphasized by Tran and Davis (2011): “Some scholars conclude that moderate under–
staffing has positive effects on outcomes, while others prove that slight over–staffing performs better. They all
agree that, however, both great over–staffing and extreme under–staffing conditions have negative effects on
organization performance.” These comments also support our choices of restricting the degree of asymmetry
of the loss function within a relatively short interval not only to ensure that it is non-negative, but also to
avoid that the loss function could represent too extreme preferences toward under– or over–staffing. In fact,
empirical estimates indicate that the degree of asymmetry of economic agents is not very large (Elliott et al.,
2005).
12
can still be used to obtain reliable interval estimates of the optimal service level.
Table 3: Multinomial payoff schemelower bound upper bound bonus / penalty
k (LBk) (UBk) (Euro)
1 0.00 0.80 -10
2 0.80 0.90 -5
3 0.90 0.95 -2.5
4 0.95 1.05 10
5 1.05 1.10 -1.25
6 1.10 1.20 -2.5
7 1.20 ∞ -10
Notes: the multinomial compensation scheme implies that at time t the manager gets a bonus bt = 10 Euro if n∗t × LB4 ≤nt < n∗t × UB4.
Let the manager’s daily payoff, Wt, be the sum of a fixed F and a variable part, vt, that
is Wt = F + vt. For the fixed part of the payoff, we impose that the call center manager
earns on average 1200 Euro for 28 working days, that is F = d1200/28e = 43 Euro per day.
The variable part of the payoff, vt, depends on the manager’s ability to staff the call center,
dt, that is vt ≡ vt (dt). dt is evaluated ex-post and it is defined as a function of the distance
between his decision, nt, and the optimal number of agents n∗t , where n∗t is calculated using
the realized number of incoming calls as input to the Erlang–C model.
The company relies on the compensation scheme displayed in Table 3. We have designed
compensations so as to penalize under–staffing more heavily than over-staffing, assuming that
the company’s objective is to maximize customer satisfaction. Moreover, we assume that
the company’s compensation policy implies symmetry of over- and under–staffing if forecast
errors of both signs exceed a given threshold. At the end of the forecasting sample, whose
length is P , the manager’s payoff will be: π = P × F +∑P
t=1 vt.
Following Dorfman and McIntosh (1997), we assume that the manager has a negative
exponential utility function U(π) = 1−exp (−λπ), where λ represents the manager’s absolute
risk aversion coefficient. Notice that, for the negative exponential utility function, λ−1
describes the willingness to lose. Depending on dt, the manager can either get a bonus
(bt), or be subject to a maximum penalty (pt) of 10 Euro. Given that each day he/she
can lose 20 Euro at most, λ is varied according to the formula λ = [j × P × (bt + pt)]−1,
where j = 0.1, 0.5, 0.7 denotes a percentage of the variable part of the payoff. This implies
that the willingness to lose, λ−1, can take on the following values, expressed in Euros:
13
λ−1 = {732, 3660, 5124}. If the manager could always get the bonus, the total payoff would
be P × (F + 10) = 18603 Euro, where F = 43 Euro and P = 351 days. Therefore, the values
of the willingness to lose are equivalent to 0.4%, 19.7% and 27.5% of the total payoff.
The end–of–period expected utility is EU(π) = 1 − Mπ(−λ), where Mπ (−λ) is the
Moment Generating Function (MGF) (see Collender and Chalfant, 1986; Elbasha, 2005;
Gbur and Collins, 1989). This result and our compensation scheme allow to calculate the
expected utility using Maximum Likelihood estimates of the multinomial MGF.9
The economic value of information. The economic value of information of a set of forecasts
can only be determined with reference to an alternative set of forecasts. Following Dorfman
and McIntosh (1997), we define the value of perfect information as the value of a model that
generates perfect forecasts, that is nt = n∗t , ∀t.
If the manager could purchase this model, he/she would face no risk and the payoff
distribution would be a single point at π∗ = max (π). The lack of risk (i.e. var(π∗) = 0)
implies that the value of perfect information, V ∗, is simply the payoff obtainable from the
perfect forecast, in other words V ∗ = π∗. Given that a forecast can be “consumed” only in
discrete quantities, the expected marginal utility of the forecast, MU , equals its expected
utility. Since in equilibrium the price ratio of two goods is equal to their marginal rate of
substitution, the value of the i − th forecasting model, Mi, is the solution of: Vi/V∗ =
MUi/MU∗. Solving for Vi, using MUi = EUi and V ∗ = π∗, yields (see Dorfman and
McIntosh, 1997):
Vi =π∗EUiEU(π∗)
(3)
Equation (3) can be used to define the incremental value of information of the forecast model
Mi with respect to model M0 as:
∆Vi ≡ Vi − V0 (4)
Therefore, Mi �M0, if ∆Vi > 0 or, equivalently, if Vi > V0.
9The MGF of a multinomially distributed random variable is equal to (∑r
k=1 pketk)
P. An estimate of
the probability pk can be calculated as: pk =∑H
t=1 I(nt ∈ CIk)/H, where CIk for k = 1, ..., 7, denotes the
naive confidence interval in Table 3.
14
The certainty equivalent. An alternative money metric of forecast accuracy of model Mi is
the certainty equivalent (CE), which is defined as the value π ≡ CEi that solves U(π) =
EU(πi):
CEi = −1
λlog [1− EU (πi)] (5)
We can state that Mi � Mj if CEi > CEj. The CE can also be used to determine the
maximum amount of money the manager is willing to pay in order to switch from modelMi
to model Mj.
We assume that the manager can choose between using the naive SRW forecast (M0) for
free, or buying modelMi from an expert. Moreover, let us assume that buying the forecast
model Mi costs δi, where δi represents a fraction of the payoff the manager would get from
the naive forecast, that is δi ≡ θπ0 with 0 < θ < 1. The fraction of the payoff deriving from
the naive model that the manager is willing to pay to use forecast modelMi can be written
as:
δi = CEi − CE0 (6)
or, equivalently, as: θ = (CEi − CE0)/π0.
Linking monetary and statistical measures of forecast accuracy. Given the assumptions il-
lustrated in the previous sections, when the statistical loss function is parametrized so as
to penalize under–prediction more heavily than over–prediction (i.e. ρ = 2, φ = 0.58), the
economic evaluation of forecasts in economic terms becomes a useful alternative to the eval-
uation based on standard statistical metrics. On this respect, Dorfman and McIntosh (1997)
and Leitch and Tanner (1991) have shown that money metrics of forecasting performance,
such as the value of information and the CE, are more closely related to the profitability of
forecasts than the traditional summary statistics based on loss functions.
3 Empirical results
Our main results can be summarized as follows:
• The benchmark SRW model is outperformed by relatively more general, easily imple-
15
mentable, methods.
• Modeling second moments is useful, as the addition of a GARCH component to AR-
MAX and SARMAX models improves their performance. Moreover, the SARMAX–
GARCH model is among the best performing specifications.
• Combining forecasts from different models often outperforms the forecasts obtained
from single models.
• ABMA is the preferred forecast combination method.
• The economic and statistical evaluation of models and combining methods deliver
consistent results. The best individual forecasts are those involving second–moment
modeling, while the best combining method is ABMA, based on a group of models
that includes models with a GARCH component.
• The degree of complexity of the ABMA forecasting method is higher than producing
forecasts with a single model. Then, when over-forecasting is less heavily penalized
than under–forecasting, the economic evaluation approach identifies the SARMAX–
GARCH model as the best option.
3.1 Statistical measures
Model rankings using univariate and multivariate flexible losses are presented in Table 4.
The univariate loss rankings for forecast horizons at one day (h = 1), one week (h = 7) and
one month (h = 28) are shown in columns 2-9, while rankings based on the multivariate
loss for h = 1, 2, . . . , 28 are presented in the last three columns. The analysis of forecast
accuracy over different horizons is fundamental to assist the management of call centers. In
fact, while forecasts at monthly horizon are needed for hiring new agents, forecasts at weekly
and daily horizon are used for the scheduling of the available pool of agents (Aksin et al.,
2007).
Focusing on single models, for both univariate and multivariate losses the SARMAX–
GARCH is always among the best–performing specifications. Looking at the symmetric
16
multivariate loss in column 10 of Table 4, we see that this result holds also when including
the combined forecasts in the ranking. As for the combining methods, the ABMA based
on the AIC seems to be the best available option in the majority of cases. Notice that the
best ABMA combinations are those based on groups of models that include the SARMAX
forecasts with and without GARCH equation and the ARMAX-GARCH forecasts, namely
those individual forecasts associated with some of the lowest individual and system losses.
When analyzing asymmetric losses, the ranking of models changes according to the
incidence of over– and under–forecasting. Nevertheless, we can confirm most of the results
just highlighted for the symmetric case. Interestingly, in the multivariate case, the MEM
becomes the best option when under–forecasting is more penalized than over–forecasting
(φ = 0.58).
Focusing on combination schemes, “minimum forecasts” based on the groups of models
G3 and G4 yield the lowest average losses when φ = 0.42. On the contrary, when under–
forecasting in costlier than over–forecasting (φ = 0.58), these forecasts are not optimal
anymore. In this case, either the ABMA–AIC combining methods or the “maximum fore-
casts” lead to the lowest average losses. Moreover, we can observe that neither count data
models, nor the Spline–SARX model seem to be valuable options for obtaining accurate
point forecasts.
Lastly, we find that second–moment modeling is important when forecasting call arrivals.
In fact, when a GARCH component is added to ARMAX and SARMAX models, their
ranking improves in most cases. This result suggests (see Section 4.4) that these simple
models could also be useful in the literature dealing with density forecasts (see e.g. Taylor,
2012).
From the standing point of a practitioner, results in Table 4 also suggest that, inde-
pendently of the shape of the loss function, investing in model building or outsourcing the
forecasting exercise could be worth its cost. In fact, the benchmark SRW model is always
outperformed by other relatively more general specifications.10
Since the number of forecasts under consideration is quite large, these conclusions might
10This fact is supported also by the Diebold and Mariano (1995) test. These results are available from the
authors upon request.
17
Tab
le4:
Ran
kin
gof
model
san
dco
mbin
edfo
reca
sts
h=
1h
=7
h=
28
h=
1,...,2
8φ
=0.4
2φ
=0.5
φ=
0.5
8φ
=0.4
2φ
=0.5
φ=
0.5
8φ
=0.4
2φ
=0.5
φ=
0.5
8φ
=0.4
2φ
=0.5
φ=
0.5
8M
044
(13)
44
(13)
44
(13)
30
(7)
32
(7)
31
(7)
41
(10)
43
(12)
43
(12)
29
(7)
43
(12)
43
(12)
M1
18
(4)
21
(5)
22
(6)
19
(4)
19
(4)
18
(4)
27
(5)
26
(5)
23
(4)
26
(5)
23
(5)
22
(6)
M2
21
(5)
18
(4)
17
(3)
14
(3)
14
(3)
13
(3)
16
(3)
16
(3)
15
(3)
15
(3)
14
(3)
28
(7)
M3
26
(8)
24
(7)
24
(7)
28
(6)
27
(6)
28
(6)
20
(4)
23
(4)
24
(5)
18
(4)
22
(4)
37
(10)
M4
13
(3)
16
(3)
21
(5)
7(2
)5
(2)
5(2
)6
(2)
6(2
)4
(2)
8(2
)6
(2)
31
(8)
M5
9(2
)9
(2)
9(1
)4
(1)
2(1
)2
(1)
5(1
)3
(1)
3(1
)4
(1)
1(1
)34
(9)
M6
23
(6)
22
(6)
20
(4)
21
(5)
20
(5)
21
(5)
28
(6)
27
(6)
27
(6)
28
(6)
26
(6)
8(3
)M
75
(1)
8(1
)16
(2)
34
(9)
43
(12)
43
(12)
45
(14)
45
(14)
45
(14)
44
(13)
45
(14)
45
(14)
M8
41
(12)
41
(12)
40
(12)
41
(12)
40
(11)
39
(11)
38
(9)
37
(9)
37
(9)
40
(11)
37
(9)
6(2
)M
937
(10)
37
(10)
37
(10)
38
(11)
37
(10)
36
(10)
34
(8)
33
(8)
34
(8)
37
(10)
34
(8)
14
(4)
M10
39
(11)
39
(11)
39
(11)
36
(10)
35
(9)
34
(9)
32
(7)
31
(7)
32
(7)
35
(9)
32
(7)
16
(5)
M11
45
(14)
46
(14)
46
(14)
44
(13)
44
(13)
44
(13)
43
(12)
41
(10)
40
(10)
43
(12)
42
(11)
1(1
)M
12
29
(9)
30
(9)
32
(9)
45
(14)
45
(14)
45
(14)
44
(13)
44
(13)
44
(13)
45
(14)
44
(13)
44
(13)
M13
24
(7)
27
(8)
30
(8)
33
(8)
33
(8)
32
(8)
42
(11)
42
(11)
42
(11)
30
(8)
41
(10)
42
(11)
Avg.G
116
[13]
15
[13]
12
[11]
23
[18]
21
[16]
20
[16]
13
[11]
12
[10]
12
[10]
25
[21]
18
[15]
9[6
]Avg.G
217
[14]
17
[14]
14
[13]
16
[13]
16
[13]
12
[10]
19
[16]
18
[15]
17
[14]
17
[14]
17
[14]
26
[20]
Avg.G
36
[5]
5[5
]3
[3]
13
[11]
12
[10]
10
[8]
12
[10]
11
[9]
11
[9]
12
[10]
11
[9]
19
[14]
Avg.G
412
[10]
12
[10]
8[8
]11
[9]
10
[8]
9[7
]8
[6]
9[7
]7
[5]
11
[9]
9[7
]25
[19]
Avg.G
540
[29]
40
[29]
35
[26]
40
[29]
39
[29]
38
[28]
36
[28]
36
[28]
36
[28]
39
[29]
36
[28]
11
[8]
Tr.Avg.G
18
[7]
6[6
]6
[6]
22
[17]
23
[18]
22
[17]
11
[9]
10
[8]
10
[8]
20
[16]
16
[13]
12
[9]
Tr.Avg.G
225
[18]
28
[20]
29
[22]
20
[16]
22
[17]
23
[18]
21
[17]
20
[17]
19
[16]
21
[17]
21
[18]
27
[21]
Tr.Avg.G
315
[12]
14
[12]
15
[14]
12
[10]
11
[9]
8[6
]14
[12]
13
[11]
13
[11]
13
[11]
13
[11]
20
[15]
Tr.Avg.G
419
[15]
23
[17]
23
[17]
10
[8]
9[7
]7
[5]
7[5
]7
[5]
5[3
]10
[8]
8[6
]24
[18]
Tr.Avg.G
547
[33]
47
[33]
47
[33]
47
[33]
47
[33]
47
[33]
47
[33]
47
[33]
47
[33]
46
[32]
47
[33]
47
[33]
Notes:
conti
nu
edon
nex
tp
age
18
Tab
le4
(con
tinued
)R
ankin
gof
model
san
dco
mbin
edfo
reca
sts
h=
1h
=7
h=
28
h=
1,...,2
8φ
=0.4
2φ
=0.5
φ=
0.5
8φ
=0.4
2φ
=0.5
φ=
0.5
8φ
=0.4
2φ
=0.5
φ=
0.5
8φ
=0.4
2φ
=0.5
φ=
0.5
8Med.G
14
[4]
4[4
]1
[1]
18
[15]
17
[14]
11
[9]
23
[19]
19
[16]
18
[15]
24
[20]
19
[16]
10
[7]
Med.G
220
[16]
20
[16]
19
[16]
17
[14]
18
[15]
15
[12]
21
[17]
20
[17]
19
[16]
19
[15]
20
[17]
23
[17]
Med.G
37
[6]
7[7
]7
[7]
9[7
]8
[6]
6[4
]15
[13]
14
[12]
14
[12]
14
[12]
12
[10]
18
[13]
Med.G
414
[11]
13
[11]
13
[12]
8[6
]4
[3]
4[3
]9
[7]
8[6
]6
[4]
9[7
]7
[5]
21
[16]
Med.G
536
[27]
36
[27]
36
[27]
38
[27]
37
[27]
36
[26]
34
[26]
33
[25]
34
[26]
36
[27]
33
[26]
13
[10]
MinG 1
31
[22]
34
[25]
43
[31]
24
[19]
31
[25]
41
[30]
18
[15]
40
[31]
41
[31]
1[1
]40
[31]
46
[32]
MinG 2
3[3
]3
[3]
5[5
]6
[5]
13
[11]
19
[15]
10
[8]
15
[13]
22
[19]
7[6
]10
[8]
38
[28]
MinG 3
1[1
]1
[1]
2[2
]1
[1]
6[4
]17
[14]
2[2
]5
[4]
9[7
]2
[2]
4[3
]40
[30]
MinG 4
2[2
]2
[2]
4[4
]2
[2]
7[5
]16
[13]
1[1
]4
[3]
8[6
]3
[3]
5[4
]39
[29]
MinG 5
35
[26]
35
[26]
34
[25]
35
[26]
34
[26]
33
[25]
30
[24]
29
[23]
30
[24]
34
[26]
31
[25]
17
[12]
MaxG 1
46
[32]
45
[32]
45
[32]
46
[32]
46
[32]
46
[32]
46
[32]
46
[32]
46
[32]
47
[33]
46
[32]
41
[31]
MaxG 2
32
[23]
31
[22]
28
[21]
29
[23]
28
[22]
27
[22]
29
[23]
28
[22]
28
[22]
31
[23]
28
[22]
4[3
]MaxG 3
34
[25]
33
[24]
33
[24]
32
[25]
30
[24]
30
[24]
37
[29]
35
[27]
31
[25]
33
[25]
30
[24]
2[1
]MaxG 4
33
[24]
32
[23]
31
[23]
31
[24]
29
[23]
29
[23]
31
[25]
30
[24]
29
[23]
32
[24]
29
[23]
3[2
]MaxG 5
43
[31]
43
[31]
42
[30]
43
[31]
42
[31]
42
[31]
40
[31]
39
[30]
39
[30]
42
[31]
39
[30]
5[4
]SICG 2
28
[20]
26
[19]
26
[19]
27
[22]
26
[21]
26
[21]
25
[21]
25
[21]
26
[21]
23
[19]
25
[20]
36
[27]
SICG 3
30
[21]
29
[21]
27
[20]
25
[20]
24
[19]
24
[19]
26
[22]
22
[19]
21
[18]
27
[22]
27
[21]
30
[23]
SICG 4
27
[19]
25
[18]
25
[18]
26
[21]
25
[20]
25
[20]
24
[20]
24
[20]
25
[20]
22
[18]
24
[19]
35
[26]
SICG 5
41
[30]
41
[30]
40
[29]
41
[30]
40
[30]
39
[29]
38
[30]
37
[29]
37
[29]
40
[30]
37
[29]
6[5
]AICG 2
21
[17]
18
[15]
17
[15]
14
[12]
14
[12]
13
[11]
16
[14]
16
[14]
15
[13]
15
[13]
14
[12]
28
[22]
AICG 3
10
[8]
10
[8]
10
[9]
4[3
]2
[1]
2[1
]3
[3]
1[1
]1
[1]
5[4
]2
[1]
32
[24]
AICG 4
10
[8]
10
[8]
10
[9]
4[3
]2
[1]
2[1
]3
[3]
1[1
]1
[1]
5[4
]2
[1]
32
[24]
AICG 5
37
[28]
37
[28]
37
[28]
38
[27]
37
[27]
36
[26]
34
[26]
33
[25]
34
[26]
37
[28]
34
[27]
14
[11]
Notes:
the
firs
tco
lum
nu
ses
the
foll
ow
ing
short
han
dn
ota
tion
:“A
vg.”
=A
ver
age,
“T
r.A
vg.”
=T
rim
med
Aver
age,
“M
ed.”
=M
edia
n,
“SIC
”=
AB
MA
usi
ng
SIC
an
d“A
IC”
=A
BM
Au
sin
g
AIC
;en
trie
sre
pre
sent
the
ran
kin
gof
mod
els
an
dfo
reca
stco
mb
inati
on
sbase
don
the
stati
stic
s√P−1∑P t=
1L t
,w
her
eL t
isth
egen
erali
zed
loss
fun
ctio
nin
Eq.
(1)
an
dP
=351;
stati
stic
sin
colu
mn
s2-1
0are
base
don
the
un
ivari
ate
loss
for
fore
cast
hori
zon
s,h
=1,7,2
8,
wh
ile
stati
stic
sin
the
last
thre
eco
lum
ns
are
base
don
the
mu
ltiv
ari
ate
loss
forh
=1,...,2
8;
the
shap
eof
the
loss
fun
ctio
nis
det
erm
ined
byρ
=2
an
dth
easy
mm
etry
coeffi
cien
t:φ
=0.4
2,0.5
0,0.5
8(τ
=−
0.1
6,0.0
0,0.1
6);
thes
evalu
esgu
ara
nte
eth
at
the
mu
ltiv
ari
ate
loss
isalw
ays
non
-neg
ati
ve
(see
Kom
un
jer
an
dO
wyan
g(2
012)
for
det
ail
s);
entr
ies
ou
tsid
eb
rack
ets
rep
rese
nt
the
over
all
ran
kin
gof
the
fore
cast
;en
trie
sin
rou
nd
bra
cket
sre
pre
sent
the
ran
kin
gam
on
gm
od
els,
wh
ile
entr
ies
insq
uare
bra
cket
sd
enote
the
ran
kin
gam
on
gco
mb
inin
gm
eth
od
s;m
od
els,M
m,
gro
up
sof
mod
els,G i
,an
dco
mb
inin
gm
eth
od
sare
des
crib
edin
Tab
les
1an
d2;
entr
ies
inb
old
den
ote
mod
els
wit
hth
elo
wes
tlo
ss.
19
Table 5: Reality Check testh = 1 h = 7 h = 28φ φ φ
0.42 0.50 0.58 0.42 0.50 0.58 0.42 0.50 0.58M0 – – – ? – – ?? ?? ??M1 – – ? – – – – – –M2 – – ? ? ? ? – – ?M3 – – ? – – – – ? ?M4 – – ?? ?? ?? ?? – ? ??M5 – ?? ?? ?? ?? ?? ?? ?? ??M6 ?? ?? ?? ? ?? ?? – – ?M7 ? ?? ?? – – – ? ? ?M8 – – – – – – – – –M9 – – – – – – – – –M10 – – – – – – – – –M11 – – – – – – – – –M12 – – – – – – – – –M13 – – – – – – ?? ?? ??Avg. G1 – – ? – ? ?? – ? ??Avg. G2 – ?? ?? – – – – – –Avg. G3 ?? ?? ?? – – ? – – –Avg. G4 – ?? ?? ? ? ?? ?? ?? ??Avg. G5 – – – – – – – – ?Tr. Avg. G1 – ? ? ? – ? – ? ??Tr. Avg. G2 – ? ?? – ? ? – – –Tr. Avg. G3 – ? ?? ? ? ?? – – ??Tr. Avg. G4 – – – ? – – ?? ?? ??Tr. Avg. G5 – – – – – – – – –Med. G1 ? ?? ?? – ?? ?? – – –Med. G2 – – – – – ? – – –Med. G3 – ?? ?? ?? ?? ?? – – ??Med. G4 – ? ?? ?? ?? ?? ?? ?? ??Med. G5 – – – – – – – – –Min G1 ?? ?? ?? – – – ?? ?? ??Min G2 ?? ?? ?? ? ? ?? ?? ?? ??Min G3 ?? ?? ?? ?? ?? ?? ?? ?? ??Min G4 ?? ?? ?? ?? ?? ?? ?? ?? ??Min G5 – – – – – – – – –Max G1 – – – – – – – – –Max G2 – – – – – – – – ?Max G3 – – – – – – – – –Max G4 – – – – – – – – –Max G5 – – – – – – – – –SIC G2 – – – – ? ? – – ?SIC G3 – – – – – – – ? ??SIC G4 – – – – – ? – – ?SIC G5 – – – – – – – – –AIC G2 – – ? ? ?? ?? – ? ?AIC G3 – ?? ?? ?? ?? ?? ?? ?? ??AIC G4 – ?? ?? ?? ?? ?? ?? ?? ??AIC G5 – – – – – – – – –
Notes: the table presents results of the Reality Check test of White (2000), as modified by Hansen (2005); the benchmark modelis indicated in the first column, where the following shorthand notation is used: “Avg.” = Average, “Tr. Avg.” = TrimmedAverage, “Med.” = Median, “SIC” = ABMA using SIC and “AIC” = ABMA using AIC; the test is implemented using thestationary (block) bootstrap of Politis and Romano (1994); the number of bootstrap repetitions is equal to 999, the block lengthequals 29 days; a p-value lower than 0.05 indicates rejection of the null hypothesis that the benchmark performs as well as thebest alternative model; “–” denotes a p-value < 0.05, “?” denotes 0.05 ≤ p-value < 0.1, “??” denotes a p-value ≥ 0.1.
be subject to data snooping effects. Both the Reality Check test (RCT) of White (2000)
and the Model Confidence Set (MCS) of Hansen et al. (2011) are designed to deal with
data snooping. The difference between the two procedures is that the former requires a
benchmark model, while the latter does not. Results of the RCT test of the null hypothesis
that the benchmark performs as well as the best alternative model are shown in Table 5.
Our findings are based on the consistent p–values of Hansen (2005), who has shown that the
20
Table 6: Are models within the MCS?h = 1 h = 7 h = 28
φ = 0.42 φ = 0.5 φ = 0.58 φ = 0.42 φ = 0.5 φ = 0.58 φ = 0.42 φ = 0.5 φ = 0.58M0 – – – – – – – – –M1 – – – – – – – – –M2 – – – – – – – – –M3 – – – – – – – – –M4 – X – – – – – – –M5 – X X – – – – – –M6 – X X – – – – – –M7 X X X – – – – – –M8 – – – – – – – – –M9 – – – – – – – – –M10 – – – – – – – – –M11 – – – – – – – – –M12 – – – – – – – – –M13 – – – – – – – – –Avg. G1 – – X – – – – – –Avg. G2 – X X – – – – – –Avg. G3 – X X – – – – – –Avg. G4 – X X – – – – – –Avg. G5 – – – – – – – – –Tr. Avg. G1 – X X – – – – – –Tr. Avg. G2 – – – – – – – – –Tr. Avg. G3 – – – – – – – – –Tr. Avg. G4 – – – – – – – – –Tr. Avg. G5 – – – – – – – – –Med. G1 – X X – – – – – –Med. G2 – – – – – – – – –Med. G3 – X X – – – – – –Med. G4 – – – – – – – – –Med. G5 – – – – – – – – –Min G1 – – – – – – – – –Min G2 X X X – – – – – –Min G3 X X X X – – X – –Min G4 X X X X – – X – –Min G5 – – – – – – – – –Max G1 – – – – – – – – –Max G2 – – – – – – – – –Max G3 – – – – – – – – –Max G4 – – – – – – – – –Max G5 – – – – – – – – –SIC G2 – – – – – – – – –SIC G3 – – – – – – – – –SIC G4 – – – – – – – – –SIC G5 – – – – – – – – –AIC G2 – – – – – – – – –AIC G3 – – – – – – – – –AIC G4 – X X X X X X X XAIC G5 – – – – – – – – –
Notes: the table presents results of the Model Confidence Set (MCS) procedure of Hansen et al. (2011), implemented usingthe stationary (block) bootstrap of Politis and Romano (1994); the number of bootstrap repetitions is equal to 999, the blocklength equals 29 days; “–” indicates that the model is not in the MCS at the 90% confidence level, while “X” indicates that themodel belongs to the MCS; the first column uses the following shorthand notation: “Avg.” = Average, “Tr. Avg.” = TrimmedAverage, “Med.” = Median, “SIC” = ABMA using SIC and “AIC” = ABMA using AIC.
original procedure has low power when a poor performing forecast enters the set of alternative
models. Using the SARMAX–GARCH model, or the ABMA–AIC combined forecasts (based
on the models belonging to group G3 or group G4), we reject the null hypothesis only once.
The MCS test is used to compare the forecast accuracy of models without selecting a
benchmark model and yields a set of specifications that contains the best forecast with a
prespecified asymptotic probability. As it can be seen from Table 6, this test is more selective
21
than the RCT. Considering one day ahead forecasts and under MSE loss, the MCS at the 90%
confidence level contains only four individual models: SARMAX, SARMAX–GARCH, PAR
and Airline. When over–forecasting is more penalized than under–prediction (i.e. φ = 0.42),
the only model entering the MCS is the Airline. On the contrary, when positive forecast
errors are more heavily weighted than negative forecast errors (i.e. φ = 0.58), the SARMAX–
GARCH and the PAR are also in the MCS. When the forecast horizon is one week, or one
month, only the ABMA–AIC combined forecasts based on model group G4 are always in the
MCS.
Therefore, when the loss function is parametrized so as to penalize under–prediction
more heavily than over–prediction, the best performing model and combination method are
the SARMAX–GARCH and the ABMA–AIC based on model group G4.
3.2 Monetary measures
We now focus on the task of selecting the best forecasting approach, given a forecast horizon
of one day and assuming that the manager of the call center is more adverse to under–
staffing than to over–staffing. We thus define the set of alternative forecasts so as to include
all individual models and the combined forecast obtained with ABMA–AIC applied to group
G4.
The economic evaluation of forecasts based on the willingness to pay, δi, and the incre-
mental value of information, ∆Vi, is presented in Table 7. Although both economic measures
of performance decrease as the absolute risk aversion increases, the manager’s willingness to
pay seems to be less responsive to such a change than the incremental value of information.
The second column of Table 7 shows the percentage change in the root MSE distance for
comparison. An entry below 100 indicates that the model Mi outperforms the benchmark.
All measures suggest that the worst performing model is MEM, while the best models are
the SARMAX–GARCH and the ABMA–AIC combined forecasts. The manager is willing
to pay up to 1687 Euro in order to use these models instead of the benchmark. The model
characterized by the minimum (positive) willingness to pay, 912 Euro, is the Poisson count
data specification.
22
The ranking based on the incremental value of information is consistent with that based
on the willingness to pay. On the contrary, being symmetric about zero forecast errors,
the ranking based on the MSE ranking is quite different. For instance, the Airline model
would be preferred to the SARMAX–GARCH model, which is the best option when the loss
function is consistent with the manager’s compensation scheme.
Overall, both economic and statistical evaluation of models indicate the SARMAX–
GARCH model and the ABMA–AIC combining method based on model group G4 as the
best options for the manager. However, using only statistical methods, we cannot clearly
identify which of these options is best. On the contrary, given that the monetary value of
the two forecasts and the manager’s willingness to pay are very similar, we can conclude that
the SARMAX–GARCH model is to be preferred to the ABMA–AIC forecast combination
method.
Table 7: Economic evaluation of modelsδi (Euro) ∆Vi (Euro)
∆RMSE (%) λ = 0.0002 λ = 0.0003 λ = 0.0005 λ = 0.0002 λ = 0.0003 λ = 0.0005M1 58.40 1377.40 1377.40 1377.30 401.66 114.03 11.96M2 58.50 1544.90 1544.80 1544.60 445.01 125.19 12.97M3 62.27 1342.40 1342.40 1342.30 392.47 111.63 11.74M4 57.06 1413.70 1413.70 1413.70 411.15 116.49 12.19M5 56.85 1687.40 1687.30 1687.20 481.04 134.29 13.77M6 59.02 1433.70 1433.60 1433.50 416.35 117.84 12.31M7 55.86 1517.40 1517.40 1517.30 437.98 123.40 12.81M8 76.36 912.39 912.30 912.16 275.33 80.22 8.73M9 76.10 988.59 988.47 988.28 296.65 86.05 9.30M10 76.10 988.59 988.47 988.28 296.65 86.05 9.30M11 123.66 -933.48 -933.27 -932.94 -324.07 -105.60 -13.59M12 62.72 1031.10 1031.10 1030.90 308.44 89.26 9.62M13 61.19 1217.50 1217.40 1217.40 359.21 102.88 10.93AIC G4 56.85 1687.40 1687.30 1687.20 481.04 134.29 13.77
Notes: economic evaluation of one day ahead forecasts; ∆RMSE=100×(RMSEi/RMSE0), where the RMSE corresponds to theflexible loss distance for ρ = 2 and φ = 0.5; the incremental value of information is ∆Vi = Vi − V0, where Vi is the value ofinformation from model i; the willingness to pay for model i is δ = CEi − CE0, where CEi is the certainty equivalent frommodel i; λ is the coefficient of risk aversion.
Actually, given that the latter method involves more than one model and that the
specification of models has to be periodically revised, the ABMA–AIC forecast combination
method will have higher maintenance costs than the SARMAX–GARCH model. Moreover,
if the model is run by an employee of the call center and not by the professional forecaster,
we see the “ease–of–use” as a critical factor for the choice of the best forecast.
All in all, we have shown that simple measures of performance expressed in monetary
terms are easy to construct and are more flexible than the often used symmetric loss func-
tions. This flexibility allows the forecasts’ user and the adviser to judge the predictive
23
performance of models with the same metric. Moreover, being expressed in monetary terms,
we believe that these measures are more interesting for practitioners than traditional sta-
tistical distances. Finally, from the perspective of the professional forecaster, we see the
results in this section as complementary to those based on flexible loss functions. From the
standpoint of forecast users, we believe that the economic measures are to be preferred, as
they are more closely linked to their profit maximizing behavior.
4 Robustness checks
4.1 Rolling estimation window
Although the recursive (or expanding) scheme has the advantage of using more observations
than its rolling (or moving) window counterpart, the latter is more robust to the presence of
structural breaks, while it is based on an estimation sample size which is arbitrarily chosen.
The fixed forecasting scheme is another option, which not only does involve an arbitrary
choice of the sample size, but also it is less robust to the presence of structural breaks in
the variance and in the mean of the series. Its main strength is the fact that the model is
estimated only once. However, this aspect does not justify its use in our setting, given that
all models we have proposed can be easily estimated and forecasts and simulations are based
on the most recent sample of observations.
For these reasons, we have decided to produce new forecasts for the Italian call center
with a rolling scheme and a sample size R = 371. This sample size matches that of the first
estimation sample used with the recursive scheme, and it allows to rely on the same evaluation
sample. Results in Table 8 show that SARMAX specifications, with or without the GARCH
component, remain among the preferred options at all forecast horizons, irrespective of the
shape of the loss function.
24
Table 8: Ranking of models for the Italian call center (rolling window estimation)(a) h = 1 (b) h = 7
φ = 0.42 φ = 0.5 φ = 0.58 φ = 0.42 φ = 0.5 φ = 0.58M0 14 14 14 14 14 14M1 3 3 4 4 4 4M2 5 4 1 5 5 5M3 7 7 9 10 12 13M4 2 1 3 1 1 1M5 4 2 2 3 3 2M6 9 8 7 2 2 3M7 6 6 6 13 13 12M8 10 10 10 9 9 8M9 11 11 11 8 8 7M10 12 12 12 7 7 6M11 13 13 13 11 10 9M12 1 5 5 6 6 10M13 8 9 8 12 11 11
(c) h = 28 (d) h = 1, ..., 28φ = 0.42 φ = 0.5 φ = 0.58 φ = 0.42 φ = 0.5 φ = 0.58
M0 14 14 14 14 14 14M1 5 5 5 6 4 5M2 4 4 4 7 5 3M3 11 11 12 1 11 13M4 1 2 2 2 1 6M5 2 1 1 4 2 2M6 3 3 3 5 3 4M7 13 13 13 13 13 12M8 8 8 6 10 8 7M9 7 7 8 9 6 8M10 6 6 7 8 7 9M11 9 9 9 11 9 1M12 10 10 10 3 10 10M13 12 12 11 12 12 11
Notes: entries represent the ranking of models based on the statistics√P−1
∑Pt=1 Lt, where Lt is the flexible loss function in
Eq. (1) and P = 351; statistics in panel (a) - (c) are based on the univariate loss for forecast horizons h = 1, 7, 28, while thosein panel (d) are based on the multivariate loss for h = 1, ..., 28; the shape of the loss function is determined by ρ = 2 and theasymmetry coefficient: φ = 0.42, 0.50, 0.58 (τ = −0.16, 0.00, 0.16); these values guarantee that the multivariate loss is alwaysnon-negative (see Komunjer and Owyang (2012) for details). Models, Mm, are described in Table 1. Entries in bold denotemodels with the lowest loss.
4.2 Additional call arrival data
The relevance of our results clearly depends on whether they can be extended to other call
arrival data. To shed light on this issue, we consider two additional series representing the
number of call arrivals recorded at the call centers operated by two banks, one located in
Israel and the other in the U.S.. In Table 9 we list, for each series, the model with the lowest
MSFE. Our results show that the ARMAX–GARCH and the SARMAX–GARCH models
are the winning options in all countries and at all forecast horizons.
25
Table 9: MSFE ranking of forecasting models for call centers in Israel, Italy and the U.S.country h = 1 h = 7 h = 28 h = 1, ..., 28
Israel M5 M5 M2 M5
Italy M7 M5 M5 M5
U.S. M2 M2 M3 M2
Notes: the first column identifies the dataset used; entries in columns 2-4 represent the model with the lowest Mean SquaredForecast Error (MSFE) at forecast horizon h = 1, 7, 28 days, while entries in the last column represent the model with thelowest sum of MSFE over forecast horizons h = 1, 2, . . . , 28. Models are described in Tables 1.
4.3 Degree of asymmetry of the loss function
In Section 3 the degree of asymmetry has been restricted to lie in the range φ = (0.42, 0.58)
because, with h = H = 28, the multivariate loss is non–negative for |τ | < 1/√H, that is
|τ | < 0.18. This corresponds to φ = (|τ | + 1)/2, that is 0.41 and 0.59. Therefore, while in
the multivariate case we cannot vary φ any further, when evaluating one forecast horizon
at the time we can consider more extreme values of φ. Figure 2 shows the best–performing
model for φ = 0.1, 0.2, . . . , 0.9 and h = 1, 7, 28. The ranking remains surprisingly stable for
a wide range of values of the asymmetry parameter.
4.4 On the usefulness of second–moment modeling
We have highlighted that the addition of a GARCH component to ARMAX and SARMAX
models improves their forecasting performance. However, having focused on point forecasts,
we have not fully exploited the strength of these specifications, namely the modeling of
the conditional variance of call arrival data. The GARCH component can be exploited in
the construction of optimal point forecasts for log–transformed variables and for density
forecasts.
26
Fig
ure
2:R
ankin
gof
model
sas
afu
nct
ion
ofth
eas
ym
met
ryof
the
loss
funct
ion
(a)
Isra
el
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
1
M8M
8M
8
M5M
5M
5
M2M
2M
2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
7
M11
M5M
5M
5M
5M
5M
5
M2M
2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
28
M11
M2M
2M
2M
2M
2M
2M
2M
2
(b)
Italy
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
1
M7M
7M
7M
7M
7M
5M
5M
6M
6
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
7
M5M
5M
5M
5M
5M
5M
5M
4
M11
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
28
M5M
5M
5M
5M
5M
5M
5M
6
M11
(c)
U.S
.
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
1
M2M
2M
2M
2M
2M
2M
2M
2M
2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
7
M5
M2M
2M
2M
2M
2M
2M
2M
2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
?
024681012 Best Model
h=
28
M2M
2M
2M
2M
3M
2M
3M
3M
3
Notes:
the
figu
resh
ow
sth
em
od
elth
at
min
imiz
esth
elo
ssfu
nct
ion
for
diff
eren
tvalu
esof
the
asy
mm
etry
deg
reeφ
=0.1
,0.2
,0.3
,0.4
,0.5
,0.6
,0.7
,0.8
,0.9
(τ=
2×φ−
1=
-0.8
,-0
.6,
-0.4
,-0
.2,
0.0
,0.2
,0.4
,0.6
,0.8
)
27
Table 10: Naive vs optimal forecastsModel Israel Italy US
M1 0.7477 1.0455 0.9802
M2 0.9778 1.0378 0.9867
M4 0.7133 1.0380 0.9802
M5 0.9758 1.0300 0.9861
Notes: entries are the ratio of the Root Mean Squared Forecast Error of optimal and naive one-step ahead forecasts:RMSFEopt/RMSFEnaive. Cases when the optimal forecast outperforms the naive forecast are indicated with bold text.
Naive vs optimal point forecasts. Since our results indicate that forecasts tend to improve
when a GARCH component is added to the model, it is worth investigating the usefulness
of second-moment modeling in detail. ARMAX and SARMAX models have been estimated
after log-transforming call arrival data (Yt) and forecasts are obtained by the inverse of the
log transformation: ft+h|t = exp(yt+h|t
), where yt+h|t = E (yt+h |yt, yt−1, ...) and yt = log(Yt).
Granger and Newbold (1976) label this approach as “naive”. The naive forecast is biased,
whereas an “optimal” forecast that uses the forecast error variance of the transformed series
(σ2y) to correct the bias would be f optt+h|t = exp
(yt+h|t + 0.5× σ2
y(h)).
Therefore, we compare ARMA and SARMAX models, with and without the GARCH
component, to see if incorporating the conditional variance predictions in the optimal forecast
formula yields to more accurate point forecasts than simply relying on the naive approach.
In the case of ARMAX and SARMAX models without GARCH component, we have used
the variance of the residuals as an estimate of σ2y . For the sake of brevity, we have carried
out this exercise only for one-step ahead forecasts, evaluated with the symmetric MSE loss.
Table 10 shows that optimal forecasts are better than naive forecasts in eight comparisons
out of twelve. Moreover, improvements associated with models with a GARCH component
are generally smaller than models without such component. These results are in line with
those of Lutkepohl and Xu (2010), who show that the accuracy gains due to use of optimal
instead of naive forecasts are usually minimal.
Density forecasts. Density forecasts have been produced for a subset of models that include
the benchmark SRW, ARMAX and SARMAX with and without GARCH component, the
Poisson and the Negative Binomial specifications.
28
Table 11: Density and quantile forecasts evaluationIsrael
Model RPS 5% 25% 75% 95%
M0 415.75 22.83 81.52 100.00 100.00
M1 660.51 10.87 43.48 100.00 100.00
M2 763.86 6.52 36.96 88.04 97.83
M4 707.10 9.78 44.57 98.91 100.00
M5 752.42 7.61 32.61 89.13 95.65
M8 1650.92 2.17 5.43 13.04 26.09
M9 835.89 4.35 38.04 94.57 100.00
Italy
Model RPS 5% 25% 75% 95%
M0 4604.45 6.15 36.00 87.69 98.46
M1 3966.79 7.08 41.85 89.54 97.23
M2 4019.20 5.85 36.00 82.46 94.46
M4 3941.04 8.31 42.77 89.23 96.92
M5 4009.09 7.08 34.15 82.77 94.46
M8 11174.43 0.00 0.62 2.77 6.15
M9 4800.73 7.69 40.92 84.31 96.00
U.S.
Model RPS 5% 25% 75% 95%
M0 6836.74 26.94 78.89 86.39 89.17
M1 4132.72 6.11 35.28 81.39 92.50
M2 3487.27 5.83 27.22 75.83 90.00
M4 4106.38 6.11 36.39 82.22 92.50
M5 3485.90 6.11 28.06 76.39 90.83
M8 10224.19 0.00 0.56 2.22 3.61
M9 5048.43 3.89 24.44 77.22 95.56
Notes: column 2 presents Ranked Probability Score (RPS), while columns 3-6 show the Empirical Coverage Probability (ECP)of 5%, 25%, 75% and 95% prediction intervals. In all cases the most accurate model is identified with boldface font.
We compare density forecasts obtained from count data models with those from linear
specifications on log–transformed data, assuming that the series in levels are log-normal.
One–step–ahead density forecasts are computed with Monte Carlo simulations. For each
observation in the evaluation sample we have drawn 1000 observations from the relevant
distribution using the estimated or forecast parameters. In particular, for SRW, ARMA
and SARMAX models, we simulate from a log–normal distribution with parameters equal
to the one–step ahead forecast of its mean and variance. When the model has a GARCH
component, we rely of the forecast of the conditional variance, otherwise we use the estimate
of the variance of the residuals. In the case of the Poisson, we rely on the one–step–ahead
estimate of the arrival rate. Lastly, for the Negative Binomial model, density forecasts are
drawn using the in-sample estimate of the over–dispersion parameter.
29
Density forecasts are evaluated using the Ranked Probability Score (RPS):
RPSt =J∑
j=0
[Ft(j)− I(Yt ≤ j)]2 (7)
where Ft is the forecast of the empirical cumulative distribution function of Yt, I(.) is
the indicator function and J is the possible maximum number of calls at date t, which is set
as the maximum between the realized and largest simulated forecast of Yt. Averaging over
the evaluation sample we get: RPS = H−1∑H
t=1RPSt, which is a loss function and it is
lower for models that perform better in terms of density forecast ability.
Moreover, we evaluated distributional forecasts computing the Empirical Coverage Prob-
ability (ECP) of 5%, 25%, 75% and 95% prediction intervals:
ECPθ =1
H
H∑
t=1
I(q0.5θt < Yt < q1−0.5θ
t
)(8)
Results in Table 11 show that ARMAX and SARMAX models yield the lowest RPS for
both the Italian and U.S. call arrival data, while in the case of Israeli, the SRW is the preferred
option. When looking at the ECP of the prediction interval we can see that in 9 cases out of
12 the most accurate forecasts are produced with models including a GARCH component.
Interestingly, there is also evidence that modeling the over–dispersion of call arrival data
with a Negative Binomial model might be a useful approach for density forecasting.
5 Conclusions
Call centers’ managers and companies relying on call center services are interested in ob-
taining accurate forecasts of call arrivals for achieving optimal operating efficiency. This
paper has shown how to choose among competing forecasting methods for call arrivals in
call centers.
The empirical exercise in this paper mimics the interaction between a professional fore-
caster and a manager who uses forecasts of incoming calls to decide how many operators are
required each day at a call center. In this context, we have evaluated fourteen models and a
30
set of seven forecast combination schemes using flexible loss functions, statistical tests and
economic measures of performance.
Each of the forecast models discussed in this paper is able to capture one or more
key features of the daily call arrival series. Moreover, all of the models and combination
methods are computationally tractable, with a relatively small number of parameters that
can be easily estimated and updated with any off–the–shelf statistical software as new data
become available. We view simplicity as an important point in the selection of a model
for forecasting call arrivals. In fact, to be of practical use, a forecast model not only should
reproduce the key features of the data, but also it has to be easily implementable, and able to
quickly generate new forecasts to support the operational decisions in call centers (Ibrahim
et al., 2016; Mehrotra and Fama, 2003).
We have shown that the professional forecaster can reduce the number of proposed
models using the Reality Check test and the Model Confidence Set. These tests, as well as
models ranking, suggest that the best available options are the SARMAX–GARCH and a
combined forecast obtained with ABMA. Subsequently, we have shown that the economic
evaluation of forecast accuracy leads to the same selection. However, given that individual
and combined forecasts have approximately the same monetary value, the manager will
choose eventually the SARMAX–GARCH model, due to lower maintenance costs. The
maintenance costs of a forecasting model used by an employee of the call center include direct
costs, related to periodical checks of the specification, as well as indirect costs, associated to
the relative complexity of the forecasting method. Given that forecast combinations require
a set of models, whose specifications have to be periodically checked, this approach will
probably lead to higher maintenance costs with respect to the SARMAX–GARCH model,
which is then the best available choice. Moreover, combined forecasts cannot be used for
simulation purposes, which typically need a unique set of key parameters.
Since our paper relies on a wide array of results covering different loss functions, forecast
horizons and call arrival series, a number of more comprehensive conclusions can be drawn.
First, since the benchmark SRW model is always outperformed by other relatively more
general specifications, investing in the use of these models might be a good option for the
management of the call center. Second, irrespective of the forecast horizon and the shape
31
of the loss function, forecast combination, especially if based on optimal combining weights
calculated by means of ABMA, proves to be useful and leads to lower statistical losses
than most forecasts obtained from individual models. Third, the statistical evaluation of
each forecast model indicates that second-moment modeling, in addition to seasonality, is
important. In fact, the ARMAX–GARCH and SARMAX–GARCH models emerge as the
best alternatives among both individual and combined forecasts. This last result implies
that anticipating the variability of call arrivals is extremely relevant since, when a call center
operates under a SLA, higher uncertainty requires higher staffing levels to meet service
quality objectives. Moreover, in the light of its ease of implementation, the SARMAX–
GARCH model seems to be a good candidate also for density forecasting.
References
Aksin, Z., Armony, M., and Mehrotra, V. (2007). The modern call center: A multi-
disciplinary perspective on operations management research. Production and Operations
Management, 16:665–688.
Andrews, B. H. and Cunningham, S. M. (1995). L.L. Bean improves call-center forecasting.
Interfaces, 25(6):1–13.
Antipov, A. and Meade, N. (2002). Forecasting call frequency at a financial services call
centre. The Journal of the Operational Research Society, 53(9):953–960.
Bastianin, A., Galeotti, M., and Manera, M. (2014). Causality and predictability in distri-
bution: The ethanol-food price relation revisited. Energy Economics, 42:152 – 160.
Bianchi, L., Jarrett, J., and Hanumara, R. C. (1998). Improving forecasting for telemarketing
centers by ARIMA modeling with intervention. International Journal of Forecasting,
14(4):497–504.
Cameron, A. C. and Trivedi, P. K. (1990). Regression-based tests for overdispersion in the
Poisson model. Journal of Econometrics, 46(3):347–364.
32
Channouf, N., L’Ecuyer, P., Ingolfsson, A., and Avramidis, A. (2007). The application of
forecasting techniques to modeling emergency medical system calls in Calgary, Alberta.
Health Care Management Science, 10(1):25–45.
Collender, R. N. and Chalfant, J. A. (1986). An alternative approach to decisions under
uncertainty using the empirical moment-generating function. American Journal of Agri-
cultural Economics, 68(3):727–731.
Diebold, F. X. and Mariano, R. S. (1995). Comparing predictive accuracy. Journal of
Business & Economic Statistics, 13(3):253–63.
Dorfman, J. H. and McIntosh, C. S. (1997). Economic criteria for evaluating commodity
price forecasts. Journal of Agricultural and Applied Economics, 29(2):337–345.
Elbasha, E. H. (2005). Risk aversion and uncertainty in cost-effectiveness analysis: the
expected-utility, moment-generating function approach. Health Economics, 14(5):457–470.
Elliott, G., Komunjer, I., and Timmermann, A. (2005). Estimation and testing of forecast
rationality under flexible loss. Review of Economic Studies, 72(4):1107–1125.
Elliott, G., Komunjer, I., and Timmermann, A. (2008). Biases in macroeconomic fore-
casts: Irrationality or asymmetric loss? Journal of the European Economic Association,
6(1):122–157.
Engle, R. F. (2002). New frontiers for ARCH models. Journal of Applied Econometrics,
17(5):425–446.
Franses, P. H. and van Dijk, D. (2005). The forecasting performance of various models for
seasonality and nonlinearity for quarterly industrial production. International Journal of
Forecasting, 21(1):87–102.
Gans, N., Koole, G., and Mandelbaum, A. (2003). Telephone call centers: A tutorial and
literature review. Manufacturing and Service Operations Management, 5(2):79–141.
Gardner, Jr., E. S. (2006). Exponential smoothing: The state of the art–part II. International
Journal of Forecasting, 22(4):637–666.
33
Garratt, A., Lee, K., Pesaran, H. M., and Shin, Y. (2003). Forecast uncertainties in macroeco-
nomic modeling: An application to the U.K. economy. Journal of the American Statistical
Association, 98(464):829–838.
Gbur, E. E. and Collins, R. A. (1989). A small-sample comparison of estimators in the
EU-MGF approach to decision making. American Journal of Agricultural Economics,
71(1):202–210.
Gneiting, T. (2011). Quantiles as optimal point forecasts. International Journal of Forecast-
ing, 27(2):197 – 207.
Granger, C. W. and Newbold, P. (1976). Forecasting transformed series. Journal of the
Royal Statistical Society. Series B, 38(2):189–203.
Hansen, P. R. (2005). A test for superior predictive ability. Journal of Business & Economic
Statistics, 23(4):365–380.
Hansen, P. R., Lunde, A., and Nason, J. M. (2011). The model confidence set. Econometrica,
79(2):453–497.
Ibrahim, R., Ye, H., L’Ecuyer, P., and Shen, H. (2016). Modeling and forecasting call
center arrivals: A literature survey and a case study. International Journal of Forecasting,
32(3):865–874.
Jongbloed, G. and Koole, G. (2001). Managing uncertainty in call centres using Poisson
mixtures. Applied Stochastic Models in Business and Industry, 17(4):307–318.
Jung, R. C. and Tremayne, A. R. (2011). Useful models for time series of counts or simply
wrong ones? AStA Advances in Statistical Analysis, 95(1):59–91.
Kim, T., Kenkel, P., and Brorsen, B. W. (2012). Forecasting hourly peak call volume for a
rural electric cooperative call center. Journal of Forecasting, 31(4):314–329.
Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46(1):33–50.
Komunjer, I. and Owyang, M. T. (2012). Multivariate forecast evaluation and rationality
testing. Review of Economics and Statistics, 94(4):1066–1080.
34
Leitch, G. and Tanner, J. E. (1991). Economic forecast evaluation: Profits versus the
conventional error measures. The American Economic Review, 81(3):580–590.
Lutkepohl, H. and Xu, F. (2010). The role of the log transformation in forecasting economic
variables. Empirical Economics, 42(3):619–638.
Mabert, V. A. (1985). Short interval forecasting of emergengy phone call (911) work loads.
Journal of Operations Management, 5(3):259–271.
Mehrotra, V. and Fama, J. (2003). Call center simulation modeling: methods, challenges,
and opportunities. In Chick, S., Sanchez, P. J., Ferrin, D., and Morrice, D. J., editors,
Proceedings of the 2003 Winter Simulation Conference, pages 135–143. IEEE Press, Pis-
cataway, NJ.
Milner, J. M. and Olsen, T. L. (2008). Service-level agreements in call centers: Perils and
prescriptions. Management Science, 54(2):238–252.
Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and testing.
Econometrica, 55(4):819–847.
Politis, D. N. and Romano, J. P. (1994). The stationary bootstrap. Journal of the American
Statistical Association, 89(428):1303–1313.
Roubos, A., Koole, G., and Stolletz, R. (2012). Service-level variability of inbound call
centers. Manufacturing & Service Operations Management, 14(3):402–413.
Shen, H. and Huang, J. Z. (2008). Forecasting time series of inhomogeneous Poisson pro-
cesses with application to call center workforce management. Annals of Applied Statistics,
2(2):601–623.
Stock, J. H. and Watson, M. W. (2004). Combination forecasts of output growth in a
seven-country data set. Journal of Forecasting, 23(6):405–430.
Stolletz, R. (2003). Performance Analysis and Optimization of Inbound Call Centers.
Springer Berlin Heidelberg.
35
Taylor, J. W. (2008). A comparison of univariate time series methods for forecasting intraday
arrivals at a call center. Management Science, 54(2):253–265.
Taylor, J. W. (2012). Density forecasting of intraday call center arrivals using models based
on exponential smoothing. Management Science, 58(3):534–549.
Thomas, D. J. (2005). Measuring item fill-rate performance in a finite horizon. Manufacturing
& Service Operations Management, 7(1):74–80.
Timmermann, A. (2006). Forecast combinations. In Elliott, G., Granger, C. W. J., and
Timmermann, A., editors, Handbook of Economic Forecasting, volume 1, chapter 4, pages
135–196. Elsevier, Amsterdam.
Tran, T. B. and Davis, S. R. (2011). A quantitative approach to measure organisation
performance. International Journal of Social Science and Humanity, 1(4):289–293.
Tych, W., Pedregal, D. J., Young, P. C., and Davies, J. (2002). An unobserved component
model for multi-rate forecasting of telephone call demand: the design of a forecasting
support system. International Journal of Forecasting, 18(4):673–695.
Weinberg, J., Brown, L. D., and Stroud, J. R. (2007). Bayesian forecasting of an inhomo-
geneous poisson process with applications to call center data. Journal of the American
Statistical Association, 102(480):1185–1198.
White, H. (2000). A reality check for data snooping. Econometrica, 68(5):1097–1126.
36
Appendix to
“Statistical and Economic Evaluation of Time Series
Models for Forecasting Arrivals at Call Centers”
Contents
A Flexible loss functions 2
A.1 Univariate loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A.2 Bivariate loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
B Descriptive statistics 8
B.1 Additional figures and descriptive statistics . . . . . . . . . . . . . . . . . . . 8
B.2 ARCH effects and seasonality in variance . . . . . . . . . . . . . . . . . . . . 14
B.3 Overdispersion test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
A Flexible loss functions
In this section we present more details about the loss function used in the paper. The
statistical evaluation of forecasts is based on the loss function put forth by Komunjer and
Owyang (2012), that is a multivariate generalization of the loss function due to Elliott et al.
(2005, 2008). The strength of these metrics is to encompass a variety of symmetric and
asymmetric loss functions that are often used in empirical applications.
As for the notation, let fi,t+h|t ≡ E(Yt+h|It) be the h-step ahead forecast of Yt+h issued
with model i (Mi), where It is the information set available at time t and h is the forecast
horizon. The corresponding forecast error is ui,t+h|t = Yt+h − fi,t+h|t. The model subscript i
is dropped for ease of notation, and a set of (P × 1) vectors of forecasts errors, one for each
forecast horizon, is considered: u(h) = [u(h)1, . . . , u(h)P ]′ for h = 1, . . . , H, where H is the
maximum forecast horizon (i.e. 28 days in this paper) and P is the size of the evaluation
sample. These vectors form the rows of a (H × P ) matrix: u = [u(1) . . .u(H)]′, where up,
p = 1, . . . , P , indicates its p-th column. With this notation, we can write the multivariate
loss function of Komunjer and Owyang (2012) as:
Lp (up; ρ, τ ) =(‖up‖ρ + τ ′u
)‖up‖ρ−1
ρ (1)
where ‖up‖ρ ≡(∑H
h=1
∣∣u(h)p
∣∣ρ) 1ρ.
Properties of the multivariate loss function. The properties of the multivariate loss function
in equation (1) are established by Komunjer and Owyang (2012). The elements of the
(H × 1) vector τ determine the degree of asymmetry of the loss function. We assume
that all the elements of τ are equal to τ , so that the degree of asymmetry is the same
at all forecast horizons h, although we might consider a different τ for each h. Moreover,
−1 ≤ τ ≤ 1, so that the loss is symmetric and additively separable for τ = 0. Lastly,
it is useful to recall that the non-negativeness of Lp is ensured if the q norm of τ is less
than unity, where 1p
+ 1q
= 1, 1 ≤ p ≤ ∞, with the convention that q = ∞ if p = 1 and
‖τ‖∞ = max {τ1, . . . , τH}. For ρ = 2 and with a common τ for all h = 1, . . . , H, the non-
negativeness of Lp requires ‖τ‖2 < 1, which simplifies to |τ |√H < 1 and hence |τ | < 1√
H.
2
For example, when H = 1, 2, 28, we need |τ | < 1, |τ | < 0.7 and |τ | < 0.18, respectively.
A.1 Univariate loss
To fix the ideas, let us consider the case when the forecaster wants to evaluate forecasts at a
single horizon, so that H = 1 and u = u′(1) = [u(1)1, . . . , u(1)P ]. In this case we can simplify
the notation further and write u = [u1, . . . , uP ]. Thus, since the p-th column of u is a scalar,
we have: ‖up‖ρ ≡ (|up|ρ)1ρ = |up|. Therefore, equation (1) reduces to:
Lp (up; ρ, τ) = (|up|+ τ × u) |up|ρ−1
=
[1 + τ × u
|up|
]× |up|ρ
= [1 + τ × sign (up)]× |up|ρ
= [1 + τ × (2I(up > 0)− 1)]× |up|ρ (2)
where I(.) is the indicator function: I(up > 0) = 1 if up > 0, I(up > 0) = 0 if up < 0,
I(up > 0) = 12
if up = 0.1 Then, letting τ = 2φ− 1, we can write equation (2) as:
Lp (up; ρ, τ) = [1 + τ × (2I(up > 0)− 1)]× |up|ρ
= [1 + (2φ− 1)× (2I(up > 0)− 1)]× |up|ρ
= 2 [1− φ+ (2φ− 1)× I(up > 0)]× |up|ρ
= 2 [1− φ+ τ × I(up > 0)]× |up|ρ (3)
Finally, since I(up > 0) = 1− I(up < 0), we have:
Lp (up; ρ, τ) = 2 [1− φ+ τ × I(up > 0)]× |up|ρ
= 2 [1− φ+ (2φ− 1)× (1− I(up < 0))]× |up|ρ
= 2 [φ+ (1− 2φ)× I(up < 0)]× |up|ρ (4)
Notice that expression (4) corresponds to the univariate flexible loss of Elliott et al. (2005).
1To derive equation (2) we have used the facts that sign(x) = x|x| for x 6= 0 and that sign(x) = 2I(u >
0)− 1.
3
The loss function is asymmetric for φ 6= 0.5. In particular, over-forecasting (negative forecast
errors) is costlier than under-forecasting for φ < 0.5. On the contrary, when φ > 0.5, positive
forecast errors (under-prediction) are more heavily weighed than negative forecast errors
(over-prediction).
Special cases. Figure A1 shows that equation (4) nests several special cases. For ρ = 1
equation (4) gives the lin-lin loss:
Lp (up; 1, φ) = 2 [φ+ (1− 2φ)× I(up < 0)]× |up| (5)
Optimal forecasts from the lin-lin loss are conditional quantiles (see Koenker and Bassett,
1978; Gneiting, 2011). The function is symmetric and boils down to the Mean Absolute
Error loss for φ = 0.5 (τ = 0):
Lp(up; 1,
1
2
)= |up| (6)
For ρ = 2 equation (4) gives the quad-quad loss:
Lp (up; 2, φ) = 2 [φ+ (1− 2φ)× I(up < 0)]× |up|2 (7)
Optimal forecasts from the quad-quad loss are expectiles (Newey and Powell, 1987). See
Bastianin et al. (2014) for the use of expectiles in a forecast horse-race. The function is
symmetric and boils down to the Mean Squared Error loss for φ = 0.5 (τ = 0):
Lp(up; 2,
1
2
)= u2
p (8)
4
Figure A1: Examples of flexible loss functions
u = 0
0
L(u
;;;?
)
; = 1;? < 0:5
Overprediction(y < f)
Underprediction(y > f)
u = 0
0
L(u
;;;?
)
; = 2;? < 0:5
Overprediction(y < f)
Underprediction(y > f)
u = 0
0
L(u
;;;?
)
; = 1;? = 0:5
Overprediction(y < f)
Underprediction(y > f)
u = 0
0
L(u
;;;?
); = 2;? = 0:5
Overprediction(y < f)
Underprediction(y > f)
u = 0
0
L(u
;;;?
)
; = 1;? > 0:5
Overprediction(y < f)
Underprediction(y > f)
u = 0
0
L(u
;;;?
)
; = 2;? > 0:5
Overprediction(y < f)
Underprediction(y > f)
Notes: The figure shows the flexible loss function of Elliott et al. (2005) for different ρ and φ. The flexible loss function is givenby: Lp (up; ρ, φ) = 2 [φ+ (1− 2φ)× I(up < 0)]× |up|ρ, where u = y − f are forecast errors and ρ ≥ 1 and 0 < φ < 1. The firstcolumn of plots shows lin-lin losses (ρ = 1) for different asymmetry levels (φ). The second column of plots shows quad-quadlosses (ρ = 2) for different asymmetry levels (φ). Plots in the first row show that over-prediction (i.e. y− f < 0) is costlier thanunder-prediction (i.e. y − f > 0) for φ < 0.5. Plots in the second row show that the loss is symmetric about forecast errorswhen φ = 0.5. Plots in the third row show that under-prediction is costlier than over-prediction for φ > 0.5.
5
A.2 Bivariate loss
Let us now move to the case H = 2, so that there are only two series of forecast errors, one-
and two-step ahead. In this case u = [u(1), u(2)]′ is a (2× P ) matrix with the p-th column
given by up = [u(1)p, u(2)p]′ for p = 1, . . . , P . Assuming that ρ = 2, the multivariate loss
function in equation (1) reduces to:
Lp (up; 2, τ ) =(‖up‖2 + τ ′u
)‖up‖2
=
(
2∑
h=1
u2(h)p
) 12
+ τu(1)p + τu(2)p
(
2∑
h=1
u2(h)p
) 12
=((u2
(1)p + u2(2)p
) 12 + τu(1)p + τu(2)p
) (u2
(1)p + u2(2)p
) 12
=(u2
(1)p + u2(2)p
)+ τ ×
(u(1)p + u(2)p
)×√(
u2(1)p + u2
(2)p
)(9)
As in the univariate case, the bivariate (and, more generally, the multivariate) loss includes
some special cases: when τ = 0 (φ = 0.5) and ρ = 2, we obtain the trace of the MSE loss
function, while for τ = 0 (φ = 0.5) and ρ = 1, equation (9) reduces to the trace of the
MAE loss function (see Zeng and Swanson, 1998). In both cases, symmetry also ensures
the multivariate loss to be additively separable in univariate losses. On the contrary, when
τ 6= 0 (φ 6= 0.5), the loss function becomes asymmetric and is not additively separable in
individual losses. This is illustrated for the bivariate case in Figure A2.
6
Figure A2: Bivariate flexible loss function and sum of individual losses: contour plots
; = 2; = < 0; (? < 0:5)
u(1)p = 0
u(2
)p=
0; = 2; = < 0; (? < 0:5)
u(1)p = 0
u(2
)p=
0; = 2; = = 0; (? = 0:5)
u(1)p = 0
u(2
)p=
0
; = 2; = = 0; (? = 0:5)
u(1)p = 0
u(2
)p=
0
; = 2; = > 0; (? > 0:5)
u(1)p = 0
u(2
)p=
0
; = 2; = > 0; (? > 0:5)
u(1)p = 0
u(2
)p=
0
Notes: figures in the left panel show the contour plots of the bivariate flexible loss functions of Elliott et al. (2005) for differentρ and τ . The right panel shows the iso-loss curves obtained as the sum of univariate flexible losses with the same ρ and τ .u = y − f are forecast errors and τ = 2φ − 1. Plots in the first row show that over-prediction (i.e. y − f < 0) is costlier thanunder-prediction (i.e. y − f > 0) for τ < 0 (φ < 0.5). Plots in the second row show that the loss is symmetric about forecasterrors when τ = 0 (φ = 0.5). Plots in the third row show that under-prediction is costlier than over-prediction for τ > 0(φ > 0.5).
7
B Descriptive statistics
B.1 Additional figures and descriptive statistics
Table B1 shows the sample average of the three series of daily call arrivals used in the
paper. It is evident from the second column that the Italian and the U.S. series are char-
acterized by very high daily call volumes, while the Israeli series consists of lower counts.
The third and fourth columns report the coefficient of variation of the series and of their
log-transformations. In all cases the logarithm seems to stabilize the variance of the series.
Figures B3, B4, B5 show the line plots of daily call arrivals series used in the paper and
the boxplots of their distribution over the days of the week and the months of the year.
Figure B6 shows the log-transformed daily call arrivals and fitted values from cubic
spline interpolation. The estimated autocorrelation of the three series are shown in Figure
B7.
8
Table B1: Descriptive statistics for daily call arrivals: Israel, Italy and the U.S.Country Avg. Yt C.V. Yt C.V. log Yt TIsrael 960 0.51930 0.14027 361Italy 31258 0.60040 0.08743 749U.S. 45145 0.43377 0.06117 893
Notes: the first column shows the sample average daily call arrivals (Yt), the second column reports the coefficient of variationof Yt (C.V. Yt), the third column shows the coefficient of variation of log Yt (C.V. log Yt). The coefficient of variation, as wellas the sample average, are calculated using the a whole sample of observations, whose size is shown in the fourth column.
Figure B3: Daily call arrivals for the Italian call center
08/09/2008 2009 2010 26/09/20100
25000
50000
75000
100000
num
ber
of c
alls
CallsAverage = 31258Closing days
(a) Daily call arrivals
Mon Tue Wed Thu Fri Sat Sun
0
25000
50000
75000
100000
num
ber
of c
alls
(b) Box plots of daily call arrivals per day foreach day of the week.
JanFebMarAprMayJun Jul AugSepOctNovDec
0
25000
50000
75000
100000
num
ber
of c
alls
(c) Box plots of daily call arrivals per month foreach month of the year.
Notes: panel (a) shows the daily call arrival series (continuous line), its sample average (horizontal dashed line) and closingdays (circles). In each box plot in panels (b) and (c), the central mark indicates the median, the bottom and top edges of thebox indicate the 25th and 75th percentiles, respectively. The two bars, known as whiskers, are located at a distance of 1.5 timesthe interquartile range (i.e. the height of the box) below the 25th percentile and above the 75th percentile. Daily observationsthat fall above or below these two bars are denoted with crosses.
9
Figure B4: Daily call arrivals for the Israeli call centre
01/01/1999 31/12/19990
500
1000
1500
2000
num
ber
of c
alls
CallsAverage = 960
(a) Daily call arrivals
Mon Tue Wed Thu Fri Sat Sun
0
500
1000
1500
num
ber
of c
alls
(b) Box plots of daily call arrivals per day foreach day of the week.
Jan FebMar AprMayJun Jul AugSep Oct NovDec
0
500
1000
1500
num
ber
of c
alls
(c) Box plots of daily call arrivals per month foreach month of the year.
Notes: panel (a) shows the daily call arrival series (continuous line) and its sample average (horizontal dashed line). In eachbox plot in panels (b) and (c), the central mark indicates the median, the bottom and top edges of the box indicate the 25thand 75th percentiles, respectively. The two bars, known as whiskers, are located at a distance of 1.5 times the interquartilerange (i.e. the height of the box) below the 25th percentile and above the 75th percentile. Daily observations that fall aboveor below these two bars are denoted with crosses.
10
Figure B5: Daily call arrivals for the U.S. call center
01/04/2001 2002 2003 26/10/20030
20000
40000
60000
80000
num
ber
of c
alls
CallsAverage = 45145
(a) Daily call arrivals
Mon Tue Wed Thu Fri Sat Sun
20000
40000
60000
80000
num
ber
of c
alls
(b) Box plots of daily call arrivals per day foreach day of the week.
JanFebMar AprMayJun Jul AugSepOctNovDec
20000
40000
60000
80000
num
ber
of c
alls
(c) Box plots of daily call arrivals per month foreach month of the year.
Notes: panel (a) shows the daily call arrival series (continuous line) and its sample average (horizontal dashed line). In eachbox plot in panels (b) and (c), the central mark indicates the median, the bottom and top edges of the box indicate the 25thand 75th percentiles, respectively. The two bars, known as whiskers, are located at a distance of 1.5 times the interquartilerange (i.e. the height of the box) below the 25th percentile and above the 75th percentile. Daily observations that fall aboveor below these two bars are denoted with crosses.
11
Figure B6: Log daily call arrivals and cubic spline fit
01/0
1/19
99
31/1
2/19
99
3
4
5
6
7
8
log
num
ber
of c
alls
log Calls Cubic Spline fit
(a) Israel
08/0
9/20
08
2009
2010
26/0
9/20
10
7
8
9
10
11
12
log
num
ber
of c
alls
log Calls Cubic Spline fit
(b) Italy
01/0
4/20
01
2002
2003
26/1
0/20
03
8.5
9
9.5
10
10.5
11
11.5
log
num
ber
of c
alls
log Calls Cubic Spline fit
(c) US
Notes: for each call arrival series the figure shows the (log) number of incoming calls and the fitted value using a cubic spline.Closing days, if present, are set equal to the sample average before log-transforming the series.
12
Figure B7: Estimated autocorrelation of daily call arrivals
0 7 14 21 28 35 42 49 56Lag
-0.5
0
0.5
1
Aut
ocor
rela
tion
of d
aily
call
arriv
als
(a) Israeli call center.
0 7 14 21 28 35 42 49 56Lag
-0.5
0
0.5
1A
utoc
orre
latio
n of
dai
lyca
ll ar
rival
s
(b) Italian call center.
0 7 14 21 28 35 42 49 56Lag
-0.5
0
0.5
1
Aut
ocor
rela
tion
of d
aily
call
arriv
als
(c) US call center.
Notes: dashed lines are 95% confidence bands based on standard error calculated as the inverse of the square root of the numberof observations.
13
B.2 ARCH effects and seasonality in variance
The presence of conditional heteroskedasticity and seasonality in the variance of daily call
arrivals is investigated in Table B2. For the ARMAX and SARMAX models we present a
Lagrange Multiplier test of the null hypothesis of conditional heteroskedasticity, carried out
as a test for the presence of serial correlation of order one in the squared residuals. Similarly,
the test of the null hypothesis of seasonality in the variance is implemented by estimating
an auxiliary regression of the squared residuals (from the ARMAX model or the SARMAX
model) on a constant and six day-of-week dummy variables. The p-values are based on the
F -test of the joint significance of the six dummy variables (see Franses and Paap, 2004).
Table B2: ARCH effects and seasonality in variance
Israel Italy USModel Test p-value p-value p-valueARMAX ARCH effects 0.9939 0.0000 0.0000
Seasonality in variance 0.0928 0.2270 0.0045SARMAX ARCH effects 0.9651 0.0000 0.0000
Seasonality in variance 0.0882 0.2228 0.0049
Notes: For the ARMAX and the SARMAX models we present a Lagrange Multiplier test of the null hypothesis of conditionalheteroskedasticity, carried out as a test for the presence of serial correlation of order one in the squared residuals. Similarly, thetest of the null hypothesis of seasonality in the variance is implemented by estimating an auxiliary regression of the squaredresiduals (from the ARMAX model or the SARMAX model) on a constant and six day-of-week dummy variables. The p-valuesare based on an F -test of the joint significance of the six dummy variables (see Franses and Paap, 2004). The tests are carriedout on the first estimation sample, that is on the first b0.55× T c observations, where b·c denotes the floor function.
B.3 Overdispersion test
Call arrival volume represents a count and are often modeled as a Poisson process with
time-varying arrival rate, µt. The underlying assumption is that there is large population of
potential customers, each of whom acts independently making calls with a very low prob-
ability (Ibrahim et al., 2016). Then, the total number of incoming calls in a given time
period is approximately Poisson: Yt ∼ Pois(µt). In Poisson regression models the standard
approach is to rely on µt = exp(x′tβ), where xt is a vector of regressors (i.e. in this case,
day-of-the-week dummies and the lagged call volume) and β is a vector of parameters. The
equidispersion property is implicit in the Poisson approach: the variance of the call arrival
volume in each period is equal to its expected value during the same time frame. This
14
property is often not consistent with the features of call arrival data, that could well be
over-dispersed (i.e. the variance is larger than the mean). Since Jung and Tremayne (2011)
have shown that when forecasting with count data a dominating modeling approach does
not exist, we consider three specifications based on the Exponential, Poisson and Negative
Binomial distributions, respectively.
While the Poisson modeling approach posits that the arrival rate, µt, is a deterministic
function of the regressors, we can deal with overdispersion by assuming the arrival rate to
be stochastic. In particular, we consider a Poisson distribution with a Gamma-distributed
arrival rate, which implies that the series of call arrivals follows a Negative Binomial distribu-
tion. Assuming λt = µtνt = exp(x′tβ)νt, where νt > 0 is Gamma-distributed with unit mean
and variance α, Yt then follows a Negative Binomial distribution with mean µt and variance
equal to µt(1 + αµt). Notice that the Negative Binomial distribution includes the Poisson
as a special case when α = 0. In fact, α controls the degree of overdispersion: the variance
exceeds the mean when µt > 0 and α > 0, while for α = 0, E(Yt|xt) = var(Yt|xt) = µt.
Following Cameron and Trivedi (1990) a statistical test of overdispersion can be con-
structed after estimating the Poisson model. The starting point is to specify overdispersion
of the form var(Yt|xt) = µt(1 +αµt). Notice that the previous expression corresponds to the
variance of Yt when it follows a Negative Binomial distribution. A test statistic for H0 : α = 0
against H1 : α 6= 0 can be computed by estimating the Poisson model, constructing the fitted
values µt = exp(x′tβ), the residuals et = Yt − µt and running the following auxiliary OLS
regression:
e2t − Yt = αµ2
t + εt (10)
where εt is an error term. We have implemented this test to check if the three call arrival
series we have considered in the paper are overdispersed. The second column of Table B3
presents the estimates of the overdispersion parameter from running the auxiliary regression
in equation (10). A low p-value in the third column denotes rejection of H0 : α = 0 and
hence provides evidence against equidispersion of the data. The fourth column presents
estimates of the degree of overdispersion implied by the Negative Binomial model. A low
p-value in the fifth column denotes rejection of H0 : α = 0 and hence provides evidence
15
against equidispersion of the data. The Poisson model underlying the auxiliary regression of
the overdispersion test and the Negative Binomial model include a day-of-the-week dummies
and the lagged dependent variable.
Table B3: Overdispersion test and estimates of the Negative Binomial modelOverdispersion test Negative Binomial model
country α p-value α p-valueIsrael 0.0338 (0.0000) 0.0967 (0.0000)Italy 0.0550 (0.0000) 0.5509 (0.0000)US 0.0108 (0.0000) 0.0204 (0.0000)
Notes: the second column presents the estimates of the overdispersion parameter from running the auxiliary regression inequation (10). A low p-value in the third column denotes rejection of H0 : α = 0 and hence provides evidence againstequidispersion of the data. The fourth column presents estimates of the degree of overdispersion implied by the NegativeBinomial model. A low p-value in the fifth column denotes rejection of H0 : α = 0 and hence provides evidence againstequidispersion of the data. The Poisson model underlying the auxiliary regression of the overdispersion test and the NegativeBinomial model include a day-of-the-week dummies and the lagged dependent variable.
References
Bastianin, A., Galeotti, M., and Manera, M. (2014). Causality and predictability in distri-
bution: The ethanol-food price relation revisited. Energy Economics, 42:152 – 160.
Cameron, A. C. and Trivedi, P. K. (1990). Regression-based tests for overdispersion in the
Poisson model. Journal of Econometrics, 46(3):347–364.
Elliott, G., Komunjer, I., and Timmermann, A. (2005). Estimation and testing of forecast
rationality under flexible loss. Review of Economic Studies, 72(4):1107–1125.
Elliott, G., Komunjer, I., and Timmermann, A. (2008). Biases in macroeconomic fore-
casts: Irrationality or asymmetric loss? Journal of the European Economic Association,
6(1):122–157.
Franses, P. H. and Paap, R. (2004). Periodic Time Series Models. Oxford University Press,
Oxford.
Gneiting, T. (2011). Quantiles as optimal point forecasts. International Journal of Forecast-
ing, 27(2):197 – 207.
16
Ibrahim, R., Ye, H., L’Ecuyer, P., and Shen, H. (2016). Modeling and forecasting call
center arrivals: A literature survey and a case study. International Journal of Forecasting,
32(3):865–874.
Jung, R. C. and Tremayne, A. R. (2011). Useful models for time series of counts or simply
wrong ones? AStA Advances in Statistical Analysis, 95(1):59–91.
Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46(1):33–50.
Komunjer, I. and Owyang, M. T. (2012). Multivariate forecast evaluation and rationality
testing. Review of Economics and Statistics, 94(4):1066–1080.
Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and testing.
Econometrica, 55(4):819–847.
Zeng, T. and Swanson, N. R. (1998). Predictive evaluation of econometric forecasting models
in commodity futures markets. Studies in Nonlinear Dynamics & Econometrics, 2(4):159–
177.
17