bootstrap prediction bands for functional time series

Bootstrap Prediction Bands for Functional TimeSeries

Efstathios PaparoditisDepartment of Mathematics and Statistics

University of Cyprus

Han Lin ShangDepartment of Actuarial Studies and Business Analytics

Macquarie University

Abstract

A bootstrap procedure for constructing prediction bands for a stationary functional timeseries is proposed. The procedure exploits a general vector autoregressive representationof the time-reversed series of Fourier coefficients appearing in the Karhunen-Loeve repre-sentation of the functional process. It generates backward-in-time, functional replicates thatadequately mimic the dependence structure of the underlying process in a model-free wayand have the same conditionally fixed curves at the end of each functional pseudo-time series.The bootstrap prediction error distribution is then calculated as the difference between themodel-free, bootstrap-generated future functional observations and the functional forecastsobtained from the model used for prediction. This allows the estimated prediction errordistribution to account for the innovation and estimation errors associated with predictionand the possible errors due to model misspecification. We establish the asymptotic validityof the bootstrap procedure in estimating the conditional prediction error distribution ofinterest, and we also show that the procedure enables the construction of prediction bandsthat achieve (asymptotically) the desired coverage. Prediction bands based on a consistentestimation of the conditional distribution of the studentized prediction error process alsoare introduced. Such bands allow for taking more appropriately into account the localuncertainty of prediction. Through a simulation study and the analysis of two data sets,we demonstrate the capabilities and the good finite-sample performance of the proposedmethod.

Keywords: Functional prediction; Prediction error; Principal components; Karhunen-Loeveexpansion.

1

arX

iv:2

004.

0397

1v2

[m

ath.

ST]

27

May

202

1

1 Introduction

Functional time series consist of random functions observed at regular time intervals. Theycan be classified into two main categories depending on whether the continuum is also a timevariable. First, functional time series can arise from measurements obtained by separating analmost continuous time record into consecutive intervals (e.g., days, weeks, or years; see, e.g.,Hormann and Kokoszka, 2012). We refer to such data structures as sliced functional time series,examples of which include daily price curves of a financial stock (Kokoszka et al., 2017) andintraday particulate matter (Shang, 2017). When the continuum is not a time variable, functionaltime series can also arise when observations over a period are considered as finite-dimensionalrealizations of an underlying continuous function (e.g., yearly age-specific mortality rates; see,e.g., Chiou and Muller, 2009; Hyndman and Shang, 2009).

In either case, the underlying strictly stationary stochastic process is denoted by X = {Xt, t ∈Z}, where each Xt is a random element in a separable Hilbert spaceH, with values Xt(τ) and τvarying within a compact interval I ⊂ R. We assume that I = [0, 1] without loss of generality.Furthermore, we assume thatH = L2, the set of (equivalence classes of) measurable, real-valuedfunctions x(·) defined on [0, 1] and satisfying

∫ 10 x2(τ)dτ < ∞. Central statistical issues include

modeling of the temporal dependence of the functional random variables {Xt, t ∈ Z}, makinginferences about parameters of interest, and predicting future values of the process when anobserved stretch X1,X2, . . . ,Xn is given. Not only is it vital to obtain consistent estimators, butto also estimate the uncertainty associated with such estimators, the construction of confidenceor prediction intervals, and the implementation of hypothesis tests (e.g., Horvath et al., 2014).When such inference problems arise in functional time series, a resampling methodology,especially bootstrapping, is an important alternative to standard asymptotic considerations.For independent and identically distributed (i.i.d.) functional data, bootstrap procedures havebeen considered, among others, by Cuevas et al. (2006); McMurry and Politis (2011); Goldsmithet al. (2013); Shang (2015); Paparoditis and Sapatinas (2016), where appropriate sampling fromthe observed sample is used to mimic sampling from the population. However, for functionaltime series, the existing temporal dependence between the random elements Xt significantlycomplicates matters, and the bootstrap must be appropriately adapted in order to be successful.

The development of bootstrap procedures for functional time series has received increasingattention in recent decades. In an early paper, Politis and Romano (1994) obtained weakconvergence results for approximate sums of weakly dependent, Hilbert space-valued randomvariables. Dehling et al. (2015) also obtained weak convergence results for Hilbert space valuedrandom variables, which are assumed to be weakly dependent in the sense of near-epochdependence and showed consistency of a non-overlapping block bootstrap procedure. Ranaet al. (2015) extended the stationary bootstrap to functional time series, Ferraty and Vieu (2011)applied a residual-based bootstrap procedure to construct confidence intervals for the regressionfunction in a nonparametric setting, and Zhu and Politis (2017) to kernel estimators. Franke andNyarige (2019) proposed a residual-based bootstrap procedure for functional autoregressions.Pilavakis et al. (2019) established theoretical results for the moving block and the tapered blockbootstrap, Shang (2018) applied a maximum entropy bootstrap procedure, and Paparoditis(2018) proposed a sieve bootstrap for functional time series.

In this paper, we build on the developments mentioned above and focus on constructingprediction intervals or bands for a functional time series. To elaborate, suppose that for everyt ∈ Z, the zero mean random element Xt is generated as

Xt = f (Xt−1,Xt−2, . . .) + εt, (1)

where f : H∞ → H is some appropriate operator and {εt} is a zero mean i.i.d. innovation

2

process inH with E‖εt‖2 < ∞ and ‖ · ‖ the norm ofH. For simplicity, we write εt ∼ i.i.d.(0, Cε),where Cε = E(εt ⊗ εt) is the covariance operator of εt and ⊗ denotes the tensor operator,defined by (x ⊗ y)(·) = 〈x, ·〉y, for x, y ∈ H. Suppose that based on the last k observedfunctional elements, Xn,Xn−1, . . . ,Xn−k+1, k < n, a predictor

Xn+h = g(h)(Xn,Xn−1, . . . ,Xn−k+1), (2)

of Xn+h is used, where h ∈ N is the prediction horizon and g(h) : Hk → H some estimatedoperator. For instance, we may think of g(h) as an estimator of the best predictor, i.e., of theconditional expectation E(Xn+h|Xn,Xn−1, . . . ,Xn−k+1). The important case, however, we havein mind, is that where a model

Xt = g(Xt−1,Xt−2, . . . ,Xt−k) + vt, (3)

is used to predict Xn+h. Here g : Hk → H is an unknown, linear bounded operator andvt ∼ i.i.d.(0, Cv). Using model (3), a h-step-ahead predictor can be obtained as

Xn+h = g(Xn+h−1, Xn+h−2, . . . , Xn+h−k

), (4)

where g is an estimator of the operator g in (3) and Xt ≡ Xt, if t ∈ {n− k + 1, n− k + 2, . . . , n}.Notice that the predictor (4) also can be written as Xn+h = g(h)(Xn,Xn−1, . . . ,Xn−k+1) forsome appropriate operator g(h). In particular, setting g(1) = g, it is easily seen that g(h) =

g(g(h−1), . . . , g(1),Xn, . . . ,Xn−k+h) for 2 ≤ h ≤ k and g(h) = g(g(h−1), . . . , g(h−k)) for h > k. Westress here the fact that we do not assume that model (3) used for prediction coincides with thetrue data generating process (1); that is, we allow for model misspecification Observe that thesimple case, where g is known up to a finite-dimensional vector of parameters, is also coveredby the above setup.

As already mentioned, our aim is to construct a prediction band for Xn+h associated withthe predictor Xn+h. That is, given the part Xn,k = (Xn,Xn−1, . . . ,Xn−k+1) of the functional timeseries observed and for any α ∈ (0, 1), we want to construct a band denoted by {[Xn+h(τ)−Ln,h(τ), Xn+h(τ) + Un,h(τ)], τ ∈ [0, 1]} such that

limn→∞

P(Xn+h(τ)− Ln,h(τ) ≤ Xn+h(τ) ≤ Xn+h(τ) + Un,h(τ), for all τ ∈ [0, 1]

∣∣∣Xn,k

)= 1− α.

Toward this end we focus on the estimation of the conditional distribution of the prediction errorEn+h = Xn+h − Xn+h given Xn,k, which is a key quantity for the construction of the predictionband of interest. Using (4), this error can be decomposed as

En+h := Xn+h − Xn+h

= εn+h

+[

f (Xn+h−1,Xn+h−2, . . .)− g (Xn+h−1,Xn+h−2, . . . ,Xn+h−k)]

+[

g (Xn+h−1,Xn+h−2, . . . ,Xn+h−k)− g(Xn+h−1, Xn+h−2, . . . , Xn+h−k

) ]= EI,n+h + EM,n+h + EE,n+h,

with an obvious notation for EI,n+h, EM,n+h and EE,n+h. Notice that EI,n+h is the error attributableto the i.i.d. innovation, EM,n+h is the model specification error, and EE,n+h is the error attributableto estimation of the unknown operator g and of the random elements Xn+h−1, . . . ,Xn+h−k used

3

for h-step prediction. Observe that if h = 1 then EE,n+1 only depends on the estimation error

g− g. Furthermore, if g is a consistent estimator of g, for instance, if ‖g− g‖LP→ 0, with ‖ · ‖L

being the operator norm, the part of the estimation error EE,n+h which is due to the estimator gis asymptotically negligible. On the contrary, the misspecification error EM,n+h may not vanishasymptotically if the model used for prediction is different from the one generating the data.

To better illustrate the above discussion, consider the following example. Suppose that Xt isgenerated according to the FAR(2) model Xt = Φ1(Xt−1) + Φ2(Xt−2) + εt, Φ2 6= 0, and that aFAR(1) model Xt = R(Xt−1) + vt is used for prediction, where Φ1, Φ2, and R are appropriateoperators. In general, R 6= Φ1. With R denoting an estimator of R, the prediction error En+h canbe decomposed as

Xn+h−Xn+h = εn+h +(Φ1(Xn+h−1)−R(Xn+h−1)+Φ2(Xn+h−2)

)+(

R(Xn+h−1)− R(Xn+h−1)).

Consider now the conditional distribution En+h|Xn. Notice that if h = 1, the model specificationerror

(Φ1(Xn)− R(Xn) + Φ2(Xn−1)

)causes a shift in this conditional distribution due to the

term Φ1(Xn)− R(Xn) as well as an increase in variability due to the term Φ2(Xn−1). Similarly,for h ≥ 2 the model specification error

(Φ1(Xn+h−1) − R(Xn+h−1) + Φ2(Xn+h−2)

)does not

vanish asymptotically. Furthermore, for h ≥ 2, the error(

R(Xn+h−1)− R(Xn+h−1))

is not onlydue to the estimator R of R (as in the case h = 1), but also due the fact that the unknown randomelement Xn+h−1 has been replaced by its predictor Xn+h−1. This causes a further increase invariability.

An appropriate procedure to construct prediction intervals or bands, should consider all threeaforementioned sources affecting the prediction error and consistently estimate the conditionaldistribution En+h|Xn,Xn−1, . . . ,Xn−k+1. However, and to the best of our knowledge, this issuehas not been appropriately explored in the literature. In particular, and even in the most studiedunivariate, real-valued case, it is common to estimate the prediction error distribution byignoring the model specification error, that is, by assuming that the model used for prediction isidentical to the data generating process. Consequently, the bootstrap approaches applied in thiscontext, use the same model to make the prediction and to generate the bootstrap pseudo-timeseries. Such approaches ignore the model misspecification error; see Thombs and Schucany(1990), Breidt et al. (1995), Alonso et al. (2002), Pascual et al. (2004) as well as Pan and Politis(2016) and the references therein. See also Section 3 for more details.

In this paper, we develop a bootstrap procedure to construct prediction bands for functionaltime series that appropriately takes into account all three sources of errors affecting the condi-tional distribution of En+h. The proposed bootstrap approach generates, in a model-free way,pseudo-replicates X1,∗ ,X ∗2 , . . . ,X ∗n , and X ∗n+1,X ∗n+2, . . . ,X ∗n+h of the functional time series athand that appropriatelly mimic the dependence structure of the underlying functional process.Moreover, the approach ensures that the generated functional pseudo-time series has the samek functions at the end as the functional times series observed; that is, X ∗t = Xt holds true fort ∈ {n − k + 1, n − k + 2, . . . , n}. This is important because, as already mentioned, it is theconditional distribution of En+h given Xn,Xn−1, . . . ,Xn−k+1 in which we are interested. Theserequirements are fulfilled by generating the functional pseudo-elements X ∗1 ,X ∗2 , . . . ,X ∗n usinga backward-in-time vector autoregressive representation of the time-reversed process of scoresappearing in the Karhunen-Loeve representation (see Section 2 for details). Given the model-free, bootstrap-generated functional pseudo-time series X ∗1 ,X ∗2 , . . . ,X ∗n and X ∗n+1, . . . ,X ∗n+h, thesame model used to obtain the predictor Xn+h = g(Xn+h−1, . . . , Xn+h−k), see (4), is then applied,and the pseudo-predictor X ∗n+h = g∗(X ∗n+h−1, . . . , X ∗n+h−k) is obtained. Here, X ∗t = X ∗t = Xt ift ∈ {n, n− 1, . . . , n− k + 1} and g∗ denotes the same estimator as g but based on the bootstrap

4

functional pseudo-time series X ∗1 ,X ∗2 , . . . ,X ∗n . The conditional (on Xn,k) distribution of the pre-diction error Xn+h − Xn+h is then estimated using the conditional distribution of the bootstrapprediction error X ∗n+h − X ∗n+h. We show that the described procedure leads to consistent esti-mates of the conditional distribution of interest. We also prove the consistency of the bootstrapin estimating the conditional distribution of the studentized prediction error process inH. Thelatter consistency is important because it theoretically justifies the use of the proposed bootstrapmethod in the construction of simultaneous prediction bands for the h-step-ahead predictionthat also appropriately account for the local variability of the corresponding prediction error.Using simulations and two empirical data applications the good finite sample performance ofthe bootstrap procedure proposed.

We perform dimension reduction via the truncation of the Karhunen-Loeve representationand we capture the infinite-dimensional structure of the underlying functional process byallowing the number of principal components used to increase to infinite (at an appropriaterate) with n. These aspects of our procedure are common to the sieve-bootstrap introducedin Paparoditis (2018). However, and apart from the differences in the technical tools usedfor establishing bootstrap validity, a novel, general backward autoregressive representationof the vector process of scores is introduced, which is a key part of the bootstrap procedureproposed in this paper. This representation allows for the generation of the functional pseudotime series X ∗1 ,X ∗2 , . . . ,X ∗n , backward in time and enables, therefore, for this pseudo time seriesto satisfy the condition X ∗t = Xt for t ∈ {n − k + 1, n − k + 2, . . . , n}. The latter conditionis essential for successfully evaluating the conditional distribution of the prediction errorEn+h|Xn,k using the bootstrap and for the construction of the desired prediction bands. Thesame condition, including the aforementioned backward vector autoregressive representationof the score process and the focus on prediction error distribution, also are the main differencesto the resampling approach considered in Shang (2018) which has been used for estimating thelong-run variance. This approach is based on bootstrapping the principal component scores bymaximum entropy.

Antoniadis et al. (2006) and Antoniadis et al. (2016) considered nonparametric (kernel-type),one step ahead predictor and proposed a resampling method to construct pointwise predictionintervals. Model-based bootstrap procedures to construct pointwise prediction intervals forone step ahead prediction under (mainly) FAR(1) model assumptions on the data generatingprocess have been considered in Rana et al. (2016) and Vilar et al. (2018). These approachesdiffer from ours. They are developed for particular predictors, they are designed for pointwiseprediction intervals only, and the bootstrap approaches involved, only are designed for thespecific prediction setting considered. More related to our approach in terms of not requiringparticular model assumptions for the data generating process and of not being designed for aspecific predictor, is the approach proposed in Aue et al. (2015) for the construction of predictionbands. In Section 5 we compare the performance of this approach with the bootstrap approachproposed in this paper.

This paper is organized as follows. In Section 2, we state the notation used and introduce thenotion of backward vector autoregressive representations of the time-reversed vector process ofscores appearing in the Karhunen-Loeve representation. In Section 3, we present the proposedbootstrap procedure and show its asymptotic validity for the construction of simultaneousprediction bands. Section 4 is devoted to some practical issues related to the construction ofprediction bands and the implementation of our procedure. Section 5 investigates the finitesample performance of the proposed bootstrap procedure using simulations, while in Section 6,applications of the new methodology to two real-life data sets are considered. Conclusionsare provided in Section 7. Proofs and auxiliary lemmas are given in the Appendix and in the

5

Supplementary Material.

2 Preliminaries

2.1 Setup and Examples of Predictors

Consider a time series X1,X2, . . . ,Xn stemming from a stationary, L4-M-approximable stochas-tic process X = {Xt, t ∈ Z} with mean E(Xt) = 0 and autocovariance operator Cr =E(Xt ⊗ Xt+r), r ∈ Z. Recall that Cr is a Hilbert-Schmidt (HS) operator. The L4-M approx-imability property allows for a weak dependence structure of the underlying functional pro-cess, which covers a wide range of commonly used functional time series models, includingfunctional linear processes and functional autoregressive, conditional heteroscedasticity pro-cesses; (see Hormann and Kokoszka, 2010, for details). L4-M-approximability implies that∑r∈Z ‖Cr‖HS < ∞ and therefore, that the functional process X possesses a continuous, self-adjoint spectral density operator Fω, given by

Fω = (2π)−1 ∑r∈Z

Che−irω, ω ∈ R,

which is trace class (Hormann et al., 2015) (also see Panaretos and Tavakoli (2013) for a differentset of weak dependence conditions on the functional process X). Here and in the sequel, ‖ · ‖HSdenotes the Hilbert-Schmidt norm of an operator while ‖ · ‖F the Frobenious norm of a matrix.We assume that the eigenvalues ν1(ω), ν2(ω), . . . , νm(ω) of the spectral density operator Fω arestrictly positive for every ω ∈ [0, π].

Suppose that the h-step-ahead predictor of Xn+h is obtained as

Xn+h = g(h)(Xn, . . . ,Xn−k+1), (5)

where k ∈ N, k < n, is fixed and determined by the model selected to perform the prediction(see also (3)), while g(h) denotes an estimator of the unknown operator g(h). Based on Xn+h, ouraim is to construct a prediction interval, respectively, prediction band for Xn+h associated withthe model (5) which is used for prediction. Toward this end, an estimator of the distributionof the prediction error En+h = Xn+h − Xn+h is needed. More precisely, we are interested inestimating the conditional distribution

En+h∣∣Xn,Xn−1, . . . ,Xn−k+1. (6)

Since we do not want to restrict our considerations to a specific predictor g, many of thepredictors applied in the functional time series literature fit in our setup. We elaborate on someexamples:

1) Suppose that in (3) the operator g is given by g(Xn, . . . ,Xn−k+1) = ∑kj=1 Φj(Xn+1−j) with

the Φj’s being linear, bounded operators Φj : H → H. This is a case where a functionalautoregressive model of order k (FAR(k)) is used to predict Xn+h, see (Kokoszka andReimherr, 2013b) in which the issue of the selection of the order k also is discussed.Given some estimators Φj of Φj, the corresponding h-step-ahead predictor is given byg(Xn+h−1, . . . , Xn+h−k) = ∑k

j=1 Φj(Xn+h−j), where Xt ≡ Xt if t ∈ {n, n− 1, . . . , n− k + 1}.A special case is the popular FAR(1) model in which it is assumed that Xt is generated asXt = Φ(Xt−1) + vt with ‖Φ‖L < 1 and εt an i.i.d. sequence in H (Bosq, 2000; Bosq andBlanke, 2007). Here and in the sequel, ‖ · ‖L denotes the operator norm.

6

2) Suppose that g(h)(Xn, . . . ,Xn−k+1) = ∑dj=1 1>j ∑k

l=1 Dlξn+h−lvj, where 1j is the d-dimensionalvector with the jth component equal to 1 and 0 elsewhere, ξt is the d-dimensional vectorξt = (〈Xt, vj〉, j = 1, 2, . . . , d)>, vj are the orthonormal eigenfunctions correspondingto the d largest eigenvalues of the lag-0 covariance operator C0 = E(X0 ⊗ X0), and(D1, D2, . . . , Dk) are the matrices obtained by the orthogonal projection of ξt on the spacespanned by (ξt−1, ξt−2, . . . , ξt−k). A predictor of Xn+h is then obtained as

Xn+h =d

∑j=1

1>j ξn+hvj,

where ξn+h = ∑kl=1 Dlξn+h−l, ξt = ξt for t ∈ {n, n− 1, . . . , n− k + 1} and ξ1, . . . , ξn are

the estimated d-dimensional score vectors ξt = (〈Xt, vj〉, j = 1, 2, . . . , d)>. Here vj are theestimated orthonormal eigenfunctions corresponding to the d largest estimated eigen-values of C0 = n−1 ∑n

t=1(Xt −X n)⊗ (Xt −X n) and (Dl, l = 1, 2, . . . , k) are the estimatedd× d matrices obtained by least squares fitting of a kth order vector autoregression to thetime series ξt, t = 1, 2, . . . , n (Aue et al., 2015).

3) Similar to 2), the predictor of Xn+h can be obtained as Xn+h = ∑dj=1 1>j εn+h,jvj, where

εn+h,j is an h-step predictor of the jth component, obtained via a univariate time seriesforecasting method applied to estimated components (ε1,j, . . . , εn,j) for each j = 1, . . . , d,(Hyndman and Shang, 2009).

4) Let for notational simplicity k = 1 and g(h)(Xn) = E(Xn+h|Xn) be the conditional meanfunction of Xn+h given Xn. Consider the predictor Xn+h obtained using a nonparametricestimator of g(h), for instance, a functional version of the Nadaraya-Watson estimatorgiven by

g(h)(X ) =n−h

∑i=1

K [d(Xi,X )/δ]Xi+h

∑n−1j=1 K

[d(Xj,X )/δ

] ,

where K(·) is a kernel function, δ > 0 is a smoothing bandwidth, and d(·, ·) is a distancefunction onH. Xn+h = g(h)(Xn) is then the predictor of Xn+h, (see, e.g., Antoniadis et al.,2006).

2.2 The Time-Reversed Process of Scores

To introduce the proposed bootstrap procedure, it is important to first discuss some propertiesof the time-reversed process of scores associated with the functional process X. To this end,consider for m ∈ N, the m-dimensional vector process of scores, that is ξ = {ξt, t ∈ Z},where ξt = (ξ j,t = 〈Xt, vj〉, j = 1, 2, . . . , m)> and v1, v2, . . ., are the orthonormal eigenvectorscorresponding to the eigenvalues λ1 > λ2 > . . ., in descending order, of the lag-0 autocovarianceoperator C0. Denote by ξ = {ξt, t ∈ Z} the time-reversed version of ξ, that is, ξt = ξ−t forany t ∈ Z. We call ξ and ξ the forward and the backward score processes, respectively. Theautocovariance structure of both processes is closely related because for any h ∈ Z we have

Γξ(h) := E[ξ0(m)ξ>h (m)

]= E

[ξ0(m)ξ>−h(m)

]=: Γξ(−h). (7)

7

Thus, properties of the forward score process ξ, which arise from its second-order structure,carry over to the backward process ξ. To elaborate, note first that the (Hilbert-Schmidt) normsummability of the autocovariance operators Ch as well as the assumption that the eigenvaluesν1(ω), ν2(ω), . . . , νm(ω) of the spectral density operator Fω are bounded away from zero for allω ∈ [0, π], imply that, the m×m spectral density matrix fξ(ω) = (2π)−1 ∑h∈Z Γξ(h)e−ihω ofthe forward score process ξ, is continuous, bounded from above and bounded away from zerofrom below; (see Lemma 2.1 of Paparoditis, 2018). The same properties also hold true for them×m spectral density matrix fξ(ω) = (2π)−1 ∑h∈Z Γξ(h)e

−ihω of the backward score process

ξ. This follows immediately from the corresponding and aforementioned properties of fξ andtaking into account that by equation (7), fξ(ω) = f>ξ (ω) for all ω ∈ [0, π]. Now, the fact thatboth spectral density matrices fξ and fξ are bounded from above and from bellow, implies byLemma 3.5 of Cheng and Pourahmadi (1993, p. 116), that both processes–the process ξ and thetime-reversed process ξ–obey a so-called vector autoregressive representation. That is, infinitesequences of m×m matrices {Aj, j ∈ N} and {Bj, j ∈ N} as well as full rank m-dimensional,white noise processes {et, t ∈ Z} and {vt, t ∈ Z} exist such that the random vectors ξt and ξthave, respectively, the following autoregressive representations:

ξt =∞

∑j=1

Ajξt−j + et (8)

and

ξt =∞

∑j=1

Bjξt−j + ut. (9)

We refer to (8) and (9) as the forward and backward vector autoregressive representations ofξt and to {et} and to {ut} as the forward and the backward noise processes. We stress herethe fact that representations (8) and (9) should not be confused with that of a linear (infiniteorder) vector autoregressive process. This is due to the fact that the noise vector processes{et} and {ut} appearing in representations (8) and (9), respectively, are only uncorrelated andnot necessarily i.i.d. sequences of random vectors. Furthermore, the autoregressive matrices{Aj} and {Bj} appearing in the above representations also satisfy the summability conditions∑∞

j=1 ‖Aj‖F < ∞ and ∑∞j=1 ‖Bj‖F < ∞, while the corresponding power series

A(z) = I −∞

∑j=1

Ajzj and B(z) = I −∞

∑j=1

Bjzj

do not vanish for |z| ≤ 1; that is, A−1(z) and B−1(z) exist for all |z| ≤ 1 (see Cheng andPourahmadi, 1993; Meyer and Kreiss, 2015, for more details on such vector autoregressiverepresentations of weakly stationary processes). Using reversion in time and, specifically, theproperty that ξt = ξ−t, equation (9) leads to the expression

ξt =∞

∑j=1

Bjξt+j + ut, (10)

which also can be written as B(L−1)ξt = ut, with the shift operator L defined by Lkξt = ξt−k forany k ∈ Z. Expression (10) implies that the two white noise innovation processes {et, t ∈ Z}and {ut, t ∈ Z}, are related by

ut = B(L−1)ξt = B(L−1)A−1(L)et, t ∈ Z. (11)

8

Notice that (11) generalizes to the vector autoregressive case an analogue expression obtainedfor the univariate autoregressive case by Findley (1986) and Breidt et al. (1995). Further, andas relation (11) verifies, even if ξt in (8) is a linear process, that is even if {et} is an i.i.d.innovation process in Rm, the white noise innovation process {ut} appearing in the time-reversed process (10) is, in general, not an i.i.d. process.

3 Bootstrap Prediction Intervals

3.1 Bootstrap Procedure

The basic idea of the proposed bootstrap procedure is to generate a functional time seriesof pseudo-random elements X ∗1 ,X ∗2 , . . . ,X ∗n , and future values X ∗n+1,X ∗n+2, . . . ,X ∗n+h, whichappropriately imitate the dependence structure of the functional time series at hand, while atthe same time satisfy the condition

X ∗n−k+1 = Xn−k+1, X ∗n−k+2 = Xn−k+2, . . . , X ∗n = Xn. (12)

The above condition is important because, as we have seen, the conditional distribution ofEn+1(·) given Xn,Xn−1, . . . ,Xn−k+1 is the one in which we are interested. Toward this goaland motivated by the functional sieve bootstrap proposed by Paparoditis (2018), we use theKarhunen-Loeve representation and decompose the random element Xt in two parts:

Xt =∞

∑j=1

ξ j,tvj =m

∑j=1

ξ j,tvj︸︷︷︸Xt,m

+∞

∑j=m+1

ξ j,tvj︸︷︷︸Ut,m

. (13)

In (13), the element Xt,m is considered as the main driving part of Xt, while the “remainder”Ut,m is treated as a white noise component. Now, to generate the functional pseudo-timeseries X ∗1 ,X ∗2 , . . . ,X ∗n , we first bootstrap the m-dimensional time series of scores by using thebackward vector autoregressive representation given in (10). Using the backward representationallows for the generation of a pseudo-time series of scores ξ∗1 , ξ∗2 , . . . , ξ∗n, which satisfies thecondition ξ∗t = ξt for t ∈ {n − k + 1, n − k + 2, . . . , n}. This is important to ensure that thebootstrap-generated time series X ∗1 ,X ∗2 , . . . ,X ∗n fulfills requirement (12). The backwards-in-time-generated pseudo-time series of scores ξ∗1 , ξ∗2 , . . . , ξ∗n can then be transformed to pseudo-replicates of the main driving part Xt,m by using the equation X ∗t,m = ∑m

j=1 ξ∗j,tvj. Notice thatsince by construction ξ∗t = ξt for t = n, n − 1, . . . , n − k + 1, we have that X ∗t,m = Xt,m and,consequently, we set X ∗t = Xt for the same set of time indices. Adding to the generated X ∗t,mfor the remaining indices t = n− k, n− k − 1, . . . , 1, an appropriately resampled functionalnoise U∗t,m, leads to the functional pseudo replicates X ∗1 ,X ∗2 , . . . ,X ∗n−k. As a result, a functionalpseudo-time series X ∗1 ,X ∗2 , . . . ,X ∗n can be obtained that imitates the dependence structure ofX1,X2, . . . ,Xn and at the same time satisfies (12). Notice that implementation of the aboveideas requires estimation of the eigenvectors vj and of the scores ξ j,t = 〈Xt, vj〉, because thesequantities are not observed (see Section 3.2 for details).

Before proceeding with a precise description of the bootstrap algorithm, we illustrate itscapability using a data example. Figure 1 shows the monthly sea surface temperatures forthe last three years (analyzed in Section 6) together with 1,000 bootstrap replications obtainedwhen k = 1 and using the bootstrap algorithm described in Section 3.2. Notice the asymmetricfeatures of the time series paths generated and the fact that all 1,000 bootstrap samples displayedpass through the same final curve. That is, all generated bootstrap functional time series satisfycondition (12), which for the case k = 1 reduces to X ∗n = Xn.

9

1520

2530

Month

Sea

sur

face

tem

pera

ture

(C

elsi

us)

Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul

Original sampleBootstrap sample

2014 2015 2016 2017 2018

Figure 1: Sea surface temperature in El Nino region 1+2 displayed from January 2014 to December2018 (black line) together with 1000 different bootstrap samples (gray lines) when k = 1.

3.2 Bootstrap Algorithm

We now proceed with a detailed description of the bootstrap algorithm used to generatethe functional pseudo-time series X ∗1 ,X ∗2 , . . . ,X ∗n and X ∗n+1,X ∗n+2, . . . ,X ∗n+h. Steps 1 to 3 of thefollowing algorithm concern the generation of the future pseudo-elementsX ∗n+1,X ∗n+2, . . . ,X ∗n+h,while Steps 4 to 6 the generation of X ∗1 , . . . ,X ∗n .

Step 1: Center the observed functional time series by calculating Yt = Xt −X n, X n =n−1 ∑n

t=1Xt.

Step 2: Select integers m and p, where m is the truncation number in (13) and p the orderused to approximate the infinite-order vector autoregressive representations (8) and (9).Denote by ξ1, ξ2, . . . , ξn the time series of estimated, m-dimensional vector of scores; thatis,

ξt =(〈Yt, vj〉, j = 1, 2, . . . , m

)>, t = 1, 2, . . . , n

where vj, j = 1, 2, . . . , m, are the estimated (up to a sign) orthonormal eigenfunctionscorresponding to the m largest estimated eigenvalues of the lag-0 sample autocovarianceoperator C0 = n−1 ∑n

t=1 Yt ⊗Yt.

Step 3: Fit a VAR(p) process to the “forward” series of estimated scores; that is, ξt =

∑pj=1 Aj,pξt−j + et, t = p + 1, p + 2, . . . , n, with et being the estimated residuals. Generate

ξ∗n+h = ∑pl=1 Al,pξ

∗n+h−l + e

∗n+h, where we set ξ∗n+h−l = ξn+h−l if n + h− l ≤ n and e∗n+h

is i.i.d. resampled from the set of centered residuals {et − en, t = p + 1, p + 2, . . . , n},en = (n− p)−1 ∑n

t=p+1 et. Calculate

X ∗n+h = X n +m

∑j=1

1>j ξ∗n+hvj + U∗n+h,m,

10

where the U∗n+h,m are i.i.d. resampled from the set{

Ut,m −Un, t = 1, 2, . . . , n}

, Un =

n−1 ∑nt=1 Ut,m and Ut,m = Yt − ∑m

j=1 1>j ξtvj. Recall that 1j denotes the m-dimensionalvector with the jth component equal to 1 and 0 elsewhere.

If p ≤ k + h, move to Step 4. If p > k + h, generate for l = 1, 2, . . . , p− (k + h) additionalrandom vectors ξ∗n+h+l = ∑

pj=1 Aj,pξ

∗n+h+l−j + e

∗n+h+l, where e∗n+h+l are i.i.d. generated

in the same way as e∗n+h.

Step 4: Fit a VAR(p) process to the “backward” series of estimated scores; that is,

ξt =p

∑j=1

Bj,pξt+j + ut, t = 1, 2, . . . , n− p.

Step 5: Generate a pseudo-time series of the scores {ξ∗1 , ξ∗2 , . . . , ξ∗n} by setting ξ∗t = ξt fort = n, n− 1, . . . , n− k + 1, and by using for t = n− k, n− k− 1, . . . , 1, the backward vectorautoregression ξ∗t = ∑

pj=1 Bj,pξ

∗t+j + u

∗t . Here u∗1 ,u∗2 , . . . ,u∗n−k are obtained as (see (11)),

u∗t = Bp(L−1)A−1p (L)e∗t ,

with Ap(z) = I − ∑pj=1 Aj,pzj, Bp(z) = I − ∑

pj=1 Bj,pzj, z ∈ C, and where the e∗t are i.i.d.

resampled as in Step 3.

Step 6: Generate a pseudo-functional time series {X ∗1 ,X ∗2 , . . . ,X ∗n } as follows. For t =n, n− 1, . . . , n− k + 1 set

X ∗t = X n +m

∑j=1

1>j ξtvj + Ut,m ≡ Xt,

while for t = n− k, n− k− 1, . . . , 1, use the obtained backward pseudo-scores ξ∗1 , ξ∗2 , . . . , ξ∗n−kand calculate

X ∗t = X n +m

∑j=1

1>j ξ∗t vj + U∗t,m.

Here, the U∗t,m are i.i.d. pseudo-elements resampled as in Step 3.

Step 7: If model (3) is used to obtain the prediction Xn+h, then calculate the pseudo-predictor

X ∗n+h = X n + g∗(X ∗n+h−1 −X

∗n, X ∗n+h−2 −X

∗n, . . . , X ∗n+h−k −X

∗n

), (14)

where we set X ∗t −X∗n ≡ Xt −X n for t = n, n− 1, . . . , n− k + 1, X∗n = n−1 ∑n

t=1X ∗t andg∗ is the same estimator as g used in (4) but obtained using the generated pseudo-timeseries X ∗1 ,X ∗2 , . . . ,X ∗n . Alternatively, the bootstrap analogue of (5) can be calculated as

X ∗n+h = X n + g∗(h)(Xn −X n,Xn−1 −X n, . . . ,Xn−k+1 −X n

). (15)

Here g∗(h) is the same estimator as g(h) given in (5) but based on the pseudo-time seriesX ∗1 ,X ∗2 , . . . ,X ∗n .

11

Step 8: Use the distribution of E∗n+h = X ∗n+h − X ∗n+h to approximate the (conditional)distribution of En+h = Xn+h − Xn+h given Xn−k+1,Xn−k+2, . . . ,Xn.

Before investigating the theoretical properties of the above bootstrap procedure and evaluat-ing its practical implementation for the construction of prediction bands, some remarks are inorder.

Notice that X ∗n+h in Step 3 is generated in a model-free way, while the estimated model g∗ isonly used for obtaining the pseudo-predictor X ∗n+h. In this way the pseudo-error X ∗n+h − X ∗n+his able to imitate not only the innovation and estimation errors affecting the prediction errorXn+h − Xn+h but also the error arising from possible model misspecification. In Steps 4 and 5,the backward vector autoregressive representation is used to generate the pseudo time seriesof scores ξ∗t , t = 1, 2, . . . , n, where this pseudo time series satisfies the condition ξ∗t = ξt fort = n− k + 1, n− k + 2, . . . , n. This enables the generation of a functional pseudo-time seriesX ∗1 ,X ∗2 , . . . ,X ∗n in Step 6, satisfying requirement (12). A problem occurs when p > k + h, that is,when the autoregressive order used is larger than the number of future functional observationsneeded to run the backward vector autoregression. In this case, the time series of scores mustbe extended with the p− (k + h) “missing” future scores. This problem is solved in Step 3 bygenerating the additional pseudo-scores ξ∗n+h+l , for l = 1, 2, . . . , p− (k + h).

3.3 Bootstrap Validity

We establish consistency of the proposed bootstrap procedure in approximating the conditionalerror distribution (6) of interest. Regarding the underlying class of functional processes, weassume that X is a purely non-deterministic, mean square continuous and L4-M approximableprocess. The mean square continuity of X implies that its mean and covariance functions arecontinuous. For simplicity of notation, we assume that EXt = 0.

Because we condition on the last k observations, in what follows, all asymptotic results arederived under the assumption that we have observed a functional time series Xs,Xs+1, . . . ,Xnin which we view n as fixed and allow s→ −∞. This is also the meaning of the statement ”asn→ ∞ ” used in all derivations and asymptotic considerations in the sequel. Some conditionsregarding the underlying process X and the behavior of the bootstrap parameters m and pas well as the estimators g and g∗ used are first imposed. Notice that to achieve bootstrapconsistency, it is necessary to allow for the order p of the fitted autoregression and the dimensionm of the number of principal components used to increase to infinity with the sample size.This is required in order for the bootstrap to appropriately capture both the entire temporaldependence structure of the vector process of scores and the infinite-dimensional structure ofthe prediction error En+1.

Assumption 1:

(i) The autocovariance operator Ch of X satisfies ∑h∈Z |h|‖Ch‖HS < ∞.

(ii) For all ω ∈ [0, π], the spectral density operator Fω is of full rank, that is, kern(Fω) =0 and the eigenvalues λj of the full rank covariance operator C0 (in descending order)are denoted by λ1 > λ2 > λ3 > . . . > 0.

Assumption 2: The sequences p = p(n) and m = m(n) satisfy p → ∞ and m → ∞, asn→ ∞, such that

(i) m2/√

p→ 0,

12

(ii) p3√

nmλ2m

√∑m

j=1 α−2j = O(1), where α1 = λ1 − λ2 and αj = min{λj−1 − λj, λj − λj+1} for

j = 2, 3, . . . , m.

(iii) m4p2‖Ap,m − Ap,m‖F = OP(1), where Ap,m = (A1,p, A2,p, . . . , Ap,p), and Ap,m = (A1,p,A2,p, . . . , Ap,p). Here Aj,p, j = 1, 2, . . . , p are the same estimators as Aj,p, j = 1, 2, . . . , p, butbased on the time series of true scores ξ1, ξ2, . . . , ξn. Furthermore, Aj,p, j = 1, 2, . . . , p arethe coefficient matrices of the best (in the mean square sense) linear predictor of ξt basedon the finite past ξt−1, ξt−2, . . . , ξt−p.

Assumption 3: The estimators g and g∗ converge to the same limit g0; that is, ‖g− g0‖L =oP(1) and ‖g∗ − g0‖L = oP(1).

Several comments regarding the above assumptions are in order. Assumption 1(i) impliesthat the spectral density operatorFω is a continuously differentiable function of the frequency ω.Regarding Assumption 2, notice first that allowing for the number m of principal componentsused as well as the order p of the vector autoregression fitted to increase to infinity with thesample size, make the asymptotic analysis of the bootstrap quite involved. This is so becausethe bootstrap procedure is based on the time series of estimated instead of true scores, thedimension and the order of the fitted vector autoregression increase to infinity, and, at the sametime, the eigenvalue λm of the lag zero covariance operator C0, approaches zero as m increasesto infinity with n. As we will see, a slow increase of m and p with respect to n is required tobalance these different effects.

To elaborate, parts (i) and (ii) of Assumption 2 summarize the conditions imposed onthe rate of increase of m and p to establish bootstrap consistency. Before discussing theseconditions in more detail, observe that Assumption 2(iii) is a condition that the estimators ofthe autoregressive coefficient matrices have to fulfill, after ignoring the effects caused by the factthat estimated instead of true scores are used. Observe first that in contrast to the estimatorsAj,p, j = 1, 2, . . . , p, based on the vector of estimated scores ξt, the estimators Aj,p, j = 1, 2, . . . , p,stated in Assumption 2(iii) are based on the true (i.e., unobserved) vector of scores ξt, t =

1, 2, . . . , n. As an an example, consider the case where Aj,p, j = 1, 2, . . . , p, are the well-knownYule-Walker estimators of Aj,p, j = 1, 2, . . . , p. By the arguments given in Paparoditis (2018, p.3521), we have in this case that ‖Ap,m− Ap,m‖F = OP

(mp(√

mλ−1m + p)2/

√n). From this bound

it is easily seen by straightforward calculations, that, for these estimators, Assumption 2(iii) issatisfied if m and p increase to infinity as n → ∞ slowly enough such that m6p4 = O(λ2

m√

n)and p2/m2 = O(

√n).

To give an example of a functional process, of an estimator Aj, j = 1, 2, . . . , p and of the rateswith which p and m have to increase to infinity so that all parts of Assumption 2 are fulfilled,suppose again that Yule Walker estimators of Aj,p, j = 1, 2, . . . , p, are used in the bootstrapprocedure. Assume further that the eigenvalues of the lag zero covariance operator C0 satisfyλj − λj+1 ≥ C · j−ϑ, j = 1, 2, . . ., where C is a positive constant and ϑ > 1. That is, assume thatthe eigenvalues of the lag zero autocovariance operator C0 converge at a polynomial rate to zero.As it is shown in the supplementary material, Assumption 2(i), (ii) and (iii) are then satisfied ifp = O(nγ) and m = O(nδ) with γ > 0 and δ > 0, such that

γ ∈ (0, 1/8) and δ ∈ (0, δmax), where δmax = min{

1− 6γ

6ϑ,

1− 8γ

12 + 4ϑ, γ/4

}. (16)

More specifically and if for instance, ϑ = 2, then (1 − 8γ)/(12 + 4ϑ) < (1 − 6γ)/6ϑ and

13

Assumption 2 is satisfied if

γ ∈ (0, 1/8) and δ ∈(0, min

{1− 8γ

20, γ/4

} ).

Notice that 0 < γ < 1/8 ensures that (1− 8γ)/20 > 0.Concerning Assumption 3 and given that we do not focus on a specific predictor, this

assumption is necessarily a high-level type assumption. It requires that the estimator g∗, whichis based on the bootstrap pseudo-time series X ∗1 ,X ∗2 , . . . ,X ∗n , converges in probability and inoperator norm, to the same limit g0 as the estimator g based on the time series X1,X2, . . . ,Xn.Notice that Assumption 3 can only be verified in a case-by-case investigation for a specificoperator g at hand and for the particular estimators g and g∗ used to perform the prediction.

To elaborate, consider the following example. Suppose that a FAR(1) model Xt = Φ(Xt−1) +

εt is used in equations (5) and (14) to obtain the predictors Xn+1 and X ∗n+1, respectively. Acommon estimator of Φ based on an approximative solution of the Yule-Walker-type equationC1 = ΦC0, is given by

ΦM(·) = 1n− 1

n−1

∑t=1

M

∑i=1

M

∑j=1

1λj〈·, vj〉〈Xt, vj〉〈Xt+1, vi〉vi, (17)

where M is some integer referring to the number of functional principal components includedin the estimation of Φ (Bosq, 2000; Hormann and Kokoszka, 2012). Observe that (17) is a kerneloperator with kernel

ϕM(τ, σ) =1

n− 1

n−1

∑t=1

M

∑i=1

M

∑j=1

1λj〈Xt, vj〉〈Xt+1, vi〉vj(σ)vi(τ) (18)

and notice that g = ΦM in this example. Furthermore, for fixed M and by the consistency

properties of λj and vj, it is not difficult to show that ‖ΦM(·)− g0‖LP→ 0, where the limiting

operator g0 is given by

g0(·) ≡ C1,M

( M

∑j=1

1λj〈·, vj〉vj

). (19)

Here, C1,M(·) = E〈Xt,M, ·〉Xt+1,M is a finite rank approximation of the lag-1 autocovarianceoperator C1 (see equation (13) for the definition of Xt,M). Further, ∑M

j=1 λ−1j 〈·, vj〉vj is the

corresponding approximation of the inverse operator C−10 (·) = ∑∞

j=1 λ−1j 〈·, vj〉vj, which appears

when solving the aforementioned Yule-Walker-type equation (see Horvath and Kokoszka, 2012,Chapter 13, for details). Similarly, the same convergence also holds true for the bootstrap

estimator g∗0 , that is, ‖g∗0 − g0‖LP→ 0 with g0 given in (19). Now, if interest is focused on

consistently estimating the operator Φ, then, from an asymptotic perspective, the number Mof functional principal components used in approximating the inverse of the operator C0, hasto increase to infinity at an appropriate rate, as n goes to infinity. In this case, it can be shownunder certain regularity conditions that ‖ΦM − Φ‖L = oP(1) (see Bosq, 2000, Theorem 8.7).Here g0 = Φ, and this limit is different from the one given in (19). In such a case, and for theestimator g∗ to also converge to the same limit, additional arguments are needed since thetechnical derivations are then much more involved, compared to those used in the case of a fixedM (we refer to Paparoditis, 2018, for more details on this type of asymptotic considerations).

14

Before stating our first consistency result, we fix some additional notation. Recall thedefinition of Xn,k and denote by CE ,h and C∗E ,h the conditional covariance operators of therandom elements En+h and E∗n+h, respectively, given Xn,k. That is, CE ,h = E

(En+h ⊗ En+h|Xn,k

)and C∗E ,h = E∗

(E∗n+h ⊗ E∗n+h|Xn,k

), where E∗ denotes expectation with respect to the bootstrap

distribution. Recall that X ∗t = Xt for t ∈ {n, n− 1, . . . , n− k + 1}. Let further,

σ2n+h(τ) = cE ,h(τ, τ) and σ∗

2

n+h(τ) = c∗E ,h(τ, τ), τ ∈ [0, 1],

where cE ,h and c∗E ,h denote the kernels of the conditional covariance (integral) operators CE ,h andC∗E ,h, respectively, which exist since these operators are Hilbert-Schmidt. Denote by LXn,k(En+h)

the conditional distribution En+h∣∣Xn,k, and by LXn,k(E

∗n+h|X1,X2, . . . ,Xn) the conditional dis-

tribution E∗n+h|Xn,k, given the observed functional time series X1,X2, . . . ,Xn. The followingtheorem establishes consistency of the bootstrap procedure in estimating the conditional distri-bution of interest.

Theorem 3.1 Suppose that Assumptions 1, 2, and 3 are satisfied. Then,

d(LXn,k(En+h),LXn,k(E

∗n+h∣∣X1,X2, . . . ,Xn

))= oP(1), (20)

where d is any metric metricizing weak convergence onH.

The above result, together with the continuous mapping theorem, allows for the use of theconditional distribution of E∗n+h(τ) to construct pointwise prediction intervals for Xn+h(τ) orfor the use of the conditional distribution of supτ∈[0,1] |E∗n+h(τ)| to construct prediction bandsfor Xn+h. Notice that the latter prediction bands will have the same width for all values ofτ ∈ [0, 1] since they do not appropriately reflect the local variability of the prediction errorEn+h(τ). One way to take the (possible different) prediction uncertainty at every τ ∈ [0, 1] intoaccount, is to use the studentized conditional distribution of the prediction error, that is touse the process {En+h(τ)/σn+h(τ), τ ∈ [0, 1]} onH in order to construct the prediction bands.However, in this case, and additional to the weak convergence of E∗n+h to En+h onH, establishingbootstrap consistency requires the uniform (over τ ∈ [0, 1]) convergence of the conditionalvariance of the prediction error σ∗

2

n+h(τ) against σ2n+h(τ). This will allow for the proposed

bootstrap procedure to appropriately approximate the random behavior of the studentizedprocess {En+h(τ)/σn+h(τ), τ ∈ [0, 1]}. To achieve such a uniform consistency of bootstrapestimates, additional conditions compared to those stated in the previous Assumptions 2 and 3are needed. We begin with the following modification of Assumption 2.

Assumption 2′: The sequences m = m(n) and p = p(n) satisfy Assumptions 2 (i), (iii), and

(ii) p5 mn1/2λ5/2

m

√∑m

j=1 α−2j = O(1).

Our next assumption imposes additional conditions to those made in Assumption 3 andconcern the mean square consistency properties of the estimators g(h) and g∗(h) used to performthe prediction.

Assumption 3′: The estimators g(h)(x) and g∗(h)(x) satisfy for any given x ∈ Hk,

supτ∈[0,1] E|g(h)(x)(τ) − g0,h(x)(τ)|2 → 0 and supτ∈[0,1] E∗|g∗(h)(x)(τ) − g0,h(x)(τ)|2 → 0 inprobability.

15

The following proposition discusses the conditions that the initial estimators g and g∗ haveto fulfill so that Assumption 3

′is satisfied for the important case where the limiting operator

g0 is an integral operator. Recall that if g0 is an integral operator with kernel cg0 satisfying∫ 10

∫ 10 |cg0(τ, s)|2τds < ∞, then g0 is also is a Hilbert-Schmidt operator.

Proposition 3.1 Suppose that g0 is an integral operator with kernel cg : [0, 1]× [0, 1]→ R and let cgbe an estimator of cg and g the corresponding integral operator. If ‖g‖2

HS ≤ C for some constant C > 0and if

(i) E‖g− g0‖2HS → 0, and,

(ii) supτ∈[0,1] E∫ 1

0

(cg(τ, s)− cg(τ, s)

)2ds→ 0,

as n→ ∞, then Assumption 3′

is satisfied for any h ∈N.

We observe that apart from the basic requirement (i) on the mean square consistency ofthe estimator g with respect to the Hilbert-Schmidt norm, the additional property one needsin case of integral operators is the uniform, mean square consistency stated in part (ii) of theabove proposition. Bosq (2000, Theorem 8.7) established mean square consistency results wheng = ΦM with ΦM given in (17), in case g is an autoregressive, Hilbert-Schmidt operator of aFAR(1) process. Recall that in this case ‖g‖2

HS =∫ 1

0

∫ 10 |cg(τ, s)|2dτds < 1 is required in order

to ensure stationarity and causality of the FAR(1) process. Thus the requirement ‖g‖2HS ≤ C

stated in above proposition essentially means in this case, that the Hilbert-Schmidt norm of theestimator g used, should be bounded away from unity, uniformly in n.

We now establish the next theorem, which concerns the weak convergence of LXn,k(E∗n+h) as

well as the uniform convergence of the conditional variance function σ∗2

n+h(·) of the bootstrapprediction error.

Theorem 3.2 Suppose that Assumption 1, 2′

and 3′

are satisfied. Then, additional to assertion (20) ofTheorem 3.1, the following also holds true:

supτ∈[0,1]

∣∣∣σ∗2

n+h(τ)− σ2n+h(τ)

∣∣∣→ 0, in probability. (21)

Theorem 3.2 and Slutsky’s theorem theoretically justify the use of {E∗n+h(τ) /σ∗n+h(τ), τ ∈[0, 1]} to approximate the behavior of {En+h(τ)/σn+h(τ)), τ ∈ [0, 1]}. As the following corollaryshows, the bootstrap can then successfully be applied to construct a simultaneous predictionband for Xn+h that appropriately takes into account the local uncertainty of prediction.

Corollary 3.1 Suppose that the assumptions of Theorem 3.2 are satisfied. For τ ∈ [0, 1], let

Vn+h(τ) =Xn+h(τ)− Xn+h(τ)

σn+h(τ), and V∗n+h(τ) =

X ∗n+h(τ)− X ∗n+h(τ)

σ∗n+h(τ).

Then,supx∈R

∣∣∣P( supτ∈[0,1]

∣∣Vn+h(τ)∣∣ ≤ x

∣∣Xn,k

)− P∗

(sup

τ∈[0,1]

∣∣V∗n+h(τ)∣∣ ≤ x

∣∣Xn,k

)∣∣∣→ 0,

in probability, where P∗(A) denotes the probability of the event A given the functional time seriesX1,X2, . . . ,Xn.

16

4 Practical Construction of Prediction Intervals

As mentioned, the theoretical results of the previous section allow for the use of the quantilesof the distribution of E∗n+h(τ), or of V∗n+h(τ), to construct either pointwise prediction intervalsfor Xn+h(τ) for any τ ∈ [0, 1], or simultaneous prediction bands for {Xn+h(τ), τ ∈ [a, b]}for any 0 ≤ a < b ≤ 1. Notice that the conditional distributions of E∗n+h(τ) and V∗n+h(τ)can be evaluated by Monte Carlo, that is, by generating B replicates of E∗n+h and σ∗n+h, say,E∗n+h,1, E∗n+h,2, . . . , E∗n+h,B and σ∗n+h,1, σ∗n+h,2, . . . , σ∗n+h,B. Let M∗n,h = supτ∈[a,b]

∣∣V∗n+h(τ)∣∣, where

V∗n+h(τ) = E∗n+h(τ)/σ∗n+h(τ), and denote by Q∗h,1−α the 1− α quantile of the distribution ofM∗n,h. This distribution can consistently be estimated using the B replicates, V∗n+h,b(τ) =

E∗n+h,b(τ)/σ∗n+h,b(τ), b = 1, 2, . . . , B. The simultaneous (1− α)100% prediction band for Xn+h

over the desired interval [a, b], associated with the predictor Xn+h, is then given by{[Xn+h(τ)−Q∗h,1−α · σ∗n+h(τ), Xn+h(τ) + Q∗h,1−α · σ∗n+h(τ)

], τ ∈ [a, b]

}. (22)

Clearly, a pointwise prediction interval for and any τ ∈ [0, 1], or a prediction band for the entireinterval [0, 1], can be obtained as special cases of (22). By the theoretical results established inSection 3, this prediction band achieves (asymptotically) the desired coverage probability 1− α.

5 Simulations

5.1 Choice of Tuning Parameters

The theory developed in Section 3 formulates conditions on the rates at which the bootstraptuning parameters m and p have to increase to infinity with the sample size n such that bootstrapconsistency can be established. An optimal choice of these parameters in practice, which also isconsistent with the theoretical requirements stated in Section 3.3, is left as an open problem forfuture research. In this section, we will discuss some relatively simple and practical rules toselect m and p, which we found to work well in practice.

We first mention that different approaches have been proposed in the literature on how tochoose m and that these approaches also can be applied in our setting. We mention here, amongothers, the pseudo-versions of the Akaike information criterion and Bayesian informationcriterion considered in (Yao et al., 2005); the finite prediction error criterion considered in(Aue et al., 2015); the eigenvalue ratio tests (Ahn and Horenstein, 2013) and the generalizedvariance ratio criterion introduced in (Paparoditis, 2018). However, and in order to reduce thecomputational burden in our simulation experiments, we apply a simple and commonly usedrule to select the number m of functional principal components. This parameter is chosen hereusing the ratio of the variance explained by the m principal components to the total variance ofthe random element Xt. More specifically, m is selected as

mn,Q = argminm≥1

{∑m

j=1 λj

∑nj=1 λj

≥ Q

},

where λs denotes the sth estimated eigenvalue of the sample lag-0 covariance operator C0, andQ is a pre-determined value, with Q = 0.85 being a common choice (see, e.g., Hormann andKokoszka, 2012, p.41). Once the parameter m has been selected, the order p of the fitted VAR

17

model is chosen using a corrected Akaike information criterion (Hurvich and Tsai, 1993), that is,by minimizing

AICC(p) = n log∣∣∣Σe,p

∣∣∣+ n(nm + pm2)

n−m(p + 1)− 1,

over a range of values of p. Here Σe,p = n−1 ∑nt=p+1 et,pe>t,p and et,p are the residuals obtained

by fitting the VAR(p) model to the m-dimensional, vector time series of estimated scores,ξ1, ξ2, . . . , ξn; see also Step 3 of the bootstrap algorithm.

5.2 Simulation Study

We utilize Monte Carlo methods to investigate the finite sample performance of the proposedbootstrap procedure. In particular, the final goal of our simulation study is to evaluate theinterval forecast accuracy of the bootstrap prediction intervals under both regimes, that is, whenthe model used for prediction coincides with the model generating the data and when this isnot the case. To this end, and in the first part of our simulation experiment, we always use aFAR(1) model for prediction. At the same time, we consider a data generating process, whichallows for the investigation of the behavior of the proposed bootstrap procedure to constructprediction bands under both aforementioned regimes. The functional time series X1,X2, . . . ,Xnused stems from the process

Xt(τ) =∫ 1

0ψ(τ, s)Xt−1(s)ds + b · Xt−2(τ) + Bt(τ) + c · Bt−1(τ), t = 1, 2, . . . , n, (23)

where ψ(τ, s) = 0.34 exp12 (τ

2+s2), τ ∈ [0, 1], and Bt(τ) are Brownian motions with zero meanand variance 1/(L− 1) with L = n+ 1. Notice that ‖Ψ‖L ≈ 0.5, where Ψ is the integral operatorassociated with the kernel ψ. Three parameter combinations are considered:

Case I: b = c = 0; Case II: b = 0.4 and c = 0; Case III: b = 0.4 and c = 0.8.

Notice that for b = c = 0, the data are generated by a FAR(1) model, so in this case, the modelused for prediction coincides with the model generating the functional time series. In CaseII and for b = 0.4 and c = 0, the data generating process follows a FAR(2) model, which isstationary because ‖Ψ‖L + |b| < 1. This case imitates a regime of model miss-specification.A situation of an even ”heavier” model misspecification is simulated in Case III, where forb = 0.4 and c = 0.8, the data generating process is a stationary FARMA(2,1) process with a largemoving average component.

Four sample sizes are considered in the simulation study, n = 100, 200, 400 and 800. Usingthe first 80% of the data as the initial training sample, we compute a one-step-ahead predictioninterval. Then, we increase the training sample by one and compute the one-step-aheadprediction interval again. This procedure continues until the training sample reaches the samplesize. With 20% of the data as the testing sample, we compute the interval forecast accuracyof the one-step-ahead prediction based on the FAR(1) model used for prediction. We presentresults evaluating the performance of the bootstrap method for all three cases described above.

In the second part of our simulation experiment, we compare the finite sample performanceof the bootstrap for constructing simultaneous prediction bands with that of the procedureproposed as Algorithm 4 in Aue et al. (2015, Section 5.2). In this comparison, the full FARMA(2,1)model is used to generate the data, that is, model (23) with b = 0.4 and c = 0.8. At the sametime, the one-step-ahead predictor Xn+1 is obtained using the prediction method proposed bythe authors in the aforecited paper. See also case 2) in Section 2.1.

18

5.3 Evaluation Criteria of the Interval Forecast Accuracy

To measure the interval forecast accuracy, we consider the coverage probability difference (CPD)between the nominal coverage probability and empirical coverage probability and the intervalscore criterion of Gneiting and Raftery (2007). The pointwise and uniform empirical coverageprobabilities are defined as

Coveragepoinwise = 1− 1ntest × J

ntest

∑η=1

J

∑j=1

[1{Xη(τj) > X ub

η (τj)}+ 1

{Xη(τj) < X lb

η (τj)}]

,

Coverageuniform = 1− 1ntest

ntest

∑η=1

[1{Xη(τ) > X ubη (τ)}+ 1{Xη(τ) < X lb

η (τ)}],

where ntest denotes the number of curves in the forecasting period, J denotes the numberof discretized data points, X ub

η and X lbη denote the upper and lower bounds of the corre-

sponding prediction interval, and 1{·} the indicator function. The pointwise and uniformCPDs are defined as CPDpointwise =

∣∣∣Coveragepointwise −Nominal coverage∣∣∣ and CPDuniform =∣∣Coverageuniform −Nominal coverage

∣∣. Clearly, the smaller the CPD value, the better the per-formance of the forecasting method.

The mean interval score criterion introduced by Gneiting and Raftery (2007), is denoted bySα and combines both CPD and half-width of pointwise prediction interval. It is defined as

Sα =1

ntest × J

ntest

∑η=1

J

∑j=1

{ [X ub

η (τj)− X lbη (τj)

]+

2α

[Xη(τj)− X ub

η (τj)]1[Xη(τj) > X ub

η (τj)]

+2α

[X lb

η (τj)−Xη(τj)]1[Xη(τj) < X lb

η (τj)] }

,

where α denotes the level of coverage, customarily α = 0.2 corresponding to 80% predictioninterval and α = 0.05 corresponding to 95% prediction interval. The optimal interval score isachieved when Xη(τj) lies between X lb

η (τj) and X ubη (τj), with the distance between the upper

bound and lower bounds being minimal.

5.4 Simulation Results

As already mentioned, in the first part of our simulations, we use an estimated FAR(1) modelto perform the prediction. The corresponding FAR(1) predictor is obtained as Xn+1 = X n +

Φ(Xn − X n), where Φ is a regularized, Yule-Walker-type estimator; see also (17). Table 1and Table 2 present results based on 1, 000 replications (i.e., a pseudo-random seed for eachreplication) and B = 1, 000 bootstrap repetitions. For the case n = 800 and for computationalreasons, we only consider 300 replications. Table 1 presents results for Case I (b = c = 0) andCase II (b = 0.4, c = 0) while Table 2 for Case III (b = 0.4, c = 0.8).

From Table 1 and Table 2 some interesting observations can be made. First of all, the empiricalcoverage of the prediction intervals is good, even for the small sample sizes considered. Itimproves fast and considerably as the sample size increases, and they get quite close to thedesired nominal coverages. This is true for both the pointwise prediction intervals and thesimultaneous prediction bands considered and for both coverage levels used in the simulationstudy. Further, the Sα values are systematically larger for the cases b = 0.4, c = 0 and b =0.4, c = 0.8 than for the case b = c = 0. As discussed in the introduction, this expected result is

19

Table 1: Empirical performance of the bootstrap prediction intervals and bands using the FAR(1) modelto perform one-step-ahead predictions for functional time series stemming from model (23) withc = 0 and for Case I (b = 0) and Case II (b = 0.4).

Nominal n = 100 n = 200 n = 400 n = 800coverage Criterion b = 0 b = 0.4 b = 0 b = 0.4 b = 0 b = 0.4 b = 0 b = 0.4

80% Coveragepointwise 0.778 0.745 0.791 0.778 0.797 0.797 0.799 0.797CPDpointwise 0.0497 0.0739 0.0344 0.0413 0.0230 0.0238 0.0167 0.0180Coverageuniform 0.772 0.714 0.798 0.766 0.812 0.792 0.803 0.796CPDuniform 0.0841 0.1152 0.0551 0.0647 0.0390 0.0391 0.0271 0.0283Sα=0.2 2.5322 2.9329 2.5346 2.7687 2.4764 2.6647 2.4398 2.6141

95% Coveragepointwise 0.925 0.897 0.932 0.919 0.936 0.930 0.943 0.940CPDpointwise 0.0344 0.0580 0.0245 0.0344 0.0172 0.0220 0.0164 0.0202Coverageuniform 0.920 0.877 0.932 0.909 0.941 0.927 0.946 0.934CPDuniform 0.0516 0.0862 0.0338 0.0502 0.0230 0.0303 0.0185 0.0282Sα=0.05 3.4949 4.3379 3.5005 3.9347 3.3988 3.7328 3.3558 3.6646

attributable to the fact that the model misspecification errors occurring for b = 0.4 and c = 0.8,respectively, for b = 0.4 and c = 0, also cause an increase in the variability of the predictionerror distribution, leading to prediction intervals that are wider than those for b = 0 and c = 0.Consequently, for the ”heavier” case of model misspecification, Case III, that is for b = 0.4 andc = 0.8, the Sα values are larger than for the case b = 0.4 and c = 0.

We next present in Table 3 results for the second part of our simulations, which comparethe performance of the bootstrap method proposed in this paper with Algorithm 4 in Aue et al.(2015) and used to construct one-step-ahead prediction bands. Notice that the predictor Xn+1used in this part of the simulation study is the one proposed in Aue et al. (2015). As it is seenfrom Table 3, the bootstrap approach proposed outperforms the aforementioned algorithm inAue et al. (2015), in that the coverage rates are uniformly closer to the nominal levels and themean interval scores Sα, are smaller for all α and for both sample sizes considered.

6 Empirical Data Analysis

For the two real-life data sets analyzed in this section we consider (one and two step head) pre-diction using a FAR(1) model and a nonparametric forecasting method (NFR). The latter methodapplied for one-step-ahead prediction, uses a nonparametric estimation of the lag-1 conditionalmean function g(Xn) = E(Xn+1|Xn) (also see Section 2.1). Recall that g(Xn) is the best (in themean square sense) predictor of Xn+1 based on Xn, and that different data-driven smoothingtechniques exist for estimation of g. We refer to the functional Nadaraya-Watson estimator (see,e.g., Masry, 2005; Ferraty and Vieu, 2006), the functional local linear estimator, Berlinet et al.(2011), the functional k-nearest neighbor estimator, Kudraszow and Vieu (2013), and the distance-based local linear estimator Boj et al. (2010), to name a few. Throughout this section, we use forthe one-step-ahead prediction the Nadaraya-Watson estimator of g, which leads to the predictorXn+1 = g(Xn) with g given by g(x) = ∑n

t=2 K (d(x,Xt−1)/δ)Xt/

∑nt=2 K (d(x,Xt−1))/δ). In

the Nadaraya-Watson estimator, K(·) is the Gaussian kernel and δ is bandwidth, which in ourcalculations has been obtained using a generalized cross-validation procedure.

20

Table 2: Empirical performance of the bootstrap prediction intervals and bands using the FAR(1) modelto perform one-step-ahead predictions for functional time series stemming from model (23) withb = 0.4 and c = 0.8.

Nominalcoverage Criterion n = 100 n = 200 n = 400 n = 800

80% Coveragepointwise 0.741 0.782 0.803 0.820CPDpointwise 0.0888 0.0467 0.0275 0.0235Coverageuniform 0.735 0.793 0.827 0.853CPDuniform 0.1164 0.0652 0.0469 0.0542Sα=0.2 3.5907 3.0588 2.8808 2.7326


Table 3: Finite sample performance of the bootstrap prediction intervals and of the prediction intervalsof Aue et al. (2015) for time series stemming from model (23) with b = 0.4 and c = 0.8.

Nominal n = 100 n = 200coverage Criterion Aue et al.’s (2015) Bootstrap Aue et al.’s (2015) Bootstrap



In addition to the construction of prediction bands, we also demonstrate how the proposedbootstrap method can be used to select the prediction method that performs better according tosome user-specified criterion. In particular, since the future random element X ∗n+1 is generatedin a model-free way, the bootstrap prediction error X ∗n+1 −X ∗n+1 correctly imitates the behaviorof the prediction error Xn+1 − Xn+1 associated with the particular model/method used toperform the prediction. Thus, using some loss L(Xn+1,Xn+1) and based on the behavior ofthe corresponding bootstrap loss L(X ∗n+1,X ∗n+1), the proposed bootstrap procedure can also beapplied to select between different predictors, the one which performs better. In the following,we demonstrate such an application of the bootstrap in selecting between the FAR(1) and theNFR method using the behavior of the bootstrap estimator of the (conditional) mean square

21

error of prediction, i.e., of E∗[(X ∗n+1(τ)−X ∗n+1(τ))2|Xn], τ ∈ [0, 1].

6.1 Monthly Sea Surface Temperature Data Set

Consider the monthly sea surface temperatures from January 1950 to December 2018 available athttps://www.cpc.ncep.noaa.gov/data/indices/ersst5.nino.mth.81-10.ascii. These aver-aged sea surface temperatures were measured by moored buoys in the “Nino region”. Weconsider all four Nino regions: Nino 1+2 is defined by the coordinates 0− 10◦ South, 90− 80◦

West; Nino 3 is defined by the coordinates 5◦ North – 5◦ South, 150◦ – 90◦ West; Nino 4 isdefined by the coordinates 5◦ North – 5◦ South, 160◦ East – 150◦ West; Nino 3+4 is defined bythe coordinates 5◦ North – 5◦ South, 170− 120◦ West. For the sea surface temperatures in Nino1+2 region, univariate and functional time series plots are shown in Figure 2.

Month

Sea

sur

face

tem

pera

ture

1950 1960 1970 1980 1990 2000 2010 2020

2022

2426

28

(a) Univariate time series plot

2 4 6 8 10 12

2022

2426

28

Month

Sea

sur

face

tem

pera

ture

(b) Functional time series plot

Figure 2: Time series (left panel) and rainbow plots (right panel) of sea surface temperatures in Nino1+2 region from January 1982 to December 2017.

Applying the proposed bootstrap procedure and the two compared forecasting methods,we generate a set of B = 1, 000 bootstrap one-step-ahead prediction error curves X ∗n+1 − X ∗n+1.These are presented in the left panel of Figure 3 for the FAR(1) predictor and in the right panelof the same figure for the NFR predictor.

In Figure 4(a), we present the bootstrap estimates of the (conditional) mean square errorE[(Xn+1(τj)−Xn+1(τj))

2|Xn], j = 1, 2, . . . , J, of the two prediction methods. As shown in thisfigure, the mean square error produced by the NFR method is uniformly (across all pointsτ1, τ2, . . . , τJ) smaller than the corresponding mean square prediction error produced using theFAR(1) method. Thus, for this functional time series, using the NFR method to perform theone-step-ahead prediction seems preferable. The point prediction using this method as well asthe corresponding 80% and 95% simultaneous prediction bands are shown in Figure 4(b).

6.2 Intraday PM10 Data Set

We analyze the half-hourly measurements of the concentration of particulate matter with anaerodynamic diameter of less than 10um in ambient air taken in Graz, Austria, from October

22

https://www.cpc.ncep.noaa.gov/data/indices/ersst5.nino.mth.81-10.ascii

−4

−2

02

46

FAR(1)

Month

Sea

sur

face

tem

pera

ture

in C

elsi

us

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

−4

−2

02

46

NFR

Month

Sea

sur

face

tem

pera

ture

in C

elsi

us


Figure 3: Bootstrap-generated one-step-ahead prediction error curves for the sea surface temperaturedata using the FAR(1) model (left panel) and the NFR method (right panel).

●

●

●

●

●

●

●

●

● ●

●●

0.0

0.5

1.0

1.5

2.0

Month

Mea

n sq

uare

err

or o

f one

−st

ep−

ahea

d pr

edic

tion

erro

rs

Jan Mar May Jul Aug Oct Dec

●

●

●

●

●●

●

●

●●

●●

FAR(1)NFR

(a) Mean square error of one-step-ahead prediction errors

1820

2224

2628

NFR

Month

Sea

sur

face

tem

pera

ture

in C

elsi

us


Point forecast80% uniform PI95% uniform PI

(b) Pointwise and uniform prediction intervals (NFRmethod)

Figure 4: Bootstrap estimates of the mean square error of the one-step-ahead prediction for the FAR(1)model and the NFR method (left panel). Point forecast together with 80% and 95% simultane-ous prediction bands using the NFR method (right panel).

1, 2010, to March 31, 2011. We convert N = 8, 736 discrete univariate time series points inton = 182 daily curves. A univariate time series display of intraday pollution curves is given inFigure 5a, with the same data shown in Figure 5b as a time series of functions.

Using the bootstrap procedure, we construct one- and two-step-ahead prediction bands forthis functional time series. We first generate B = 1, 000 functional pseudo-time series, and weapply the FAR(1) and the NFR forecasting methods to obtain a set of one-step-ahead predictionerror curves. These are displayed in Figure 6. In the left panel of Figure 7, we show the bootstrapestimates of the (conditional) mean square prediction error E[(Xn+1(τj)−Xn+1(τj))

2|Xn], j =1, 2, . . . , J, obtained using the FAR(1) and NFR methods. As this figure shows, neither of the twomethods is uniformly (i.e., across all points τj ∈ [0, 1]) better, with the FAR(1) method having

23

05

1015

182 days

Squ

are

root

tran

sfor

mat

ion

of P

M10

Oct Nov Dec Jan Feb Mar

(a) A univariate time series display

0 5 10 15 20

05

1015

Half−hourly time interval

Squ

are

root

tran

sfor

mat

ion

of P

M10

(b) A functional time series display

Figure 5: Graphical displays of intraday measurements of the PM10 from October 1, 2010, to March 31,2011, in Graz, Austria.

a slight advantage. In particular, the bootstrap estimated conditional, root mean square error(RMSE) of the FAR(1) method is 1.451 compared with 2.054 of the NFR method.

0 5 10 15 20

−6

−4

−2

02

46

Squ

are

root

tran

sfor

mat

ion

of P

M10

at s

tatio

n G

raz−

Mitt

e

0 5 10 15 20

−6

−4

−2

02

46

Half−hour

Squ

are

root

tran

sfor

mat

ion

of P

M10

at s

tatio

n G

raz−

Mitt

e

Figure 6: One-step-ahead prediction error curves for the intraday PM10 data using the FAR(1) model(left panel) and the NFR method (right panel).

We apply the FAR(1) method to perform the prediction for h = 1 and h = 2. The correspond-ing bootstrap-based simultaneous prediction bands for h = 1, are displayed in the right panelof Figure 7.

In contrast to the relatively small sample size of n = 69 curves of the Monthly Sea SurfaceTemperature data analyzed in the previous example, the sample size of n = 182 curves of theintraday PM10 data considered in this section allows us to further evaluate the performanceof the FAR(1) prediction method. For this, we use the observed data from October 1, 2010,to January 30, 2011, as the initial training sample, and we produce one- and two-step-aheadforecasts by increasing the training sample by one curve each time. We iterate this procedure

24

●

●

●

●

●

●

●●●

●

●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

0 5 10 15 20

1.0

1.5

2.0

2.5

Half−hour

Mea

n sq

uare

err

or o

f one

−st

ep−

ahea

d pr

edic

tion

erro

rs

●

●

●

●●

●

●

●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

FAR(1)NFR

0 5 10 15 20

24

68

1012

Squ

are

root

tran

sfor

mat

ion

of P

M10

at s

tatio

n G

raz−

Mitt

e

Point forecast80% uniform PI95% uniform PI

Figure 7: Bootstrap estimates of the mean square error of the one-step-ahead prediction for the FAR(1)model and the NFR method (left panel). Point forecast together with 80% and 95% simultane-ous prediction bands using the FAR(1) method (right panel).

until the training sample contains all the observed data. In this way, we construct 60 one-stepand 59 two-step-ahead forecasts, enabling us to assess the interval forecast accuracy of theFAR(1) method. The results obtained for h = 1 and h = 2 are shown in Table 4. The FAR(1)prediction performs well for this functional time series, and the proposed bootstrap methodproduces for both forecasting steps, prediction intervals, the empirical coverages of which areclose to the desired nominal levels. Notice that, as it is expected, the uncertainty associatedwith h = 2 is larger than that for h = 1.

Table 4: Evaluation of the interval forecast accuracy at the forecast horizons h = 1 and h = 2 for thePM10 data set using the FAR(1) model for prediction and the sieve bootstrap procedure with1000 bootstrap replications.

Nominalcoverage Criterion h = 1 h = 2

80% CPDpointwise 0.789 0.777Sα=0.2 5.189 7.572CPDuniform 0.833 0.777

95% CPDpointwise 0.943 0.923Sα=0.05 7.528 10.941CPDuniform 0.917 0.901

7 Conclusions

We have presented a novel bootstrap method for the construction of prediction bands for afunctional time series. In a model-free way, our method generates future functional pseudo-random elements that allow for valid estimation of the conditional distribution of the predictionerror that a user-selected prediction method produces. The obtained bootstrap estimates of the

25

prediction error distribution consider the innovation and the estimation errors associated withprediction and the error arising from a different model being used for prediction than the onegenerating the functional time series observed. Theoretical results were presented to justifyusing the proposed bootstrap method in constructing prediction bands that also appropriatelyconsider the local variability of the conditional distribution of the prediction error. We havedemonstrated the good finite sample performance of the bootstrap method presented througha series of simulations. Two real-life data analyzed have demonstrated the capabilities andthe good finite sample behavior of the presented bootstrap method for the construction ofprediction bands.

Appendix: Auxiliary Results and Proofs

Recall that Xn+h = ∑mj=1 1>j ξn+hvj + Un+h,m, Xn+h = g(h)(Xn, . . . ,Xn−k+1), X ∗n+h =

∑mj=1 1>j ξ

∗n+hvj + U∗n+h,m and X ∗n+h = g∗(h)(Xn, . . . ,Xn−k+1). Define X+

n+h,m = ∑mj=1 1>j ξ

+n+hvj,

where ξ+n+h = ∑pj=1 Aj,pξn+h−j +e

+n+h, with e+n+h i.i.d. resampled from the set {et − en, t = p +

1, p + 2, . . . , n}, en = (n− p)−1 ∑nt=p+1 et, and et = ξt − ∑

pj=1 Aj,pξt−j, t = p + 1, p + 2, . . . , n,

are the residuals obtained from an autoregressive fit based on the time series of true scoresξ1, ξ2, . . . , ξn.

Lemma 7.1 Let Γm(0) = E(ξn+hξ>n+h), Γ+

m(0) = E(ξ+n+hξ+>

n+h) and Γ∗m(0) = E(ξ∗n+hξ∗>n+h). If

Assumption 1 and 2 are satisfied, then,

‖Γ+m(0)− Γm(0)‖F = OP

( m2√

p

).

If Assumption 1 and 2’ are satisfied, then

‖Γ∗m(0)− Γm(0)‖F = OP

( p5√m√nλ2

m

√√√√ m

∑j=1

α−2j

).

Lemma 7.2 Let g, g : Hk → H. For x = (x1, x2, . . . , xk) ∈ Hk and h ∈N, let

g(h)(x) = g(

g(h−1)(x), g(h−2)(x), . . . , (g(h−k)(x)),

where g(1)(x) = g(x) and g(s)(x) = x1−s if s ≤ 0. Define analogously g(h)(x). If ‖g− g0‖LP→ 0

then, ‖g(h)(x)− g0,(h)(x)‖2P→ 0 for any h ∈N,.

Proof of Theorem 3.1: Recall the notation Xn,k = (Xn−k+1,Xn−k+2, . . . ,Xn). Observe that

‖g(h)(Xn,k)− g∗(h)(Xn,k)‖2 ≤ ‖g(h)(Xn,k)− g0,(h)(Xn,k)‖2 + ‖g∗(h)(Xn,k)− g0,(h)(Xn,k)‖2

= oP(1),

where the last equality follows by Assumption 3, Lemma 7.2 and the fact that ‖Xn,k‖2 = OP(1).Hence En+h − E∗n+h = ∑∞

j=1 1>j ξn+hvj −(

∑mj=1 1>j ξ

∗n+hvj + U∗n+h,m

)+ oP(1) and by Slutsky’s

theorem, it suffices to show that

d( ∞

∑j=1

1>j ξn+hvj,m

∑j=1

1>j ξ∗n+hvj + U∗n+h,m

)= oP(1). (24)

Assertion (24) follows if we show that,

26

(i) d(

∑mj=1 ξ+j,n+hvj, ∑∞

j=1 ξ j,n+hvj)→ 0,

(ii) ‖∑mj=1 1>j ξ

∗n+hvj −∑m

j=1 1>j ξ+n+hvj‖2

P→ 0, and,

(iii) U∗n+h,mp→ 0.

To establish (i) consider the sequence {Y+n (h)} inH, where Y+

n (h) = ∑∞j=1 ξ j,n+hvj and ξ j,n+h =

ξ+j,n+h for j = 1, 2, . . . , m and ξ j,n+h = 0 for j ≥ m + 1. For k ∈N, let Y+n,k(h) = ∑k

j=1 ξ j,n+hvj. ByTheorem 3.2 of Billingsley (1999), assertion (i) follows if we show that

(a) Y+n,k(h)

d→ Yk(h) = ∑kj=1 ξ j,n+hvj for any k ∈N, as n→ ∞.

(b) Yk(h)d→ Y(h) = ∑∞

j=1 ξ j,n+hvj, as k→ ∞.

(c) For any ε > 0, limk→∞ lim supn∈N P(‖Y+n,m(h)−Y+

n,k(h)‖2 > ε) = 0.

Consider (a). Assume that n is large enough such that m > k. Since k is fixed, (ξ1,n+h, ξ2,n+h,. . . , ξk,n+h)

> = (ξ+1,n+h, ξ+2,n+h, . . . , ξ+k,n+h)> = ξ+n+h(k) where the latter vector is obtained as

ξ+n+h(k) = Ik,mξ+n+h with Ik,m the k × m matrix with elements (j, j), j = 1, 2, . . . , k, equal to

one and zero else. Assume that h = 1. Then, the vector ξ+n+1 is generated via the regressiontype autoregression ξ+n+1 = ∑

pj=1 Aj,pξn+1−j + e+n+1 and e+n+1 i.i.d. innovations. Therefore, and

since k is fixed, we have by standard arguments (see Lemma 3.1 of Meyer and Kreiss (2015)),

that ξ+n+1(k)d→ ξn+1(k) = (ξ1,n+1, ξ2,n+1, . . . , ξk,n+1)

>. Suppose that the assertion is true forsome h ∈ N. For h + 1 it follows by similar arguments and using the recursion ξ+n+h+1 =

∑pj=1 Aj,pξ

+n+h+1−j + e+n+h+1 that ξ+n+h+1(k)

d→ ξn+h+1(k) = (ξ1,n+h+1, ξ2,n+h+1, . . . , ξk,n+h+1)>.

By the continuous mapping theorem we then conclude that Y+n,k(h)

d→ ∑kj=1 ξ j,n+hvj = Yn,k(h).

Consider (b). Notice that, E‖Y+k (h) − Y(h)‖2

2 = E‖∑∞j=k+1 ξ j,n+hvj‖2

2 = ∑∞j=k+1 λj → 0, as

k→ ∞, which by Markov’s inequality and Slusky’s theorem implies that Yk(h)d→ ∑∞

j=1 ξ j,n+hvjas k→ ∞. Consider (c). We have

E‖Yn,m(h)−Y+n,k(h)‖

22 = E‖

m

∑j=k+1

ξ+j,n+hvj‖22

=m

∑j=k+1

1>j Γ+m(0)1j =

m

∑j=k+1

λj +m

∑j=k+1

1>j(Γ+

m(0)− Γm(0))1j.

Now since∣∣∑m

j=k+1 1>j(Γ+

m(0)− Γm(0))1j∣∣ = OP

(√m‖Γ+

m(0)− Γm(0)‖F), we get by Lemma 7.1,

Assumption 2 and Markov’s inequality that

lim supn∈N

P(‖Y+n,m(h)−Y+

n,k(h)‖2 > ε) ≤ 1ε2

{ ∞

∑j=k+1

λj + OP

(√m‖Γ+

m(0)− Γm(0)‖F)}

,

which converges to zero as k→ ∞.The proof of assertion (ii) and (iii) are given in the supplementary material.

27

Proof of Theorem 3.2: We only give the proof of assertion (20) since the weak convergenceof the conditional distribution of E∗n+h

∣∣Xn,k to the corresponding conditional distribution ofEn+h

∣∣Xn,k, has been given under weaker assumptions in the proof of Theorem 3.1. We have∣∣σ∗2

n+h(τ)− σ2n+h(τ)

∣∣ ≤ ∣∣E∗(X ∗n+h(τ)2 − E(Xn+h(τ)

2∣∣+ 2∣∣EXn+h(τ)

(g(h)(Xn,k)(τ)− g0,(h)(Xn,k)(τ)

)∣∣+ 2∣∣E∗X∗n+h(τ)

(g∗(h)(Xn,k)(τ)− g0,(h)(Xn,k)(τ)

)∣∣+∣∣E∗(g∗(h)(Xn,k)(τ))

2 − g20,(h)(Xn,k)(τ)

∣∣+ ∣∣E(g(h)(Xn,k)(τ))2 − g2

0,(h)(Xn,k)(τ)∣∣ (25)

Notice first that supτ∈[0,1] E∗(X∗n+h(τ))2 = OP(1) by equation (26) below and the fact that

supτ∈[0,1] E(Xn+h(τ))2 = supτ∈[0,1] c(τ, τ) < ∞, by the continuity of the kernel c. Using Caushy-

Schwarz’s inequality, and Assumption 3′

we get,

supτ∈[0,1]

∣∣EXn+h(τ)(

g(h)(Xn,k)(τ)− g0,(h)(Xn,k)(τ))∣∣

≤ supτ∈[0,1]

E(Xn+h(τ))2√

supτ∈[0,1]

E(

g(h)(Xn,k)(τ)− g0,(h)(Xn,k)(τ))2 → 0,

supτ∈[0,1]

∣∣E∗X∗n+h(τ)(

g∗(h)(Xn,k)(τ)− g0,(h)(Xn,k)(τ))∣∣

≤ supτ∈[0,1]

E∗(X∗n+h(τ))2√

supτ∈[0,1]

E∗(

g∗(h)(Xn,k)(τ)− g0,(h)(Xn,k)(τ)

)2 → 0.

Using a2 − b2 = (a− b)(a + b), Cauchy-Schwarz’s inequality and Assumption 3′

again, weget for the last two terms on the right hand side of the bound (25), that they also convergesuniformly to zero, in probability. To establish assertion (20) it remains to show that

supτ∈[0,1]

∣∣E∗(X ∗n+h(τ)2 − E(Xn+h(τ))

2∣∣→ 0, (26)

in probability. Notice first that due to the independence of ∑mj=1 1>ξ∗n+hvj and U∗n+h,m we

have E∗(X ∗n+h(τ))2 = E∗(∑m

j=1 1>ξ∗n+hvj(τ))2 + E∗(U∗n+h,m(τ))

2. Furthermore, using c(τ, τ) =

∑∞j=1 λjv2

j (τ), where the convergence |c(τ, τ)−∑kj=1 λjv2

j (τ)| → 0, as k → ∞, is uniformly inτ ∈ [0, 1], see Theorem 7.3.5 of Hsing and Eubank (2015), we get that supτ∈[0,1 E(Un+h,m(τ))

2 =

supτ∈[0,1] ∑∞j=m+1 λjv2

j (τ) → 0. Therefore, and because E(Xn+h,m(τ)Un+h,m(τ)) = 0 for allτ ∈ [0, 1], to establish (26), it suffices by Cauchy-Schwarz inequality and the inequalitysupτ∈[0,1]

√f (τ) ≤

√supτ∈[0,1] f (τ), where f is a non negative function on [0, 1], to show

that, in probability,

(I) supτ∈[0,1]

∣∣E(∑mj=1 1>j

(ξ∗n+hvj(τ)

)2 − E(

∑mj=1 1>j ξn+hvj(τ)

)2∣∣→ 0, and

(II) supτ∈[0,1] E∗(U∗n+h,m(τ)

)2 → 0.

28

Consider (I). We have

∣∣E( m

∑j=1

1>j(ξ∗n+hvj(τ)

)2−E( m

∑j=1

1>j ξn+hvj(τ))2∣∣ ≤ ∣∣ m

∑j1,j2=1

1>j1 (Γ∗m(0)− Γm(0))1j2vj1(τ)vj2(τ)

∣∣+∣∣ m

∑j1,j2=1

1>j1 Γ∗m(0)1j2(vj1(τ)vj2(τ)− vj1(τ)vj2(τ)

)∣∣≤ ‖Γ∗m(0)− Γm(0)‖F

( m

∑j=1|vj(τ)|

)+ ‖Γm(0)‖F

m

∑j=1|vj(τ)− vj(τ)|

( m

∑j=1|vj(τ)|+

m

∑j=1|vj(τ)|

).

To evaluate the above terms notice that ‖Γm(0)‖F = O(1) and that by Lemma 7.1, ‖Γ∗m(0)‖F =OP(1), where both bounds are uniform in m. Furthermore, using c(τ, τ) = ∑∞

j=1 λjv2j (τ) we

get by the continuity of the kernel c(·, ·) on the compact support [0, 1] × [0, 1], the bound∑m

j=1 λjv2j (τ) ≤ c(τ, τ) ⇒ supτ∈[0,1] ∑m

j=1 v2j (τ) ≤ λ−1

m C, where C := supτ∈[0,1] c(τ, τ) <

∞. Moreover, arguing as in Kokoszka and Reimherr (2013a), which showed that (√

n(vj −vj), j = 1, 2, . . . , m) converges weakly onHm, we get that {m−1 ∑m

j=1(√

n(vj(τ)− vj(τ)))2, τ ∈

[0, 1]} converges weakly on H, which by the continuous mapping theorem implies thatsupτ∈[0,1] m−1 ∑m

j=1 n(vj(τ)− vj(τ))2 = OP(1). Hence

m

∑j=1|vj(τ)| ≤

√m

√√√√ m

∑j=1

v2j (τ) +

√m

√√√√ m

∑j=1

(vj(τ)− vj(τ))2 = O(√ m

λm+

m√n

),

andm

∑j=1|vj(τ)− vj(τ)|

( m

∑j=1|vj(τ)|+

m

∑j=1|vj(τ)|

)= OP

( m3/2

nλ1/2m

+m2

n3/2

).

Therefore,

supτ∈[0,1]

∣∣E( m

∑j=1

1>j(ξ∗n+hvj(τ)

)2 − E( m

∑j=1

1>j ξn+hvj(τ))2∣∣ = OP

(√ mλm‖Γ∗m(0)− Γm(0)‖F

)+ oP(1),

which vanishes because of Lemma 7.1 and Assumption 2′.

Consider (II). Since E∗(U∗n+h,m(τ))2 ≤ 2n−1 ∑n

t=1(U2

t,n(τ))2

+ 2(Un(τ)

)2, it suffices to show

that supτ∈[0,1] n−1 ∑nt=1(Ut,n(τ)

)2 → 0. Toward this we use the bound

1n

n

∑t=1

(Ut,n(τ)

)2 ≤ 4n

n

∑t=1‖ξt − ξt‖2

2

m

∑j=1

v2j (τ) +

4n

n

∑t=1‖ξt‖2

2

m

∑j=1

(vj(τ)− vj(τ))2

+2n

n

∑t=1

( ∞

∑j=m+1

ξ j,tvj(τ))2. (27)

Since

supτ∈[0,1]

1n

n

∑t=1‖ξt − ξt‖2

2

m

∑j=1

v2j (τ) ≤ C

1λm

1n

n

∑t=1‖ξt − ξt‖2

2 =1

λmOP

( 1n

m

∑j=1

1α2

j

)29

and

supτ∈[0,1]

1n

n

∑t=1‖ξt‖2

2

m

∑j=1

(vj(τ)− vj(τ))2 ≤ OP

(mn

) 1n

n

∑t=1‖ξt‖2

2 = OP

(mn

),

the first two terms on the right hand side of (27) converge to zero. For the third term we getafter evaluating the squared term and substituting ξ j,t = 〈Xt, vj〉 the bound,

1n

n

∑t=1

(Xt(τ)−m

∑j=1

ξ j,tvj(τ))2 ≤

∣∣ 1n

n

∑t=1X 2

t (τ)− EX 2t (τ)

∣∣+∣∣ m

∑j1=1

m

∑j2=1〈(C0 − C0)(vj1), vj2〉vj1(τ)vj2(τ)

∣∣+ 2∣∣ m

∑j=1

∫ 1

0(c(τ, s)− c(τ, s))vj(s)dsvj(τ)

∣∣+ ∣∣EX 2t (τ)−

m

∑j=1

λjv2j (τ)

∣∣. (28)

Now, supτ∈[0,1] |n−1 ∑nt=1(X 2

t (τ)− EX 2t (τ))| = OP(n−1/2) → 0 by the continuous mapping

theorem and since {n−1/2 ∑nt=1(X 2

t (τ)− EX 2t (τ)), τ ∈ [0, 1]} converges weakly onH. Further-

more, for the last term of (28) we have, supτ∈[0,1]

∣∣EX 2t (τ)− ∑m

j=1 λjv2j (τ)

∣∣ → 0, by Theorem7.3.5 of Hsing and Eubank (2015). Also

supτ∈[0,1]

∣∣ m

∑j1=1

m

∑j2=1〈(C0 − C0)(vj1), vj2〉vj1(τ)vj2(τ)

∣∣ ≤ ‖C0 − C0‖HS supτ∈[0,1]

( m

∑j=1|vj(τ)|

)2

≤ m‖C0 − C0‖HS supτ∈[0,1]

m

∑j=1

v2j (τ)

≤ Cmλm‖C0 − C0‖HS = OP

( m√n λm

),

which converges to zero. Finally, using

∣∣ m

∑j=1

∫ 1


∣∣2 ≤ ∫ 1

0(c(τ, s)− c(τ, s))2ds

( m

∑j=1|vj(τ)|

)2

≤∫ 1

0(c(τ, s)− c(τ, s))2ds m

m

∑j=1

v2j (τ),

we get

supτ∈[0,1]

∣∣ m

∑j=1

∫ 1


∣∣ ≤ C√

mλm

√√√√ supτ∈[0,1]

∫ 1

0(c(τ, s)− c(τ, s))2ds

= OP

(√ mλm n

)→ 0.

30

Supplement: Bootstrap Prediction Bands for Functional Time Series

This supplementary material contains the derivation of Condition (16), the proofs of Lemma7.1, Lemma 7.2 and of Proposition 3.1 as well as the proof of assertion (ii) and (iii) appearing inthe proof of Theorem 3.1

Derivation of Condition in (16): Note that the assumption λj − λj+1 ≥ Cj−ϑ for j = 1, 2, . . .,implies that 1/λj ≤ C−1 jϑ. For p = O(nγ) and m = O(nδ), with γ > 0 and δ > 0, Assumption2(i) is satisfied if δ < γ/4. Regarding Assumption 2(ii) verify that√√√√ m

∑j=1

1α2

j≤ 1

C1√

2ϑ + 1(m + 1)ϑ+1/2 = O(nδ(ϑ+1/2)),

From this we get that

p3√

nmλ2m

√√√√ m

∑j=1

1α2

j= O

(n3γ−1/2+3ϑδ

).

i.e., Assumption 2(iii) is satisfied if δ ≤ (1− 6γ)/(6ϑ) and γ < 1/6. For Assumption 2(iii) andin the case of Yule Walker estimators, we have

m6p4

λ2m√

n= O(n6δ+4γ+2ϑδ−1/2),

which is fulfilled for δ ≤ (1− 8γ)/(12 + 4ϑ) and γ < 1/8 . Note that for γ < 1/8 and δ > 0,the condition p2/m2 = O(

√n) also holds true. Putting the derived requirements on γ and δ

together, leads to condition in (15). �

Proof of Lemma 7.1: Let Ψj,p, Ψj,p and Ψj,p, j = 1, 2, . . ., be the coefficient matrices in the powerseries expansions of the inverse matrix polynomial (Im −∑

pj=1 Aj,pzj)−1, (Im −∑

pj=1 Aj,pzj)−1

and (Im −∑pj=1 Aj,pzj)−1, respectively, |z| ≤ 1, where Im is the m×m unit matrix. Set Ψ0,p =

Ψ0,p = Ψ0,p = Im and let Σe = E(et,pe>t,p), Σe = E(et,pe

>t,p) and Σe = E(et,pe

>t,p). Since

Γm(0) =∞

∑j=0

Ψj,pΣeΨ>j,p, Γ+m(0) =

∞

∑j=0

Ψj,pΣeΨ>j,p and Γ∗m(0) =∞

∑j=0

Ψj,pΣeΨ>j,p,

the assertion of the lemma follows using the same arguments as in the proof of Lemma 6.5 ofPaparoditis (2018) and the bounds

∞

∑j=1‖Ψj,p −Ψj,p‖F = oP

(m3/2/

√p)

and∞

∑j=1‖Ψj,p − Ψj,p‖F = OP

( p5√m√nλ2

m

√√√√ m

∑j=1

α−2j

),

obtained in the aforementioned paper.

Proof of Lemma 7.2: We use the notation yn = (g(L)(x), . . . , g(L−k+1)(x)) andy = (g0,(L)(x), . . . , g0,(L−k+1)(x)). For h = 1 we have

‖g(x)− g0(x)‖2 ≤ ‖g− g0‖L‖x‖2 = oP(1)O(1) = oP(1),

31

since ‖g− g0‖LP→ 0. Suppose that ‖g(L)(x)− g0,(L)(x)‖2

P→ 0 for some L ∈N. Then,

‖g(L+1)(x)− g0,(L+1)(x)‖2 ≤ ‖g(yn)− g0(yn)‖2

+ ‖g0(yn)− g0(y)‖2

≤ ‖g− g0‖L‖yn‖2 + ‖g0‖Lk−1

∑l=0‖g(L−l)(x)− g0,(L−l)(x)‖2

= oP(1)OP(1) + O(1)oP(1) = oP(1).

Proof of Proposition 3.1: We first show that under the conditions of the proposition, and forevery h ∈N the following is true,

supτ∈[0,1]

E∫ 1

0

(cg,h(τ, s)− cg,h(τ, s)

)2ds→ 0, (29)

where cg,h and cg,h are the kernels associated with the operators g(h)and g(h) respectively. These

kernels satisfy cg,h(τ, s) =∫ 1

0 cg,h−1(τ, u)cg,1(u, s)du with cg,1(τ, s) = cg(τ, s) and cg,h definedsimilarly. Notice that for h = 1, assertion (29) is true by assumption. Suppose that the assertionis true for some h ∈N. Using (a + b)2 ≤ 2a2 + 2b2 as well as Cauchy-Schwarz’s inequality, weget ∫ 1

0

(cg,h+1(τ, s)− cg,h+1(τ, s)

)2ds ≤ 2∫ 1

0

(cg,h(τ, u)− cg,h(τ, u)

)2du∫ 1

0

∫ 1

0c2

g(u, s)duds

+ 2∫ 1

0c2

g,h(τ, u)du∫ 1

0

∫ 1

0

(cg(u, s)− cg(u, s)

)2duds.

From this we have

supτ∈[0,1]

E∫ 1

0

(cg,h+1(τ, s)− cg,h+1(τ, s)

)2ds ≤ 2 supτ∈[0,1]

E∫ 1

0


)2du‖g‖2HS

+ 2 supτ∈[0,1]

∫ 1

0c2

g,h(τ, u)duE‖g− g‖2HS

≤ C{

supτ∈[0,1]

E∫ 1

0


)2du + E‖g− g‖2HS}→ 0.

Using (29) we prove by induction that Assumption 3′

is satisfied. For h = 1 we have

supτ∈[0,1]

E∣∣g(x)(τ)− g(x)(τ)

∣∣2 ≤ ‖x‖22 sup

τ∈[0,1]E∫ 1

0

(cg(τ, s)− cg(τ, s)

)2ds→ 0.

Suppose that the assertion is true for some h ∈N. Using Cauchy-Schwarz’s inequality and the

32

bound ‖g(x)‖ ≤ ‖g‖HS‖x‖, we get for h + 1,

supτ∈[0,1]

E∣∣g(h+1)(x)(τ)− g(h+1)(x)(τ)

∣∣2 ≤ 2 supτ∈[0,1]

E∣∣ ∫ 1

0


)g(x)(s)ds

∣∣2+ 2 sup

τ∈[0,1]E∣∣ ∫ 1

0cg,h(τ, s)

(g(x)− g(x)

)(s)ds

∣∣2≤ 2 sup

τ∈[0,1]E∫ 1

0


)2ds∫ 1

0g2(x)(s)ds

+ 2 supτ∈[0,1]

∫ 1

0c2

g,h(τ, s)E∫ 1

0

(g(x)− g(x)

)2(s)ds

≤ 2‖x‖22 sup

τ∈[0,1]E∫ 1

0


)2ds‖g‖2HS

+ 2‖x‖22 sup

τ∈[0,1]

∫ 1

0c2

g,h(τ, s)E‖g− g‖2HS → 0.

Proof of Theorem 3.1: Consider assertion (ii). Since ‖vj‖2 = 1, we get the bound

‖m

∑j=1

1>j(ξ+n+hvj − ξ∗n+hvj

)‖2 ≤ ‖

m

∑j=1

1>j(ξ+n+h − ξ

∗n+h)vj‖2

+ ‖m

∑j=1

1>j ξ+n+h(vj − vj

)‖2

≤√

m‖ξ∗n+h − ξ+n+h‖2 + ‖ξ+n+h‖2

m

∑j=1‖vj − vj‖2

=√

m‖ξ∗n+h − ξ+n+h‖2 + OP

( m√n

√√√√ m

∑j=1

α−2j

),

where the last equality follows using ‖ξ+n+h‖22 = OP(m) and Lemma 3.2 of Hormann and

Kokoszka (2010). To evaluate the first term on the right hand side of the last displayed inequality,we use the bound

‖ξ∗n+h − ξ+n+h‖2 ≤ ‖p

∑j=1

(Aj,p − Aj,p

)ξ∗n+h−j‖2 + ‖

p

∑j=1

Aj,p(ξ+n+h−j − ξ

∗n+h−j

)‖2 + ‖e∗n+h − e+n+h‖2

=3

∑j=1

Tj,n, (30)

with an obvious notation for Tj,n, j = 1, 2, 3. Observe first that E‖e∗n+h − e+n+h‖

22 → 0 in

probability, as in the proof of Lemma 6.7 in Paparoditis (2018), that is T3,nP→ 0. For the first

term of (30) we have

T1,n = ‖p

∑j=1

(Aj,p − Aj,p

)ξ∗n+h−j‖2 ≤

p

∑j=1‖Aj,p − Aj,p‖F‖ξ∗n+h−j‖2

= OP(√

mp‖Ap,m − Ap,m‖F)= OP

(m3/2p5/2

λ2m

√√√√ 1n

m

∑j=1

α−2j

)→ 0,

33

because ‖Ap,m − Ap,m‖F = OP((p√

mλ−1m + p2)2

√n−1 ∑m

j=1 α−2j), see Paparoditis (2018, p. 5 of

the Supplementary Material), and the last equality follows by Assumption 2. To show that

T2,nP→ 0, we first show that, for any h ∈N,

p

∑j=1‖ξ+n+h−j − ξ

∗n+h−j‖2

2P→ 0, (31)

For h = 1 we havep

∑j=1‖ξ+n+h−j − ξ

∗n+h−j‖2

2 =p

∑j=1‖ξn+h−j − ξn+h−j‖2

2

≤p

∑j=1‖Xn+h−j‖2

2

m

∑j=1‖vj − vj‖2

2

= OP(

pm

∑j=1‖vj − vj‖2

2)= OP

( pn

m

∑j=1

α−2j)→ 0.

Assume that the assertion is true for some h ∈N. We then have for h + 1 thatp

∑j=1‖ξ+n+h+1−j − ξ

∗n+h+1−j‖2

2 = ‖ξ+n+h − ξ∗n+h‖2

2 + oP(1)

≤ 4‖p

∑j=1

(Aj,p − Aj,p)ξ∗n+h−j‖2

2 + 4‖p

∑j=1


∗n+h−j)‖2

2

+ 2‖e+h+h − e∗n+h‖22 + oP(1),

which converges to zero in probability by the same arguments as those used in the proof thatT1,n and T3,n converge to zero in probability and because

‖p

∑j=1


∗n+h−j)‖2

2 ≤p

∑j=1‖Aj,p‖2

F

p

∑j=1‖ξ+n+h−j − ξ

∗n+h−j‖2

2 = OP(1) · oP(1),

using the fact that ∑pj=1 ‖Aj,p‖2

F = OP(1) uniformly in m and p; see the proof of Lemma 6.5 inPaparoditis (2018). Using (31), we get for the term T2,n,

T22,n = ‖

p

∑j=1


∗n+h−j)‖2

2 ≤p

∑j=1‖Aj,p‖2

F

p

∑j=1‖ξ+n+h−j − ξ

∗n+h−j‖2

2 = OP(1) · oP(1)→ 0.

Consider assertion (iii). This assertion follows from Markov’s inequality and the fact thatE∗‖U∗n+h,m‖2

2 → 0 in probability. The last statement is true since

E∗‖U∗n+h,m‖22 =

1n

n

∑t=1‖Ut,m −Un‖2

2

≤ 4n

n

∑t=1‖Ut,m −Ut,m‖2

2 +4n

n

∑t=1‖Ut,m‖2

2 + 2‖Un‖22

=4n

n

∑t=1‖

m

∑j=1

1>j(ξtvj − ξtvj

)‖2

2 + oP(1)

34

where the oP(1) is due to the weak law of large numbers, the fact that E‖Ut,m‖2 = ∑∞j=m+1 λj →

0 as m → ∞ and Un,m → 0, in probability. For the first term on the right hand side of the lastdisplayed equality we have that this term is bounded by

8n

n

∑t=1

∥∥ m

∑j=1

1>j (ξt − ξt)vj∥∥2

2 +8n

n

∑t=1

∥∥ m

∑j=1

1>j ξt(vj − vj)∥∥2

2

≤ 8n

n

∑t=1‖Xt‖2

2

m

∑j=1‖vj − vj‖2

2 + 8

√√√√ m

∑j,l=1

∣∣ 1n

n

∑t=1

ξ j,tξl,t∣∣22

m

∑j=1‖vj − vj‖2

2 = OP(n−1m

∑j=1

α−2j )→ 0,

where the last equality follows because n−1 ∑nt=1 ‖Xt‖2

2 = OP(1) and n−1 ∑nt=1 ξ j,tξl,t

P→ λj1j=las n→ ∞.

References

Ahn, S. C. and Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’,Econometrica 81(3), 1203–1227.

Alonso, A. M., na, D. P. and Romo, J. (2002), ‘Forecasting time series with sieve bootstrap’,Journal of Statistical Planning and Inference 100(1), 1–11.

Antoniadis, A., Brossat, X., Cugliari, J. and Poggi, J.-M. (2016), ‘A prediction interval for afunction-valued forecast model: Application to load forecasting’, International Journal ofForecasting 32, 939–947.

Antoniadis, A., Paparoditis, E. and Sapatinas, T. (2006), ‘A functional wavelet-kernel approachfor time series prediction’, Journal of the Royal Statistical Society: Series B 68(5), 837–857.

Aue, A., Norinho, D. D. and Hormann, S. (2015), ‘On the prediction of stationary functional timeseries’, Journal of the American Statistical Association: Theory and Methods 110(509), 378–392.

Berlinet, A., Elamine, A. and Mas, A. (2011), ‘Local linear regression for functional data’, Annalsof the Institute of Statistical Mathematics 63(5), 1047–1075.

Billingsley, P. (1999), Converges of Probability Measures, John Wiley & Sons, New York.

Boj, E., Delicado, P. and Fortiana, J. (2010), ‘Distance-based local linear regression for functionalpredictors’, Computational Statistics & Data Analysis 54(2), 429–437.

Bosq, D. (2000), Linear processes in function spaces, Lecture notes in Statistics, New York.

Bosq, D. and Blanke, D. (2007), Inference and Prediction in Large Dimensions, John Wiley & Sons,Chichester.

Breidt, F. J., Davis, R. A. and Dunsmuir, W. T. M. (1995), ‘Improved bootstrap prediction intervalsfor autoregressions’, Journal of Time Series Analysis 16(2), 177–200.

Cheng, R. and Pourahmadi, M. (1993), ‘Baxter’s inequality and convergence of finite predictorsof multivariate stochastic processes’, Probability Theory and Related Fields 95, 115–124.

35

Chiou, J.-M. and Muller, H.-G. (2009), ‘Modeling hazard rates as functional data for the analysisof cohort lifetables and mortality forecasting’, Journal of the American Statistical Association:Applications and Case Studies 104(486), 572–585.

Cuevas, A., Febrero, M. and Fraiman, R. (2006), ‘On the use of the bootstrap for estimatingfunctions with functional data’, Computational Statistics and Data Analysis 51(2), 1063–1074.

Dehling, H., Sharipov, S. O. and Wendler, M. (2015), ‘Bootstrap for dependent Hilbert space-valued random variables with application to von Mises statistics’, Journal of MultivariateAnalysis 133, 200–215.

Ferraty, F. and Vieu, P. (2006), Nonparametric Functional Data Analysis, Springer, New York.

Ferraty, F. and Vieu, P. (2011), Kernel regression estimation for functional data, in F. Ferratyand Y. Romain, eds, ‘The Oxford Handbook of Functional Data Analysis’, Oxford UniversityPress, Oxford.

Findley, D. F. (1986), Bootstrap estimates of forecast mean square errors for autoregressiveprocesses, in D. M. Allen, ed., ‘Computer Science and Statistics: the Interface’, ElsevierScience, pp. 11–17.

Franke, J. and Nyarige, E. G. (2019), A residual-based bootstrap for functional autoregressions,Technical report, University of Kaiserslautern.URL: https://arxiv.org/abs/1905.07635

Gneiting, T. and Raftery, A. E. (2007), ‘Strictly proper scoring rules, prediction and estimation’,Journal of the American Statistical Association: Review Article 102(477), 359–378.

Goldsmith, J., Greven, S. and Crainiceanu, C. (2013), ‘Corrected confidence bands for functionaldata using principal components’, Biometrics 69(1), 41–51.

Hormann, S., Kidzinski, L. and Hallin, M. (2015), ‘Dynamic functional principal components’,Journal of the Royal Statistical Society: Series B 77(2), 319–348.

Hormann, S. and Kokoszka, P. (2010), ‘Weakly dependent functional data’, Annals of Statistics38(3), 1845–1884.

Hormann, S. and Kokoszka, P. (2012), Functional time series, in T. S. Rao, S. S. Rao and C. R.Rao, eds, ‘Handbook of Statistics’, Vol. 30, North Holland, Amsterdam, pp. 157–186.

Horvath, L. and Kokoszka, P. (2012), Inference for Functional Data with Applications, Springer,New York.

Horvath, L., Kokoszka, P. and Rice, G. (2014), ‘Testing stationarity of functional time series’,Journal of Econometrics 179(1), 66–82.

Hsing, T. and Eubank, R. (2015), Theoretical Foundations of Functional Data Analysis, with AnIntroduction to Linear Operators, John Wiley & Sons, New York.

Hurvich, C. M. and Tsai, C.-L. (1993), ‘A corrected Akaike information criterion for vectorautoregressive model selection’, Journal of Time Series Analysis 14(3), 271–279.

Hyndman, R. J. and Shang, H. L. (2009), ‘Forecasting functional time series (with discusssions)’,Journal of the Korean Statistical Society 38(3), 199–221.

36

https://arxiv.org/abs/1905.07635

Kokoszka, P. and Reimherr, M. (2013a), ‘Asymptotic normality of the principal components offunctional time series’, Stochastic Processes and Their Applications 123(5), 1546–1562.

Kokoszka, P. and Reimherr, M. (2013b), ‘Determining the order of the functional autoregressivemodel’, Journal of Time Series Analysis 34(1), 116–129.

Kokoszka, P., Rice, G. and Shang, H. L. (2017), ‘Inference for the autocovariance of a functionaltime series under conditional heteroscedasticity’, Journal of Multivariate Analysis 162, 32–50.

Kudraszow, N. L. and Vieu, P. (2013), ‘Uniform consistency of kNN regressors for functionalvariables’, Statistics and Probability Letters 83(8), 1863–1870.

Masry, E. (2005), ‘Nonparametric regression estimation for dependent functional data: asymp-totic normality’, Stochastic Processes and their Applications 115(1), 155–177.

McMurry, T. and Politis, D. N. (2011), Resampling methods for functional data, in F. Ferratyand Y. Romain, eds, ‘The Oxford Handbook of Functional Data Analysis’, Oxford UniversityPress, New York, pp. 189–209.

Meyer, M. and Kreiss, J.-P. (2015), ‘On the vector autoregressive sieve bootstrap’, Journal of TimeSeries Analysis 36(3), 377–397.

Pan, L. and Politis, D. N. (2016), ‘Bootstrap prediction intervals for linear, nonlinear andnonparametric autoregression’, Journal of Statistical Planning and Inference 177, 1–27.

Panaretos, V. M. and Tavakoli, S. (2013), ‘Fourier analysis of stationary time series in functionspace’, Annals of Statistics 41(2), 568–603.

Paparoditis, E. (2018), ‘Sieve bootstrap for functional time series’, Annals of Statistics 46(6B), 3510–3538.

Paparoditis, E. and Sapatinas, T. (2016), ‘Bootstrap-based testing of equality of mean functionsor equality of covariance operators for functional data’, Biometrika 103(3), 727–733.

Pascual, L., Romo, J. and Ruiz, E. (2004), ‘Bootstrap predictive inference of ARIMA processes’,Journal of Time Series Analysis 25(4), 449–465.

Pilavakis, D., Paparoditis, E. and Sapatinas, T. (2019), ‘Moving block and tapered block bootstrapfor functional time series with an application to the K-sample mean problem’, Bernoulli25(4B), 3496–3526.

Politis, D. N. and Romano, J. P. (1994), ‘The stationary bootstrap’, Journal of the American StatisticalAssociation: Theory and Methods 89(428), 1303–1313.

Rana, P., Aneiros, G., Vilar, J. and Vieu, P. (2016), ‘Bootstrap confidence intervals in functionalnonparametric regression under dependence’, Electronic Journal of Statistics 10(2), 1973–1999.

Rana, P., Aneiros-Perez, G. and Vilar, J. M. (2015), ‘Detection of outliers in functional time series’,Environmetrics 26(3), 178–191.

Shang, H. L. (2015), ‘Resampling techniques for estimating the distribution of descriptivestatistics of functional data’, Communications in Statistics-Simulation and Computation 44(3), 614–635.

37

Shang, H. L. (2017), ‘Functional time series forecasting with dynamic updating: An applicationto intraday particulate matter concentration’, Econometrics and Statistics 1, 184–200.

Shang, H. L. (2018), ‘Bootstrap methods for stationary functional time series’, Statistics andComputing 28(1), 1–10.

Thombs, L. A. and Schucany, W. R. (1990), ‘Bootstrap prediction intervals for autoregression’,Journal of the American Statistical Association: Theory and Methods 85(410), 486–492.

Vilar, J., Aneiros, G. and Rana, P. (2018), ‘Prediction intervals for electricity demand and priceusing functional data’, International Journal of Electrical Power and Energy Systems 96, 457–472.

Yao, F., Muller, H. and Wang, J. (2005), ‘Functional data analysis for sparse longitudinal data’,Journal of the American Statistical Association: Theory and Methods 100(470), 577–590.

Zhu, T. and Politis, D. N. (2017), ‘Kernel estimates of nonparametric functional autoregressionmodels and their bootstrap approximation’, Electronic Journal of Statistics 11(2), 2876–2906.

38

bootstrap prediction bands for functional time series

Documents