· testing the marginal distribution in time-varying location-scale models matei demetrescu y...
TRANSCRIPT
Testing the marginal distribution
in time-varying location-scale models∗
Matei Demetrescu†
Christian-Albrechts-University of Kiel
Robinson Kruse‡
University of Cologne and CREATES
Preliminary version: February 25, 2019
Please do not quote
Abstract
Testing distributional assumptions is an evergreen topic in statistics, econometrics and other
quantitative disciplines. As a leading case in applied work, the paper begins with testing
the normality of time series exhibiting features such as serial dependence and time-varying
mean and volatility. The results extend to general location-scale models without essential
modications. If falsely assuming weak or strict stationarity, the marginal distribution of the
series of interest is a mixture of the baseline distribution with dierent location and scale at
dierent times, and standard distribution tests would reject too often when the null is actually
true. We consider here tests based on raw moments of probability integral transform of the
suitably standardized series. For standardization, nonparametric estimators of the mean
and the variance functions may be used, which eliminates the possibility of rejecting the null
of, say, normality when the marginal distributions are normal, but with time-varying mean
and variance. The use of probability integral transforms is advantageous as they are quite
sensitive to deviations from the null other than asymmetry and excess kurtosis. Short-run
dynamics are taken into account using the (xed-b) Heteroskedasticity and Autocorrelation
Robust [HAR] approach of Kiefer and Vogelsang (2005, ET), which is shown to automatically
capture the eect of estimation uncertainty arising from empirical standardization. The
provided Monte Carlo experiments show that the new tests are performing well in terms
of size, but also in terms of power, when compared to popular alternative procedures. An
application to testing a distributional assumption for log squared returns in a stochastic
volatility model sheds light on the empirical usefulness of the proposed test.
Key words: Distribution testing; Probability integral transform; Estimated standardization; Nonpara-
metric estimator; Robust testing.
JEL classication: C12; C14; C22
∗The authors would like to thank Mehdi Hosseinkouchack, Dominik Liebl, Philipp Sibbertsen and MichaelVogt for very helpful comments. We are grateful to Yuze Liu for providing excellent research assistance.†Corresponding author: Institute for Statistics and Econometrics, Christian-Albrechts-University of Kiel,
Olshausenstr. 40-60, D-24118 Kiel, Germany, email: [email protected].‡University of Cologne, Faculty of Management, Economics and Social Science, Albert-Magnus-Platz, 50923
Cologne, Germany. E-mail address: [email protected] and CREATES, Aarhus University,School of Economics and Management, Fuglesangs Allé 4, DK-8210 Aarhus V, Denmark, e-mail address:[email protected] .
1
1 Introduction
Testing distributional assumptions is an important aspect of applied work. For instance, non-
normality of disturbances is sometimes taken to indicate a misspecication in regression models;
non-normality may also be a prerequisite of certain modelling approaches; see e.g. the analysis
of non-causal time series models (Lanne and Saikkonen, 2011; Lanne et al., 2012; Lanne and
Saikkonen, 2013). In duration models, departures from exponential distribution indicate again
misspecication. In an iid sampling situation, the Kolmogorov-Smirnov statistic is quite often
used to test distributional assumptions, but this is not straightforward to extend to serial depen-
dence and the use of estimated parameters. Bai (2003) resorts to the martingale transformation
of Khmaladze (1981); the martingale transform approach is quite demanding, though, so Bai
and Ng (2005) follow Jarque and Bera (1980) and resort to moment-based testing; see also Lom-
nicki (1961) for an early discussion for linear processes or Bontemps and Meddahi (2005) for
an ingenious choice of moment restrictions. While Bai and Ng (2005) address normality testing
explicitly, moment-based testing can be extended to test other distributions as well.
But serial dependence and estimation uncertainty are not the only issues to be faced in econo-
metric practice. Consider for instance the situation where a series is marginally normal, but
exhibits one break in the mean or the variance. The pooled distribution is a mixture of two
normals, which is non-normal, so a normality test ignoring the break will reject the true null
more often than required by the nominal level of the test. The reasoning extends to more gen-
eral patterns of changes in mean or variance, and other families of distributions. And indeed,
economic data are often found to exhibit time-varying moments. Even if arguing mean breaks
away, examples of time-varying volatility can be found in the eld of nancial data such as as-
set returns (see among others Guidolin and Timmermann, 2006; Amado and Teräsvirta, 2014;
Teräsvirta and Zhao, 2011; Amado and Teräsvirta, 2013) and also macroeconomic time series
such as economic growth or price changes (see e.g. Stock and Watson, 2002; Sensier and van Dijk,
2004; Clark, 2009, 2011; Justiniano and Primiceri, 2008). Typical patterns are permanent breaks
(like the Great Moderation as an example for a downward break) or trends in the variance.
As a consequence, robust inference for time-heteroskedasticity with dependent data has received
considerable attention in the last decade.1
We discuss in this paper tests based on series standardized using means and variances estimated
in a nonparametric fashion, to account for possible time variations of unknown shape in the
location and the scale of the series of interest. The tests are based on moments of probability
integral transforms [PIT]s of the standardized series. PITs have already been used successfully
by Knüppel (2015), though without accounting for the estimated standardization.
Regarding robustness against serial dependence, we rely on long-run variance estimation following
Bai and Ng (2005). We go one step further, though, and adopt the xed-b asymptotic framework
of Kiefer and Vogelsang (2005). The main feature of the xed-b framework is that the bandwidth
B used for long-run covariance estimation does not need to fulll the standard assumption that
1Phillips and Xu (2006) and Xu (2008) deal with stationary autoregressions, while, for unit root autoregressions,the reader is referred to Cavaliere and Taylor (2008) or Cavaliere and Taylor (2009). Time-varying volatility haveeven larger eects in panels of (nonstationary) series, prompting for suitable treatment; see e.g. Demetrescu andHanck (2012) or Westerlund (2014).
2
b = B/T → 0 as T →∞. On the contrary, the bandwidth is held xed as a linear proportion of
the sample size T , i.e. B = [bT ] with b ∈ (0, 1]. This leads to non-standard asymptotic limiting
distributions of tests statistics (like t, Wald and F ), in such a way that the critical values
obtained from such distributions reect the choice of bandwidth and kernel even as T → ∞,
such that the xed-b approach may provide much more accurate nite-sample inference.2 Our
main contribution is to show that the mean and variance functions may be estimated in a
nonparametric fashion. As a consequence, the practitioner does not have to specify a model
for the mean and the variance explicitly; moreover, the limiting distribution turns out to be
the same whether the mean and variance functions are known or estimated, thus leading to a
straightforward implementation of the proposed tests.
The remainder of the paper is structured as follows. In Section 2, the setup is described and newly
proposed test statistics are introduced for the important particular case of normality. The case
of uncertainty induced by nonparametrically estimated standardization is located in Section 3,
followed by the extension to other distributions. Our Monte Carlo simulations study is included
in Section 5. Section 6 provides an empirical application of tests to log squared returns for the
DJIA, FTSE and Nikkei stock index. Section 6 concludes the study. Proofs, additional results,
response curves for critical values and a description of the Bai and Ng (2005) test statistic are
given in the Appendix.
In terms of notation, C stands for a generic constant whose value may change from one occurrence
to another and ⇒ for weak convergence in a space of cadlag functions endowed with a suitable
norm.
2 Model and test idea
To x ideas, we rst describe the proposed procedure for the null hypothesis of normality and
assuming that the mean and variance of the series of interest are known. Section 3 shall discuss
the feasible version of our test procedure with nonparametric standardization and the application
to other distributions under the null.
The null hypothesis to be tested is that the series of interest xt is marginally normal. The
series xt is taken to exhibit time-varying mean and variance behavior as given by the following
component model
xt = µt + σtzt, t = 1, 2, . . . , T,
where zt is unconditionally homoskedastic and otherwise short-range dependent, while the time-
varying mean and variance are allowed to have triangular array structures, µt = µt,T and σt =
σt,T , allowing e.g. for breaks.
The following assumptions make the notions of short-run dependence and time-varying moments
precise.
2See Yang and Vogelsang (2011), Vogelsang and Wagner (2013) or Sun (2014a,b) for recent contributions tothis eld, inter alia.
3
Assumption 1 Let zt be a marginally standard normal, strictly stationary series with strong
mixing with coecients α (j) for which
α (j) < Aj−b for some b > 10/3.
Assume furthermore that zt has unity long-run variance,∑∞
h=−∞ E (ztzt+h) > 0. Finally, assume
absolutely summable 8th-order cumulants of zt.
The strong mixing condition is a standard way of controlling for the persistence of stochastic
processes and ensures zt to have short memory; given the unity long-run variance, zt is integrated
of order zero and σ2t then gives the local long-run variance. The mixing coecients α (j) are
only mildly restricted, given that normality of zt ensures niteness of moments of any order
and the typical trade-o between serial dependence and niteness of higher-order moments is
not relevant here. The condition also allows for mild form of conditional heteroskedasticity,
so the observed series xt may exhibit both conditional and unconditional heteroskedasticity.
Assumption 1 ensures e.g. weak convergence of the suitably normalized partial sums of zt and
z2t ,
1√T
[sT ]∑t=1
(zt
z2t − 1
)⇒
(W1 (s)
W2 (s)
), (1)
where (W1,W2)′ is a bivariate Brownian motion (see e.g. Davidson, 1994, Chapter 29). Strict
stationarity is a more restrictive condition than needed for the convergence in (1), for which
weak stationarity would have suced in addition to the I(0) property and uniform boundedness
of higher-order moments. We shall consider nonlinear transformations of zt, however, and strict
stationarity of zt ensures that the transformed series have constant variance; see below. Moreover,
strict stationarity is a reasonable assumption once the time-varying mean and variance have been
accounted for.
Strict stationarity of zt also separates the variance uctuations from the serial dependence prop-
erties. The unity long-run variance assumption on zt is an identifying restriction and allows
for the interpretation of σt as marginal (long-run) standard deviation. The mean and variance
functions themselves are taken to satisfy some smoothness conditions:
Assumption 2 The triangular arrays µt,T and σt,T are given as µt,T = µ (t/T) and σt,T =
σ (t/T), where both µ (·) and σ (·) are Lipschitz-continuous on [0, 1], and σ (·) is bounded away
from zero on [0, 1]. Let σ′′ exist and be bounded on [0, 1].
We base our test of the null hypothesis on moments of transformed series rather than the original
series zt. With Φ being the cdf (and ϕ denoting the pdf) of the standard normal distribution,
the probability integral transform
pt = Φ (zt)
is marginally uniform on [0, 1] under the null. It then holds under the null of uniformly distributed
PITs that
E(pkt
)=
1
k + 1; k ∈ N (2)
4
such that, under Assumption 1,
1√T
[sT ]∑t=1
pt − 1
2...
pKt − 1K+1
⇒
B1 (s)...
BK (s)
(3)
where (B1, . . . BK)′ is a K-variate Brownian motion with covariance matrix denoted by Ω =
E((B1(1), . . . BK(1))′ (B1(1), . . . BK(1))
)which is taken to be positive denite. Because pt is
only marginally uniform, Ω depends in general on the specic data generating process at hand.
We shall resort to an estimate thereof (based on the usual spectral density based approach;
see Newey and West, 1987; Andrews, 1991; Andrews and Monahan, 1992) to build Wald test
statistics of the moment restrictions in (2), so it is not required to know Ω. This follows the
approach of Bai and Ng (2005) or Bontemps and Meddahi (2005) to deal with serial dependence
of unknown form.
Suppose for now that the test can be based directly on empirical moments of pt (i.e. under known
parameters µt and σt). With mk = 1T
∑Tt=1 p
kt , a simple t-statistic for a single restriction on the
k-th moment is given by
tk =√T
(mk − 1
k+1
ωk
)(4)
with ω2k being the kth diagonal element of Ω (i.e. the long-run variance of pkt ). Let
Ω =T−1∑
j=−T+1
κ
(j
B
)Γj (5)
denote an estimator of Ω with proportional bandwidth B = [bT ], b > 0, where the Γj 's denote
the usual autocovariance matrix estimator at lag j,
Γj =1
T
T∑t=j+1
(pt − p)(pt−j − p
)′, j ≥ 0, and Γj = Γ−|j|, j < 0,
with pt the vector stacking pt, p2t , . . . , p
Kt . For b ∈ (0, 1] we have from Kiefer and Vogelsang
(2005) that
t2k ⇒W 2(1)
Qb,κ
where W is a standard Wiener process, and the functional Qb,κ is given in terms of the Brownian
bridge W (s)−sW (1) and depends explicitly on the choice of kernel and bandwidth. For simplicity
we work with the two most popular kernels in applied time series analysis, a) the quadratic
spectral [QS] kernel of Andrews (1991) with κ(s) = 2512π2s2
(sin(6πs/5)
6πs/5 − cos(6πs/5))and b) the
Bartlett kernel κ(s) = (1− |s|) 1 (|s| ≤ 1) with 1 the indicator function. For kernels with smooth
2nd order derivative, of which the QS kernel is one, it holds that
Qb,κ = −ˆ 1
0
ˆ 1
0
1
b2κ′′(r − sb
)(W (r)− rW (1)
)(W (s)− sW (1)
)drds ,
5
while, for the Bartlett kernel,
Qb,κ =2
b
ˆ 1
0
(W (r)− rW (1)
)2dr − 2
b
ˆ 1−b
0
(W (r + b)− (r + b)W (1)
)(W (r)− rW (1)) dr.
For both kernels, the standard asymptotics (t2k ⇒ χ21) is recovered when b→ 0 at suitable rates
(in fact Qb,κd→ 1 for b→ 0; c.f. Kiefer and Vogelsang, 2005).
Working with several raw moments (a portmanteau test so-to-say), we suggest to construct
TK = T
(m1 −
1
2, . . . ,mK −
1
K + 1
)Ω−1
(m1 −
1
2, . . . ,mK −
1
K + 1
)′. (6)
Similarly,
TK ⇒ W′K(1)Q−1K,b,κWK(1),
where WK(s) is a K-dimensional vector of independent standard Wiener processes QK,b,κ is
the K-dimensional variant of the above functionals relying on the Brownian bridges WK(s) −sWK(1); see Kiefer and Vogelsang (2005) for details.
Compared with relying on zt directly, PITs have several advantages; see Knüppel (2015) again.
Among others, PITs are bounded such that its higher-order cumulants are smaller than those
of the standard normal such that the variability of the long-run covariance matrix estimators
is smaller and the χ2 asymptotic approximation is more accurate. The bias of the moments of
PITs is also typically smaller than those of the untransformed series; see the Appendix for some
evidence in this respect.At the same time, PITs still allow to distinguish between skewness and
kurtosis as causes of non-normality: since the cdf of the standard normal is symmetric about
the point (0, 0.5), the rst raw moment of the PITs captures distributional asymmetry, but not
skewness alone. So a rejection of the null which is not driven by the rst raw moment is clearly
not due to skewness.
To take advantage of the properties of the PITs based test, one must however standardize the
series prior to applying the PIT. We address this issue in the following section.
3 Local standardization and estimation uncertainty
If e.g. the location and scale parameters of the sample to be tested are known (or given to the
researcher), the tests may be applied directly. Although this is not a purely hypothetical situation
(for instance, the evaluation of density forecasts is often conducted under such assumptions; see
Knüppel (2015) and the referenced therein), it is not the prevailing case in applied work. Let
therefore
pt = Φ (zt) = Φ
(xt − µtσt
)(7)
with µt and σt being estimators of the (time-varying) mean and standard deviation µt and σt.
Let also mk = 1T
∑Tt=1 p
kt denote the sample average of pkt .
The use of pt instead of pt for computing a feasible statistic, say tk, typically aects the limiting
distributions and requires corrections. This is known in the literature as the Durbin problem; see
6
Durbin (1973). In previous work, Bai and Ng (2005) show how to robustify against estimating
(constant) mean and variance, while Bontemps and Meddahi (2012) derive conditions under
which more general parametric standardization does not aect the limiting distribution. Bai
(2003) uses the Khmaladze transform to tackle this issue.
These approaches are discussed under stationarity assumptions. To account for the time-varying
nature of our model, we employ a local standardization to match the local stationarity properties
of the model. Consider to this end the Nadaraya-Watson estimator for the unknown functions
µ and σ, i.e. the local constant regressions of xt and (xt − E(xt))2 on the normalized time t/T :3
µ
(t
T
)=
∑Tj=1K
(t/T−j/T
h
)xj∑T
j=1K(t/T−j/T
h
)and
σ2(t
T
)=
∑Tj=1K
(t/T−j/T
h
)(xj − µj)2∑T
j=1K(t/T−j/T
h
) .
where we assume for simplicity that x0,−1,... and xT+1,.... These estimators are not unfamiliar in
time series analysis: using the uniform kernel, we obtain the classical centered moving averages
µt =1
2τ + 1
t+τ∑j=t−τ
xj and σ2t =1
2τ + 1
t+τ∑j=t−τ
(xj − µj)2 ,
where the window width is obtained from the bandwidth h by multiplication with T . The window
width τ is smaller than T , hence ensuring that xt is approximately standardized in nite samples,
and letting τ →∞ ensures that, asymptotically, xt is standardized correctly. This is simply local
standardizing instead of global as would have been sucient for the case of strict stationarity
of xt. In fact, we may allow for dierent window widths τµ and τσ to allow for a more exible
choice of these tuning parameters; see also Remark 1 below.
The key step in analyzing the feasible statistic based on pt is to note that the weak convergence
in (3) is replaced by the following limiting behavior.
Lemma 1 Let τµ, τσ, T → ∞ such that Tκ1τµ,σ
+τµ,σTκ2 → 0 for 2/3 < κ1 < κ2 < 3/4. Then, under
Assumptions 1 and 2 and the uniform kernel,
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)⇒ Bk (s)− kϑk−1W1 (s)− k
2$k−1W2 (s)
jointly for all k = 1, . . . ,K, with ϑk−1 = E(pk−1t ϕ (zt)
)and $k−1 = E
(pk−1t ztϕ (zt)
), as well
as W1,2 from (1) and Bk from (3).
Proof: see the Appendix.
3E.g. Vogt (2012) considers nonparametric multivariate regressions with time-varying regression surfaces.
7
Remark 1 The choice of the window widths τµ and τσ is critical for the performance of the
smoothers. One may note that the imposed restrictions imply undersmoothing, which is ex-
plained by the nature of the desired result. While classical nonparametric regression focusses on
minimizing the MSE of the estimated curve, we need to reduce estimation bias to a minimum,
since the eect of the bias on the partial sums of pt of µt, say, cumulates in s. This eectively
induces trends in the partial sums of pt, and, for weak convergence to Wiener process to still
take place, these trends must be of negligible magnitude. This implies that usual procedures for
choosing the window width such as cross-validation do not deliver. We make recommendations
to the choice of window width in the Monte Carlo section.
Remark 2 We state the lemma for the case of the uniform kernel to simplify the proofs. Other
kernel choices, say the popular Gaussian or Epanechnikov kernels, plausibly lead to analogous
results, as would using a local linear (or polynomial) regression. Moreover, one may allow for
a nite number of breaks, too. We provide simulation evidence in support of these claims, but
choose not to follow through analytically to focus on the main message.
Remark 3 Note that ϑ0 = E (ϕt) =´∞−∞ ϕ
2 (x) dx = 12√π; via the use of power series expan-
sions one may show that ϑ1 = 14√π, but the higher-order expectations (for ϑk, k ≥ 2) do not
seem to have a closed-form expression. We computed the expectations ϑk−1 = E(pk−1t ϕ(zt)) via
Monte Carlo simulation for k = 1, 2, 3, 4 with 1,000,000 observations and 10,000 replications;
the resulting values are as follows: ϑ = (0.2820948, 0.1410473, 0.0857805, 0.0581472). Clearly,
the simulated values for k = 1 and k = 2 match perfectly with their theoretical counterpart. We
therefore expect that MC precision of the higher-order terms is quite reasonable.
The feasible test statistics are based on pt,
tk =√T
(mk − 1
k+1
ωk
)(8)
and
TK = T
(m1 −
1
2, . . . , mK −
1
K + 1
)Ω−1
(m1 −
1
2, . . . , mK −
1
K + 1
)′(9)
with Ω from (5) computed using pkt as well.
By Lemma 1 we have that the K normalized partial sums 1√T
∑[sT ]t=1
(pkt − 1
k+1
)still converge
weakly to K-dimensional Brownian motion, albeit with a dierent long-run covariance matrix
than Ω, namely
Ω = V ΞV ′
where Ξ is the long-run covariance matrix of(pt, . . . , p
Kt , zt, z
2t − 1
)′, and
V =(IK ; ιK
)with ιK = −
(ϑ0 · · · KϑK−112$0 · · · K
2 $K−1
)′.
Since the xed-b asymptotics leads to partial-sums asymptotics for both mk and Ω, where the
8
relevant long-run (co)variance matrix simply cancels out in (8) and (9), it turns outrelatively
surprisinglythat no explicit correction is required. This is formalized in the following
Proposition 1 Under the Assumptions of Lemma 2, it holds under the additional condition that
Ξ be positive denite, that
t2k ⇒W 2(1)
Qb,κand
TK ⇒ W′K(1)Q−1K,b,κWK(1)
as T →∞.
Proof: see the Appendix.
Remark 4 It is questionable whether small-b asymptotics are feasible or worth pursuing. Due
to the local nature of the standardization, the convergence of pt to pt is quite slow compared to the
parametric case (where is is easily shown that pt − pt = Op(T−1/2)). Showing that Ω converges
to Ω is therefore more dicult for local standardization. Since such convergence is not required
for the xed-b approach, this actually delivers a further argument in favor of using the latter.
Remark 5 As a comparison, Appendix C provides a discussion of parametric standardization.
Parametric approaches require a model for the mean µt and the variance σ2t , which is prone to
misspecication. Moreover, there are essential dierences between the implications of the two
approaches. While parametric adjustment typically leads to bridge-type processes (see Lemma 3
in Appendix C for details), we obtain Brownian motions for local adjustment. As a consequence,
parametric standardization leads to the need of an explicit correction explicitly involving an esti-
mate of Ξ; see Proposition 2 in Appendix C. As may be seen there, a further disadvantage of the
parametric approach is that the eect depends on the shape of the mean or variance component
adjusted for.
4 Other distributions and extensions
Our framework allows testing other null distributions in location-scale models, since PITs apply
to any continuous distribution. In particular, it is straightforward to show that Lemma 1 and
Proposition 1 hold under mild regularity conditions (nite moments of any order; relaxing this
can be done, at the cost of appropriately restricting serial dependence), yet with e.g. ϑk =
E(pkt f0(zt)
)where f0 is the density function of the null distribution of zt. One further advantage
of the local standardization approach is that the expectations ϑk = E(pktϕ(zt)
)and $k =
E(pkt ztϕ (zt)
)need not be computed explicitly, so the test is immediately applicable for any
continuous null shape in location-scale families.
9
5 Monte Carlo study
In our Monte Carlo simulation study we compare the tk and TK statistics to the procedure of
Bai and Ng (2005).4 The newly proposed test is carried out by using either a single moment
(rst to fourth) or the rst two (T2), rst three (T3) or rst four moments (T4). We use sample
sizes of T = 250, 1000.
Regarding autocorrelation, we consider a causal and invertible ARMA(1,1) process with AR and
MA parameter φ = 0, 0.85 and θ = 0,−0.45, respectively. The general form of the DGP is
given by
yt = µt + σtzt
µt = µ1 + (µ2 − µ1) · 1(t ≥ bτT c)
σt = σ1 + (σ2 − σ1) · 1(t ≥ bτT c)
zt = φzt−1 + εt − θεt−1εt
i.i.d.∼ D(0, 1) .
Since all procedures are scale-invariant, we do not normalize the long-run variance of zt to unity.
distribution of εt is specied as follows. Under H0, innovations εt are standard normally dis-
tributed. Under the alternative, we consider three standardized non-normal mixture distributions
with weights c ∈ [0, 1]:
1. Mixture of a normal and a Student-t(3) distribution,
2. Mixture of a normal and a lognormal-distribution,
3. Mixture of a normal and a χ2(3)-distribution.
Regarding the long-run covariance matrix estimator, the xed-bandwidth parameter b is specied
as b = 0.1 as previous simulation experiments showed that the size of the tests is quite stable for
dierent values of b, but that power is higher for smaller values of b. Hence, we choose b = 0.1.
Results are presented for the Bartlett kernel with linearly decaying weights.
The nonparametric estimator is the Nadaraya-Watson estimator with Gaussian kernel and a
down-scaled bandwidth chosen via cross-validation. The scaling factor s is set equal to 3/4 or
3/5 for comparison. The bandwidth chosen according to cross-validation is down-scaled as the
asymptotic results require a small bias in the estimation. The nominal signicance level equals
5% and the number of Monte Carlo replications is set to 2,000 for each single experiment. In
what concerns critical values for the xed-b distributions, we provide them on the basis of the
limiting results with 1,000 observations and 50,000 replications forK = 1, 2, 3, 4. Estimated cubic
response curves cv(b) are reported in Table 18 together with an R2 measure for the precision of
approximation.
Results are reported in Tables 1-16. If the mixture weight c = 0, then the distribution of
the innovations zt is Normal and this case refers to the IID or ARMA size experiments. For
4Details on the test proposed by Bai and Ng (2005) can be found in Appendix D.
10
the other three distributions, i.e. t(3), Log − Normal and χ2(3), the mixture weight is set as
c = 0.25, 0.5, 0.75, 1. We expect the power to increase along with c giving more weight on the
non-normal distribution. In the Tables 1-16, we distinguish between the four cases (i) no shift,
(ii) mean shift, (iii) vola shift and (iv) mean plus vola shift. In case of a mean shift, a structural
break from 0 to 3 takes place in the middle of the sample. For a vola shift, the standard deviation
switches from unity to three at the same breakpoint.
While the Bai and Ng (2005) test is generally oversized (less for the ARMA(1,1) case), the raw
moment-based tests are much closer to the nominal signicance level of 5%. In some cases we
observe that they are marginally undersized. But, for the larger sample size of T = 250 with
short-run dynamics, most of them are pretty close to the desired frequency of rejections. Clearly,
the BN test is heavily oversized when there are mean and/or vola shifts. On the contrary, the
newly proposed statistics properly account for such shifts and deliver quite accurate empirical
sizes. The size performance is somewhat better for s = 3/4 rather than s = 3/5.5
Turning to power, we shall distinguish case (i) no shift from the remaining ones. Only in the
absence of mean and vola shifts, the BN test performs reasonable as it does not account for such
breaks. Here, we can compare the (size-unadjusted) power of the BN test to the newly proposed
ones, while keeping in mind that the BN test is generally somewhat over-sized under the null.
These situations are given in Tables 1, 5, 9 and 13. As we can observe, the rst moment cannot
detect excess kurtosis given symmetry as in the t(3) distribution, but the second, third and fourth
moment tests are successful and deliver higher power than the BN test. Moreover, a combination
of moments does not pay o in terms of power. The power of a properly selected single-moment
statistic is generally higher than those observed for the statistics relying on combinations. For
the log-normal and χ2-distributions, the test based on the rst moment appears to be quite
powerful and provides largest power across the studied tests. It also beats the BN test. The
choice of the down-sclaing factor for the bandwidth of the Nadaraya-Watson estimator does not
impact the results much. As the size results are better for the case of s = 3/4, we recommend
this choice for practical purposes (we also use this setting in our empirical application).
For the remaining situations, a direct comparison of the BN test to the new ones is not meaningful
due to the massive over-rejections of the BN test under the null. These additional experiments
allow us to compare the power across dierent scenarios regarding the breaks in comparison to
the benchmark case of no shifts. The results reveal that power is somewhat lowered in the case of
mean shifts, but a bit higher when pure vola shifts are present. In case of a combination of both
eects being present, power is almost similar to the benchmark case in many cases. Furthermore,
we might simply evaluate the consistency of our test statistics which should reject more often as
c increases and also as T increases. As expected, power increases along both dimensions.
5It is of importance to note that the size does not vary much with the choice of the bandwidth parameter b asunreported previous simulations revealed. This is of advantage when it comes to the power of such tests whichtypically depend a lot on the bandwidth choice; cf. Kiefer and Vogelsang (2005). In this sense, we are not facinga size-power tradeo as we can select the most suitable b in a way that power is maximized. As indicated above,we select b = 0.1.
11
Table 1: Size and power for T = 250, no shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.054 0.052 0.053 0.043 0.051 0.051 0.047 0.121ARMA 0.045 0.055 0.058 0.071 0.057 0.062 0.060 0.093
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.051 0.049 0.046 0.050 0.050 0.038 0.041 0.0960.5 0.041 0.075 0.131 0.183 0.096 0.085 0.081 0.0440.75 0.048 0.320 0.607 0.650 0.519 0.469 0.520 0.180
1 0.070 0.482 0.750 0.772 0.699 0.705 0.769 0.215
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.078 0.061 0.049 0.052 0.061 0.048 0.052 0.1020.5 0.664 0.594 0.500 0.372 0.485 0.391 0.386 0.5780.75 0.971 0.931 0.874 0.755 0.973 0.998 0.998 0.950
1 0.959 0.920 0.855 0.726 0.967 1.000 1.000 0.954
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.056 0.057 0.054 0.052 0.049 0.055 0.053 0.1120.5 0.692 0.584 0.383 0.230 0.522 0.435 0.338 0.6660.75 1.000 0.989 0.919 0.583 0.999 0.997 0.992 1.000
1 1.000 0.999 0.957 0.636 1.000 1.000 1.000 1.000
Table 2: Size and power for T = 250, mean shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.042 0.046 0.046 0.041 0.042 0.045 0.040 0.999
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.044 0.058 0.056 0.059 0.064 0.073 0.052 0.9940.5 0.049 0.082 0.150 0.168 0.117 0.096 0.089 0.9420.75 0.045 0.254 0.552 0.633 0.524 0.431 0.440 0.798
1 0.034 0.371 0.666 0.733 0.668 0.588 0.644 0.536
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.048 0.059 0.067 0.058 0.055 0.047 0.054 0.9890.5 0.526 0.520 0.438 0.344 0.384 0.341 0.284 0.9070.75 0.952 0.921 0.873 0.775 0.897 0.993 0.985 0.719
1 0.959 0.929 0.892 0.808 0.935 1.000 1.000 0.518
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.051 0.046 0.046 0.048 0.054 0.048 0.052 1.0000.5 0.522 0.477 0.351 0.209 0.372 0.270 0.215 0.9980.75 0.997 0.992 0.909 0.625 0.987 0.974 0.954 0.974
1 1.000 0.997 0.936 0.643 0.999 0.999 0.997 0.800
12
Table 3: Size and power for T = 250, vola shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.039 0.046 0.055 0.071 0.050 0.042 0.041 0.769
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.055 0.062 0.066 0.071 0.062 0.051 0.047 0.7370.5 0.046 0.128 0.229 0.293 0.220 0.180 0.144 0.4520.75 0.063 0.400 0.696 0.739 0.630 0.580 0.605 0.271
1 0.062 0.505 0.764 0.783 0.761 0.727 0.830 0.261
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.056 0.076 0.095 0.107 0.078 0.058 0.051 0.7010.5 0.588 0.613 0.555 0.449 0.451 0.424 0.442 0.2770.75 0.955 0.914 0.862 0.760 0.943 0.996 0.995 0.780
1 0.957 0.931 0.892 0.793 0.952 1.000 1.000 0.924
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.052 0.058 0.071 0.076 0.053 0.051 0.041 0.7400.5 0.602 0.652 0.562 0.405 0.491 0.369 0.320 0.5320.75 0.997 0.995 0.927 0.702 0.999 0.996 0.983 0.850
1 0.998 0.995 0.955 0.727 0.998 1.000 1.000 0.975
Table 4: Size and power for T = 250, mean plus vola shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.059 0.049 0.056 0.060 0.057 0.047 0.048 0.998
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.061 0.072 0.079 0.092 0.073 0.056 0.058 0.9980.5 0.050 0.149 0.247 0.268 0.197 0.162 0.134 0.9360.75 0.060 0.399 0.657 0.662 0.564 0.483 0.490 0.739
1 0.051 0.492 0.766 0.785 0.728 0.670 0.712 0.622
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.092 0.119 0.117 0.097 0.102 0.067 0.057 1.0000.5 0.673 0.659 0.586 0.428 0.508 0.432 0.413 0.9940.75 0.959 0.931 0.883 0.777 0.920 0.998 0.996 0.972
1 0.951 0.928 0.884 0.791 0.929 1.000 1.000 0.959
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.093 0.090 0.090 0.084 0.093 0.079 0.064 1.0000.5 0.691 0.687 0.529 0.340 0.540 0.403 0.355 1.0000.75 0.995 0.992 0.939 0.689 0.998 0.995 0.987 1.000
1 0.997 0.998 0.967 0.754 1.000 1.000 1.000 1.000
13
Table 5: Size and power for T = 250, no shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.054 0.047 0.047 0.055 0.056 0.043 0.037 0.126ARMA 0.032 0.092 0.122 0.135 0.105 0.110 0.148 0.091
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.047 0.052 0.043 0.037 0.043 0.043 0.036 0.0790.5 0.052 0.072 0.124 0.160 0.116 0.098 0.074 0.0400.75 0.071 0.337 0.593 0.637 0.546 0.477 0.537 0.165
1 0.082 0.471 0.740 0.768 0.710 0.690 0.806 0.214
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.068 0.062 0.053 0.044 0.056 0.054 0.052 0.1060.5 0.664 0.592 0.483 0.362 0.481 0.405 0.380 0.5500.75 0.959 0.919 0.846 0.733 0.963 0.997 0.998 0.957
1 0.975 0.935 0.867 0.749 0.975 1.000 1.000 0.964
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.070 0.048 0.055 0.055 0.063 0.062 0.059 0.1230.5 0.656 0.570 0.407 0.237 0.502 0.363 0.315 0.6390.75 0.999 0.997 0.932 0.605 0.999 0.997 0.990 1.000
1 0.998 0.998 0.955 0.640 0.999 1.000 1.000 1.000
Table 6: Size and power for T = 250, mean shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.047 0.041 0.051 0.048 0.046 0.045 0.051 0.998
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.044 0.049 0.055 0.059 0.048 0.057 0.048 0.9960.5 0.040 0.089 0.145 0.162 0.126 0.111 0.099 0.9400.75 0.051 0.256 0.499 0.574 0.463 0.389 0.399 0.785
1 0.034 0.344 0.693 0.757 0.668 0.555 0.599 0.542
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.078 0.065 0.063 0.061 0.070 0.073 0.063 0.9910.5 0.476 0.507 0.439 0.366 0.347 0.306 0.291 0.9180.75 0.954 0.920 0.867 0.802 0.905 0.990 0.980 0.701
1 0.959 0.937 0.900 0.821 0.926 1.000 1.000 0.503
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.058 0.051 0.054 0.058 0.068 0.069 0.054 1.0000.5 0.489 0.476 0.337 0.210 0.368 0.270 0.224 1.0000.75 0.991 0.981 0.886 0.613 0.988 0.969 0.939 0.964
1 0.997 0.998 0.944 0.666 1.000 0.999 0.995 0.825
14
Table 7: Size and power for T = 250, vola shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.041 0.053 0.067 0.070 0.053 0.043 0.039 0.757
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.049 0.066 0.089 0.094 0.057 0.049 0.041 0.7270.5 0.040 0.114 0.231 0.278 0.205 0.163 0.140 0.4290.75 0.059 0.371 0.656 0.724 0.636 0.583 0.601 0.304
1 0.065 0.545 0.787 0.815 0.750 0.742 0.811 0.252
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.068 0.092 0.102 0.099 0.081 0.062 0.064 0.7080.5 0.609 0.633 0.577 0.461 0.464 0.419 0.417 0.3000.75 0.972 0.921 0.879 0.778 0.963 0.998 0.997 0.782
1 0.968 0.942 0.880 0.780 0.969 1.000 1.000 0.920
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.076 0.078 0.103 0.106 0.079 0.066 0.069 0.7510.5 0.589 0.616 0.498 0.351 0.439 0.341 0.267 0.5170.75 0.997 0.991 0.934 0.698 0.992 0.992 0.982 0.843
1 0.996 0.994 0.949 0.707 0.999 1.000 0.999 0.979
Table 8: Size and power for T = 250, mean and vola shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.037 0.047 0.065 0.065 0.049 0.048 0.039 0.997
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.047 0.058 0.073 0.079 0.060 0.050 0.042 0.9970.5 0.044 0.138 0.246 0.285 0.203 0.155 0.126 0.9410.75 0.055 0.359 0.624 0.672 0.576 0.470 0.479 0.732
1 0.058 0.476 0.768 0.795 0.729 0.660 0.720 0.614
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.099 0.112 0.101 0.085 0.082 0.071 0.065 1.0000.5 0.666 0.643 0.548 0.439 0.487 0.430 0.410 0.9930.75 0.962 0.932 0.882 0.781 0.934 0.993 0.989 0.968
1 0.960 0.933 0.897 0.825 0.937 1.000 0.999 0.963
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.077 0.076 0.076 0.065 0.067 0.073 0.067 1.0000.5 0.643 0.635 0.503 0.343 0.518 0.411 0.361 1.0000.75 0.996 0.990 0.933 0.706 0.990 0.985 0.983 1.000
1 0.996 0.996 0.963 0.743 0.996 0.997 0.997 1.000
15
Table 9: Size and power for T = 1000, no shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.047 0.049 0.050 0.043 0.045 0.049 0.043 0.083ARMA 0.046 0.056 0.054 0.045 0.044 0.046 0.038 0.083
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.049 0.061 0.057 0.057 0.054 0.037 0.037 0.0640.5 0.050 0.368 0.612 0.677 0.499 0.382 0.325 0.2000.75 0.052 0.879 0.928 0.927 0.918 0.913 0.991 0.386
1 0.063 0.914 0.934 0.925 0.940 0.965 1.000 0.427
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.138 0.173 0.157 0.137 0.124 0.086 0.068 0.1240.5 0.997 0.981 0.952 0.923 0.996 0.993 0.976 0.9930.75 0.998 0.994 0.986 0.970 0.999 1.000 1.000 0.996
1 0.998 0.994 0.985 0.965 0.999 1.000 1.000 0.994
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.095 0.073 0.065 0.049 0.072 0.056 0.047 0.1000.5 1.000 1.000 0.985 0.896 0.995 0.985 0.954 1.0000.75 1.000 1.000 1.000 0.998 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000
Table 10: Size and power for T = 1000, mean shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.042 0.047 0.050 0.050 0.053 0.046 0.036 1.000
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.048 0.048 0.053 0.054 0.052 0.046 0.048 0.9950.5 0.046 0.323 0.589 0.682 0.540 0.386 0.336 0.9520.75 0.045 0.852 0.931 0.930 0.929 0.931 0.985 0.730
1 0.036 0.929 0.950 0.945 0.953 0.949 0.999 0.353
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.143 0.179 0.170 0.161 0.126 0.097 0.083 0.9960.5 0.992 0.966 0.950 0.924 0.985 0.970 0.955 0.9790.75 0.997 0.988 0.980 0.962 0.997 1.000 1.000 0.958
1 1.000 0.993 0.988 0.976 0.997 1.000 1.000 0.979
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.078 0.088 0.071 0.063 0.066 0.058 0.040 1.0000.5 0.998 0.995 0.969 0.879 0.990 0.970 0.936 1.0000.75 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 0.999 1.000 1.000 1.000 0.987
16
Table 11: Size and power for T = 1000, vola shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.049 0.053 0.055 0.058 0.044 0.040 0.054 1.000
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.057 0.074 0.100 0.113 0.096 0.069 0.059 0.9440.5 0.056 0.427 0.687 0.756 0.629 0.494 0.442 0.6090.75 0.069 0.917 0.946 0.941 0.936 0.943 0.992 0.408
1 0.069 0.951 0.958 0.956 0.955 0.977 0.999 0.407
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.165 0.256 0.280 0.247 0.177 0.129 0.090 0.9100.5 0.992 0.980 0.962 0.936 0.991 0.982 0.976 0.7490.75 0.999 0.990 0.980 0.962 0.999 1.000 1.000 0.984
1 0.995 0.990 0.977 0.958 0.993 1.000 1.000 0.990
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.069 0.115 0.131 0.128 0.086 0.073 0.057 0.9990.5 0.997 0.999 0.990 0.951 0.995 0.986 0.959 0.8960.75 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Table 12: Size and power for T = 1000, mean plus vola shift and s = 3/4
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.051 0.054 0.063 0.064 0.055 0.047 0.038 1.000
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.049 0.078 0.100 0.109 0.075 0.065 0.047 1.0000.5 0.055 0.469 0.694 0.746 0.613 0.481 0.400 0.9780.75 0.047 0.909 0.942 0.936 0.947 0.929 0.987 0.835
1 0.046 0.946 0.956 0.950 0.955 0.955 0.998 0.708
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.238 0.301 0.282 0.241 0.217 0.157 0.122 1.0000.5 0.996 0.986 0.968 0.944 0.992 0.983 0.972 1.0000.75 0.994 0.992 0.986 0.967 0.994 1.000 1.000 0.996
1 0.993 0.991 0.988 0.972 0.993 1.000 1.000 0.993
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.121 0.139 0.123 0.117 0.099 0.082 0.056 1.0000.5 1.000 0.998 0.994 0.942 0.997 0.989 0.969 1.0000.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000
17
Table 13: Size and power for T = 1000, no shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.050 0.043 0.045 0.049 0.055 0.049 0.044 0.083ARMA 0.049 0.103 0.118 0.117 0.116 0.118 0.118 0.091
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.045 0.046 0.045 0.059 0.050 0.050 0.035 0.0660.5 0.058 0.349 0.596 0.668 0.530 0.373 0.325 0.2090.75 0.065 0.880 0.927 0.923 0.918 0.923 0.990 0.387
1 0.062 0.936 0.951 0.948 0.947 0.961 1.000 0.385
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.159 0.165 0.152 0.117 0.125 0.106 0.077 0.1270.5 0.996 0.973 0.957 0.927 0.994 0.989 0.976 0.9970.75 1.000 0.992 0.983 0.967 1.000 1.000 1.000 0.995
1 0.998 0.995 0.982 0.960 0.999 1.000 1.000 0.994
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
0.25 0.080 0.057 0.048 0.038 0.067 0.067 0.047 0.1010.5 0.999 0.997 0.984 0.899 0.994 0.987 0.950 0.9980.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Table 14: Size and power for T = 1000, mean shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.044 0.046 0.045 0.040 0.042 0.043 0.035 1.000
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.042 0.048 0.061 0.064 0.054 0.054 0.053 0.9890.5 0.054 0.294 0.558 0.664 0.513 0.387 0.319 0.9370.75 0.048 0.859 0.940 0.939 0.948 0.930 0.978 0.723
1 0.047 0.927 0.950 0.950 0.968 0.965 0.999 0.337
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.150 0.172 0.152 0.123 0.118 0.081 0.077 0.9990.5 0.997 0.981 0.966 0.939 0.990 0.960 0.951 0.9840.75 0.999 0.994 0.986 0.967 0.999 1.000 1.000 0.971
1 0.998 0.993 0.985 0.974 0.999 1.000 1.000 0.978
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.078 0.080 0.067 0.057 0.070 0.044 0.038 1.0000.5 0.992 0.993 0.969 0.879 0.981 0.945 0.898 1.0000.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.990
18
Table 15: Size and power for T = 1000, vola shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.046 0.044 0.061 0.073 0.059 0.049 0.041 0.997
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.059 0.081 0.113 0.122 0.088 0.073 0.053 0.9440.5 0.048 0.449 0.731 0.785 0.673 0.536 0.445 0.6010.75 0.063 0.899 0.937 0.932 0.931 0.942 0.991 0.414
1 0.077 0.944 0.957 0.955 0.959 0.959 0.999 0.429
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.167 0.241 0.258 0.247 0.176 0.106 0.083 0.9050.5 0.998 0.986 0.965 0.937 0.993 0.984 0.975 0.7570.75 1.000 0.992 0.984 0.966 0.999 1.000 1.000 0.986
1 0.998 0.992 0.982 0.954 0.998 1.000 1.000 0.985
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.083 0.129 0.122 0.124 0.095 0.074 0.055 0.9990.5 0.998 0.998 0.993 0.949 0.992 0.969 0.956 0.9270.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Table 16: Size and power for T = 1000, mean and vola shift and s = 3/5
Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN
IID 0.048 0.057 0.075 0.072 0.052 0.045 0.042 1.000
t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.042 0.084 0.111 0.123 0.083 0.068 0.053 1.0000.5 0.047 0.443 0.685 0.742 0.623 0.491 0.401 0.9840.75 0.052 0.903 0.944 0.938 0.942 0.931 0.980 0.813
1 0.042 0.946 0.957 0.956 0.955 0.955 1.000 0.700
Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.226 0.291 0.289 0.239 0.193 0.143 0.115 1.0000.5 0.995 0.975 0.964 0.930 0.992 0.983 0.977 1.0000.75 0.999 0.993 0.984 0.971 1.000 1.000 1.000 0.996
1 0.995 0.989 0.978 0.966 0.995 1.000 1.000 0.994
χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.131 0.157 0.135 0.099 0.106 0.077 0.067 1.0000.5 0.994 0.994 0.988 0.942 0.988 0.978 0.956 1.0000.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
19
6 Distributional assumptions in a stochastic volatility model
As an empirical application, we test a distributional implication in Stochastic Volatility (SV)
models. The basic SV model with a normal distribution as an assumption reads
yt = exp(ht/2)ξt
for the returns yt; ξtiid∼ N(0, 1) and the rst-order autoregressive log-volatility process ht:
ht = α+ βht−1 + εt .
with εtiid∼ N(0, σ2). Hence, apart from a constant term, the log-squared returns can be expressed
as
xt = ht + log(ξ2t )
where xt = log(y2t ). A direct implication is that the logarithm of squared returns follows a
log−χ2(ν)-distribution. Its density is given as
1
2ν/2Γ(ν/2)exp
(1
2νx− 1
2exp(x)
)where ν = 1. Often, for simplicity the maximum likelihood estimation procedures for such
models relies either on a normality assumption for simplicity or on this particular log-χ2(1)
distribution. We use the newly proposed test statistics to test these assumptions on xt via the
locally standardized probability integral transformation. Corresponding values for ϑk and $k
are simulated accordingly.
Daily data is taken from the Realized Library for three markets from three dierent continents,
namely the DJIA, FTSE and Nikkei. The sample spans from February 2, 2000 to November
27, 2018 and contains T = 4602 observations. A very few observations with returns amount-
ing exactly to zero are removed from the data sets. The nominal signicance level is 5%. The
xed-bandwidth parameter is set equal to b = 0.1 as the Monte Carlo simulation results sug-
gest. A Bartlett kernel is employed for the xed-b covariance matrix estimator. Regarding the
nonparametric regression estimator for the mean and the variance, we employ the Nadaraya-
Table 17: Empirical results log-χ2(1) distribution, rejections at 5% level in bold face
DJIA FTSE NIKKEI
m1 0.525 0.527 0.526m2 0.352 0.352 0.352m3 0.257 0.256 0.256m4 0.197 0.195 0.195
t1 -1.959 -1.814 -1.197t2 1.201 0.737 1.301t3 2.790 2.196 2.679
t4 3.753 3.014 3.300
20
Log
squa
red
retu
rns
− D
JIA
0 1000 3000
−20
−15
−10
−5
0
−20 −10 −5 0
0.00
0.05
0.10
0.15
Den
sity
z
0 1000 3000
−6
−4
−2
02
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure 1: DJIA log squared returns. Top left: data plot with tted nonparametric mean function;top right: kernel density plot; bottom left: locally standardized data and bottom right: histogramon PIT.
Watson estimator with a Gaussian kernel. The bandwidth is selected via down-scaled (s = 3/4)
cross-validation. Results for the log-χ2(1) distribution are reported in Table 17. (Results for the
normal distribution are unreported, but clear cut and all test statistics reject the null.)
As we observe, the assumption that log squared returns follow a log-chi squared distribution
with one degree of freedom has to be rejected at the ve percent level by most of the tests.
Even though the deviations of the raw moments of PITs from their theoretical values under the
null might be relatively small, the corresponding statistics indicate their signicance in most of
the cases. The apparent deviation opens for improvement in the estimation of SV models where
eciency gains in nite samples are to be expected from exploiting more reasonable distributional
assumptions. In short, the normality assumptions on ξt or on xt directly are highly questionable.
The Figures 1-3 display the tested series together with the nonparametric tted mean (top left),
the kernel density estimator (top right), the locally standardized series zt (bottom left) and the
histogram of the PIT (bottom right). Clearly, there are important variations in the mean to be
captured by the Nadaraya-Watson estimator for all the series. Moreover, the histograms suggest
some degree of non-uniformity which is detected by the newly proposed statistics.
21
Log
squa
red
retu
rns
− F
TS
E
0 1000 3000
−20
−15
−10
−5
0
−20 −15 −10 −5 0
0.00
0.05
0.10
0.15
0.20
Den
sity
z
0 1000 3000
−6
−4
−2
02
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Figure 2: FTSE log squared returns. Top left: data plot with tted nonparametric mean function;top right: kernel density plot; bottom left: locally standardized data and bottom right: histogramon PIT.
22
Log
squa
red
retu
rns
− N
ikke
i
0 1000 3000
−20
−15
−10
−5
0
−20 −10 −5 0
0.00
0.05
0.10
0.15
0.20
Den
sity
z
0 1000 3000
−6
−4
−2
02
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure 3: NIKKEI log squared returns. Top left: data plot with tted nonparametric meanfunction; top right: kernel density plot; bottom left: locally standardized data and bottom right:histogram on PIT.
23
7 Concluding remarks
This work considers the long-standing issue of testing distributional assumptions. The newly
proposed tests are based on raw moment conditions of probability integral transformations.
By doing so, we are able to construct tests which are more sensitive towards deviations from
theoretical values under the null. The framework which we provide makes use of the so-called
xed-bandwidth approach for the estimation of long-run covariance matrices of dierent raw
moments. As a result, the empirical size is well controlled for even in small samples under
dierent types of autocorrelation. Time-varying unconditional mean and variance are found in
many economic series. In order to cope with this typical empirical feature, our framework also
allows for non-parametric time-varying variance estimation. As both, the mean and variance
function of the time series are estimated, we provide a necessary correction which amounts to a
modied long-run variance estimation. Our simulation study demonstrates that the suggested
tests perform very well in nite samples. In an empirical application to log squared returns from
three major stock indices, we study the merits and limitations of the robust raw moment-based
statistics.
24
Appendix
To simplify notation in the proofs, we let w.l.o.g. τµ = τσ = τ .
A Preliminary results
Lemma 2 Let g, h be two functions such that E (g (zt)) = E (h (zt)) = 0 and g (x) = O(x2)
=
h (x) as as x→ ±∞. Under the assumptions of Lemma 1, we have that
1. supt=1,...,T
∣∣∣ 12τ+1
∑t+τj=t−τ wjg (zj)
∣∣∣ = Op(T−δ
)for some 0 < δ < 1/8 and any bounded
sequence wj;
2. supt=1,...,T |µt − µt| = Op(T−δ
)and supt=1,...,T |σt − σt| = Op
(T−δ
)for some 0 < δ < 1/8;
3. 1√T
∑[sT ]t=1 g (zt)
(1
2τ+1
∑t+τj=t−τ h (zj)
)= op (1);
4. 1√T
∑[sT ]t=1 g (zt)
(σtσt− 1)
= op (1);
5. 1√T
∑[sT ]t=1 (zt − zt)2 = op (1);
6. 1√T
∑[sT ]t=1
(ϕ (zt) (zt − zt) + ϕ′ (ξt) (zt − zt)2
)2= op (1), where ξt lies between zt and zt for
any 1 ≤ t ≤ T ;
7. 1√T
∑[sT ]t=1 p
k−1t ϕ (zt)
(1
2τ+1
∑t+τj=t−τ
σjσtzj
)= E
(pk−1t ϕ (zt)
)∑[sT ]t=1 zt + op (1);
8. 1√T
∑[sT ]t=1 p
k−1t ϕ (zt) zt
(σtσt− 1)
= −12 E(pk−1t ztϕ (zt)
)∑[sT ]t=1
(z2t − 1
)+ op (1),
where the op (1) terms are uniform in s ∈ [0, 1].
B Proofs
Proof of item 1
We rst show that 1√τ
∑t+τj=t−τ wjg (zj) is uniformly L4-bounded in t = 1, . . . , T . We have
∥∥∥∥∥∥ 1√τ
t+τ∑j=t−τ
wjg (zj)
∥∥∥∥∥∥4
4
= E
1√τ
t+τ∑j=t−τ
wjg (zj)
4=
1
τ2
t+τ∑j1=t−τ
t+τ∑j2=t−τ
t+τ∑j3=t−τ
t+τ∑j4=t−τ
wj1wj2wj3wj4 E (g (zj1) g (zj2) g (zj3) g (zj4)) .
Now, an upper bound is given by∥∥∥∥∥∥ 1√τ
t+τ∑j=t−τ
wjg (zj)
∥∥∥∥∥∥4
4
≤ C
τ2
t+τ∑j1=t−τ
t+τ∑j2=t−τ
t+τ∑j3=t−τ
t+τ∑j4=t−τ
E(z2j1z
2j2z
2j3z
2j4
)25
where the absolute summability of the 8th order cumulants of zt leads with standard arguments
to the niteness of this upper bound.
Then, the maximum over T elements of a positive, uniformly L4-bounded sequence is known to
be Op(T 1/4
), so
supt=1,...,T
∣∣∣∣∣∣ 1
2τ + 1
t+τ∑j=t−τ
wjg (zj)
∣∣∣∣∣∣ = Op
(4√T√τ
),
from which the desired result follows given the rate restrictions on τ .
Proof of item 2
Let us examine the properties of µt rst. We have that
µt − µt =1
2τ + 1
t+τ∑j=t−τ
xj =1
2τ + 1
t+τ∑j=t−τ
(µj − µt) +1
2τ + 1
t+τ∑j=t−τ
σjzj .
Thanks to the Lipschitz property of µ (·), the rst summand on the r.h.s. is Op(τT
)uniformly
in t = 1, . . . , T . Item 1 can be used to derive the unform behavior of the second summand,
such that supt=1,...,T |µt − µt| = Op(T−δ
)for some 0 < δ < 1/8 as required. The local variance
estimator is given by
σ2t =1
2τ + 1
t+τ∑j=t−τ
(xj − µj)2 =1
2τ + 1
t+τ∑j=t−τ
(σjzj − (µj − µj))2
so
σ2t − σ2t =1
2τ + 1
t+τ∑j=t−τ
(σ2j z
2j − σ2t
)− 2
2τ + 1
t+τ∑j=t−τ
σjzj (µj − µj) +1
2τ + 1
t+τ∑j=t−τ
(µj − µj)2 .
Now,
supt=1,...,T
∣∣∣∣∣∣ 1
2τ + 1
t+τ∑j=t−τ
σjzj (µj − µj)
∣∣∣∣∣∣ ≤ supt=1,...,T
|µj − µj | supt=1,...,T
1
2τ + 1
t+τ∑j=t−τ
|σjzj |
where
0 ≤ supt=1,...,T
1
2τ + 1
t+τ∑j=t−τ
|σjzj | ≤ E (|zt|) supt=1,...,T
σt+ supt=1,...,T
1
2τ + 1
t+τ∑j=t−τ
σj (|zj | − E (|zj |)) = Op (1)
with the same arguments used in the proof of item 1. Furthermore, for all 1 ≤ t ≤ T ,
1
2τ + 1
t+τ∑j=t−τ
(µj − µj)2 ≤
(sup
t=1,...,T|µj − µj |
)2
= op (1)
26
so, after using item 1 again to conclude that supt=1,...,T1
2τ+1
∑t+τj=t−τ σ
2j
(z2j − 1
)= Op
(T−δ
)for
some 0 < δ < 1/8, we have that
1
2τ + 1
t+τ∑j=t−τ
(σ2j z
2j − σ2t
)=
1
2τ + 1
t+τ∑j=t−τ
σ2j(z2j − 1
)+
1
2τ + 1
t+τ∑j=t−τ
(σ2j − σ2t
)= Op
(T−δ
)+O
( τT
)uniformly in t = 1, . . . , T and thus supt=1,...,T
∣∣σ2t − σt∣∣ = Op(T−δ
)as well.
Note that uniform consistency of σt implies, thanks to the properties of σt, supt=1,...,T σt = Op (1)
and supt=1,...,T σ−1t = Op (1).
Proof of item 3
Split the sample in B disjoint blocks of length M and assume that T = MB and [sT ] = M [sB]
for the sake of the exposition. Then
1√T
[sT ]∑t=1
g (zt)
1
2τ + 1
t+τ∑j=t−τ
h (zj)
=
=1√T
[sB]∑b=1
M∑m=1
g(zM(b−1)+m
) 1
2τ + 1
M(b−1)+m+τ∑j=M(b−1)+m−τ
h (zj)−M(b−1)+τ∑j=M(b−1)−τ
h (zj)
+
1√T
[sB]∑b=1
M∑m=1
g(zM(b−1)+m
) 1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
.
The rst summand on the r.h.s. is easily shown to be Op
(√TMτ supt∈1,...,T |h (zj)|
). For the
second, note that∣∣∣∣∣∣ 1√T
[sB]∑b=1
M∑m=1
g(zM(b−1)+m
) 1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
∣∣∣∣∣∣≤ 1√
T
B∑b=1
∣∣∣∣∣∣ 1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
( M∑m=1
g(zM(b−1)+m
))∣∣∣∣∣∣ .The expectation of the r.h.s. is given by
B∑b=1
E
∣∣∣∣∣∣ 1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
( M∑m=1
g(zM(b−1)+m
))∣∣∣∣∣∣
≤
√√√√√E
1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
2E
( M∑m=1
g(zM(b−1)+m
))2
27
where
E
1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
2 = O
(1
τ
)and
E
( M∑m=1
g(zM(b−1)+m
))2 = O (M) .
Hence
1√T
[sB]∑b=1
M∑m=1
g(zM(b−1)+m
) 1
2τ + 1
M(b−1)+τ∑j=M(b−1)−τ
h (zj)
= Op
(B√M√τT
)= Op
(√B
τ
)
and
1√T
[sT ]∑t=1
g (zt)
1
2τ + 1
t+τ∑j=t−τ
h (zj)
= Op
(max
√M
τT
1/2+γ ;
√B
τ
)
so choosing B appropriately leads to the desired result.
Proof of item 4
Use a Taylor series expansion for x−1/2 about x0 = 1 with rest term in dierential form to obtain
1√T
[sT ]∑t=1
g (zt)
(σtσt− 1
)= −1
2
1√T
[sT ]∑t=1
g (zt)
(σ2tσ2t− 1
)+
3
8
1√T
[sT ]∑t=1
g (zt) ξ−5/2t
(σ2tσ2t− 1
)2
= A1T +A2T
with ξt betweenσ2t
σ2tand unity for all t = 1, . . . , T . Now, for A1T , write
1√T
[sT ]∑t=1
g (zt)
(σ2tσ2t− 1
)=
1√T
[sT ]∑t=1
g (zt)1
2τ + 1
t+τ∑j=t−τ
(1
σ2t(σjzj + (µj − µj))2 − 1
)
=1√T
[sT ]∑t=1
g (zt)
2τ + 1
t+τ∑j=t−τ
σ2j z2j − σ2tσ2t
+1√T
[sT ]∑t=1
g (zt)
2τ + 1
t+τ∑j=t−τ
(µj − µj)2
σ2t
+2√T
[sT ]∑t=1
g (zt)1
σ2t
1
2τ + 1
t+τ∑j=t−τ
σjzj (µj − µj) .
= B1T +B2T +B3T .
For B1T , we have
1√T
[sT ]∑t=1
g (zt)1
2τ + 1
t+τ∑j=t−τ
σ2j z2j − σ2tσ2t
=1√T
[sT ]∑t=1
g (zt)1
2τ + 1
t+τ∑j=t−τ
(z2j − 1
)
+1√T
[sT ]∑t=1
g (zt)1
2τ + 1
t+τ∑j=t−τ
(σ2j − σ2t
)σ2t
z2j ,
28
where the rst summand on the r.h.s. vanishes thanks to item 3, while for the second we employ
a Taylor series approximation of σ2 (·) about t/T to obtain
1√T
[sT ]∑t=1
g (zt)
2τ + 1
t+τ∑j=t−τ
(σ2j − σ2t
)z2j
σ2t=
1√T
[sT ]∑t=1
g (zt)
2τ + 1
t+τ∑j=t−τ
∂σ2
∂s
∣∣∣s= t
T
j−tT
(z2j − 1
)σ2t
+1√T
[sT ]∑t=1
g (zt)
2τ + 1
t+τ∑j=t−τ
∂σ2
∂s
∣∣∣s= t
T
j−tT
σ2t
+1√T
[sT ]∑t=1
g (zt)
2τ + 1
t+τ∑j=t−τ
∂2σ2
∂s2
∣∣∣s=ξt,j
(j−t)2T 2 z2j
σ2t
= C1T + C2T + C3T
for suitable ξt,j between t/T and j/T − t/T . Here, C1T vanishes along the lines of item 3 by noting
that deterministic weights don't aect the result, C2T = 0 and
|C3T | ≤sCτ2
T√T
supt=1,...,T
|g (zt)| supt=1,...,T
z2t ;
this is seen to vanish too uniformly in s ∈ [0, 1] since, given the niteness of moments of any
order of zt and thus of z2t and g (zt), we have supt |g (zt)| = Op (T γ) = supt=1,...,T z2t for any
γ > 0, and γ can then be chosen arbitrarily close to 0 to make the r.h.s. op (1).
For B2T , we have∣∣∣∣∣∣ 1√T
[sT ]∑t=1
g (zt)1
σ2t
1
2τ + 1
t+τ∑j=t−τ
(µj − µj)2∣∣∣∣∣∣ ≤ C sup
t|g (zt)|
1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
(µj − µj)2
≤ C supt|g (zt)|
1√T
T∑t=1
(µt − µt)2 + op (1)
with |g (zt)| = Op (T γ) for any γ > 0. We show in the following that
1√T
T∑t=1
(µt − µt)2 =1√T
T∑t=1
1
2τ + 1
t+τ∑j=t−τ
((µt − µj)− σjzj)
2
= Op
(T−δ
)(10)
for some 0 < δ < min κ1 − 1/2; 3/4− κ2, and simply pick γ < δ for our purposes. With
the help of the Cauchy-Schwarz inequality, the term is easily seen to vanish when the terms1√T
∑Tt=1
(1
2τ+1
∑t+τj=t−τ (µt − µj)
)2and 1√
T
∑Tt=1
(1
2τ+1
∑t+τj=t−τ σjzj
)2both vanish themselves.
This is indeed the case under our rate restrictions considering that
1√T
T∑t=1
1
2τ + 1
t+τ∑j=t−τ
(µt − µj)
2
= O
(√Tτ2
T 2
)
29
and
E
1√T
T∑t=1
1
2τ + 1
t+τ∑j=t−τ
σjzj
2 ≤ √T E
1
2τ + 1
t+τ∑j=t−τ
σjzj
2 = C
√T
τ
thanks to the uniform L4-boundedness of normalized running averages of zt, see the proof of
item 1; thus, B2T vanishes at the required rate.
Moving on, we have
B3T =2√T
[sT ]∑t=1
g (zt)
σ2t
1
2τ + 1
t+τ∑j=t−τ
σjzj
µj − 1
2τ + 1
j+τ∑k=j−τ
σkzk −1
2τ + 1
j+τ∑k=j−τ
µk
= − 2√
T
[sT ]∑t=1
g (zt)
σ2t
1
(2τ + 1)2
t+τ∑j=t−τ
σjzj
j+τ∑k=j−τ
(µk − µj)
− 2√T
[sT ]∑t=1
g (zt)
σ2t
1
(2τ + 1)2
t+τ∑j=t−τ
σjzj
j+τ∑k=j−τ
σkzk
,
where the rst summand on the r.h.s. is immediately shown to vanish thanks to item 3 after
noting that deterministic weights of uniform order O (τ/T) do not aect the arguments there.
A tedious, yet straightforward application of the blocking arguments from the proof of item 3
shows the second summand to vanish in probability as well.
Summing up, sups∈[0,1] |A1T |p→ 0; to complete the result, note that
0 ≤ sups∈[0,1]
|A2T | ≤ C supt=1,...,T
∣∣∣ξ−5/2t
∣∣∣ supt=1,...,T
|g (zt)| sups∈[0,1]
1√T
[sT ]∑t=1
(σ2tσ2t− 1
)2
,
where the r.h.s., and thus A2T , vanishes since supt=1,...,T
∣∣∣ξ−5/2t
∣∣∣ = Op (1), supt=1,...,T |g (zt)| =
Op (T γ) for any positive γ and sups∈[0,1]1√T
∑[sT ]t=1
(σ2t
σ2t− 1)2
= 1√T
∑Tt=1
(σ2t
σ2t− 1)2
= Op(T−δ
),
analogously to Equation (10), so the result follows after choosing γ < δ.
Proof of item 5
We have that
1√T
[sT ]∑t=1
(xt − µtσt
− zt)2
=1√T
[sT ]∑t=1
(zt
(σtσt− 1
)+µt − µtσt
)2
=1√T
[sT ]∑t=1
z2t
(σtσt− 1
)2
+1√T
[sT ]∑t=1
1
σ2t(µt − µt)2
+2√T
[sT ]∑t=1
zt
(σtσt− 1
)1
σt(µt − µt) .
30
Noting that supt=1,...,T σt is bounded in probability, an application of the Cauchy-Schwarz in-
equality for the third summand on the r.h.s. shows that the result follows when the two terms
sups∈[0,1]1√T
∑[sT ]t=1 z
2t
(σtσt− 1)2
and sups∈[0,1]1√T
∑[sT ]t=1
1σ2t
(µt − µt)2 vanish in probability. To
show this, we have like in the proof of item 4 that∣∣∣∣∣∣ sups∈[0,1]
1√T
[sT ]∑t=1
z2t
(σtσt− 1
)2∣∣∣∣∣∣ ≤ sup
t=1,...,Tz2t sup
s∈[0,1]
1√T
[sT ]∑t=1
(σtσt− 1
)2
= supt=1,...,T
z2t1√T
T∑t=1
(σtσt− 1
)2
= op (1)
since supt=1,...,T z2t = Op (T γ) and 1√
T
∑Tt=1
(σtσt− 1)2
= Op(T−δ
), where γ < δ may be picked,
and similarly
0 ≤ sups∈[0,1]
1√T
[sT ]∑t=1
1
σ2t(µt − µt)2 ≤ sup
t=1,...,T
1
σ2t
1√T
T∑t=1
(µt − µt)2 = op (1) ,
since 1√T
∑Tt=1 (µt − µt)2 = op (1), again like in the proof of item 4.
Proof of item 6
Note that rt =(xt−µtσt− zt
)(ϕ (zt) + ϕ′ (ξt)
(xt−µtσt− zt
))where ϕ and ϕ′ are bounded. The
result follows with item 5 if supt
∣∣∣xt−µtσt− zt
∣∣∣ = Op (1). This is indeed the case, since
xt − µtσt
− zt =
(σtσt− 1
)zt +
µt − µtσt
where µt and σt converge uniformly at some rate Op(T δ), see item 2, and supt |zt| = Op (T γ)
for any γ > 0 such that choosing γ < δ leads to the desired result.
Proof of item 7
Begin by writing
1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
σjσtzj =
1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
(σjσt− 1
)zj
+1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
zj .
= A1T +A2T .
31
We now show the rst summand to vanish and resort to this end to the Taylor series approxi-
mation of x−1/2 employed in the proof of item 4 to obtain analogously
A1T = −1
2
1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
(σ2tσ2j− 1
)zj
+3
8
1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
ξ−5/2t,j
(σ2tσ2j− 1
)2
zj
where ξt,j lies betweenσjσt
and unity for all t = 1, . . . , T , being hence uniformly bounded. The
rst summand of A1T can be shown to be negligible by writing
1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
(σ2tσ2j− 1
)zj
=1√T
[sT ]∑t=1
(pk−1t ϕ (zt)− E
(pk−1t ϕ (zt)
)) 1
2τ + 1
t+τ∑j=t−τ
(σ2tσ2j− 1
)zj
+ E(pk−1t ϕ (zt)
) 1√T
[sT ]∑t=1
(σ2tσ2j− 1
)1
2τ + 1
t+τ∑j=t−τ
zj ;
and noting that arguments analog to those in the proof of item 3 apply.
For the second summand of A1T , with ϕ (·) being bounded on R, we have
0 ≤ sups∈[0,1]
∣∣∣∣∣∣ 1√T
[sT ]∑t=1
pk−1t ϕ (zt)1
2τ + 1
t+τ∑j=t−τ
ξ−5/2t,j
(σ2tσ2j− 1
)2
zj
∣∣∣∣∣∣≤ max
x∈Rϕ (x) sup
t|zt| sup
t,j
(ξ−5/2t,j
) 1√T
T∑t=1
(σ2tσ2j− 1
)2
,
with 1√T
∑Tt=1
(σ2t
σ2j− 1
)2
vanishing like in the proof of item 4 and supt |zt| = Op (T γ) for positive
γ arbitrarily close to zero.
To complete the result, write
A2T =1√T
[sT ]∑t=1
(pk−1t ϕ (zt)− E
(pk−1t ϕ (zt)
)) 1
2τ + 1
t+τ∑j=t−τ
zj
+ E(pk−1t ϕ (zt)
) 1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
zj ,
where the rst summand on the r.h.s. vanishes thanks to item 3, while the second delivers the
desired approximation upon re-arranging its sum elements.
32
Proof of item 8
Write
1√T
[sT ]∑t=1
pk−1t ϕ (zt) zt
(σtσt− 1
)=
1√T
[sT ]∑t=1
(pk−1t ϕ (zt) zt − E
(pk−1t ϕ (zt) zt
))(σtσt− 1
)
+ E(pk−1t ϕ (zt) zt
) 1√T
[sT ]∑t=1
(σtσt− 1
).
The rst summand on the r.h.s. vanishes, see item 4, and, with the same Taylor series expansion
of x−1/2 employed there, we have for the second summand that
1√T
[sT ]∑t=1
(σtσt− 1
)= −1
2
1√T
[sT ]∑t=1
(σ2tσ2t− 1
)+
3
8
1√T
[sT ]∑t=1
ξ−5/2t
(σ2tσ2t− 1
)2
= A1T +A2T .
with ξt lying betweenσ2t
σ2tand unity for all t = 1, . . . , T . The arguments in the proof of item 4
apply directly, with the exception of the analogues of B1T and B3T . For the analog of B1T from
the proof of item 4 we write
1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
σ2j z2j − σ2tσ2t
=1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
(z2j − 1
)+
1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
(σ2j − σ2t
)z2j
σ2t
where the sumands of the rst term on the r.h.s. are re-arranged to give the desired approxima-
tion, and the second term is given, similarly to the proof of item 4, by
1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
(σ2j − σ2t
)z2j
σ2t=
1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
∂σ2
∂s
∣∣∣s= t
T
j−tT
(z2j − 1
)σ2t
+1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
∂σ2
∂s
∣∣∣s= t
T
j−tT
σ2t
+1√T
[sT ]∑t=1
1
2τ + 1
t+τ∑j=t−τ
∂2σ2
∂s2
∣∣∣s=ξt,j
(j−t)2T 2 z2j
σ2t
= C1T + C2T + C3T
for suitable ξt,j between t/T and j/T − t/T . To analyze C1T , re-arrange sum terms to obtain
C1T =C√T
1
(2τ + 1)T
[sT ]∑t=1
(z2t − 1
)τ(τ + 1) +Op
(τ2) = op (1)
uniformly in s ∈ [0, 1],
C2T = 0,
33
and, for all s ∈ [0, 1],
0 ≤ C3T ≤Cτ2
T 2√T
[sT ]∑t=1
z2t ≤Cτ2
T 2√T
T∑t=1
z2t = Op
(τ2
T√T
)= op (1) .
For the analog of B3T from the proof of item 4, we re-arrange sum terms to obtain
2√T
[sT ]∑t=1
1
σ2t
1
2τ + 1
t+τ∑j=t−τ
σjzj (µj − µj) =2√T
[sT ]∑t=1
σtzt (µt − µt)1
2τ + 1
t+τ∑j=t−τ
1
σ2j+ op (1) .
To complete the result, write
1√T
[sT ]∑t=1
σtzt (µt − µt) =1√T
[sT ]∑t=1
σtzt
1
2τ + 1
t+τ∑j=t−τ
(µt − µj)
− 1√T
[sT ]∑t=1
σtzt
1
2τ + 1
t+τ∑j=t−τ
σjzj
where both summands on the r.h.s. can be shown to vanish uniformly in s using e.g. item 3.
Proof of Lemma 1
We let w.l.o.g. τµ = τσ = τ . Write with a Taylor expansion
pt = pt + ϕ (zt)
(xt − µtσt
− zt)
+ ϕ′ (ξt)
(xt − µtσt
− zt)2
= pt + rt,
where ξt lies betweenxt−µσt
= zt andxt−µtσt
= zt; note that ϕ′ (·) is bounded on R. Then,
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)=
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)+
k√T
[sT ]∑t=1
pk−1t rt +k (k − 1)
2√T
[sT ]∑t=1
pk−1t r2t
where pt lies between pt and pt. Since pt ∈ [0, 1] ∀t, like pt and pt, we have that
0 ≤ 1√T
[sT ]∑t=1
pk−1t r2t ≤1√T
[sT ]∑t=1
r2tp→ 0
uniformly in s, thanks to Lemma 2 item 6.
We may then focus on
k√T
[sT ]∑t=1
pk−1t rt =k√T
[sT ]∑t=1
pk−1t ϕ (zt)
(xt − µtσt
− zt)
+k√T
[sT ]∑t=1
pk−1t ϕ′ (ξt)
(xt − µtσt
− zt)2
where the second summand vanishes uniformly in s since∣∣∣∣∣∣ k√T[sT ]∑t=1
pk−1t ϕ′ (ξt)
(xt − µtσt
− zt)2∣∣∣∣∣∣ ≤ C√
T
[sT ]∑t=1
(zt − zt)2
34
due to the boundedness of ϕ′ and pt, and Lemma 2 item 5 applies. Now,
zt − zt =σtzt + µt − µt
σt− zt = zt
(σtσt− 1
)− 1
2τ + 1
t+τ∑j=t−τ
σjσtzj +
1
σt
1
2τ + 1
t+τ∑j=t−τ
(µt − µj) ,
such that the leading term of k√T
∑[sT ]t=1 p
k−1t rt is given by
k√T
[sT ]∑t=1
pk−1t ϕ (zt) (zt − zt) =k√T
[sT ]∑t=1
pk−1t ϕ (zt) zt
(σtσt− 1
)
− k√T
[sT ]∑t=1
pk−1t ϕ (zt)
1
2τ + 1
t+τ∑j=t−τ
σjσtzj
+
k√T
[sT ]∑t=1
pk−1t ϕ (zt)
σt
1
2τ + 1
t+τ∑j=t−τ
(µt − µj)
where∣∣∣∣∣∣ 1√T
[sT ]∑t=1
1
σtpk−1t ϕ (zt)
1
2τ + 1
t+τ∑j=t−τ
(µt − µj)
∣∣∣∣∣∣ ≤ 1√T
[sT ]∑t=1
1
σtpk−1t ϕ (zt)
1
2τ + 1
t+τ∑j=t−τ
|µt − µj |
= Op
(τ√T
)with pt and φ (zt) being bounded and positive, and supt σt bounded in probability and non-
negative.
Using Lemma 2 again, items 7 and 8, we obtain uniformly in s ∈ [0, 1] that
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)=
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)− kE
(pk−1t ϕ (zt)
) 1√T
[sT ]∑t=1
zt
−k2
E(pk−1t ztϕ (zt)
) 1√T
[sT ]∑t=1
(z2t − 1
)+ op (1)
as required for the result, which follows with a multivariate invariance principle for strongly
mixing sequences (see e.g. Davidson, 1994, Chapter 29).
Proof of Proposition 1
To simplify notation we provide the arguments for tk only; the extension for K > 1 is trivial.
The arguments in the proof of Theorem 2 in Kiefer and Vogelsang (2005) indicate that
tk =
1√T
∑Tt=1
(pkt − 1
k+1
)√− 1T 2
∑T−1i=1
∑T−1j=1
T 2
B2k′′(i−jB
)1√T
∑it=1
(pkt − pk
)1√T
∑jt=1
(pkt − pk
) + op (1)
35
for kernels with smooth derivatives, or
tk =
1√T
∑Tt=1
(pkt − 1
k+1
)√
2bT
∑Ti=1
(1√T
∑it=1
(pkt − pk
))2− 2
bT
∑[(1−b)T ]i=1
(1√T
∑it=1
(pkt − pk
))(1√T
∑i+[bT ]t=1
(pkt − pk
))+op (1)
for the Bartlett kernel. From Lemma 1, we know that
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)⇒ Bk (s)− kϑk−1W1 (s)− k
2$k−1W2 (s) ,
where the process on the r.h.s. is Brownian motion, say B(s), with variance
ω2k = (e′k,−kϑk−1,−
k
2$k−1)Ξ(ek,−kϑk−1,−
k
2$k−1)
′
where ek is the kth column of the K × K identity matrix. Nonsingularity of Ξ ensures that
ω2k > 0, which then cancels out, so the the continuous mapping theorem [CMT] then establishes
the desired limiting null distribution.
36
C More on parametric mean adjustment
Since this section only serves the purpose of illustrating the inuence the specic choice of model
has on the feasible PITs pt, we treat σt as known and set it to unity; similar eects are expected
if σt is to be modeled as well. Concretely, consider a parametric model for the mean of the
observed time series xt such that
xt = µ (t/T ,θ) + σtzt .
Note that normalizing the time is not restrictive, since one may e.g. redene a classical linear
trend model µt = θ1 + θ2t as µt = θ1 + (Tθ2) t/T without loss of generality. We take the mean
component to satisfy the following requirements.
Assumption 3 Let µ (s,θ) have uniformly continuous 2nd order partial derivatives. The rst
and second order partial derivatives w.r.t. θ are weakly bounded uniformly in s, in the sense that
there exists a nondecreasing function f such that max∥∥∥∂µ(s,θ)∂θ
∥∥∥ ;∥∥∥∂2µ(s,θ)∂θ∂θ′
∥∥∥ ≤ f (‖θ‖) for all
s ∈ [0, 1].
This assumption allows for polynomial trend models, µ (s,θ) =∑p+1
j=1 sj−1θj , for breaks in the
mean, µ (s,θ) = θ1 + θ2I (s ≥ τ), for smooth mean changes, e.g. µ (s,θ) = 11+exp(θ3(s−θ4))θ1 +
exp(θ3(s−θ4))1+exp(θ3(s−θ4))θ2, or for µ (s,θ) = θ1 +
∑pj=1 (θ2j sin 2πjs+ θ2j+1 cos 2πjs) motivated by approx-
imations via Fourier sums.
Based on this model, one obtains
pt = Φ (zt) = Φ(xt − µ
(t/T , θ
))by plugging in an estimator θ which is taken to be
√T -consistent. The straightforward choice is
the NLS estimator, which we employ in the following; some of the requirements of Assumption
3, e.g. referring to the Hessian of m, help establish the limiting behavior of the NLS estimator.
Irrespective of what estimator is used, we note that
pt = Φ(zt −
(µ(t/T , θ
)− µ (t/T ,θ)
))(11)
such that the estimation has an eect. The following Lemma provides the precise result when
xt is parametrically adjusted for nonzero mean.
Lemma 3 Under Assumptions 1 through 3, it holds as T →∞ that
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)⇒ Bk (s)− kϑk−1δ′ (s,θ) Θ (1) (12)
where Θ (1) =(´ 1
0∂µ(s,θ)∂θ
∂µ(s,θ)∂θ
′ds)−1 ´ 1
0∂µ(s,θ)∂θ dW1 (s), δ (s,θ) =
´ s0∂µ(r,θ)∂θ dr and ϑk = E
(pktϕ (zt)
)as before.
37
Proof of Lemma 3
Begin by discussing the limiting behavior of the NLS estimators θ. We have under Assumptions
1 and 3 that
√T(θ − θ
)⇒(ˆ 1
0
∂µ (s,θ)
∂θ
∂µ (s,θ)
∂θ
′ds
)−1 ˆ 1
0
∂µ (s,θ)
∂θdW1 (s) ;
this is a standard application of extremum estimator theory and we omit the details.
With the application of the mean value theorem when k = 1 (or Taylor series expansion with
rest term in dierential form) we obtain
pt = pt + ϕ (zt)(µ (t/T ,θ)− µ
(t/T , θ
))+ ϕ′ (ξt)
(µ (t/T ,θ)− µ
(t/T , θ
))2where ξt lies between zt and zt− µ
(t/T , θ
)+ µ (t/T ,θ) for each t. The exact values for ξt do not
matter since ϕ′ is bounded. A second expansion, here about θ, is required for the trend function
µ:
µ (t/T ,θ)− µ(t/T , θ
)= −∂µ (t/T ,θ)
∂θ
′ (θ − θ
)−(θ − θ
)′ ∂2µ (t/T ,θ)
∂θ∂θ′
∣∣∣∣θ=ϑt
(θ − θ
)again with ϑt between θ and θ (note that since t is an argument of µ, ϑ also depends on t hence
the notation). Putting the two together we obtain
1√T
[sT ]∑t=1
(pt −
1
2
)=
1√T
[sT ]∑t=1
(pt −
1
2
)−
1√T
[sT ]∑t=1
ϕ (zt)∂µ (t/T ,θ)
∂θ
′ (θ − θ)
−(θ − θ
)′ 1√T
[sT ]∑t=1
ϕ (zt)∂2µ (t/T ,θ)
∂θ∂θ′
∣∣∣∣θ=ϑt
(θ − θ)+Rs,T
where Rs,T is just the normalized partial sums of ϕ′ (ξt)(µ (t/T ,θ)− µ
(t/T , θ
))2.
Examining the third summand on the r.h.s., we note that the boundedness of ϕ′ and the fact
that
∣∣∣∣ ∂m(t/T ,θ)∂θ
∣∣∣θ=ϑt
∣∣∣∣ ≤ f (‖ϑt‖) ≤ f(
max‖θ‖ ;
∥∥∥θ∥∥∥) make the partial sums of order Op (T ),
but θ − θ = Op(T−0.5
)and the normalization with
√T make the entire summand vanish.
For the fourth summand, Rs,T , we have with a rst-order Taylor expansion, µ (t/T ,θ)−µ(t/T , θ
)=
∂µ(t/T ,θ)∂θ
∣∣∣′θ=ϑt
(θ − θ
)with ϑt between θ and θ for each t, that
Rs,T =(θ − θ
)′ 1√T
[sT ]∑t=1
ϕ′ (ξt)∂µ (t/T ,θ)
∂θ
∣∣∣∣θ=ϑt
∂µ (t/T ,θ)
∂θ
∣∣∣∣′θ=ϑt
(θ − θ) .Similarly, ϕ′ is bounded and
∣∣∣∣ ∂µ(t/T ,θ)∂θ
∣∣∣θ=ϑt
∣∣∣∣ ≤ f (‖ϑt‖) ≤ f(
max‖θ‖ ;
∥∥∥θ∥∥∥) for all t, it
follows that supsRs,T = Op(T−1/2
).
38
Summing up, we are left with the rst two summands,
1√T
[sT ]∑t=1
(pt −
1
2
)=
1√T
[sT ]∑t=1
(pt −
1
2
)−
1√T
[sT ]∑t=1
ϕ (zt)∂µ (t/T ,θ)
∂θ
∣∣∣∣θ
′ (θ − θ)+ op (1) ;
the same arguments show that analogous relations hold for pkt . With√T(θ − θ
)⇒ Θ (1)
and 1T
∑[sT ]t=1 p
k−1t ϕ (zt)
∂µ(t/T ,θ)∂θ ⇒ E
(pk−1t ϕ (zt)
) ´ s0∂µ(r,θ)∂θ dr = ϑk−1δ (s,θ), the desired result
follows.
Remark 6 Bai and Ng (2005) show in their Theorem 5 that regressing xt on a set of regressors
has no eect on the limiting distributions beyond that of the intercept. There is no contradiction
between their result and our Lemma 3, since the result in (12) applies in the case where the
regressors are deterministic. For a comparison with Theorem 5 in Bai and Ng (2005), take
one stochastic regressor and a linear model xt = θwt such that ∂µ(t/T ,θ)∂θ = wt. We obtain for
stationary regressors that 1T
∑[sT ]t=1 ϕ (zt)wt ⇒ s E (ϕ (zt)wt). Now, Bai and Ng (2005) assume
that an intercept is always present in the regression, which is equivalent to setting E (wt) = 0; they
also assume the regressors to be independent of zt, hence E (ϕ (zt)wt) = 0 and correspondingly
µ (s) = 0. This is not the case when wt is deterministic, say an intercept or a trend, and the
limiting distribution of θ needs to be taken into account.
Clearly, the estimation eect described by Equation (12) will aect the limiting xed-b distri-
bution of a statistic based on a parametric estimated standardization. The eect is dierent
from that deriven in Lemma 1, since the presence of Θ (1) (as opposed to W1 (s)) indicates a
bridge-type behavior of the limit process of the relevant partial sums. Moreover, the components
Θ and δ depend on the specic model µ chosen. The statistics can be made pivotal, see below,
but the limiting distributions are not the usual xed-b ones, except in the case of an intercept.
The bottom line is that dierent deterministic components will lead to dierent distributions
(with the exception of the small-b case, where χ2 asymptotics may be recovered for all consistent
choices of HAC covariance matrix estimator). This implies the need to simulate the distribu-
tions for each specic type of deterministic component accounted for in the data. While this
can be done in advance for some popular combinations (see below for the case of intercept and
trend, where the generalized Brownian bridge plays a role; cf. MacNeill, 1978), one solution for
a generic mean function m is to resort to some form of bootstrap. Since zt is strictly stationary
and mixing, the residual-based iid or wild bootstrap is likely valid, but we do not pursue the
topic here.
We now illustrate concretely the dierence between nonparametric and parametric mean adjust-
ment for the case of a linear trend. Considering constant variance for simplicity, we have the
following procedure simplied by the linearity of the mean function. Detrend xt using OLS regres-
sion and standardize the detrended series with σt to obtain zt. With(pt, . . . , p
Kt , zt
)′, compute
like in the mean case an xed-b estimate of the long-run covariance matrix of(pt, . . . , p
Kt , zt
)′,
say Γ, and, based on it, the scaling matrix ˆΩ = V ΓV with V like before, and then T from (9).
Then,
39
Proposition 2 Under Assumptions 1 and 2, it holds as T →∞ that
TK ⇒ W′K(1)Q−1K,b,κWK(1).
with
QK,b,κ = −ˆ 1
0
ˆ 1
0
1
b2κ′′(r − sb
)V (r)V ′(s) drds
for smooth kernels and
QK,b,κ =2
b
ˆ 1
0V (r)V ′(r)dr − 1
b
ˆ 1−b
0V (r + b)V ′(r) dr − 1
b
ˆ 1−b
0V (r)V ′(r + b) dr
for the Bartlett kernel, where V (s) is, for demeaning, the rst-order Brownian bridge
V (s) = WK(s)− sWK(1)
with W a vector of independent standard Wiener processes; for detrending, V (s) is the second-
level Brownian bridge
V (s) = WK(s) + (2s− 3s2)WK(1)− 6s(1− s)ˆ 1
0WK(s)ds.
Proof of Proposition 2
To deal with detrending, let µ = θ1 in Lemma 3 to obtain
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)⇒ Bk (s)− kϑk−1W1 (1) .
We then need to examine the limiting behavior of the suitably normalized partial sums of zt. To
this end, note that
1√T
[sT ]∑t=1
(σtσt− 1
)(zt − z) = op (1)
uniformly in s thanks to the arguments used in the proof of Lemma 1. Then,
1√T
[sT ]∑t=1
zt =1√T
[sT ]∑t=1
σtσt
(zt − z) =1√T
[sT ]∑t=1
(zt − z) +1√T
[sT ]∑t=1
(σtσt− 1
)(zt − z)
⇒ W1 (s)− sW1 (1) .
Let now
B (s) = (B1 (s) , . . . , BK (s) ,W1 (s))′
and
B = (B1 (s)− s ϑ0W1 (1) , . . . , BK (s)− sKϑK−1W1 (1) ,W1 (s)− sW1 (1))′ ;
40
using the arguments of the proof of Theorem 2 in Kiefer and Vogelsang (2005) together with the
Lemma 1, we obtain e.g. for smooth kernels
TK ⇒(V B
)′(1)
(V
(−ˆ 1
0
ˆ 1
0
1
b2κ′′(r − sb
)(B(r)− rB(1)
)(B(s)− sB(1)
)′drds
)V ′)−1
V B(1).
Note further that
V(B(s)− sB(1)
)= V
(B(s)− sB(1)
),
and let Y = V B such that
TK ⇒ Y ′(1)
(−ˆ 1
0
ˆ 1
0
1
b2κ′′(r − sb
)(Y (r)− rY (1)) (Y (s)− sY (1))′ drds
)−1Y (1)
where Y is a multivariate Brownian motion with covariance matrix Υ. To obtain the required
distribution, let W = Υ−1/2Y (s), and note that Υ cancels out. The result for the Bartlett kernel
follows analogously.
To deal with detrending, let µ = θ1 + θ2s in Lemma 3 to obtain
1√T
[sT ]∑t=1
(pkt −
1
k + 1
)⇒ Bk (s)− kϑk−1
(4sW1 (1)− 3s2W1 (1)− 6s(1− s)
ˆ 1
0sdW1(s)
).
Note that´ 10 sdW1(s) = W1(1)−
´ 10 W1(s)ds; use then the same steps as for demeaning to arrive
at the desired result.
41
D The Bai and Ng (2005) test procedure
The test statistic suggested by Bai and Ng (2005) is given by
µ34 = Y ′T (γΦγ)−1YT
where
YT =
[1√T
∑Tt=1(yt − y)3
1√T
∑Tt=1[(yt − y)4 − 3σ4]
]and
γ =
[−3σ2 0 1 0
0 −6σ2 0 1
]
y, σ and Φ are consistent estimators. The theoretical long-run covariance matrix Φ is given
by Φ = limT→∞ T E(ZZ ′) with Z ′ =[yt − µ, (yt − µ)2 − σ2, (yt − µ)3, (yt − µ)4 − 3σ4
]and Z being the sample mean of Zt. The limiting distribution of µ34 is χ2(2). This result
is motivated by the fact that under normality, one obtains YT = γ 1√T
∑Tt=1 Zt + op(1) with
1√T
∑Tt=1 Zt ⇒ N(0,Φ). We follow Bai and Ng (2005) and consider the Newey and West (1987)
estimator.
42
E Critical values
Table 18: Critical values via response curves from the W′K(1)Q−1K,b,κWK(1)-distribution. κ is
the Bartlett kernel. The regression is given by cv(b) = a0 + a1b + a2b2 + a3b
3 + error withcorresponding R2. Nominal signicance levels are 0.9, 0.95, 0.975, 0.99 and 0.995.
a0 a1 a2 a3 R2
K = 10.9 2.7055 6.1598 8.6142 -3.3854 0.99980.95 3.8415 10.2574 15.6231 -7.0320 0.99970.975 5.0239 15.8489 24.5892 -12.5751 0.99950.99 6.6349 26.3361 36.1330 -19.6341 0.99940.995 7.8794 37.5823 41.2076 -21.6338 0.9991
K = 20.9 4.6052 15.5300 33.0455 -18.0050 0.99980.95 5.9915 24.2350 48.4528 -27.7431 0.99980.975 7.3778 35.6889 62.8696 -36.8917 0.99970.99 9.2103 53.2832 88.7896 -55.9722 0.99960.995 10.5966 71.9545 96.5536 -60.2045 0.9994
K = 30.9 6.2514 30.2793 67.5629 -42.2680 0.99980.95 7.8147 45.5956 88.1783 -56.1070 0.99970.975 9.3484 63.5918 109.2760 -70.7583 0.99970.99 11.3449 94.2752 127.9765 -84.0108 0.99960.995 12.8382 121.7357 137.7951 -91.2883 0.9994
K = 40.9 7.7794 54.1072 94.7069 -61.0147 0.99970.95 9.4877 76.3485 121.5104 -79.8180 0.99970.975 11.1433 102.1803 145.6040 -97.0618 0.99970.99 13.2767 142.5323 169.0490 -113.2457 0.99970.995 14.8603 177.5045 183.2276 -123.6561 0.9996
43
F Details on V matrices for dierent distributions
Table 19: Simulated coecients in the V matrixNormal ϑk $k t(3) ϑk $k
k = 1 0.28215 0.00023 k = 1 0.22969 -0.00018k = 2 0.14116 0.04605 k = 2 0.11475 0.02124k = 3 0.08588 0.04600 k = 3 0.06776 0.02126k = 4 0.05822 0.04004 k = 4 0.04428 0.01832
Log −Normal ϑk $k χ3(3) ϑk $k
k = 1 0.36215 -0.14576 k = 1 0.15910 -0.06493k = 2 0.12372 -0.02913 k = 2 0.06125 -0.00416k = 3 0.05921 -0.00549 k = 3 0.03239 0.00584k = 4 0.03382 0.00110 k = 4 0.02004 0.00753
44
References
Amado, C. and T. Teräsvirta (2013). Modelling volatility by variance decomposition. Journal
of Econometrics 175 (2), 142153.
Amado, C. and T. Teräsvirta (2014). Modelling changes in the unconditional variance of long
stock return series. Journal of Empirical Finance 25 (1), 1535.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix
estimation. Econometrica 59 (3), 817858.
Andrews, D. W. K. and J. C. Monahan (1992). An improved heteroskedasticity and autocorre-
lation consistent covariance matrix estimator. Econometrica 60 (4), 953966.
Bai, J. (2003). Testing parametric conditional distributions of dynamic models. Review of
Economics and Statistics 85 (3), 531549.
Bai, J. and S. Ng (2005). Tests for skewness, kurtosis, and normality for time series data. Journal
of Business & Economic Statistics 23 (1), 4960.
Bontemps, C. and N. Meddahi (2005). Testing normality: a GMM approach. Journal of Econo-
metrics 124 (1), 149186.
Bontemps, C. and N. Meddahi (2012). Testing distributional assumptions: A GMM aproach.
Journal of Applied Econometrics 27 (6), 9781012.
Cavaliere, G. and A. M. R. Taylor (2008). Time-transformed unit root tests for models with
non-stationary volatility. Journal of Time Series Analysis 29 (2), 300330.
Cavaliere, G. and A. M. R. Taylor (2009). Heteroskedastic time series with a unit root. Econo-
metric Theory 25 (5), 12281276.
Clark, T. E. (2009). Is the Great Moderation over? An empirical analysis. Economic Review 4,
542.
Clark, T. E. (2011). Real-time density forecasts from Bayesian vector autoregressions with
stochastic volatility. Journal of Business & Economic Statistics 29 (3), 327341.
Davidson, J. (1994). Stochastic Limit Theory. Oxford university press.
Demetrescu, M. and C. Hanck (2012). Unit root testing in heteroskedastic panels using the
Cauchy estimator. Journal of Business & Economic Statistics 30 (2), 256264.
Durbin, J. (1973). Distribution theory for tests based on the sample distribution function, Vol-
ume 9. Society for Industrial and Applied Mathematics.
Guidolin, M. and A. Timmermann (2006). An econometric model of nonlinear dynamics in the
joint distribution of stock and bond returns. Journal of Applied Econometrics 21 (1), 122.
Jarque, C. M. and A. K. Bera (1980). Ecient tests for normality, homoscedasticity and serial
independence of regression residuals. Economics Letters 6 (3), 255259.
45
Justiniano, A. and G. Primiceri (2008). The time-varying volatility of macroeconomic uctua-
tions. American Economic Review 98 (3), 604641.
Khmaladze, E. V. (1981). Martingale approach in the theory of goodness-of-t tests. Theory of
Probability & Its Applications 26 (2), 240257.
Kiefer, N. M. and T. J. Vogelsang (2005). A new asymptotic theory for heteroskedasticity-
autocorrelation robust tests. Econometric Theory 21 (6), 11301164.
Knüppel, M. (2015). Evaluating the calibration of multi-step-ahead density forecasts using raw
moments. Journal of Business & Economic Statistics 33 (2), 270281.
Lanne, M., J. Luoto, and P. Saikkonen (2012). Optimal forecasting of noncausal autoregressive
time series. International Journal of Forecasting 28 (3), 623631.
Lanne, M. and P. Saikkonen (2011). Noncausal autoregressions for economic time series. Journal
of Time Series Econometrics 3 (3), article 2.
Lanne, M. and P. Saikkonen (2013). Noncausal vector autoregression. Econometric Theory 29 (3),
447481.
Lomnicki, Z. A. (1961). Tests for departure from normality in the case of linear stochastic
processes. Metrika 4 (1), 3762.
MacNeill, I. B. (1978). Properties of sequences of partial sums of polynomial regression resid-
uals with applications to tests for change of regression at unknown times. The Annals of
Statistics 6 (2), 422433.
Newey, W. K. and K. D. West (1987). A simple, positive semi-denite, heteroskedasticity and
autocorrelation consistent covariance matrix. Econometrica 55 (3), 70308.
Phillips, P. C. B. and K. L. Xu (2006). Inference in autoregression under heteroskedasticity.
Journal of Time Series Analysis 27 (2), 289308.
Sensier, M. and D. van Dijk (2004). Testing for volatility changes in U.S. macroeconomic time
series. The Review of Economics and Statistics 86 (3), 833839.
Stock, J. H. and M. W. Watson (2002). Has the business cycle changed and why? NBER
Macroeconomics Annual 17 (1), 159218.
Sun, Y. (2014a). Fixed-smoothing asymptotics in a two-step generalized method of moments
framework. Econometrica 82 (6), 23272370.
Sun, Y. (2014b). Let's x it: Fixed-b asymptotics versus small-b asymptotics in heteroskedas-
ticity and autocorrelation robust inference. Journal of Econometrics 178 (3), 659677.
Teräsvirta, T. and Z. Zhao (2011). Stylized facts of return series, robust estimates and three
popular models of volatility. Applied Financial Economics 21 (1-2), 6794.
Vogelsang, T. J. and M. Wagner (2013). A xed-b perspective on the Phillips-Perron unit root
tests. Econometric Theory 29, 609628.
46
Vogt, M. (2012). Nonparametric regression for locally stationary time series. The Annals of
Statistics 40 (5), 26012633.
Westerlund, J. (2014). Heteroscedasticity robust panel unit root tests. Journal of Business &
Economic Statistics 32 (1), 112135.
Xu, K.-L. (2008). Bootstrapping autoregression under non-stationary volatility. The Economet-
rics Journal 11 (1), 126.
Yang, J. and T. J. Vogelsang (2011). Fixed-b analysis of LM-type tests for a shift in mean. The
Econometrics Journal 14 (3), 438456.
47