· testing the marginal distribution in time-varying location-scale models matei demetrescu y...

Testing the marginal distribution

in time-varying location-scale models∗

Matei Demetrescu†

Christian-Albrechts-University of Kiel

Robinson Kruse‡

University of Cologne and CREATES

Preliminary version: February 25, 2019

Please do not quote

Abstract

Testing distributional assumptions is an evergreen topic in statistics, econometrics and other

quantitative disciplines. As a leading case in applied work, the paper begins with testing

the normality of time series exhibiting features such as serial dependence and time-varying

mean and volatility. The results extend to general location-scale models without essential

modications. If falsely assuming weak or strict stationarity, the marginal distribution of the

series of interest is a mixture of the baseline distribution with dierent location and scale at

dierent times, and standard distribution tests would reject too often when the null is actually

true. We consider here tests based on raw moments of probability integral transform of the

suitably standardized series. For standardization, nonparametric estimators of the mean

and the variance functions may be used, which eliminates the possibility of rejecting the null

of, say, normality when the marginal distributions are normal, but with time-varying mean

and variance. The use of probability integral transforms is advantageous as they are quite

sensitive to deviations from the null other than asymmetry and excess kurtosis. Short-run

dynamics are taken into account using the (xed-b) Heteroskedasticity and Autocorrelation

Robust [HAR] approach of Kiefer and Vogelsang (2005, ET), which is shown to automatically

capture the eect of estimation uncertainty arising from empirical standardization. The

provided Monte Carlo experiments show that the new tests are performing well in terms

of size, but also in terms of power, when compared to popular alternative procedures. An

application to testing a distributional assumption for log squared returns in a stochastic

volatility model sheds light on the empirical usefulness of the proposed test.

Key words: Distribution testing; Probability integral transform; Estimated standardization; Nonpara-

metric estimator; Robust testing.

JEL classication: C12; C14; C22

∗The authors would like to thank Mehdi Hosseinkouchack, Dominik Liebl, Philipp Sibbertsen and MichaelVogt for very helpful comments. We are grateful to Yuze Liu for providing excellent research assistance.†Corresponding author: Institute for Statistics and Econometrics, Christian-Albrechts-University of Kiel,

Olshausenstr. 40-60, D-24118 Kiel, Germany, email: [email protected].‡University of Cologne, Faculty of Management, Economics and Social Science, Albert-Magnus-Platz, 50923

Cologne, Germany. E-mail address: [email protected] and CREATES, Aarhus University,School of Economics and Management, Fuglesangs Allé 4, DK-8210 Aarhus V, Denmark, e-mail address:[email protected] .

1

1 Introduction

Testing distributional assumptions is an important aspect of applied work. For instance, non-

normality of disturbances is sometimes taken to indicate a misspecication in regression models;

non-normality may also be a prerequisite of certain modelling approaches; see e.g. the analysis

of non-causal time series models (Lanne and Saikkonen, 2011; Lanne et al., 2012; Lanne and

Saikkonen, 2013). In duration models, departures from exponential distribution indicate again

misspecication. In an iid sampling situation, the Kolmogorov-Smirnov statistic is quite often

used to test distributional assumptions, but this is not straightforward to extend to serial depen-

dence and the use of estimated parameters. Bai (2003) resorts to the martingale transformation

of Khmaladze (1981); the martingale transform approach is quite demanding, though, so Bai

and Ng (2005) follow Jarque and Bera (1980) and resort to moment-based testing; see also Lom-

nicki (1961) for an early discussion for linear processes or Bontemps and Meddahi (2005) for

an ingenious choice of moment restrictions. While Bai and Ng (2005) address normality testing

explicitly, moment-based testing can be extended to test other distributions as well.

But serial dependence and estimation uncertainty are not the only issues to be faced in econo-

metric practice. Consider for instance the situation where a series is marginally normal, but

exhibits one break in the mean or the variance. The pooled distribution is a mixture of two

normals, which is non-normal, so a normality test ignoring the break will reject the true null

more often than required by the nominal level of the test. The reasoning extends to more gen-

eral patterns of changes in mean or variance, and other families of distributions. And indeed,

economic data are often found to exhibit time-varying moments. Even if arguing mean breaks

away, examples of time-varying volatility can be found in the eld of nancial data such as as-

set returns (see among others Guidolin and Timmermann, 2006; Amado and Teräsvirta, 2014;

Teräsvirta and Zhao, 2011; Amado and Teräsvirta, 2013) and also macroeconomic time series

such as economic growth or price changes (see e.g. Stock and Watson, 2002; Sensier and van Dijk,

2004; Clark, 2009, 2011; Justiniano and Primiceri, 2008). Typical patterns are permanent breaks

(like the Great Moderation as an example for a downward break) or trends in the variance.

As a consequence, robust inference for time-heteroskedasticity with dependent data has received

considerable attention in the last decade.1

We discuss in this paper tests based on series standardized using means and variances estimated

in a nonparametric fashion, to account for possible time variations of unknown shape in the

location and the scale of the series of interest. The tests are based on moments of probability

integral transforms [PIT]s of the standardized series. PITs have already been used successfully

by Knüppel (2015), though without accounting for the estimated standardization.

Regarding robustness against serial dependence, we rely on long-run variance estimation following

Bai and Ng (2005). We go one step further, though, and adopt the xed-b asymptotic framework

of Kiefer and Vogelsang (2005). The main feature of the xed-b framework is that the bandwidth

B used for long-run covariance estimation does not need to fulll the standard assumption that

1Phillips and Xu (2006) and Xu (2008) deal with stationary autoregressions, while, for unit root autoregressions,the reader is referred to Cavaliere and Taylor (2008) or Cavaliere and Taylor (2009). Time-varying volatility haveeven larger eects in panels of (nonstationary) series, prompting for suitable treatment; see e.g. Demetrescu andHanck (2012) or Westerlund (2014).

2

b = B/T → 0 as T →∞. On the contrary, the bandwidth is held xed as a linear proportion of

the sample size T , i.e. B = [bT ] with b ∈ (0, 1]. This leads to non-standard asymptotic limiting

distributions of tests statistics (like t, Wald and F ), in such a way that the critical values

obtained from such distributions reect the choice of bandwidth and kernel even as T → ∞,

such that the xed-b approach may provide much more accurate nite-sample inference.2 Our

main contribution is to show that the mean and variance functions may be estimated in a

nonparametric fashion. As a consequence, the practitioner does not have to specify a model

for the mean and the variance explicitly; moreover, the limiting distribution turns out to be

the same whether the mean and variance functions are known or estimated, thus leading to a

straightforward implementation of the proposed tests.

The remainder of the paper is structured as follows. In Section 2, the setup is described and newly

proposed test statistics are introduced for the important particular case of normality. The case

of uncertainty induced by nonparametrically estimated standardization is located in Section 3,

followed by the extension to other distributions. Our Monte Carlo simulations study is included

in Section 5. Section 6 provides an empirical application of tests to log squared returns for the

DJIA, FTSE and Nikkei stock index. Section 6 concludes the study. Proofs, additional results,

response curves for critical values and a description of the Bai and Ng (2005) test statistic are

given in the Appendix.

In terms of notation, C stands for a generic constant whose value may change from one occurrence

to another and ⇒ for weak convergence in a space of cadlag functions endowed with a suitable

norm.

2 Model and test idea

To x ideas, we rst describe the proposed procedure for the null hypothesis of normality and

assuming that the mean and variance of the series of interest are known. Section 3 shall discuss

the feasible version of our test procedure with nonparametric standardization and the application

to other distributions under the null.

The null hypothesis to be tested is that the series of interest xt is marginally normal. The

series xt is taken to exhibit time-varying mean and variance behavior as given by the following

component model

xt = µt + σtzt, t = 1, 2, . . . , T,

where zt is unconditionally homoskedastic and otherwise short-range dependent, while the time-

varying mean and variance are allowed to have triangular array structures, µt = µt,T and σt =

σt,T , allowing e.g. for breaks.

The following assumptions make the notions of short-run dependence and time-varying moments

precise.

2See Yang and Vogelsang (2011), Vogelsang and Wagner (2013) or Sun (2014a,b) for recent contributions tothis eld, inter alia.

3

Assumption 1 Let zt be a marginally standard normal, strictly stationary series with strong

mixing with coecients α (j) for which

α (j) < Aj−b for some b > 10/3.

Assume furthermore that zt has unity long-run variance,∑∞

h=−∞ E (ztzt+h) > 0. Finally, assume

absolutely summable 8th-order cumulants of zt.

The strong mixing condition is a standard way of controlling for the persistence of stochastic

processes and ensures zt to have short memory; given the unity long-run variance, zt is integrated

of order zero and σ2t then gives the local long-run variance. The mixing coecients α (j) are

only mildly restricted, given that normality of zt ensures niteness of moments of any order

and the typical trade-o between serial dependence and niteness of higher-order moments is

not relevant here. The condition also allows for mild form of conditional heteroskedasticity,

so the observed series xt may exhibit both conditional and unconditional heteroskedasticity.

Assumption 1 ensures e.g. weak convergence of the suitably normalized partial sums of zt and

z2t ,

1√T

[sT ]∑t=1

(zt

z2t − 1

)⇒

(W1 (s)

W2 (s)

), (1)

where (W1,W2)′ is a bivariate Brownian motion (see e.g. Davidson, 1994, Chapter 29). Strict

stationarity is a more restrictive condition than needed for the convergence in (1), for which

weak stationarity would have suced in addition to the I(0) property and uniform boundedness

of higher-order moments. We shall consider nonlinear transformations of zt, however, and strict

stationarity of zt ensures that the transformed series have constant variance; see below. Moreover,

strict stationarity is a reasonable assumption once the time-varying mean and variance have been

accounted for.

Strict stationarity of zt also separates the variance uctuations from the serial dependence prop-

erties. The unity long-run variance assumption on zt is an identifying restriction and allows

for the interpretation of σt as marginal (long-run) standard deviation. The mean and variance

functions themselves are taken to satisfy some smoothness conditions:

Assumption 2 The triangular arrays µt,T and σt,T are given as µt,T = µ (t/T) and σt,T =

σ (t/T), where both µ (·) and σ (·) are Lipschitz-continuous on [0, 1], and σ (·) is bounded away

from zero on [0, 1]. Let σ′′ exist and be bounded on [0, 1].

We base our test of the null hypothesis on moments of transformed series rather than the original

series zt. With Φ being the cdf (and ϕ denoting the pdf) of the standard normal distribution,

the probability integral transform

pt = Φ (zt)

is marginally uniform on [0, 1] under the null. It then holds under the null of uniformly distributed

PITs that

E(pkt

)=

1

k + 1; k ∈ N (2)

4

such that, under Assumption 1,

1√T

[sT ]∑t=1

pt − 1

2...

pKt − 1K+1

⇒

B1 (s)...

BK (s)

(3)

where (B1, . . . BK)′ is a K-variate Brownian motion with covariance matrix denoted by Ω =

E((B1(1), . . . BK(1))′ (B1(1), . . . BK(1))

)which is taken to be positive denite. Because pt is

only marginally uniform, Ω depends in general on the specic data generating process at hand.

We shall resort to an estimate thereof (based on the usual spectral density based approach;

see Newey and West, 1987; Andrews, 1991; Andrews and Monahan, 1992) to build Wald test

statistics of the moment restrictions in (2), so it is not required to know Ω. This follows the

approach of Bai and Ng (2005) or Bontemps and Meddahi (2005) to deal with serial dependence

of unknown form.

Suppose for now that the test can be based directly on empirical moments of pt (i.e. under known

parameters µt and σt). With mk = 1T

∑Tt=1 p

kt , a simple t-statistic for a single restriction on the

k-th moment is given by

tk =√T

(mk − 1

k+1

ωk

)(4)

with ω2k being the kth diagonal element of Ω (i.e. the long-run variance of pkt ). Let

Ω =T−1∑

j=−T+1

κ

(j

B

)Γj (5)

denote an estimator of Ω with proportional bandwidth B = [bT ], b > 0, where the Γj 's denote

the usual autocovariance matrix estimator at lag j,

Γj =1

T

T∑t=j+1

(pt − p)(pt−j − p

)′, j ≥ 0, and Γj = Γ−|j|, j < 0,

with pt the vector stacking pt, p2t , . . . , p

Kt . For b ∈ (0, 1] we have from Kiefer and Vogelsang

(2005) that

t2k ⇒W 2(1)

Qb,κ

where W is a standard Wiener process, and the functional Qb,κ is given in terms of the Brownian

bridge W (s)−sW (1) and depends explicitly on the choice of kernel and bandwidth. For simplicity

we work with the two most popular kernels in applied time series analysis, a) the quadratic

spectral [QS] kernel of Andrews (1991) with κ(s) = 2512π2s2

(sin(6πs/5)

6πs/5 − cos(6πs/5))and b) the

Bartlett kernel κ(s) = (1− |s|) 1 (|s| ≤ 1) with 1 the indicator function. For kernels with smooth

2nd order derivative, of which the QS kernel is one, it holds that

Qb,κ = −ˆ 1

0

ˆ 1

0

1

b2κ′′(r − sb

)(W (r)− rW (1)

)(W (s)− sW (1)

)drds ,

5

while, for the Bartlett kernel,

Qb,κ =2

b

ˆ 1

0

(W (r)− rW (1)

)2dr − 2

b

ˆ 1−b

0

(W (r + b)− (r + b)W (1)

)(W (r)− rW (1)) dr.

For both kernels, the standard asymptotics (t2k ⇒ χ21) is recovered when b→ 0 at suitable rates

(in fact Qb,κd→ 1 for b→ 0; c.f. Kiefer and Vogelsang, 2005).

Working with several raw moments (a portmanteau test so-to-say), we suggest to construct

TK = T

(m1 −

1

2, . . . ,mK −

1

K + 1

)Ω−1

(m1 −

1

2, . . . ,mK −

1

K + 1

)′. (6)

Similarly,

TK ⇒ W′K(1)Q−1K,b,κWK(1),

where WK(s) is a K-dimensional vector of independent standard Wiener processes QK,b,κ is

the K-dimensional variant of the above functionals relying on the Brownian bridges WK(s) −sWK(1); see Kiefer and Vogelsang (2005) for details.

Compared with relying on zt directly, PITs have several advantages; see Knüppel (2015) again.

Among others, PITs are bounded such that its higher-order cumulants are smaller than those

of the standard normal such that the variability of the long-run covariance matrix estimators

is smaller and the χ2 asymptotic approximation is more accurate. The bias of the moments of

PITs is also typically smaller than those of the untransformed series; see the Appendix for some

evidence in this respect.At the same time, PITs still allow to distinguish between skewness and

kurtosis as causes of non-normality: since the cdf of the standard normal is symmetric about

the point (0, 0.5), the rst raw moment of the PITs captures distributional asymmetry, but not

skewness alone. So a rejection of the null which is not driven by the rst raw moment is clearly

not due to skewness.

To take advantage of the properties of the PITs based test, one must however standardize the

series prior to applying the PIT. We address this issue in the following section.

3 Local standardization and estimation uncertainty

If e.g. the location and scale parameters of the sample to be tested are known (or given to the

researcher), the tests may be applied directly. Although this is not a purely hypothetical situation

(for instance, the evaluation of density forecasts is often conducted under such assumptions; see

Knüppel (2015) and the referenced therein), it is not the prevailing case in applied work. Let

therefore

pt = Φ (zt) = Φ

(xt − µtσt

)(7)

with µt and σt being estimators of the (time-varying) mean and standard deviation µt and σt.

Let also mk = 1T

∑Tt=1 p

kt denote the sample average of pkt .

The use of pt instead of pt for computing a feasible statistic, say tk, typically aects the limiting

distributions and requires corrections. This is known in the literature as the Durbin problem; see

6

Durbin (1973). In previous work, Bai and Ng (2005) show how to robustify against estimating

(constant) mean and variance, while Bontemps and Meddahi (2012) derive conditions under

which more general parametric standardization does not aect the limiting distribution. Bai

(2003) uses the Khmaladze transform to tackle this issue.

These approaches are discussed under stationarity assumptions. To account for the time-varying

nature of our model, we employ a local standardization to match the local stationarity properties

of the model. Consider to this end the Nadaraya-Watson estimator for the unknown functions

µ and σ, i.e. the local constant regressions of xt and (xt − E(xt))2 on the normalized time t/T :3

µ

(t

T

)=

∑Tj=1K

(t/T−j/T

h

)xj∑T

j=1K(t/T−j/T

h

)and

σ2(t

T

)=

∑Tj=1K

(t/T−j/T

h

)(xj − µj)2∑T

j=1K(t/T−j/T

h

) .

where we assume for simplicity that x0,−1,... and xT+1,.... These estimators are not unfamiliar in

time series analysis: using the uniform kernel, we obtain the classical centered moving averages

µt =1

2τ + 1

t+τ∑j=t−τ

xj and σ2t =1

2τ + 1

t+τ∑j=t−τ

(xj − µj)2 ,

where the window width is obtained from the bandwidth h by multiplication with T . The window

width τ is smaller than T , hence ensuring that xt is approximately standardized in nite samples,

and letting τ →∞ ensures that, asymptotically, xt is standardized correctly. This is simply local

standardizing instead of global as would have been sucient for the case of strict stationarity

of xt. In fact, we may allow for dierent window widths τµ and τσ to allow for a more exible

choice of these tuning parameters; see also Remark 1 below.

The key step in analyzing the feasible statistic based on pt is to note that the weak convergence

in (3) is replaced by the following limiting behavior.

Lemma 1 Let τµ, τσ, T → ∞ such that Tκ1τµ,σ

+τµ,σTκ2 → 0 for 2/3 < κ1 < κ2 < 3/4. Then, under

Assumptions 1 and 2 and the uniform kernel,

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)⇒ Bk (s)− kϑk−1W1 (s)− k

2$k−1W2 (s)

jointly for all k = 1, . . . ,K, with ϑk−1 = E(pk−1t ϕ (zt)

)and $k−1 = E

(pk−1t ztϕ (zt)

), as well

as W1,2 from (1) and Bk from (3).

Proof: see the Appendix.

3E.g. Vogt (2012) considers nonparametric multivariate regressions with time-varying regression surfaces.

7

Remark 1 The choice of the window widths τµ and τσ is critical for the performance of the

smoothers. One may note that the imposed restrictions imply undersmoothing, which is ex-

plained by the nature of the desired result. While classical nonparametric regression focusses on

minimizing the MSE of the estimated curve, we need to reduce estimation bias to a minimum,

since the eect of the bias on the partial sums of pt of µt, say, cumulates in s. This eectively

induces trends in the partial sums of pt, and, for weak convergence to Wiener process to still

take place, these trends must be of negligible magnitude. This implies that usual procedures for

choosing the window width such as cross-validation do not deliver. We make recommendations

to the choice of window width in the Monte Carlo section.

Remark 2 We state the lemma for the case of the uniform kernel to simplify the proofs. Other

kernel choices, say the popular Gaussian or Epanechnikov kernels, plausibly lead to analogous

results, as would using a local linear (or polynomial) regression. Moreover, one may allow for

a nite number of breaks, too. We provide simulation evidence in support of these claims, but

choose not to follow through analytically to focus on the main message.

Remark 3 Note that ϑ0 = E (ϕt) =´∞−∞ ϕ

2 (x) dx = 12√π; via the use of power series expan-

sions one may show that ϑ1 = 14√π, but the higher-order expectations (for ϑk, k ≥ 2) do not

seem to have a closed-form expression. We computed the expectations ϑk−1 = E(pk−1t ϕ(zt)) via

Monte Carlo simulation for k = 1, 2, 3, 4 with 1,000,000 observations and 10,000 replications;

the resulting values are as follows: ϑ = (0.2820948, 0.1410473, 0.0857805, 0.0581472). Clearly,

the simulated values for k = 1 and k = 2 match perfectly with their theoretical counterpart. We

therefore expect that MC precision of the higher-order terms is quite reasonable.

The feasible test statistics are based on pt,

tk =√T

(mk − 1

k+1

ωk

)(8)

and

TK = T

(m1 −

1

2, . . . , mK −

1

K + 1

)Ω−1

(m1 −

1

2, . . . , mK −

1

K + 1

)′(9)

with Ω from (5) computed using pkt as well.

By Lemma 1 we have that the K normalized partial sums 1√T

∑[sT ]t=1

(pkt − 1

k+1

)still converge

weakly to K-dimensional Brownian motion, albeit with a dierent long-run covariance matrix

than Ω, namely

Ω = V ΞV ′

where Ξ is the long-run covariance matrix of(pt, . . . , p

Kt , zt, z

2t − 1

)′, and

V =(IK ; ιK

)with ιK = −

(ϑ0 · · · KϑK−112$0 · · · K

2 $K−1

)′.

Since the xed-b asymptotics leads to partial-sums asymptotics for both mk and Ω, where the

8

relevant long-run (co)variance matrix simply cancels out in (8) and (9), it turns outrelatively

surprisinglythat no explicit correction is required. This is formalized in the following

Proposition 1 Under the Assumptions of Lemma 2, it holds under the additional condition that

Ξ be positive denite, that

t2k ⇒W 2(1)

Qb,κand

TK ⇒ W′K(1)Q−1K,b,κWK(1)

as T →∞.

Proof: see the Appendix.

Remark 4 It is questionable whether small-b asymptotics are feasible or worth pursuing. Due

to the local nature of the standardization, the convergence of pt to pt is quite slow compared to the

parametric case (where is is easily shown that pt − pt = Op(T−1/2)). Showing that Ω converges

to Ω is therefore more dicult for local standardization. Since such convergence is not required

for the xed-b approach, this actually delivers a further argument in favor of using the latter.

Remark 5 As a comparison, Appendix C provides a discussion of parametric standardization.

Parametric approaches require a model for the mean µt and the variance σ2t , which is prone to

misspecication. Moreover, there are essential dierences between the implications of the two

approaches. While parametric adjustment typically leads to bridge-type processes (see Lemma 3

in Appendix C for details), we obtain Brownian motions for local adjustment. As a consequence,

parametric standardization leads to the need of an explicit correction explicitly involving an esti-

mate of Ξ; see Proposition 2 in Appendix C. As may be seen there, a further disadvantage of the

parametric approach is that the eect depends on the shape of the mean or variance component

adjusted for.

4 Other distributions and extensions

Our framework allows testing other null distributions in location-scale models, since PITs apply

to any continuous distribution. In particular, it is straightforward to show that Lemma 1 and

Proposition 1 hold under mild regularity conditions (nite moments of any order; relaxing this

can be done, at the cost of appropriately restricting serial dependence), yet with e.g. ϑk =

E(pkt f0(zt)

)where f0 is the density function of the null distribution of zt. One further advantage

of the local standardization approach is that the expectations ϑk = E(pktϕ(zt)

)and $k =

E(pkt ztϕ (zt)

)need not be computed explicitly, so the test is immediately applicable for any

continuous null shape in location-scale families.

9

5 Monte Carlo study

In our Monte Carlo simulation study we compare the tk and TK statistics to the procedure of

Bai and Ng (2005).4 The newly proposed test is carried out by using either a single moment

(rst to fourth) or the rst two (T2), rst three (T3) or rst four moments (T4). We use sample

sizes of T = 250, 1000.

Regarding autocorrelation, we consider a causal and invertible ARMA(1,1) process with AR and

MA parameter φ = 0, 0.85 and θ = 0,−0.45, respectively. The general form of the DGP is

given by

yt = µt + σtzt

µt = µ1 + (µ2 − µ1) · 1(t ≥ bτT c)

σt = σ1 + (σ2 − σ1) · 1(t ≥ bτT c)

zt = φzt−1 + εt − θεt−1εt

i.i.d.∼ D(0, 1) .

Since all procedures are scale-invariant, we do not normalize the long-run variance of zt to unity.

distribution of εt is specied as follows. Under H0, innovations εt are standard normally dis-

tributed. Under the alternative, we consider three standardized non-normal mixture distributions

with weights c ∈ [0, 1]:

1. Mixture of a normal and a Student-t(3) distribution,

2. Mixture of a normal and a lognormal-distribution,

3. Mixture of a normal and a χ2(3)-distribution.

Regarding the long-run covariance matrix estimator, the xed-bandwidth parameter b is specied

as b = 0.1 as previous simulation experiments showed that the size of the tests is quite stable for

dierent values of b, but that power is higher for smaller values of b. Hence, we choose b = 0.1.

Results are presented for the Bartlett kernel with linearly decaying weights.

The nonparametric estimator is the Nadaraya-Watson estimator with Gaussian kernel and a

down-scaled bandwidth chosen via cross-validation. The scaling factor s is set equal to 3/4 or

3/5 for comparison. The bandwidth chosen according to cross-validation is down-scaled as the

asymptotic results require a small bias in the estimation. The nominal signicance level equals

5% and the number of Monte Carlo replications is set to 2,000 for each single experiment. In

what concerns critical values for the xed-b distributions, we provide them on the basis of the

limiting results with 1,000 observations and 50,000 replications forK = 1, 2, 3, 4. Estimated cubic

response curves cv(b) are reported in Table 18 together with an R2 measure for the precision of

approximation.

Results are reported in Tables 1-16. If the mixture weight c = 0, then the distribution of

the innovations zt is Normal and this case refers to the IID or ARMA size experiments. For

4Details on the test proposed by Bai and Ng (2005) can be found in Appendix D.

10

the other three distributions, i.e. t(3), Log − Normal and χ2(3), the mixture weight is set as

c = 0.25, 0.5, 0.75, 1. We expect the power to increase along with c giving more weight on the

non-normal distribution. In the Tables 1-16, we distinguish between the four cases (i) no shift,

(ii) mean shift, (iii) vola shift and (iv) mean plus vola shift. In case of a mean shift, a structural

break from 0 to 3 takes place in the middle of the sample. For a vola shift, the standard deviation

switches from unity to three at the same breakpoint.

While the Bai and Ng (2005) test is generally oversized (less for the ARMA(1,1) case), the raw

moment-based tests are much closer to the nominal signicance level of 5%. In some cases we

observe that they are marginally undersized. But, for the larger sample size of T = 250 with

short-run dynamics, most of them are pretty close to the desired frequency of rejections. Clearly,

the BN test is heavily oversized when there are mean and/or vola shifts. On the contrary, the

newly proposed statistics properly account for such shifts and deliver quite accurate empirical

sizes. The size performance is somewhat better for s = 3/4 rather than s = 3/5.5

Turning to power, we shall distinguish case (i) no shift from the remaining ones. Only in the

absence of mean and vola shifts, the BN test performs reasonable as it does not account for such

breaks. Here, we can compare the (size-unadjusted) power of the BN test to the newly proposed

ones, while keeping in mind that the BN test is generally somewhat over-sized under the null.

These situations are given in Tables 1, 5, 9 and 13. As we can observe, the rst moment cannot

detect excess kurtosis given symmetry as in the t(3) distribution, but the second, third and fourth

moment tests are successful and deliver higher power than the BN test. Moreover, a combination

of moments does not pay o in terms of power. The power of a properly selected single-moment

statistic is generally higher than those observed for the statistics relying on combinations. For

the log-normal and χ2-distributions, the test based on the rst moment appears to be quite

powerful and provides largest power across the studied tests. It also beats the BN test. The

choice of the down-sclaing factor for the bandwidth of the Nadaraya-Watson estimator does not

impact the results much. As the size results are better for the case of s = 3/4, we recommend

this choice for practical purposes (we also use this setting in our empirical application).

For the remaining situations, a direct comparison of the BN test to the new ones is not meaningful

due to the massive over-rejections of the BN test under the null. These additional experiments

allow us to compare the power across dierent scenarios regarding the breaks in comparison to

the benchmark case of no shifts. The results reveal that power is somewhat lowered in the case of

mean shifts, but a bit higher when pure vola shifts are present. In case of a combination of both

eects being present, power is almost similar to the benchmark case in many cases. Furthermore,

we might simply evaluate the consistency of our test statistics which should reject more often as

c increases and also as T increases. As expected, power increases along both dimensions.

5It is of importance to note that the size does not vary much with the choice of the bandwidth parameter b asunreported previous simulations revealed. This is of advantage when it comes to the power of such tests whichtypically depend a lot on the bandwidth choice; cf. Kiefer and Vogelsang (2005). In this sense, we are not facinga size-power tradeo as we can select the most suitable b in a way that power is maximized. As indicated above,we select b = 0.1.

11

Table 1: Size and power for T = 250, no shift and s = 3/4

Normalt1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

IID 0.054 0.052 0.053 0.043 0.051 0.051 0.047 0.121ARMA 0.045 0.055 0.058 0.071 0.057 0.062 0.060 0.093

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.051 0.049 0.046 0.050 0.050 0.038 0.041 0.0960.5 0.041 0.075 0.131 0.183 0.096 0.085 0.081 0.0440.75 0.048 0.320 0.607 0.650 0.519 0.469 0.520 0.180

1 0.070 0.482 0.750 0.772 0.699 0.705 0.769 0.215

Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.078 0.061 0.049 0.052 0.061 0.048 0.052 0.1020.5 0.664 0.594 0.500 0.372 0.485 0.391 0.386 0.5780.75 0.971 0.931 0.874 0.755 0.973 0.998 0.998 0.950

1 0.959 0.920 0.855 0.726 0.967 1.000 1.000 0.954

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.056 0.057 0.054 0.052 0.049 0.055 0.053 0.1120.5 0.692 0.584 0.383 0.230 0.522 0.435 0.338 0.6660.75 1.000 0.989 0.919 0.583 0.999 0.997 0.992 1.000

1 1.000 0.999 0.957 0.636 1.000 1.000 1.000 1.000

Table 2: Size and power for T = 250, mean shift and s = 3/4


IID 0.042 0.046 0.046 0.041 0.042 0.045 0.040 0.999

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.044 0.058 0.056 0.059 0.064 0.073 0.052 0.9940.5 0.049 0.082 0.150 0.168 0.117 0.096 0.089 0.9420.75 0.045 0.254 0.552 0.633 0.524 0.431 0.440 0.798

1 0.034 0.371 0.666 0.733 0.668 0.588 0.644 0.536

Log −Normalc t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.048 0.059 0.067 0.058 0.055 0.047 0.054 0.9890.5 0.526 0.520 0.438 0.344 0.384 0.341 0.284 0.9070.75 0.952 0.921 0.873 0.775 0.897 0.993 0.985 0.719

1 0.959 0.929 0.892 0.808 0.935 1.000 1.000 0.518

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.051 0.046 0.046 0.048 0.054 0.048 0.052 1.0000.5 0.522 0.477 0.351 0.209 0.372 0.270 0.215 0.9980.75 0.997 0.992 0.909 0.625 0.987 0.974 0.954 0.974

1 1.000 0.997 0.936 0.643 0.999 0.999 0.997 0.800

12

Table 3: Size and power for T = 250, vola shift and s = 3/4


IID 0.039 0.046 0.055 0.071 0.050 0.042 0.041 0.769

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.055 0.062 0.066 0.071 0.062 0.051 0.047 0.7370.5 0.046 0.128 0.229 0.293 0.220 0.180 0.144 0.4520.75 0.063 0.400 0.696 0.739 0.630 0.580 0.605 0.271

1 0.062 0.505 0.764 0.783 0.761 0.727 0.830 0.261


1 0.957 0.931 0.892 0.793 0.952 1.000 1.000 0.924

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.052 0.058 0.071 0.076 0.053 0.051 0.041 0.7400.5 0.602 0.652 0.562 0.405 0.491 0.369 0.320 0.5320.75 0.997 0.995 0.927 0.702 0.999 0.996 0.983 0.850

1 0.998 0.995 0.955 0.727 0.998 1.000 1.000 0.975

Table 4: Size and power for T = 250, mean plus vola shift and s = 3/4


IID 0.059 0.049 0.056 0.060 0.057 0.047 0.048 0.998

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.061 0.072 0.079 0.092 0.073 0.056 0.058 0.9980.5 0.050 0.149 0.247 0.268 0.197 0.162 0.134 0.9360.75 0.060 0.399 0.657 0.662 0.564 0.483 0.490 0.739

1 0.051 0.492 0.766 0.785 0.728 0.670 0.712 0.622


1 0.951 0.928 0.884 0.791 0.929 1.000 1.000 0.959

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.093 0.090 0.090 0.084 0.093 0.079 0.064 1.0000.5 0.691 0.687 0.529 0.340 0.540 0.403 0.355 1.0000.75 0.995 0.992 0.939 0.689 0.998 0.995 0.987 1.000

1 0.997 0.998 0.967 0.754 1.000 1.000 1.000 1.000

13



IID 0.054 0.047 0.047 0.055 0.056 0.043 0.037 0.126ARMA 0.032 0.092 0.122 0.135 0.105 0.110 0.148 0.091

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.047 0.052 0.043 0.037 0.043 0.043 0.036 0.0790.5 0.052 0.072 0.124 0.160 0.116 0.098 0.074 0.0400.75 0.071 0.337 0.593 0.637 0.546 0.477 0.537 0.165

1 0.082 0.471 0.740 0.768 0.710 0.690 0.806 0.214


0.25 0.068 0.062 0.053 0.044 0.056 0.054 0.052 0.1060.5 0.664 0.592 0.483 0.362 0.481 0.405 0.380 0.5500.75 0.959 0.919 0.846 0.733 0.963 0.997 0.998 0.957

1 0.975 0.935 0.867 0.749 0.975 1.000 1.000 0.964

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.070 0.048 0.055 0.055 0.063 0.062 0.059 0.1230.5 0.656 0.570 0.407 0.237 0.502 0.363 0.315 0.6390.75 0.999 0.997 0.932 0.605 0.999 0.997 0.990 1.000

1 0.998 0.998 0.955 0.640 0.999 1.000 1.000 1.000



IID 0.047 0.041 0.051 0.048 0.046 0.045 0.051 0.998

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.044 0.049 0.055 0.059 0.048 0.057 0.048 0.9960.5 0.040 0.089 0.145 0.162 0.126 0.111 0.099 0.9400.75 0.051 0.256 0.499 0.574 0.463 0.389 0.399 0.785

1 0.034 0.344 0.693 0.757 0.668 0.555 0.599 0.542


1 0.959 0.937 0.900 0.821 0.926 1.000 1.000 0.503

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.058 0.051 0.054 0.058 0.068 0.069 0.054 1.0000.5 0.489 0.476 0.337 0.210 0.368 0.270 0.224 1.0000.75 0.991 0.981 0.886 0.613 0.988 0.969 0.939 0.964

1 0.997 0.998 0.944 0.666 1.000 0.999 0.995 0.825

14



IID 0.041 0.053 0.067 0.070 0.053 0.043 0.039 0.757

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.049 0.066 0.089 0.094 0.057 0.049 0.041 0.7270.5 0.040 0.114 0.231 0.278 0.205 0.163 0.140 0.4290.75 0.059 0.371 0.656 0.724 0.636 0.583 0.601 0.304

1 0.065 0.545 0.787 0.815 0.750 0.742 0.811 0.252


1 0.968 0.942 0.880 0.780 0.969 1.000 1.000 0.920

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.076 0.078 0.103 0.106 0.079 0.066 0.069 0.7510.5 0.589 0.616 0.498 0.351 0.439 0.341 0.267 0.5170.75 0.997 0.991 0.934 0.698 0.992 0.992 0.982 0.843

1 0.996 0.994 0.949 0.707 0.999 1.000 0.999 0.979

Table 8: Size and power for T = 250, mean and vola shift and s = 3/5


IID 0.037 0.047 0.065 0.065 0.049 0.048 0.039 0.997

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.047 0.058 0.073 0.079 0.060 0.050 0.042 0.9970.5 0.044 0.138 0.246 0.285 0.203 0.155 0.126 0.9410.75 0.055 0.359 0.624 0.672 0.576 0.470 0.479 0.732

1 0.058 0.476 0.768 0.795 0.729 0.660 0.720 0.614


1 0.960 0.933 0.897 0.825 0.937 1.000 0.999 0.963

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.077 0.076 0.076 0.065 0.067 0.073 0.067 1.0000.5 0.643 0.635 0.503 0.343 0.518 0.411 0.361 1.0000.75 0.996 0.990 0.933 0.706 0.990 0.985 0.983 1.000

1 0.996 0.996 0.963 0.743 0.996 0.997 0.997 1.000

15



IID 0.047 0.049 0.050 0.043 0.045 0.049 0.043 0.083ARMA 0.046 0.056 0.054 0.045 0.044 0.046 0.038 0.083

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.049 0.061 0.057 0.057 0.054 0.037 0.037 0.0640.5 0.050 0.368 0.612 0.677 0.499 0.382 0.325 0.2000.75 0.052 0.879 0.928 0.927 0.918 0.913 0.991 0.386

1 0.063 0.914 0.934 0.925 0.940 0.965 1.000 0.427


0.25 0.138 0.173 0.157 0.137 0.124 0.086 0.068 0.1240.5 0.997 0.981 0.952 0.923 0.996 0.993 0.976 0.9930.75 0.998 0.994 0.986 0.970 0.999 1.000 1.000 0.996

1 0.998 0.994 0.985 0.965 0.999 1.000 1.000 0.994

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.095 0.073 0.065 0.049 0.072 0.056 0.047 0.1000.5 1.000 1.000 0.985 0.896 0.995 0.985 0.954 1.0000.75 1.000 1.000 1.000 0.998 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000



IID 0.042 0.047 0.050 0.050 0.053 0.046 0.036 1.000

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.048 0.048 0.053 0.054 0.052 0.046 0.048 0.9950.5 0.046 0.323 0.589 0.682 0.540 0.386 0.336 0.9520.75 0.045 0.852 0.931 0.930 0.929 0.931 0.985 0.730

1 0.036 0.929 0.950 0.945 0.953 0.949 0.999 0.353


1 1.000 0.993 0.988 0.976 0.997 1.000 1.000 0.979

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.078 0.088 0.071 0.063 0.066 0.058 0.040 1.0000.5 0.998 0.995 0.969 0.879 0.990 0.970 0.936 1.0000.75 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 0.999 1.000 1.000 1.000 0.987

16



IID 0.049 0.053 0.055 0.058 0.044 0.040 0.054 1.000

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.057 0.074 0.100 0.113 0.096 0.069 0.059 0.9440.5 0.056 0.427 0.687 0.756 0.629 0.494 0.442 0.6090.75 0.069 0.917 0.946 0.941 0.936 0.943 0.992 0.408

1 0.069 0.951 0.958 0.956 0.955 0.977 0.999 0.407


1 0.995 0.990 0.977 0.958 0.993 1.000 1.000 0.990

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.069 0.115 0.131 0.128 0.086 0.073 0.057 0.9990.5 0.997 0.999 0.990 0.951 0.995 0.986 0.959 0.8960.75 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Table 12: Size and power for T = 1000, mean plus vola shift and s = 3/4


IID 0.051 0.054 0.063 0.064 0.055 0.047 0.038 1.000

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.049 0.078 0.100 0.109 0.075 0.065 0.047 1.0000.5 0.055 0.469 0.694 0.746 0.613 0.481 0.400 0.9780.75 0.047 0.909 0.942 0.936 0.947 0.929 0.987 0.835

1 0.046 0.946 0.956 0.950 0.955 0.955 0.998 0.708


1 0.993 0.991 0.988 0.972 0.993 1.000 1.000 0.993

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.121 0.139 0.123 0.117 0.099 0.082 0.056 1.0000.5 1.000 0.998 0.994 0.942 0.997 0.989 0.969 1.0000.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000

17



IID 0.050 0.043 0.045 0.049 0.055 0.049 0.044 0.083ARMA 0.049 0.103 0.118 0.117 0.116 0.118 0.118 0.091

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.045 0.046 0.045 0.059 0.050 0.050 0.035 0.0660.5 0.058 0.349 0.596 0.668 0.530 0.373 0.325 0.2090.75 0.065 0.880 0.927 0.923 0.918 0.923 0.990 0.387

1 0.062 0.936 0.951 0.948 0.947 0.961 1.000 0.385


0.25 0.159 0.165 0.152 0.117 0.125 0.106 0.077 0.1270.5 0.996 0.973 0.957 0.927 0.994 0.989 0.976 0.9970.75 1.000 0.992 0.983 0.967 1.000 1.000 1.000 0.995

1 0.998 0.995 0.982 0.960 0.999 1.000 1.000 0.994

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN

0.25 0.080 0.057 0.048 0.038 0.067 0.067 0.047 0.1010.5 0.999 0.997 0.984 0.899 0.994 0.987 0.950 0.9980.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000



IID 0.044 0.046 0.045 0.040 0.042 0.043 0.035 1.000

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.042 0.048 0.061 0.064 0.054 0.054 0.053 0.9890.5 0.054 0.294 0.558 0.664 0.513 0.387 0.319 0.9370.75 0.048 0.859 0.940 0.939 0.948 0.930 0.978 0.723

1 0.047 0.927 0.950 0.950 0.968 0.965 0.999 0.337


1 0.998 0.993 0.985 0.974 0.999 1.000 1.000 0.978

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.078 0.080 0.067 0.057 0.070 0.044 0.038 1.0000.5 0.992 0.993 0.969 0.879 0.981 0.945 0.898 1.0000.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.990

18



IID 0.046 0.044 0.061 0.073 0.059 0.049 0.041 0.997

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.059 0.081 0.113 0.122 0.088 0.073 0.053 0.9440.5 0.048 0.449 0.731 0.785 0.673 0.536 0.445 0.6010.75 0.063 0.899 0.937 0.932 0.931 0.942 0.991 0.414

1 0.077 0.944 0.957 0.955 0.959 0.959 0.999 0.429


1 0.998 0.992 0.982 0.954 0.998 1.000 1.000 0.985

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.083 0.129 0.122 0.124 0.095 0.074 0.055 0.9990.5 0.998 0.998 0.993 0.949 0.992 0.969 0.956 0.9270.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Table 16: Size and power for T = 1000, mean and vola shift and s = 3/5


IID 0.048 0.057 0.075 0.072 0.052 0.045 0.042 1.000

t(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.042 0.084 0.111 0.123 0.083 0.068 0.053 1.0000.5 0.047 0.443 0.685 0.742 0.623 0.491 0.401 0.9840.75 0.052 0.903 0.944 0.938 0.942 0.931 0.980 0.813

1 0.042 0.946 0.957 0.956 0.955 0.955 1.000 0.700


1 0.995 0.989 0.978 0.966 0.995 1.000 1.000 0.994

χ2(3)c t1 t2 t3 t4 W1,2 W1,2,3 W1,2,3,4 BN0.25 0.131 0.157 0.135 0.099 0.106 0.077 0.067 1.0000.5 0.994 0.994 0.988 0.942 0.988 0.978 0.956 1.0000.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

19

6 Distributional assumptions in a stochastic volatility model

As an empirical application, we test a distributional implication in Stochastic Volatility (SV)

models. The basic SV model with a normal distribution as an assumption reads

yt = exp(ht/2)ξt

for the returns yt; ξtiid∼ N(0, 1) and the rst-order autoregressive log-volatility process ht:

ht = α+ βht−1 + εt .

with εtiid∼ N(0, σ2). Hence, apart from a constant term, the log-squared returns can be expressed

as

xt = ht + log(ξ2t )

where xt = log(y2t ). A direct implication is that the logarithm of squared returns follows a

log−χ2(ν)-distribution. Its density is given as

1

2ν/2Γ(ν/2)exp

(1

2νx− 1

2exp(x)

)where ν = 1. Often, for simplicity the maximum likelihood estimation procedures for such

models relies either on a normality assumption for simplicity or on this particular log-χ2(1)

distribution. We use the newly proposed test statistics to test these assumptions on xt via the

locally standardized probability integral transformation. Corresponding values for ϑk and $k

are simulated accordingly.

Daily data is taken from the Realized Library for three markets from three dierent continents,

namely the DJIA, FTSE and Nikkei. The sample spans from February 2, 2000 to November

27, 2018 and contains T = 4602 observations. A very few observations with returns amount-

ing exactly to zero are removed from the data sets. The nominal signicance level is 5%. The

xed-bandwidth parameter is set equal to b = 0.1 as the Monte Carlo simulation results sug-

gest. A Bartlett kernel is employed for the xed-b covariance matrix estimator. Regarding the

nonparametric regression estimator for the mean and the variance, we employ the Nadaraya-

Table 17: Empirical results log-χ2(1) distribution, rejections at 5% level in bold face

DJIA FTSE NIKKEI

m1 0.525 0.527 0.526m2 0.352 0.352 0.352m3 0.257 0.256 0.256m4 0.197 0.195 0.195

t1 -1.959 -1.814 -1.197t2 1.201 0.737 1.301t3 2.790 2.196 2.679

t4 3.753 3.014 3.300

20

Log

squa

red

retu

rns

− D

JIA

0 1000 3000

−20

−15

−10

−5

0

−20 −10 −5 0

0.00

0.05

0.10

0.15

Den

sity

z

0 1000 3000

−6

−4

−2

02

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 1: DJIA log squared returns. Top left: data plot with tted nonparametric mean function;top right: kernel density plot; bottom left: locally standardized data and bottom right: histogramon PIT.

Watson estimator with a Gaussian kernel. The bandwidth is selected via down-scaled (s = 3/4)

cross-validation. Results for the log-χ2(1) distribution are reported in Table 17. (Results for the

normal distribution are unreported, but clear cut and all test statistics reject the null.)

As we observe, the assumption that log squared returns follow a log-chi squared distribution

with one degree of freedom has to be rejected at the ve percent level by most of the tests.

Even though the deviations of the raw moments of PITs from their theoretical values under the

null might be relatively small, the corresponding statistics indicate their signicance in most of

the cases. The apparent deviation opens for improvement in the estimation of SV models where

eciency gains in nite samples are to be expected from exploiting more reasonable distributional

assumptions. In short, the normality assumptions on ξt or on xt directly are highly questionable.

The Figures 1-3 display the tested series together with the nonparametric tted mean (top left),

the kernel density estimator (top right), the locally standardized series zt (bottom left) and the

histogram of the PIT (bottom right). Clearly, there are important variations in the mean to be

captured by the Nadaraya-Watson estimator for all the series. Moreover, the histograms suggest

some degree of non-uniformity which is detected by the newly proposed statistics.

21

Log

squa

red

retu

rns

− F

TS

E

0 1000 3000

−20

−15

−10

−5

0

−20 −15 −10 −5 0

0.00

0.05

0.10

0.15

0.20

Den

sity

z

0 1000 3000

−6

−4

−2

02

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Figure 2: FTSE log squared returns. Top left: data plot with tted nonparametric mean function;top right: kernel density plot; bottom left: locally standardized data and bottom right: histogramon PIT.

22

Log

squa

red

retu

rns

− N

ikke

i

0 1000 3000

−20

−15

−10

−5

0

−20 −10 −5 0

0.00

0.05

0.10

0.15

0.20

Den

sity

z

0 1000 3000

−6

−4

−2

02

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3: NIKKEI log squared returns. Top left: data plot with tted nonparametric meanfunction; top right: kernel density plot; bottom left: locally standardized data and bottom right:histogram on PIT.

23

7 Concluding remarks

This work considers the long-standing issue of testing distributional assumptions. The newly

proposed tests are based on raw moment conditions of probability integral transformations.

By doing so, we are able to construct tests which are more sensitive towards deviations from

theoretical values under the null. The framework which we provide makes use of the so-called

xed-bandwidth approach for the estimation of long-run covariance matrices of dierent raw

moments. As a result, the empirical size is well controlled for even in small samples under

dierent types of autocorrelation. Time-varying unconditional mean and variance are found in

many economic series. In order to cope with this typical empirical feature, our framework also

allows for non-parametric time-varying variance estimation. As both, the mean and variance

function of the time series are estimated, we provide a necessary correction which amounts to a

modied long-run variance estimation. Our simulation study demonstrates that the suggested

tests perform very well in nite samples. In an empirical application to log squared returns from

three major stock indices, we study the merits and limitations of the robust raw moment-based

statistics.

24

Appendix

To simplify notation in the proofs, we let w.l.o.g. τµ = τσ = τ .

A Preliminary results

Lemma 2 Let g, h be two functions such that E (g (zt)) = E (h (zt)) = 0 and g (x) = O(x2)

=

h (x) as as x→ ±∞. Under the assumptions of Lemma 1, we have that

1. supt=1,...,T

∣∣∣ 12τ+1

∑t+τj=t−τ wjg (zj)

∣∣∣ = Op(T−δ

)for some 0 < δ < 1/8 and any bounded

sequence wj;

2. supt=1,...,T |µt − µt| = Op(T−δ

)and supt=1,...,T |σt − σt| = Op

(T−δ

)for some 0 < δ < 1/8;

3. 1√T

∑[sT ]t=1 g (zt)

(1

2τ+1

∑t+τj=t−τ h (zj)

)= op (1);

4. 1√T

∑[sT ]t=1 g (zt)

(σtσt− 1)

= op (1);

5. 1√T

∑[sT ]t=1 (zt − zt)2 = op (1);

6. 1√T

∑[sT ]t=1

(ϕ (zt) (zt − zt) + ϕ′ (ξt) (zt − zt)2

)2= op (1), where ξt lies between zt and zt for

any 1 ≤ t ≤ T ;

7. 1√T

∑[sT ]t=1 p

k−1t ϕ (zt)

(1

2τ+1

∑t+τj=t−τ

σjσtzj

)= E

(pk−1t ϕ (zt)

)∑[sT ]t=1 zt + op (1);

8. 1√T

∑[sT ]t=1 p

k−1t ϕ (zt) zt

(σtσt− 1)

= −12 E(pk−1t ztϕ (zt)

)∑[sT ]t=1

(z2t − 1

)+ op (1),

where the op (1) terms are uniform in s ∈ [0, 1].

B Proofs

Proof of item 1

We rst show that 1√τ

∑t+τj=t−τ wjg (zj) is uniformly L4-bounded in t = 1, . . . , T . We have

∥∥∥∥∥∥ 1√τ

t+τ∑j=t−τ

wjg (zj)

∥∥∥∥∥∥4

4

= E

1√τ

t+τ∑j=t−τ

wjg (zj)

4=

1

τ2

t+τ∑j1=t−τ

t+τ∑j2=t−τ

t+τ∑j3=t−τ

t+τ∑j4=t−τ

wj1wj2wj3wj4 E (g (zj1) g (zj2) g (zj3) g (zj4)) .

Now, an upper bound is given by∥∥∥∥∥∥ 1√τ

t+τ∑j=t−τ

wjg (zj)

∥∥∥∥∥∥4

4

≤ C

τ2

t+τ∑j1=t−τ

t+τ∑j2=t−τ

t+τ∑j3=t−τ

t+τ∑j4=t−τ

E(z2j1z

2j2z

2j3z

2j4

)25

where the absolute summability of the 8th order cumulants of zt leads with standard arguments

to the niteness of this upper bound.

Then, the maximum over T elements of a positive, uniformly L4-bounded sequence is known to

be Op(T 1/4

), so

supt=1,...,T

∣∣∣∣∣∣ 1

2τ + 1

t+τ∑j=t−τ

wjg (zj)

∣∣∣∣∣∣ = Op

(4√T√τ

),

from which the desired result follows given the rate restrictions on τ .

Proof of item 2

Let us examine the properties of µt rst. We have that

µt − µt =1

2τ + 1

t+τ∑j=t−τ

xj =1

2τ + 1

t+τ∑j=t−τ

(µj − µt) +1

2τ + 1

t+τ∑j=t−τ

σjzj .

Thanks to the Lipschitz property of µ (·), the rst summand on the r.h.s. is Op(τT

)uniformly

in t = 1, . . . , T . Item 1 can be used to derive the unform behavior of the second summand,

such that supt=1,...,T |µt − µt| = Op(T−δ

)for some 0 < δ < 1/8 as required. The local variance

estimator is given by

σ2t =1

2τ + 1

t+τ∑j=t−τ

(xj − µj)2 =1

2τ + 1

t+τ∑j=t−τ

(σjzj − (µj − µj))2

so

σ2t − σ2t =1

2τ + 1

t+τ∑j=t−τ

(σ2j z

2j − σ2t

)− 2

2τ + 1

t+τ∑j=t−τ

σjzj (µj − µj) +1

2τ + 1

t+τ∑j=t−τ

(µj − µj)2 .

Now,

supt=1,...,T

∣∣∣∣∣∣ 1

2τ + 1

t+τ∑j=t−τ

σjzj (µj − µj)

∣∣∣∣∣∣ ≤ supt=1,...,T

|µj − µj | supt=1,...,T

1

2τ + 1

t+τ∑j=t−τ

|σjzj |

where

0 ≤ supt=1,...,T

1

2τ + 1

t+τ∑j=t−τ

|σjzj | ≤ E (|zt|) supt=1,...,T

σt+ supt=1,...,T

1

2τ + 1

t+τ∑j=t−τ

σj (|zj | − E (|zj |)) = Op (1)

with the same arguments used in the proof of item 1. Furthermore, for all 1 ≤ t ≤ T ,

1

2τ + 1

t+τ∑j=t−τ

(µj − µj)2 ≤

(sup

t=1,...,T|µj − µj |

)2

= op (1)

26

so, after using item 1 again to conclude that supt=1,...,T1

2τ+1

∑t+τj=t−τ σ

2j

(z2j − 1

)= Op

(T−δ

)for

some 0 < δ < 1/8, we have that

1

2τ + 1

t+τ∑j=t−τ

(σ2j z

2j − σ2t

)=

1

2τ + 1

t+τ∑j=t−τ

σ2j(z2j − 1

)+

1

2τ + 1

t+τ∑j=t−τ

(σ2j − σ2t

)= Op

(T−δ

)+O

( τT

)uniformly in t = 1, . . . , T and thus supt=1,...,T

∣∣σ2t − σt∣∣ = Op(T−δ

)as well.

Note that uniform consistency of σt implies, thanks to the properties of σt, supt=1,...,T σt = Op (1)

and supt=1,...,T σ−1t = Op (1).

Proof of item 3

Split the sample in B disjoint blocks of length M and assume that T = MB and [sT ] = M [sB]

for the sake of the exposition. Then

1√T

[sT ]∑t=1

g (zt)

1

2τ + 1

t+τ∑j=t−τ

h (zj)

=

=1√T

[sB]∑b=1

M∑m=1

g(zM(b−1)+m

) 1

2τ + 1

M(b−1)+m+τ∑j=M(b−1)+m−τ

h (zj)−M(b−1)+τ∑j=M(b−1)−τ

h (zj)

+

1√T

[sB]∑b=1

M∑m=1

g(zM(b−1)+m

) 1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

.

The rst summand on the r.h.s. is easily shown to be Op

(√TMτ supt∈1,...,T |h (zj)|

). For the

second, note that∣∣∣∣∣∣ 1√T

[sB]∑b=1

M∑m=1

g(zM(b−1)+m

) 1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

∣∣∣∣∣∣≤ 1√

T

B∑b=1

∣∣∣∣∣∣ 1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

( M∑m=1

g(zM(b−1)+m

))∣∣∣∣∣∣ .The expectation of the r.h.s. is given by

B∑b=1

E

∣∣∣∣∣∣ 1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

( M∑m=1

g(zM(b−1)+m

))∣∣∣∣∣∣

≤

√√√√√E

1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

2E

( M∑m=1

g(zM(b−1)+m

))2

27

where

E

1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

2 = O

(1

τ

)and

E

( M∑m=1

g(zM(b−1)+m

))2 = O (M) .

Hence

1√T

[sB]∑b=1

M∑m=1

g(zM(b−1)+m

) 1

2τ + 1

M(b−1)+τ∑j=M(b−1)−τ

h (zj)

= Op

(B√M√τT

)= Op

(√B

τ

)

and

1√T

[sT ]∑t=1

g (zt)

1

2τ + 1

t+τ∑j=t−τ

h (zj)

= Op

(max

√M

τT

1/2+γ ;

√B

τ

)

so choosing B appropriately leads to the desired result.

Proof of item 4

Use a Taylor series expansion for x−1/2 about x0 = 1 with rest term in dierential form to obtain

1√T

[sT ]∑t=1

g (zt)

(σtσt− 1

)= −1

2

1√T

[sT ]∑t=1

g (zt)

(σ2tσ2t− 1

)+

3

8

1√T

[sT ]∑t=1

g (zt) ξ−5/2t

(σ2tσ2t− 1

)2

= A1T +A2T

with ξt betweenσ2t

σ2tand unity for all t = 1, . . . , T . Now, for A1T , write

1√T

[sT ]∑t=1

g (zt)

(σ2tσ2t− 1

)=

1√T

[sT ]∑t=1

g (zt)1

2τ + 1

t+τ∑j=t−τ

(1

σ2t(σjzj + (µj − µj))2 − 1

)

=1√T

[sT ]∑t=1

g (zt)

2τ + 1

t+τ∑j=t−τ

σ2j z2j − σ2tσ2t

+1√T

[sT ]∑t=1

g (zt)

2τ + 1

t+τ∑j=t−τ

(µj − µj)2

σ2t

+2√T

[sT ]∑t=1

g (zt)1

σ2t

1

2τ + 1

t+τ∑j=t−τ

σjzj (µj − µj) .

= B1T +B2T +B3T .

For B1T , we have

1√T

[sT ]∑t=1

g (zt)1

2τ + 1

t+τ∑j=t−τ


=1√T

[sT ]∑t=1

g (zt)1

2τ + 1

t+τ∑j=t−τ

(z2j − 1

)

+1√T

[sT ]∑t=1

g (zt)1

2τ + 1

t+τ∑j=t−τ

(σ2j − σ2t

)σ2t

z2j ,

28

where the rst summand on the r.h.s. vanishes thanks to item 3, while for the second we employ

a Taylor series approximation of σ2 (·) about t/T to obtain

1√T

[sT ]∑t=1

g (zt)

2τ + 1

t+τ∑j=t−τ

(σ2j − σ2t

)z2j

σ2t=

1√T

[sT ]∑t=1

g (zt)

2τ + 1

t+τ∑j=t−τ

∂σ2

∂s

∣∣∣s= t

T

j−tT

(z2j − 1

)σ2t

+1√T

[sT ]∑t=1

g (zt)

2τ + 1

t+τ∑j=t−τ

∂σ2

∂s

∣∣∣s= t

T

j−tT

σ2t

+1√T

[sT ]∑t=1

g (zt)

2τ + 1

t+τ∑j=t−τ

∂2σ2

∂s2

∣∣∣s=ξt,j

(j−t)2T 2 z2j

σ2t

= C1T + C2T + C3T

for suitable ξt,j between t/T and j/T − t/T . Here, C1T vanishes along the lines of item 3 by noting

that deterministic weights don't aect the result, C2T = 0 and

|C3T | ≤sCτ2

T√T

supt=1,...,T

|g (zt)| supt=1,...,T

z2t ;

this is seen to vanish too uniformly in s ∈ [0, 1] since, given the niteness of moments of any

order of zt and thus of z2t and g (zt), we have supt |g (zt)| = Op (T γ) = supt=1,...,T z2t for any

γ > 0, and γ can then be chosen arbitrarily close to 0 to make the r.h.s. op (1).

For B2T , we have∣∣∣∣∣∣ 1√T

[sT ]∑t=1

g (zt)1

σ2t

1

2τ + 1

t+τ∑j=t−τ

(µj − µj)2∣∣∣∣∣∣ ≤ C sup

t|g (zt)|

1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

(µj − µj)2

≤ C supt|g (zt)|

1√T

T∑t=1

(µt − µt)2 + op (1)

with |g (zt)| = Op (T γ) for any γ > 0. We show in the following that

1√T

T∑t=1

(µt − µt)2 =1√T

T∑t=1

1

2τ + 1

t+τ∑j=t−τ

((µt − µj)− σjzj)

2

= Op

(T−δ

)(10)

for some 0 < δ < min κ1 − 1/2; 3/4− κ2, and simply pick γ < δ for our purposes. With

the help of the Cauchy-Schwarz inequality, the term is easily seen to vanish when the terms1√T

∑Tt=1

(1

2τ+1

∑t+τj=t−τ (µt − µj)

)2and 1√

T

∑Tt=1

(1

2τ+1

∑t+τj=t−τ σjzj

)2both vanish themselves.

This is indeed the case under our rate restrictions considering that

1√T

T∑t=1

1

2τ + 1

t+τ∑j=t−τ

(µt − µj)

2

= O

(√Tτ2

T 2

)

29

and

E

1√T

T∑t=1

1

2τ + 1

t+τ∑j=t−τ

σjzj

2 ≤ √T E

1

2τ + 1

t+τ∑j=t−τ

σjzj

2 = C

√T

τ

thanks to the uniform L4-boundedness of normalized running averages of zt, see the proof of

item 1; thus, B2T vanishes at the required rate.

Moving on, we have

B3T =2√T

[sT ]∑t=1

g (zt)

σ2t

1

2τ + 1

t+τ∑j=t−τ

σjzj

µj − 1

2τ + 1

j+τ∑k=j−τ

σkzk −1

2τ + 1

j+τ∑k=j−τ

µk

= − 2√

T

[sT ]∑t=1

g (zt)

σ2t

1

(2τ + 1)2

t+τ∑j=t−τ

σjzj

j+τ∑k=j−τ

(µk − µj)

− 2√T

[sT ]∑t=1

g (zt)

σ2t

1

(2τ + 1)2

t+τ∑j=t−τ

σjzj

j+τ∑k=j−τ

σkzk

,

where the rst summand on the r.h.s. is immediately shown to vanish thanks to item 3 after

noting that deterministic weights of uniform order O (τ/T) do not aect the arguments there.

A tedious, yet straightforward application of the blocking arguments from the proof of item 3

shows the second summand to vanish in probability as well.

Summing up, sups∈[0,1] |A1T |p→ 0; to complete the result, note that

0 ≤ sups∈[0,1]

|A2T | ≤ C supt=1,...,T

∣∣∣ξ−5/2t

∣∣∣ supt=1,...,T

|g (zt)| sups∈[0,1]

1√T

[sT ]∑t=1

(σ2tσ2t− 1

)2

,

where the r.h.s., and thus A2T , vanishes since supt=1,...,T

∣∣∣ξ−5/2t

∣∣∣ = Op (1), supt=1,...,T |g (zt)| =

Op (T γ) for any positive γ and sups∈[0,1]1√T

∑[sT ]t=1

(σ2t

σ2t− 1)2

= 1√T

∑Tt=1

(σ2t

σ2t− 1)2

= Op(T−δ

),

analogously to Equation (10), so the result follows after choosing γ < δ.

Proof of item 5

We have that

1√T

[sT ]∑t=1

(xt − µtσt

− zt)2

=1√T

[sT ]∑t=1

(zt

(σtσt− 1

)+µt − µtσt

)2

=1√T

[sT ]∑t=1

z2t

(σtσt− 1

)2

+1√T

[sT ]∑t=1

1

σ2t(µt − µt)2

+2√T

[sT ]∑t=1

zt

(σtσt− 1

)1

σt(µt − µt) .

30

Noting that supt=1,...,T σt is bounded in probability, an application of the Cauchy-Schwarz in-

equality for the third summand on the r.h.s. shows that the result follows when the two terms

sups∈[0,1]1√T

∑[sT ]t=1 z

2t

(σtσt− 1)2

and sups∈[0,1]1√T

∑[sT ]t=1

1σ2t

(µt − µt)2 vanish in probability. To

show this, we have like in the proof of item 4 that∣∣∣∣∣∣ sups∈[0,1]

1√T

[sT ]∑t=1

z2t

(σtσt− 1

)2∣∣∣∣∣∣ ≤ sup

t=1,...,Tz2t sup

s∈[0,1]

1√T

[sT ]∑t=1

(σtσt− 1

)2

= supt=1,...,T

z2t1√T

T∑t=1

(σtσt− 1

)2

= op (1)

since supt=1,...,T z2t = Op (T γ) and 1√

T

∑Tt=1

(σtσt− 1)2

= Op(T−δ

), where γ < δ may be picked,

and similarly

0 ≤ sups∈[0,1]

1√T

[sT ]∑t=1

1

σ2t(µt − µt)2 ≤ sup

t=1,...,T

1

σ2t

1√T

T∑t=1

(µt − µt)2 = op (1) ,

since 1√T

∑Tt=1 (µt − µt)2 = op (1), again like in the proof of item 4.

Proof of item 6

Note that rt =(xt−µtσt− zt

)(ϕ (zt) + ϕ′ (ξt)

(xt−µtσt− zt

))where ϕ and ϕ′ are bounded. The

result follows with item 5 if supt

∣∣∣xt−µtσt− zt

∣∣∣ = Op (1). This is indeed the case, since

xt − µtσt

− zt =

(σtσt− 1

)zt +

µt − µtσt

where µt and σt converge uniformly at some rate Op(T δ), see item 2, and supt |zt| = Op (T γ)

for any γ > 0 such that choosing γ < δ leads to the desired result.

Proof of item 7

Begin by writing

1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

σjσtzj =

1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

(σjσt− 1

)zj

+1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

zj .

= A1T +A2T .

31

We now show the rst summand to vanish and resort to this end to the Taylor series approxi-

mation of x−1/2 employed in the proof of item 4 to obtain analogously

A1T = −1

2

1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

(σ2tσ2j− 1

)zj

+3

8

1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

ξ−5/2t,j

(σ2tσ2j− 1

)2

zj

where ξt,j lies betweenσjσt

and unity for all t = 1, . . . , T , being hence uniformly bounded. The

rst summand of A1T can be shown to be negligible by writing

1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

(σ2tσ2j− 1

)zj

=1√T

[sT ]∑t=1

(pk−1t ϕ (zt)− E

(pk−1t ϕ (zt)

)) 1

2τ + 1

t+τ∑j=t−τ

(σ2tσ2j− 1

)zj

+ E(pk−1t ϕ (zt)

) 1√T

[sT ]∑t=1

(σ2tσ2j− 1

)1

2τ + 1

t+τ∑j=t−τ

zj ;

and noting that arguments analog to those in the proof of item 3 apply.

For the second summand of A1T , with ϕ (·) being bounded on R, we have

0 ≤ sups∈[0,1]

∣∣∣∣∣∣ 1√T

[sT ]∑t=1

pk−1t ϕ (zt)1

2τ + 1

t+τ∑j=t−τ

ξ−5/2t,j

(σ2tσ2j− 1

)2

zj

∣∣∣∣∣∣≤ max

x∈Rϕ (x) sup

t|zt| sup

t,j

(ξ−5/2t,j

) 1√T

T∑t=1

(σ2tσ2j− 1

)2

,

with 1√T

∑Tt=1

(σ2t

σ2j− 1

)2

vanishing like in the proof of item 4 and supt |zt| = Op (T γ) for positive

γ arbitrarily close to zero.

To complete the result, write

A2T =1√T

[sT ]∑t=1

(pk−1t ϕ (zt)− E

(pk−1t ϕ (zt)

)) 1

2τ + 1

t+τ∑j=t−τ

zj

+ E(pk−1t ϕ (zt)

) 1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

zj ,

where the rst summand on the r.h.s. vanishes thanks to item 3, while the second delivers the

desired approximation upon re-arranging its sum elements.

32

Proof of item 8

Write

1√T

[sT ]∑t=1

pk−1t ϕ (zt) zt

(σtσt− 1

)=

1√T

[sT ]∑t=1

(pk−1t ϕ (zt) zt − E

(pk−1t ϕ (zt) zt

))(σtσt− 1

)

+ E(pk−1t ϕ (zt) zt

) 1√T

[sT ]∑t=1

(σtσt− 1

).

The rst summand on the r.h.s. vanishes, see item 4, and, with the same Taylor series expansion

of x−1/2 employed there, we have for the second summand that

1√T

[sT ]∑t=1

(σtσt− 1

)= −1

2

1√T

[sT ]∑t=1

(σ2tσ2t− 1

)+

3

8

1√T

[sT ]∑t=1

ξ−5/2t

(σ2tσ2t− 1

)2

= A1T +A2T .

with ξt lying betweenσ2t

σ2tand unity for all t = 1, . . . , T . The arguments in the proof of item 4

apply directly, with the exception of the analogues of B1T and B3T . For the analog of B1T from

the proof of item 4 we write

1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ


=1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

(z2j − 1

)+

1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

(σ2j − σ2t

)z2j

σ2t

where the sumands of the rst term on the r.h.s. are re-arranged to give the desired approxima-

tion, and the second term is given, similarly to the proof of item 4, by

1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

(σ2j − σ2t

)z2j

σ2t=

1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

∂σ2

∂s

∣∣∣s= t

T

j−tT

(z2j − 1

)σ2t

+1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

∂σ2

∂s

∣∣∣s= t

T

j−tT

σ2t

+1√T

[sT ]∑t=1

1

2τ + 1

t+τ∑j=t−τ

∂2σ2

∂s2

∣∣∣s=ξt,j

(j−t)2T 2 z2j

σ2t

= C1T + C2T + C3T

for suitable ξt,j between t/T and j/T − t/T . To analyze C1T , re-arrange sum terms to obtain

C1T =C√T

1

(2τ + 1)T

[sT ]∑t=1

(z2t − 1

)τ(τ + 1) +Op

(τ2) = op (1)

uniformly in s ∈ [0, 1],

C2T = 0,

33

and, for all s ∈ [0, 1],

0 ≤ C3T ≤Cτ2

T 2√T

[sT ]∑t=1

z2t ≤Cτ2

T 2√T

T∑t=1

z2t = Op

(τ2

T√T

)= op (1) .

For the analog of B3T from the proof of item 4, we re-arrange sum terms to obtain

2√T

[sT ]∑t=1

1

σ2t

1

2τ + 1

t+τ∑j=t−τ

σjzj (µj − µj) =2√T

[sT ]∑t=1

σtzt (µt − µt)1

2τ + 1

t+τ∑j=t−τ

1

σ2j+ op (1) .

To complete the result, write

1√T

[sT ]∑t=1

σtzt (µt − µt) =1√T

[sT ]∑t=1

σtzt

1

2τ + 1

t+τ∑j=t−τ

(µt − µj)

− 1√T

[sT ]∑t=1

σtzt

1

2τ + 1

t+τ∑j=t−τ

σjzj

where both summands on the r.h.s. can be shown to vanish uniformly in s using e.g. item 3.

Proof of Lemma 1

We let w.l.o.g. τµ = τσ = τ . Write with a Taylor expansion

pt = pt + ϕ (zt)

(xt − µtσt

− zt)

+ ϕ′ (ξt)

(xt − µtσt

− zt)2

= pt + rt,

where ξt lies betweenxt−µσt

= zt andxt−µtσt

= zt; note that ϕ′ (·) is bounded on R. Then,

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)=

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)+

k√T

[sT ]∑t=1

pk−1t rt +k (k − 1)

2√T

[sT ]∑t=1

pk−1t r2t

where pt lies between pt and pt. Since pt ∈ [0, 1] ∀t, like pt and pt, we have that

0 ≤ 1√T

[sT ]∑t=1

pk−1t r2t ≤1√T

[sT ]∑t=1

r2tp→ 0

uniformly in s, thanks to Lemma 2 item 6.

We may then focus on

k√T

[sT ]∑t=1

pk−1t rt =k√T

[sT ]∑t=1

pk−1t ϕ (zt)

(xt − µtσt

− zt)

+k√T

[sT ]∑t=1

pk−1t ϕ′ (ξt)

(xt − µtσt

− zt)2

where the second summand vanishes uniformly in s since∣∣∣∣∣∣ k√T[sT ]∑t=1

pk−1t ϕ′ (ξt)

(xt − µtσt

− zt)2∣∣∣∣∣∣ ≤ C√

T

[sT ]∑t=1

(zt − zt)2

34

due to the boundedness of ϕ′ and pt, and Lemma 2 item 5 applies. Now,

zt − zt =σtzt + µt − µt

σt− zt = zt

(σtσt− 1

)− 1

2τ + 1

t+τ∑j=t−τ

σjσtzj +

1

σt

1

2τ + 1

t+τ∑j=t−τ

(µt − µj) ,

such that the leading term of k√T

∑[sT ]t=1 p

k−1t rt is given by

k√T

[sT ]∑t=1

pk−1t ϕ (zt) (zt − zt) =k√T

[sT ]∑t=1

pk−1t ϕ (zt) zt

(σtσt− 1

)

− k√T

[sT ]∑t=1

pk−1t ϕ (zt)

1

2τ + 1

t+τ∑j=t−τ

σjσtzj

+

k√T

[sT ]∑t=1

pk−1t ϕ (zt)

σt

1

2τ + 1

t+τ∑j=t−τ

(µt − µj)

where∣∣∣∣∣∣ 1√T

[sT ]∑t=1

1

σtpk−1t ϕ (zt)

1

2τ + 1

t+τ∑j=t−τ

(µt − µj)

∣∣∣∣∣∣ ≤ 1√T

[sT ]∑t=1

1

σtpk−1t ϕ (zt)

1

2τ + 1

t+τ∑j=t−τ

|µt − µj |

= Op

(τ√T

)with pt and φ (zt) being bounded and positive, and supt σt bounded in probability and non-

negative.

Using Lemma 2 again, items 7 and 8, we obtain uniformly in s ∈ [0, 1] that

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)=

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)− kE

(pk−1t ϕ (zt)

) 1√T

[sT ]∑t=1

zt

−k2

E(pk−1t ztϕ (zt)

) 1√T

[sT ]∑t=1

(z2t − 1

)+ op (1)

as required for the result, which follows with a multivariate invariance principle for strongly

mixing sequences (see e.g. Davidson, 1994, Chapter 29).

Proof of Proposition 1

To simplify notation we provide the arguments for tk only; the extension for K > 1 is trivial.

The arguments in the proof of Theorem 2 in Kiefer and Vogelsang (2005) indicate that

tk =

1√T

∑Tt=1

(pkt − 1

k+1

)√− 1T 2

∑T−1i=1

∑T−1j=1

T 2

B2k′′(i−jB

)1√T

∑it=1

(pkt − pk

)1√T

∑jt=1

(pkt − pk

) + op (1)

35

for kernels with smooth derivatives, or

tk =

1√T

∑Tt=1

(pkt − 1

k+1

)√

2bT

∑Ti=1

(1√T

∑it=1

(pkt − pk

))2− 2

bT

∑[(1−b)T ]i=1

(1√T

∑it=1

(pkt − pk

))(1√T

∑i+[bT ]t=1

(pkt − pk

))+op (1)

for the Bartlett kernel. From Lemma 1, we know that

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)⇒ Bk (s)− kϑk−1W1 (s)− k

2$k−1W2 (s) ,

where the process on the r.h.s. is Brownian motion, say B(s), with variance

ω2k = (e′k,−kϑk−1,−

k

2$k−1)Ξ(ek,−kϑk−1,−

k

2$k−1)

′

where ek is the kth column of the K × K identity matrix. Nonsingularity of Ξ ensures that

ω2k > 0, which then cancels out, so the the continuous mapping theorem [CMT] then establishes

the desired limiting null distribution.

36

C More on parametric mean adjustment

Since this section only serves the purpose of illustrating the inuence the specic choice of model

has on the feasible PITs pt, we treat σt as known and set it to unity; similar eects are expected

if σt is to be modeled as well. Concretely, consider a parametric model for the mean of the

observed time series xt such that

xt = µ (t/T ,θ) + σtzt .

Note that normalizing the time is not restrictive, since one may e.g. redene a classical linear

trend model µt = θ1 + θ2t as µt = θ1 + (Tθ2) t/T without loss of generality. We take the mean

component to satisfy the following requirements.

Assumption 3 Let µ (s,θ) have uniformly continuous 2nd order partial derivatives. The rst

and second order partial derivatives w.r.t. θ are weakly bounded uniformly in s, in the sense that

there exists a nondecreasing function f such that max∥∥∥∂µ(s,θ)∂θ

∥∥∥ ;∥∥∥∂2µ(s,θ)∂θ∂θ′

∥∥∥ ≤ f (‖θ‖) for all

s ∈ [0, 1].

This assumption allows for polynomial trend models, µ (s,θ) =∑p+1

j=1 sj−1θj , for breaks in the

mean, µ (s,θ) = θ1 + θ2I (s ≥ τ), for smooth mean changes, e.g. µ (s,θ) = 11+exp(θ3(s−θ4))θ1 +

exp(θ3(s−θ4))1+exp(θ3(s−θ4))θ2, or for µ (s,θ) = θ1 +

∑pj=1 (θ2j sin 2πjs+ θ2j+1 cos 2πjs) motivated by approx-

imations via Fourier sums.

Based on this model, one obtains

pt = Φ (zt) = Φ(xt − µ

(t/T , θ

))by plugging in an estimator θ which is taken to be

√T -consistent. The straightforward choice is

the NLS estimator, which we employ in the following; some of the requirements of Assumption

3, e.g. referring to the Hessian of m, help establish the limiting behavior of the NLS estimator.

Irrespective of what estimator is used, we note that

pt = Φ(zt −

(µ(t/T , θ

)− µ (t/T ,θ)

))(11)

such that the estimation has an eect. The following Lemma provides the precise result when

xt is parametrically adjusted for nonzero mean.

Lemma 3 Under Assumptions 1 through 3, it holds as T →∞ that

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)⇒ Bk (s)− kϑk−1δ′ (s,θ) Θ (1) (12)

where Θ (1) =(´ 1

0∂µ(s,θ)∂θ

∂µ(s,θ)∂θ

′ds)−1 ´ 1

0∂µ(s,θ)∂θ dW1 (s), δ (s,θ) =

´ s0∂µ(r,θ)∂θ dr and ϑk = E

(pktϕ (zt)

)as before.

37

Proof of Lemma 3

Begin by discussing the limiting behavior of the NLS estimators θ. We have under Assumptions

1 and 3 that

√T(θ − θ

)⇒(ˆ 1

0

∂µ (s,θ)

∂θ

∂µ (s,θ)

∂θ

′ds

)−1 ˆ 1

0

∂µ (s,θ)

∂θdW1 (s) ;

this is a standard application of extremum estimator theory and we omit the details.

With the application of the mean value theorem when k = 1 (or Taylor series expansion with

rest term in dierential form) we obtain

pt = pt + ϕ (zt)(µ (t/T ,θ)− µ

(t/T , θ

))+ ϕ′ (ξt)

(µ (t/T ,θ)− µ

(t/T , θ

))2where ξt lies between zt and zt− µ

(t/T , θ

)+ µ (t/T ,θ) for each t. The exact values for ξt do not

matter since ϕ′ is bounded. A second expansion, here about θ, is required for the trend function

µ:

µ (t/T ,θ)− µ(t/T , θ

)= −∂µ (t/T ,θ)

∂θ

′ (θ − θ

)−(θ − θ

)′ ∂2µ (t/T ,θ)

∂θ∂θ′

∣∣∣∣θ=ϑt

(θ − θ

)again with ϑt between θ and θ (note that since t is an argument of µ, ϑ also depends on t hence

the notation). Putting the two together we obtain

1√T

[sT ]∑t=1

(pt −

1

2

)=

1√T

[sT ]∑t=1

(pt −

1

2

)−

1√T

[sT ]∑t=1

ϕ (zt)∂µ (t/T ,θ)

∂θ

′ (θ − θ)

−(θ − θ

)′ 1√T

[sT ]∑t=1

ϕ (zt)∂2µ (t/T ,θ)

∂θ∂θ′

∣∣∣∣θ=ϑt

(θ − θ)+Rs,T

where Rs,T is just the normalized partial sums of ϕ′ (ξt)(µ (t/T ,θ)− µ

(t/T , θ

))2.

Examining the third summand on the r.h.s., we note that the boundedness of ϕ′ and the fact

that

∣∣∣∣ ∂m(t/T ,θ)∂θ

∣∣∣θ=ϑt

∣∣∣∣ ≤ f (‖ϑt‖) ≤ f(

max‖θ‖ ;

∥∥∥θ∥∥∥) make the partial sums of order Op (T ),

but θ − θ = Op(T−0.5

)and the normalization with

√T make the entire summand vanish.

For the fourth summand, Rs,T , we have with a rst-order Taylor expansion, µ (t/T ,θ)−µ(t/T , θ

)=

∂µ(t/T ,θ)∂θ

∣∣∣′θ=ϑt

(θ − θ

)with ϑt between θ and θ for each t, that

Rs,T =(θ − θ

)′ 1√T

[sT ]∑t=1

ϕ′ (ξt)∂µ (t/T ,θ)

∂θ

∣∣∣∣θ=ϑt

∂µ (t/T ,θ)

∂θ

∣∣∣∣′θ=ϑt

(θ − θ) .Similarly, ϕ′ is bounded and

∣∣∣∣ ∂µ(t/T ,θ)∂θ

∣∣∣θ=ϑt

∣∣∣∣ ≤ f (‖ϑt‖) ≤ f(

max‖θ‖ ;

∥∥∥θ∥∥∥) for all t, it

follows that supsRs,T = Op(T−1/2

).

38

Summing up, we are left with the rst two summands,

1√T

[sT ]∑t=1

(pt −

1

2

)=

1√T

[sT ]∑t=1

(pt −

1

2

)−

1√T

[sT ]∑t=1

ϕ (zt)∂µ (t/T ,θ)

∂θ

∣∣∣∣θ

′ (θ − θ)+ op (1) ;

the same arguments show that analogous relations hold for pkt . With√T(θ − θ

)⇒ Θ (1)

and 1T

∑[sT ]t=1 p

k−1t ϕ (zt)

∂µ(t/T ,θ)∂θ ⇒ E

(pk−1t ϕ (zt)

) ´ s0∂µ(r,θ)∂θ dr = ϑk−1δ (s,θ), the desired result

follows.

Remark 6 Bai and Ng (2005) show in their Theorem 5 that regressing xt on a set of regressors

has no eect on the limiting distributions beyond that of the intercept. There is no contradiction

between their result and our Lemma 3, since the result in (12) applies in the case where the

regressors are deterministic. For a comparison with Theorem 5 in Bai and Ng (2005), take

one stochastic regressor and a linear model xt = θwt such that ∂µ(t/T ,θ)∂θ = wt. We obtain for

stationary regressors that 1T

∑[sT ]t=1 ϕ (zt)wt ⇒ s E (ϕ (zt)wt). Now, Bai and Ng (2005) assume

that an intercept is always present in the regression, which is equivalent to setting E (wt) = 0; they

also assume the regressors to be independent of zt, hence E (ϕ (zt)wt) = 0 and correspondingly

µ (s) = 0. This is not the case when wt is deterministic, say an intercept or a trend, and the

limiting distribution of θ needs to be taken into account.

Clearly, the estimation eect described by Equation (12) will aect the limiting xed-b distri-

bution of a statistic based on a parametric estimated standardization. The eect is dierent

from that deriven in Lemma 1, since the presence of Θ (1) (as opposed to W1 (s)) indicates a

bridge-type behavior of the limit process of the relevant partial sums. Moreover, the components

Θ and δ depend on the specic model µ chosen. The statistics can be made pivotal, see below,

but the limiting distributions are not the usual xed-b ones, except in the case of an intercept.

The bottom line is that dierent deterministic components will lead to dierent distributions

(with the exception of the small-b case, where χ2 asymptotics may be recovered for all consistent

choices of HAC covariance matrix estimator). This implies the need to simulate the distribu-

tions for each specic type of deterministic component accounted for in the data. While this

can be done in advance for some popular combinations (see below for the case of intercept and

trend, where the generalized Brownian bridge plays a role; cf. MacNeill, 1978), one solution for

a generic mean function m is to resort to some form of bootstrap. Since zt is strictly stationary

and mixing, the residual-based iid or wild bootstrap is likely valid, but we do not pursue the

topic here.

We now illustrate concretely the dierence between nonparametric and parametric mean adjust-

ment for the case of a linear trend. Considering constant variance for simplicity, we have the

following procedure simplied by the linearity of the mean function. Detrend xt using OLS regres-

sion and standardize the detrended series with σt to obtain zt. With(pt, . . . , p

Kt , zt

)′, compute

like in the mean case an xed-b estimate of the long-run covariance matrix of(pt, . . . , p

Kt , zt

)′,

say Γ, and, based on it, the scaling matrix ˆΩ = V ΓV with V like before, and then T from (9).

Then,

39

Proposition 2 Under Assumptions 1 and 2, it holds as T →∞ that

TK ⇒ W′K(1)Q−1K,b,κWK(1).

with

QK,b,κ = −ˆ 1

0

ˆ 1

0

1

b2κ′′(r − sb

)V (r)V ′(s) drds

for smooth kernels and

QK,b,κ =2

b

ˆ 1

0V (r)V ′(r)dr − 1

b

ˆ 1−b

0V (r + b)V ′(r) dr − 1

b

ˆ 1−b

0V (r)V ′(r + b) dr

for the Bartlett kernel, where V (s) is, for demeaning, the rst-order Brownian bridge

V (s) = WK(s)− sWK(1)

with W a vector of independent standard Wiener processes; for detrending, V (s) is the second-

level Brownian bridge

V (s) = WK(s) + (2s− 3s2)WK(1)− 6s(1− s)ˆ 1

0WK(s)ds.

Proof of Proposition 2

To deal with detrending, let µ = θ1 in Lemma 3 to obtain

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)⇒ Bk (s)− kϑk−1W1 (1) .

We then need to examine the limiting behavior of the suitably normalized partial sums of zt. To

this end, note that

1√T

[sT ]∑t=1

(σtσt− 1

)(zt − z) = op (1)

uniformly in s thanks to the arguments used in the proof of Lemma 1. Then,

1√T

[sT ]∑t=1

zt =1√T

[sT ]∑t=1

σtσt

(zt − z) =1√T

[sT ]∑t=1

(zt − z) +1√T

[sT ]∑t=1

(σtσt− 1

)(zt − z)

⇒ W1 (s)− sW1 (1) .

Let now

B (s) = (B1 (s) , . . . , BK (s) ,W1 (s))′

and

B = (B1 (s)− s ϑ0W1 (1) , . . . , BK (s)− sKϑK−1W1 (1) ,W1 (s)− sW1 (1))′ ;

40

using the arguments of the proof of Theorem 2 in Kiefer and Vogelsang (2005) together with the

Lemma 1, we obtain e.g. for smooth kernels

TK ⇒(V B

)′(1)

(V

(−ˆ 1

0

ˆ 1

0

1

b2κ′′(r − sb

)(B(r)− rB(1)

)(B(s)− sB(1)

)′drds

)V ′)−1

V B(1).

Note further that

V(B(s)− sB(1)

)= V

(B(s)− sB(1)

),

and let Y = V B such that

TK ⇒ Y ′(1)

(−ˆ 1

0

ˆ 1

0

1

b2κ′′(r − sb

)(Y (r)− rY (1)) (Y (s)− sY (1))′ drds

)−1Y (1)

where Y is a multivariate Brownian motion with covariance matrix Υ. To obtain the required

distribution, let W = Υ−1/2Y (s), and note that Υ cancels out. The result for the Bartlett kernel

follows analogously.

To deal with detrending, let µ = θ1 + θ2s in Lemma 3 to obtain

1√T

[sT ]∑t=1

(pkt −

1

k + 1

)⇒ Bk (s)− kϑk−1

(4sW1 (1)− 3s2W1 (1)− 6s(1− s)

ˆ 1

0sdW1(s)

).

Note that´ 10 sdW1(s) = W1(1)−

´ 10 W1(s)ds; use then the same steps as for demeaning to arrive

at the desired result.

41

D The Bai and Ng (2005) test procedure

The test statistic suggested by Bai and Ng (2005) is given by

µ34 = Y ′T (γΦγ)−1YT

where

YT =

[1√T

∑Tt=1(yt − y)3

1√T

∑Tt=1[(yt − y)4 − 3σ4]

]and

γ =

[−3σ2 0 1 0

0 −6σ2 0 1

]

y, σ and Φ are consistent estimators. The theoretical long-run covariance matrix Φ is given

by Φ = limT→∞ T E(ZZ ′) with Z ′ =[yt − µ, (yt − µ)2 − σ2, (yt − µ)3, (yt − µ)4 − 3σ4

]and Z being the sample mean of Zt. The limiting distribution of µ34 is χ2(2). This result

is motivated by the fact that under normality, one obtains YT = γ 1√T

∑Tt=1 Zt + op(1) with

1√T

∑Tt=1 Zt ⇒ N(0,Φ). We follow Bai and Ng (2005) and consider the Newey and West (1987)

estimator.

42

E Critical values

Table 18: Critical values via response curves from the W′K(1)Q−1K,b,κWK(1)-distribution. κ is

the Bartlett kernel. The regression is given by cv(b) = a0 + a1b + a2b2 + a3b

3 + error withcorresponding R2. Nominal signicance levels are 0.9, 0.95, 0.975, 0.99 and 0.995.

a0 a1 a2 a3 R2

K = 10.9 2.7055 6.1598 8.6142 -3.3854 0.99980.95 3.8415 10.2574 15.6231 -7.0320 0.99970.975 5.0239 15.8489 24.5892 -12.5751 0.99950.99 6.6349 26.3361 36.1330 -19.6341 0.99940.995 7.8794 37.5823 41.2076 -21.6338 0.9991

K = 20.9 4.6052 15.5300 33.0455 -18.0050 0.99980.95 5.9915 24.2350 48.4528 -27.7431 0.99980.975 7.3778 35.6889 62.8696 -36.8917 0.99970.99 9.2103 53.2832 88.7896 -55.9722 0.99960.995 10.5966 71.9545 96.5536 -60.2045 0.9994

K = 30.9 6.2514 30.2793 67.5629 -42.2680 0.99980.95 7.8147 45.5956 88.1783 -56.1070 0.99970.975 9.3484 63.5918 109.2760 -70.7583 0.99970.99 11.3449 94.2752 127.9765 -84.0108 0.99960.995 12.8382 121.7357 137.7951 -91.2883 0.9994

K = 40.9 7.7794 54.1072 94.7069 -61.0147 0.99970.95 9.4877 76.3485 121.5104 -79.8180 0.99970.975 11.1433 102.1803 145.6040 -97.0618 0.99970.99 13.2767 142.5323 169.0490 -113.2457 0.99970.995 14.8603 177.5045 183.2276 -123.6561 0.9996

43

F Details on V matrices for dierent distributions

Table 19: Simulated coecients in the V matrixNormal ϑk $k t(3) ϑk $k

k = 1 0.28215 0.00023 k = 1 0.22969 -0.00018k = 2 0.14116 0.04605 k = 2 0.11475 0.02124k = 3 0.08588 0.04600 k = 3 0.06776 0.02126k = 4 0.05822 0.04004 k = 4 0.04428 0.01832

Log −Normal ϑk $k χ3(3) ϑk $k

k = 1 0.36215 -0.14576 k = 1 0.15910 -0.06493k = 2 0.12372 -0.02913 k = 2 0.06125 -0.00416k = 3 0.05921 -0.00549 k = 3 0.03239 0.00584k = 4 0.03382 0.00110 k = 4 0.02004 0.00753

44

References

Amado, C. and T. Teräsvirta (2013). Modelling volatility by variance decomposition. Journal

of Econometrics 175 (2), 142153.

Amado, C. and T. Teräsvirta (2014). Modelling changes in the unconditional variance of long

stock return series. Journal of Empirical Finance 25 (1), 1535.

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix

estimation. Econometrica 59 (3), 817858.

Andrews, D. W. K. and J. C. Monahan (1992). An improved heteroskedasticity and autocorre-

lation consistent covariance matrix estimator. Econometrica 60 (4), 953966.

Bai, J. (2003). Testing parametric conditional distributions of dynamic models. Review of

Economics and Statistics 85 (3), 531549.

Bai, J. and S. Ng (2005). Tests for skewness, kurtosis, and normality for time series data. Journal

of Business & Economic Statistics 23 (1), 4960.

Bontemps, C. and N. Meddahi (2005). Testing normality: a GMM approach. Journal of Econo-

metrics 124 (1), 149186.

Bontemps, C. and N. Meddahi (2012). Testing distributional assumptions: A GMM aproach.

Journal of Applied Econometrics 27 (6), 9781012.

Cavaliere, G. and A. M. R. Taylor (2008). Time-transformed unit root tests for models with

non-stationary volatility. Journal of Time Series Analysis 29 (2), 300330.

Cavaliere, G. and A. M. R. Taylor (2009). Heteroskedastic time series with a unit root. Econo-

metric Theory 25 (5), 12281276.

Clark, T. E. (2009). Is the Great Moderation over? An empirical analysis. Economic Review 4,

542.

Clark, T. E. (2011). Real-time density forecasts from Bayesian vector autoregressions with

stochastic volatility. Journal of Business & Economic Statistics 29 (3), 327341.

Davidson, J. (1994). Stochastic Limit Theory. Oxford university press.

Demetrescu, M. and C. Hanck (2012). Unit root testing in heteroskedastic panels using the

Cauchy estimator. Journal of Business & Economic Statistics 30 (2), 256264.

Durbin, J. (1973). Distribution theory for tests based on the sample distribution function, Vol-

ume 9. Society for Industrial and Applied Mathematics.

Guidolin, M. and A. Timmermann (2006). An econometric model of nonlinear dynamics in the

joint distribution of stock and bond returns. Journal of Applied Econometrics 21 (1), 122.

Jarque, C. M. and A. K. Bera (1980). Ecient tests for normality, homoscedasticity and serial

independence of regression residuals. Economics Letters 6 (3), 255259.

45

Justiniano, A. and G. Primiceri (2008). The time-varying volatility of macroeconomic uctua-

tions. American Economic Review 98 (3), 604641.

Khmaladze, E. V. (1981). Martingale approach in the theory of goodness-of-t tests. Theory of

Probability & Its Applications 26 (2), 240257.

Kiefer, N. M. and T. J. Vogelsang (2005). A new asymptotic theory for heteroskedasticity-

autocorrelation robust tests. Econometric Theory 21 (6), 11301164.

Knüppel, M. (2015). Evaluating the calibration of multi-step-ahead density forecasts using raw

moments. Journal of Business & Economic Statistics 33 (2), 270281.

Lanne, M., J. Luoto, and P. Saikkonen (2012). Optimal forecasting of noncausal autoregressive

time series. International Journal of Forecasting 28 (3), 623631.

Lanne, M. and P. Saikkonen (2011). Noncausal autoregressions for economic time series. Journal

of Time Series Econometrics 3 (3), article 2.

Lanne, M. and P. Saikkonen (2013). Noncausal vector autoregression. Econometric Theory 29 (3),

447481.

Lomnicki, Z. A. (1961). Tests for departure from normality in the case of linear stochastic

processes. Metrika 4 (1), 3762.

MacNeill, I. B. (1978). Properties of sequences of partial sums of polynomial regression resid-

uals with applications to tests for change of regression at unknown times. The Annals of

Statistics 6 (2), 422433.

Newey, W. K. and K. D. West (1987). A simple, positive semi-denite, heteroskedasticity and

autocorrelation consistent covariance matrix. Econometrica 55 (3), 70308.

Phillips, P. C. B. and K. L. Xu (2006). Inference in autoregression under heteroskedasticity.

Journal of Time Series Analysis 27 (2), 289308.

Sensier, M. and D. van Dijk (2004). Testing for volatility changes in U.S. macroeconomic time

series. The Review of Economics and Statistics 86 (3), 833839.

Stock, J. H. and M. W. Watson (2002). Has the business cycle changed and why? NBER

Macroeconomics Annual 17 (1), 159218.

Sun, Y. (2014a). Fixed-smoothing asymptotics in a two-step generalized method of moments

framework. Econometrica 82 (6), 23272370.

Sun, Y. (2014b). Let's x it: Fixed-b asymptotics versus small-b asymptotics in heteroskedas-

ticity and autocorrelation robust inference. Journal of Econometrics 178 (3), 659677.

Teräsvirta, T. and Z. Zhao (2011). Stylized facts of return series, robust estimates and three

popular models of volatility. Applied Financial Economics 21 (1-2), 6794.

Vogelsang, T. J. and M. Wagner (2013). A xed-b perspective on the Phillips-Perron unit root

tests. Econometric Theory 29, 609628.

46

Vogt, M. (2012). Nonparametric regression for locally stationary time series. The Annals of

Statistics 40 (5), 26012633.

Westerlund, J. (2014). Heteroscedasticity robust panel unit root tests. Journal of Business &

Economic Statistics 32 (1), 112135.

Xu, K.-L. (2008). Bootstrapping autoregression under non-stationary volatility. The Economet-

rics Journal 11 (1), 126.

Yang, J. and T. J. Vogelsang (2011). Fixed-b analysis of LM-type tests for a shift in mean. The

Econometrics Journal 14 (3), 438456.

47

· testing the marginal distribution in time-varying location-scale models matei demetrescu y...

Documents