garch(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of...

125

Upload: others

Post on 18-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

GARCH(1, 1) at small sample size and pairs

trading with cointegration

Wei Ruen Leong

A PhD thesis submitted to

School of Business and Social Sciences, Aarhus University,

in partial fulllment of the requirements of

the PhD degree in

Economics and Business Economics.

April 2018

Page 2: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Preface

This dissertation comes into fruition after three years of hard work as a PhD student at the

Department of Economics and Business Economics at Aarhus University from April 2015 to March

2018. I am grateful for the nancial support and excellent research facilities provided by the

department. I am also fortunate for being part of Center for Research in Econometrics Analysis

of Time Series (CREATES) funded by the Danish National Research Foundation.

I would like to personally thank my supervisor Professor Eric Hillebrand for his guidance and

patience. I am thankful for the time he allocated for our weekly meetings to hear my research

ideas and discussions. Special thanks go to Professor Robin Lumsdaine for hosting me for a

semester at American University in Washington DC. I would like to thank her for her hospitability

and advice which made my stay in DC a unique research experience.

I would also like to thank Solveig Nygaard Sørensen for her dedication to CREATES and excellent

administrative support, in particular for helping me out in travel issues before my departure to

DC. A special mention goes to Karin Vinding, who helped me extensively with the proof reading.

Thank you for your excellent work in polishing the rougher versions of the initial drafts. Another

thank you goes to Thomas Stephansen, who provided me with an initial draft of the dissertation

summary in Danish. I am also happy to be surrounded by my wonderful colleagues which made

my time at the department more eventful. Thank you all for the social event invitations and

discussions be it work or non-work related.

Another special thanks go to Begüm for being there for me and the sharing of our life journey

together. Thank for you bringing colors to my life, without you, my life would have been duller.

Last but not least, I would like to thank my parents for bringing me up, sacrices they made and

unconditional love they provided. Without them, I would not be who I am today.

i

Page 3: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

ii

Wei Ruen Leong

Aarhus, April 2018

Page 4: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Updated preface

The pre-defence meeting was held on May 7, 2018, in Aarhus. I am grateful to the members of the

assessment committee consisting of Professor Robin Lumsdaine (American University), Assistant

Professor Cristina Amado (University of Minho), and headed by Professor Asger Lunde (Aarhus

University), for their careful reading of the dissertation and their many insightful comments and

suggestions. Some of the suggestions have been incorporated into the present version of the dis-

sertation while others remain for future research.

Wei Ruen Leong

Aarhus, August 2018

iii

Page 5: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Contents

Summary vi

Danish summary ix

1 GARCH(1, 1) on small samples 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 The model and the log-likelihood function . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Monte Carlo study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Fitting the conditional variance to the squared returns . . . . . . . . . . . . . . . . 24

1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2 A modied GARCH(1, 1) model for cross-sectional data with small samples 40

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2 The Panel GARCH(1, 1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3 Monte Carlo study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.4 A modied GARCH(1, 1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.5 Empirical applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3 Pairs trading with cointegration on multiple stock indices 65

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

iv

Page 6: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

v

3.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3 Method of cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.4 Trading algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.5 Main application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.6 Post-study analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Page 7: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Summary

This dissertation consists of three self-contained chapters on time series econometrics, particularly

dealing with the topics of conditional variance models and cointegration. A signicant contribution

to the literature on the modeling of time-varying volatility was the introduction of the autoregres-

sive conditional heteroskedasticity (ARCH) model by Engle (1982), and subsequently generalized

by Bollerslev (1986). This gave rise to the popular generalized ARCH (GARCH) model. Over

the years, we have witnessed dierent extensions to the GARCH-type models to accommodate the

needs of a practitioner to better capture the stylized facts of nancial data. Despite the availabil-

ity of more sophisticated GARCH-type models, the original GARCH(1, 1) model remains popular

among practitioners, and is often used as the benchmark for model comparisons. For example, the

study by Hansen and Lunde (2005) concludes that the general reliability and forecasting perfor-

mance of GARCH(1, 1) is superior when compared to other GARCH-type models.

The asymptotic theory of GARCH(1, 1) has been well explored and documented in the literature,

see for example Lumsdaine (1996). The rst two chapters seeks to contribute to existing theories

by investigating the GARCH(1, 1) model at small sample sizes. The rst chapter motivates and

introduces the estimation issue encountered at small sample sizes with univariate GARCH(1, 1)

as the central focus. The second chapter is a continuation of the rst chapter with an extension

of the same model to a panel setting as done in Pakel et al. (2011), and the chapter proposes a

x to the estimation issue. The theme of the third and nal chapter is unrelated to that of the

two previous chapters. It is a chapter on the application of a nancial trading algorithm utilizing

cointegration (Engle and Granger, 1987) in the selection of pairs of stocks for trading.

Chapter 1 - GARCH(1,1) on small samples - conducts a Monte Carlo study on GARCH(1, 1) model

at small sample sizes with datagenerating values that are commonly recorded in empirical studies.

vi

Page 8: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

vii

The Monte Carlo study leads to the discovery that one of the model parameter estimates tends

to be biased downward and usually negative even though its data-generating value is positive. We

perform analytical work in the paper with the purpose of explaining why the negative parameter

can occur at small sample sizes.

Chapter 2 - A modied GARCH(1, 1) model for cross-sectional data with small samples - extends

the study of the parameter bias on GARCH(1, 1) model in small samples in a panel setting.

We propose a modication to the GARCH(1, 1) conditional variance equation to x the bias

problem. We present two empirical applications by applying the modied GARCH(1, 1) model

using macroeconomic panel data, and compare the improvement over the original model.

Chapter 3 - Pairs trading with cointegration on multiple stock indices - develops a popular sta-

tistical arbitrage trading algorithm using the concept of cointegration to select pairs of stocks for

trading. Although a signicant prot is obtained when the algorithm is implemented on stocks

from S&P500, losses are incurred from other indices, however. The nding contradicts to studies in

the literature that often show consistent prots being the norm. The huge losses can be attributed

to the short-lived cointegrating relationship among selected pairs of stocks.

References

Bollerslev, T. (1986), `Generalized autoregressive conditional heteroskedasticity', Journal of Econo-

metrics 31(3), 307327.

Engle, R. F. (1982), `Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom ination', Econometrica 50(4), 9871007.

Engle, R. F. and Granger, C. W. (1987), `Co-integration and error correction: Representation,

estimation, and testing', Econometrica 55(2), 251276.

Hansen, P. R. and Lunde, A. (2005), `A forecast comparison of volatility models: Does anything

beat a GARCH (1, 1)?', Journal of Applied Econometrics 20(7), 873889.

Page 9: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

viii

Lumsdaine, R. (1996), `Consistency and asymptotic normality of the quasi-maximum likelihood

estimator in IGARCH (1, 1) and covariance stationary GARCH (1, 1) models', Econometrica

64(3), 575596.

Pakel, C., Shephard, N. and Sheppard, K. (2011), `Nuisance parameters, composite likelihoods

and a panel of GARCH models', Statistica Sinica 21(1), 307329.

Page 10: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Dansk resume

Denne afhandling består af tre selvstændige kapitler inden for tidsserie-økonometri, og hvert kapi-

tel beskæftiger sig især med emnerne betinget-varians modellering og cointegration. Et bane-

brydende bidrag til modelleringen af tidsvarierende volatilitet er den autoregressive betingede

heteroskedasticitets-model (ARCH) af Engle (1982), som efterfølgende blev generaliseret ved Boller-

slev (1986), hvilket førte til den populære, generaliserede ARCH (GARCH) model. Gennem

årene har vi set forskellige udvidelser af modeller af GARCH-typen, der alle har været et forsøg

på at imødekomme behovene for bedre at kunne håndtere de stiliserede fakta, som økonomiske

data repræsenterer. På trods af tilgængeligheden af mere avancerede GARCH-modeller er den

oprindelige GARCH(1, 1) model fortsat populær og bruges ofte som benchmark for modelsam-

menligninger. Se for eksempel Hansen and Lunde (2005), der påviser større generel pålidelighed

og bedre forecasts ved brug af GARCH(1, 1) sammenlignet med andre GARCH-type modeller.

Den asymptotiske teori i forbindelse med GARCH(1, 1) er blevet udforsket og dokumenteret i

litteraturen, se for eksempel Lumsdaine (1996). De to første kapitler søger at bidrage til de

eksisterende teorier ved at undersøge GARCH(1, 1) i små samples. Det første kapitel motiverer og

introducerer estimeringsproblemet ved små samples med univariate GARCH(1, 1) som det centrale

fokus. Det andet kapitel er en fortsættelse af det første kapitel og udvider samme model til en

panelkontekst som set i Pakel et al. (2011), og kapitlet foreslår en løsning på estimeringsproblemet.

Temaet i det tredje og sidste kapitel er et helt andet. Det er et kapitel om anvendelse af nansielle

handelsalgoritmer, der udnytter cointegration (Engle and Granger, 1987) i udvælgelsen af aktiepar

til handel.

Kapitel 1 - GARCH(1,1) on small samples - gennemfører et Monte Carlo-studie på

GARCH(1, 1) ved små samplestørrelser med datagenererende parameterværdier, som normalt

ix

Page 11: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

x

registreres i empiriske undersøgelser. Monte Carlo-studiet fører til opdagelsen af, at et af model-

parameterestimaterne ofte udviser nedadgående (negativ) skævhed, selv om dets sande værdi er

positiv. Vores analytiske arbejde forklarer, hvorfor det negative parameter er mere dominerende

ved små samplestørrelser.

Kapitel 2 - A modied GARCH(1, 1) model for cross-sectional data with small samples - ud-

vider undersøgelsen af denne parameterbias i GARCH (1, 1) til en panelopstilling. Vi foreslår en

ændring af den betingede volatilitetsligning for GARCH(1, 1) som en løsning af biasproblemet.

Derefter giver vi et empirisk eksempel ved at bruge makroøkonomiske paneldata fra handel mellem

forskellige lande og sammenligne forbedringen i den modicerede GARCH(1, 1) i forhold til den

oprindelige model.

Kapitel 3 - Pairs trading with cointegration on multiple stock indices - udvikler en populær statistisk

arbitragehandelsalgoritme ved hjælp af cointegration til at vælge par af aktier til handel. Selv om

jeg nder en betydelig prot, når algoritmen implementeres på aktier fra S&P500, er tabene

imidlertid betydelige på aktier fra andre indekser. Mine resultater er i modstrid med eksisterende

studier i litteraturen, der ofte konsekvent påviser, at fortjeneste er normen. De store tab kan

skyldes det kortvarige cointegrationsforhold mellem de udvalgte aktier.

Litteratur

Bollerslev, T. (1986), `Generalized autoregressive conditional heteroskedasticity', Journal of Econo-

metrics 31(3), 307327.

Engle, R. F. (1982), `Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom ination', Econometrica 50(4), 9871007.

Engle, R. F. and Granger, C. W. (1987), `Co-integration and error correction: Representation,

estimation, and testing', Econometrica 55(2), 251276.

Hansen, P. R. and Lunde, A. (2005), `A forecast comparison of volatility models: Does anything

beat a GARCH (1, 1)?', Journal of Applied Econometrics 20(7), 873889.

Page 12: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

xi

Lumsdaine, R. (1996), `Consistency and asymptotic normality of the quasi-maximum likelihood

estimator in IGARCH (1, 1) and covariance stationary GARCH (1, 1) models', Econometrica

64(3), 575596.

Pakel, C., Shephard, N. and Sheppard, K. (2011), `Nuisance parameters, composite likelihoods

and a panel of GARCH models', Statistica Sinica 21(1), 307329.

Page 13: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Chapter1GARCH(1, 1) on small samples

Wei Ruen Leong

Aarhus University and CREATES

Eric Hillebrand

Aarhus University and CREATES

Abstract

In this paper we conduct a Monte Carlo study on generalized autoregressive conditional

heteroskedasticity (GARCH) model estimates for the GARCH(1, 1) case on small samples by

maximum likelihood. We discover an anomaly in the estimates of the parameters α associated

with the lagged squared return term. In constrained optimization, where α > 0 is imposed,

many estimates are on the boundary, and in unconstrained optimization, estimates are negative,

even if the data-generating value is positive. The negative α estimate increases the log-likelihood

value. The increase disappears, however, as the sample size increases. We present a number

of results that suggest that a negative estimate of α allows for a better t of the conditional

variance sequence to the squared return sequence. This better t comes at the cost of non-

sensical and non-stationary parameter values. At small sample sizes, however, these are of no

numerical consequences, i.e. the conditional variances remain positive and do not explode.

1

Page 14: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.1 Introduction 2

1.1 Introduction

A stochastic process may have time-varying conditional variance, and there are two main ap-

proaches to model this aspect of volatility. The rst one is based upon the stochastic volatility

model, which assumes an independent source of randomness in the variance process. The second

approach relies upon the generalized autoregressive conditional heteroskedasticity (GARCH) fam-

ily of models (see Teräsvirta (2009) for a general introduction), which, in contrast to stochastic

volatility models, specify the conditional variance as a deterministic function of past data and

model parameters. See Taylor (1994) for a general introduction to stochastic volatility models.

Since the seminal work by Engle (1982) on ARCH and a later generalization of the model by

Bollerslev (1986), we have seen a surge in the development of dierent variants of GARCH-type

models to capture dierent aspects of the nancial time series known as the stylized facts (Engle

and Patton, 2001). Some prominent examples include the GJR-GARCH model by Glosten et al.

(1993) and the TGARCH model by Zakoïan (1994) designed to capture the asymmetric response

to negative versus positive volatility shocks. In a study conducted by Hansen and Lunde (2005),

they compare the forecasting performance of models within the GARCH family using the model

condence set technique, and they conclude that GARCH(1, 1) generally performs the best. The

most parsimonious model in the GARCH family, the GARCH(1, 1) model, is usually the default

benchmark model when it comes to model comparison studies due to its simplicity.

Over the years, the asymptotic theory of GARCH(1, 1) has been explored and well documented

in various sources in the literature (see, for example, Lee and Hansen (1994); Lumsdaine (1996);

Giraitis et al. (2000); Francq and Zakoïan (2011)). A small-sample downward bias in the ARCH

model is documented in early simulation studies, for example in Engle et al. (1985) and Diebold

and Pauly (1989). Studies in the GARCH literature that use small sample sizes also repeatedly

point out other issues in the estimated parameters. For example, an investigation by Hwang

and Valls Pereira (2006) reveals that there may be a small sample bias in the GARCH(1, 1)

parameter estimates, in particular a downward bias. Another example by Iglesias and Phillips

(2011) connects the small sample estimation bias to the number of exogenous variables in the

GARCH mean equation with sample sizes of 50 and 100. Furthermore, Ng and Lam (2006), who

conduct the study on the eect of sample size on the GARCH model, discover that if the sample

size is lower than 700, the maximum likelihood algorithm will be directed to more than two optimal

Page 15: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.1 Introduction 3

parameter estimates. For that reason, they recommend a sample size of at least 1000 for model

estimation. Finally, Zumbach (2000) studies the GARCH estimation through a transformation

of the parameters. In his simulation study he notes that the GARCH process itself can lead to

dierent solutions due to the identity 1T

∑ε2t = 1

T

∑σ2t = ω

1−α−β , which holds asymptotically but

not in nite samples. Here, εt is the excess return, σ2t is the conditional variance, T is the sample

size, and ω, α, β are the parameters of the GARCH(1,1) model. When the identity is not fullled

to a high degree of accuracy, it renders the estimation process problematic.

Due to the advancement of data keeping technology, nancial volatility researchers now have the

convenience of working with large sample sizes. It is not uncommon to nd tick-sized data being

utilized in the high-frequency nance literature. However, for macroeconomic researchers, the op-

posite is true due to the low-frequency nature of macro data sets. Macroeconomic researchers there-

fore usually have to work with much smaller sample sizes compared to their nancial counterparts.

The sample sizes chosen for our Monte Carlo study reect those encountered in macroeconomic

studies. Although GARCH types of models are most commonly associated with the application

of nancial data, we also see various studies in the literature that apply GARCH models in a

macroeconomic context. In fact, the pioneering papers on GARCH by Engle (1982) and Bollerslev

(1986) provide empirical applications using macroeconomic data, specically data on ination.

It is well known that estimates of parameters in GARCH models do not have a closed-form solu-

tion, although the rst and second-order analytical derivatives can be derived (Fiorentini et al.,

1996), and numerical techniques have to be relied upon for estimation. However, the practical

implementation of GARCH estimation is often not discussed as highlighted in Zivot (2009), who

provides guides to the practical issues in estimating a GARCH model. Winker and Maringer

(2009) provide optimization guidelines associated with estimating a GARCH model numerically.

Additionally, Brooks et al. (2001), McCullough and Vinod (1999), and McCullough and Renfro

(1998) explore the various numerical issues associated with the GARCH log-likelihood maximiza-

tion. They discuss the impact of starting values, choice of optimizing algorithm, and convergence

criteria in the estimation process. The most popular method of estimating a GARCH model is

through the quasi maximum likelihood estimator (QMLE). Various alternative estimators exist

such as an autocorrelation function approach by Baillie and Chung (2001), a GMM estimator

by Skoglund (2001), an approximate closed-form method by Kristensen and Linton (2006), and

Page 16: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.2 The model and the log-likelihood function 4

nally a semi-parametric version by Mishra et al. (2010). However, we focus our investigation on

the small sample accuracy of the GARCH model estimated by QMLE since it remains the most

popular method in empirical research.

In this paper, we focus particularly on the downward bias of the estimates of the parameter α

in small samples. We conduct a Monte Carlo study to estimate GARCH(1, 1) at small sample

sizes. We show that a negative α estimate is allowed on small samples without compromising the

positivity constraint of the conditional variances. We also show that a negative α estimate allows

the conditional variance sequence to better trace the squared return sequence.

The rest of the paper is structured as follows. Section 1.2 introduces the model and the log-

likelihood function. Section 1.3 discusses our Monte Carlo study. Section 1.4 presents some

analytical results. Finally, Section 1.5 concludes.

1.2 The model and the log-likelihood function

We consider the simple GARCH(1, 1) model with random normal innovations, i.e. the data-

generating process is of the form

ε2t = σ2

t ξ2t , ξt

iid∼ N(0, 1), (1.2.1)

σ2t = ω0 + α0ε

2t−1 + β0σ

2t−1. (1.2.2)

For the joint density of a return sequence εtTt=1, we can write

g(ε1, ε2, , . . . , εT ) = g(εT |ε1, ε2, , . . . , εT−1)g(ε1, ε2, , . . . , εT−1)

= g(εT |ε1, ε2, , . . . , εT−1)g(εT−1|ε1, ε2, , . . . , εT−2)g(ε1, ε2, , . . . , εT−2)...

= g(εT |ε1, ε2, , . . . , εT−1)g(εT−1|ε1, ε2, , . . . , εT−2) . . . g(ε1).

By assumption of ξtiid∼ N(0, 1), we have that εt|(ε1, ε2, , . . . , εt−1) is conditionally normally dis-

tributed with mean zero and variance σ2t . For the GARCH(1, 1) likelihood function we then

Page 17: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.2 The model and the log-likelihood function 5

have

L(ω, α, β|ε1, ε2, , . . . , εT ) =1√

2πσ2T

exp(−1

2

ε2T

σ2T

)1√

2πσ2T−1

exp(−1

2

ε2T−1

σ2T−1

) . . .1√

2πσ21

exp(−1

2

ε21

σ21

).

(1.2.3)

The assumption of ξtiid∼ N(0, 1) is not required, but made here for ease of argument. If ξt is

not independently, identically, and normally distributed, then the maximization of the likelihood

function in (1.2.3) represents a quasi-maximum likelihood approach. The parameter vector to be

estimated is (ω, α, β).

The GARCH(1, 1) likelihood function is a product of normal densities, and it is well dened, as

long as σ2t > 0 for t = 1, 2, . . . , T . This means that for a xed realization of a return sequence

εt, some parameters may be estimated at negative values as long as the condition σ2t > 0 for all

t is satised. The log-likelihood is

logL(ω, α, β|ε1, ε2, , . . . , εT ) = −1

2

T∑t=1

log σ2t −

1

2

T∑t=1

ε2t

σ2t

− T

2log 2π.

Dene log L(.) := −2 logL(.)+T log 2π to eliminate the additive constant π and (negative) scaling

constant of −12, then the (quasi) log-likelihood estimation method solves the minimization problem

argmin(ω,α,β)

log L(ω, α, β|ε1, ε2, , . . . , εT ) = argmin(ω,α,β)

T∑t=1

(log σ2

t +ε2t

σ2t

). (1.2.4)

Since σ2t is an unobserved sequence, in order for the log-likelihood estimation to work, an initial

value has to be chosen for σ21 to start the recursive process of constructing σ2

t . Commonly, σ21 is

chosen to be the sample mean of ε2t , i.e.

1T

∑ε2t .

1.2.1 The summand of the log-likelihood function

In Section 1.3, we present a Monte Carlo study that shows that the estimation process degenerates

in very small samples, resulting in substantial bias. An important role in this phenomenon is played

by the summand of log L, i.e. log σ2t +

ε2tσ2t, for each t. The sequence ε2

t is given and xed by the

sample, and it can be treated as a sequence of positive real values, whereas σ2t ≡ σ2

t (ω, α, β) is the

Page 18: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.2 The model and the log-likelihood function 6

model to be tted to the ε2t sequence. At σ2

t = ε2t for all t, the log-likelihood function attains

a minimum that is, however, not attainable in any situation where the number of parameters

that govern σ2t is lower than the number T of observations of ε2

t. Asymptotically the optimality

condition is E(σ2t ) = E(ε2

t ), and this is satised in GARCH(1, 1) by the data-generating parameters.

When the identity σ2t = ε2

t does not hold for all t, the optimization still tries to trace the ε2t with

σ2t as closely as possible. The set of parameter estimates chosen by the optimizer to minimize the

distance of the σ2t from the ε2

t can be very dierent from the data-generating parameters in small

samples.

Since the optimization routine relies on the minimization of the problem in (1.2.4), we begin by

inspecting the function of interest, log L, to understand the problem. The summand, log σ2t +

ε2tσ2t,

can be represented by the function f(x) = log x + ax, where x > 0, and a ∈ R+ is some positive

constant. The function has the following properties:

1. f(x) is uniquely minimized at x = a.

2. If a 6= b, the functions log x+ ax, log x+ b

xdo not intersect each other anywhere for x > 0.

Proof. We begin by proving the rst property. The rst-order derivative is

f ′(x) =1

x− a

x2(1.2.5)

which with root x = a when f ′(x) = 0. To verify that it is the unique minimum, we calculate the

second order derivative, f ′′(x) =2a−xx3

. Therefore f ′′(a) = 1a2> 0.

Next, we prove the second property. Assume by contradiction that they intersect at some (common)

point x > 0. Then we have log x+ ax− log x− b

x= 0. a−b

x= 0 implies that a− b = 0 since x > 0,

but a− b = 0 contradicts a 6= b.

If σ2t = ε2

t for all t, then the log-likelihood function attains its unique global minimum. However,

σ2t = ε2

t for all t cannot be achieved by any parametric model σt(θ), where θ hasK < T parameters.

To illustrate, we plot the function f(σ2t ) = log σ2

t +ε2tσ2tfor dierent values of ε2

t in Figure 1.2.1,

where x and a stand for σ2t and ε2

t respectively. Once a sample is realized, ε2t is xed, and σ

2t is

the variable of interest which changes in value depending on the model parameters. From Figure

Page 19: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.2 The model and the log-likelihood function 7

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

2t

-1.5

-1

-0.5

0

0.5

1

1.5

2

1 2 3 4 5 6 7 8

2t

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

Figure 1.2.1: Plots of the function log σ2t +

ε2tσ2tfor dierent xed values of ε2

t . For example, the

blue line on the left-sided plot plots the function log σ2t + 0.02

σ2tfor values of σ2

t on the x−axis. Theleft-sided plot represents xed values for ε2

t < 1, and the right-sided plot represents xed valuesfor ε2

t > 1.

1.2.1, the plotted function has an asymmetric U-shape where the asymmetry stems from the two

terms in its rst-order derivative in (1.2.5). The term − ax2

dominates for a > x, whereas the term1xdominates for x > a. At x = a (the minimum point), the rst-order derivative is zero. Note that

the lines in those two graphs do not intersect each other, consistent with the second property.

1.2.2 Three specic log-likelihood values

As noted in Zumbach (2000), at a nite (small) sample, the quantities 1T

∑ε2t and

1T

∑σ2t may

not correspond to their expected values, therefore the optimizer may return spurious parameter

estimates. Zumbach (2000) demonstrates that the identity: 1T

∑ε2t = 1

T

∑σ2t = ω

1−α−β only holds

with a high degree of accuracy in suciently large samples. We take a point of departure dierent

from Zumbach (2000) and introduce three specic log-likelihood values of interest:

Page 20: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.2 The model and the log-likelihood function 8

1) The (unattainable) minimum of the function where σ2t = ε2

t for all t:

1

Tlog L(σ2

t = ε2t |ε2t

) =

1

T

∑log ε2

t +1

T

∑ ε2t

ε2t

(1.2.6)

=1

T

∑log ε2

t + 1

p→E(log ε2t ) + 1

2) The value attained by inserting the data-generating parameters:

1

Tlog L(σ2

t = σ20,t|ε2t

) =

1

T

∑log σ2

0,t +1

T

∑ ε2t

σ20,t

(1.2.7)

=1

T

∑log ε2

t −1

T

∑log ξ2

t +1

T

∑ξ2t

p→E(log ε2t )− E(log ξ2

t ) + E(ξ2t )

3) The value attained by replacing σ2t with ε

2, the sample mean of the ε2t -sequence:

1

Tlog L(σ2

t = ε2|ε2t

) =

1

T

∑log ε2 +

1

T

∑ ε2t

ε2(1.2.8)

= log ε2 + 1p→ log E(ε2

t ) + 1

At the asymptotic level, the three dierent log-likelihood values introduced above can be ranked

from smallest to largest as follows: 1. (1.2.6), 2. (1.2.7), 3. (1.2.8). We show this in Lemma 1.

Lemma 1. If ξ2t is i.i.d. chi-squared, then E(log ε2

t )+1 ≤ E(log ε2t )−E(log ξ2

t )+E(ξ2t ) ≤ log E(ε2

t )+

1.

Remark. The dierence between the minimal log-likelihood value and the log-likelihood value

achieved by the data-generating parameters is the term E(log ξ2t ). log ξ2

t follows a log chi-squared

distribution (Lee, 2012), and its expectation E(log ξ2t ) is −

√π+ 1

2≈ −1.27. Even if ξ2

t is not i.i.d.

chi-squared, the result of the lemma still holds because the function, -log ξ2t + ξ2

t has a minimum

value of 1.

Page 21: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 9

Proof. First, we prove the inequality E(log ε2t ) + 1 ≤ E(log ε2

t ) − E(log ξ2t ) + E(ξ2

t )︸ ︷︷ ︸1

. Canceling

common terms, we arrive at −E(log ξ2t ) =

√π − 1

2≥ 0, which is true. To prove the second

inequality, we need to show E(log ε2t ) − E(log ξ2

t ) + E(ξ2t )︸ ︷︷ ︸

1

≤ log E(ε2t ) + 1. Assume it is not

true. Then we have log E(ε2t ) < E(log ε2

t ) − E(log ξ2t ) = E(log

ε2tξ2t

) = E(log σ20,t). The equality

E(ε2t ) = E(σ2

0,t) implies that log E(ε2t ) = log E(σ2

0,t) ≥ E(log σ20,t) by Jensen's inequality, which

contradicts the assumption.

The result of the lemma indicates that ideally, we want to select (ω, α, β) such that σ2t = ε2

t for

all t to achieve the minimal log-likelihood value. However, this is not possible because the number

of equations (T ) exceeds the number of parameters when T > 3. The natural questions to ask

are whether a set of parameters (ω, α, β) exists such that σ2t ≈ ε2

t for all t to a high degree of

accuracy, and if the resulting log-likelihood value is smaller than that of (1.2.7)? And if so, how

dierent is the set of parameters compared to the data-generating parameters (ω0, α0, β0)? We

seek to answer those questions in our Monte Carlo study.

1.3 Monte Carlo study

We explore the small-sample performance of the GARCH(1, 1) model estimation in a Monte Carlo

study. The parameters are estimated using maximum likelihood assuming a normal distribution

of the innovation sequence. There is no closed-form solution for the maximum likelihood estima-

tors for GARCH(1, 1), and consequently numerical methods have to be applied. We follow the

convention of applying a standard gradient-based optimizer, the Newton-Raphson algorithm, for

our study.

1.3.1 Setup

We consider the data-generating process (DGP) of the form:

Page 22: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 10

ε2t = C0 + σ2

t ξ2t , ξt

iid∼ N(0, 1)

σ2t = ω0 + α0ε

2t−1 + β0σ

2t−1,

which represents a GARCH(1, 1) model with random normal innovations and an intercept term

in the mean equation. We adopt the data-generating parameter values used in Lumsdaine (1995).

That is, we set ω0 = 1 and consider three other choices of (α0, β0) (see Table 1.1). For the

parameter C0, Lumsdaine used a value of 1, but we choose a value of 0. Selecting C0 = 0 reects

the common practice of demeaning the ε2t sequence before passing it to the optimizing algorithm.

When generating the simulated data and during estimation, we follow Lumsdaine's procedure of

starting the recursion process with σ21 = 1 . We also discard the rst 100 observations during the

simulation to avoid initial value bias.

We choose a Newton-Raphson algorithm for our estimation routine (Lumsdaine chose a modied

simulated annealing algorithm). Throughout the estimation, we impose the non-negativity and

weak stationarity constraints ω > 0, 0 < α < 1 and 0 < β < 1. The constraints guarantee non-

negativity of the conditional variance, σ2t with probability 1 (Nelson and Cao, 1992). However,

for a higher order GARCH(p, q) process, the non-negativity constraints can be relaxed and the

non-negativity of the conditional variances is still maintained (He and Teräsvirta, 1999; Tsai and

Chan, 2008).

We are interested in sample sizes smaller than the sample size of 500 reported in Lumsdaine (1995),

and thus we consider T = 200, 150, 50, 25, and 10. For each scenario, we run 10,000 Monte Carlo

replications.

Additionally, we compute the excess kurtosis of all specications in this study. Dene ν = 3α20 +

2α0β0 + β20 , if ν < 1, then the fourth-order moment, E(ε4

t ), exists (Bollerslev, 1986). Denote the

excess kurtosis,

κ = (E(ε4t )− 3E(ε2

t )2)E(ε2

t )−2

= 6α20(1− 3α2

0 + 2α0β0 + β20)−1

= 6α20(1− ν)−1.

Page 23: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 11

Model 1 2 3α0 0.10 0.25 0.40β0 0.50 0.65 0.50ν 0.38 0.94 1.31κ 0.10 5.77 -

Table 1.1: Specications of (α0, β0) corresponding to models 3, 4, 5 in Lumsdaine (1995) with thecorresponding values of ν and κ.

A value of κ = 0 indicates meso-kurtosis, a prominent feature of the normal distribution, whereas a

value greater than zero means lepto-kurtosis (fat-tailedness). To capture fat-tailedness of empirical

nancial data, an alternative to the assumption of random normal innovation is the t-distribution

such as in Hung et al. (2008) or a skewed version of it as in Harris et al. (2004).

1.3.2 Results

Table 1.2 reports the estimated sample average of the parameters over all Monte Carlo replications

(10,000 in total). The last reported statistic labeled BP (short for boundary proportion) is dened

as the proportion of cases where the estimated α is less than 0.0001.

Generally speaking, at T = 200, all models have sample average estimates that are close to their

data-generating parameter values, albeit with a leftward bias for α and a rightward bias for ω. We

nd that for the rst model even at T = 200, the BP is close to 0.20, meaning that about 20% of

the times, the estimated α is very close to its boundary value of zero.

For the second and third model, the BP statistics show few estimates of close-to-zero α. When

the sample size decreases, a general trend can be observed where the BP statistics approach 1

regardless of the specications of α0 and β0. The prevalence of close-to-zero α estimates is in

line with the pile-up eect noted in Lumsdaine (1995), that at small sample sizes, the parameter

estimates pile up at their boundary values. The procedure described in Andrews (2001) can be

used to test whether the α coecient lies on the boundary, but the test will be less accurate at

small sample sizes.

Table 1.3 reports the results for a number of dierent specications of (α0, β0) with T xed at

200. Note that at a common empirical estimate, (0.04, 0.90), the BP statistic is close to 0.40. The

Page 24: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 12

Model 1: α0 = 0.1, β0 = 0.5

T 200 150 100 50 25 10

ω 1.011 1.018 0.975 0.927 0.852 0.632α 0.095 0.096 0.094 0.100 0.088 0.048β 0.498 0.498 0.517 0.533 0.571 0.689BP 0.213 0.268 0.371 0.531 0.720 0.908

Model 2: α0 = 0.25, β0 = 0.65

ω 1.261 1.378 1.529 1.769 1.715 1.342α 0.232 0.230 0.215 0.183 0.131 0.045β 0.629 0.617 0.614 0.619 0.681 0.801BP 0.018 0.044 0.125 0.383 0.675 0.936

Model 3: α0 = 0.4, β0 = 0.5

ω 1.165 1.261 1.332 1.489 1.787 1.541α 0.377 0.372 0.354 0.310 0.242 0.111β 0.485 0.473 0.477 0.500 0.537 0.708BP 0.000 0.008 0.044 0.229 0.486 0.840

Table 1.2: Average parameter estimates and proportion of estimates close to the boundary (BP).The table reports the average over 10,000 replications for each scenario.

general nding is that the closer α0 is to zero, the higher the BP statistic. This can be problematic

since empirical estimates of GARCH model often display α estimates that are small. Recall that

an estimate of zero α indicates constant unconditional variance and thus an unidentied GARCH

model.

Figures 1.3.1a through 1.3.1f plot representative Monte Carlo runs of ε2t -sequences and the (esti-

mated) σ2t -sequences, for T = 200. Recall that the global minimum of the (negative) likelihood

function at σ2t = ε2

t for all t is unattainable, and the optimization algorithm should therefore strive

to t the σ2t -sequence as closely as possible to the ε2

t -sequence. By visual inspection, models 2, 3

and 6 show a relatively better t than models 1, 4 and 5. Note from Table 1.2 that models 1, 4 and

5 have relatively high BP values. Their reported BPs are 21.3%, 42.8% and 22.7%, respectively.

The highest BP statistic among models 2, 3 and 6 is only 4% for model 6.

Motivated by our previous Monte Carlo results that showed that the estimated α parameter often

Page 25: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 13

0 20 40 60 80 100 120 140 160 180 200

No. of observations

0

5

10

15

20

25

30

35

40

Seq

uenc

e va

lue

Plot of return vs. cond. var sequences of GARCH(1, 1)

(a) α0 = 0.1, β0 = 0.5

0 20 40 60 80 100 120 140 160 180 200

No. of observations

0

10

20

30

40

50

60

70

80

90

100

Seq

uenc

e va

lue

Plot of return vs. cond. var sequences of GARCH(1, 1)

(b) α0 = 0.25, β0 = 0.65

0 20 40 60 80 100 120 140 160 180 200

No. of observations

0

10

20

30

40

50

60

70

80

90

Seq

uenc

e va

lue

Plot of return vs. cond. var sequences of GARCH(1, 1)

(c) α0 = 0.4, β0 = 0.5

0 20 40 60 80 100 120 140 160 180 200

No. of observations

0

50

100

150

200

250S

eque

nce

valu

ePlot of return vs. cond. var sequences of GARCH(1, 1)

(d) α0 = 0.04, β0 = 0.9

0 20 40 60 80 100 120 140 160 180 200

No. of observations

0

100

200

300

400

500

600

700

800

900

Seq

uenc

e va

lue

Plot of return vs. cond. var sequences of GARCH(1, 1)

(e) α0 = 0.09, β0 = 0.9

0 20 40 60 80 100 120 140 160 180 200

No. of observations

0

10

20

30

40

50

60

70

80

90

Seq

uenc

e va

lue

Plot of return vs. cond. var sequences of GARCH(1, 1)

(f) α0 = 0.2, β0 = 0.79

Figure 1.3.1: Representative Monte Carlo plots of ε2t and σ

2t for the six dierent models.

Page 26: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 14

Model 4 5 6(α0,β0) (0.04,0.90) (0.09,0.90) (0.20,0.79)

ω 2.104 4.049 2.247α 0.038 0.069 0.177β 0.834 0.879 0.775BP 0.428 0.227 0.040ν 0.887 0.996 1.060κ 0.08 13.14 -

Table 1.3: Average parameter estimates and proportion of α estimates close to the boundary (BP).The table reports the average over 10,000 replications for each scenario. The kurtosis measures νand κ are calculated from the data-generating parameters and not from the estimates. T is xedat 200.

lies close to its boundary value, we re-perform the same optimizing procedure, but this time

without the constraints, ω > 0 and 0 < α, β < 1. We choose to initialize dierently this time

with σ21 = 1

T

∑ε2t as recommended by Bollerslev (1986). As a robustness check, we also consider

the data-generating unconditional variance, ω0/(1− α0 − β0), as the initial value. We change the

data-generating ω0 from 1 to 0.001 and choose dierent (α0, β0) to reect estimates commonly

found in empirical studies, e.g. Bollerslev (1987), Engle et al. (1990), and Baillie and Bollerslev

(2002). The small value chosen for ω0 has a disadvantage that was noted in Ma et al. (2007).

They report spurious inference due to a distortion in the size of the t-statistics. Therefore, as

a robustness check, we run the same Monte Carlo study with ω0-values that are larger and well

within the interior of the interval (0, 1), and our conclusions remain practically the same.

As before, we consider M = 10, 000 Monte Carlo replications, but with small T of sizes 50, 25, 10,

and 5. We also add a group with T = 10, 000 as a control to verify the accuracy of our optimizer

in large samples. Running unconstrained optimization, we include a safeguard in our code that

checks for negative σ2t implied by the estimated parameter (ω, α, β). If, for any t, σ2

t < 0, then the

objective function is assigned a large value, and thus the corresponding (ω, α, β) will be avoided

by the optimizing algorithm. This safeguard was not needed in the previous Monte Carlo study

because the positivity constraints of ω, α, β > 0 automatically ensured that σ2t > 0 is satised for

all t. The optimizer is initialized at the data-generating parameter values. Since we now consider

unconstrained optimization, the boundary proportion (BP) statistic is redened as the proportion

Page 27: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 15

of cases where the estimated α is negative. We also report the safeguard proportion (SP) statistic

for reference, where the SP is computed as the number of times the safeguard was triggered divided

by the number of times the objective function was evaluated.

Table 1.4 reports the Monte Carlo results. The main nding can be summarized as follows: when

T is very small, we observe negative α, and β is larger than one. This nding is independent of

our choice of initialization on the rst conditional variance, σ21, to 1 or ω0/(1 − α0 − β0). In a

majority of cases, the proportion of negative α estimates (BP) decreases as T increases, and the

values lie between 80% to 90%. The estimates of α become less highly negative in general as T

increases due to the constraint σ2t > 0 for all t. This is explained in Section 1.4 by the α lower

bound sequence. At T = 10, 000, the control group, the estimates of ω, α, and β are close to the

data-generating values, as expected by maximum likelihood theory, and the BPs are also zero in

all cases. The safeguard proportion (SP) is below 10% in all cases.

1.3.3 The likelihood function on very small samples

The ndings in Table 1.4 motivate the main research question of this chapter: For suciently small

T , does the unconstrained maximum of the log-likelihood function of the GARCH(1, 1) model lie

in the region α < 0, β > 1, and not in the point of the data-generating parameters and, if so, why?

Consider the data-generating process ω0 = 0.001, α0 = 0.04, and β0 = 0.90. We simulated the

εtTt=1 sequence for three dierent sizes T = 10, 25, 50. We x ω = ω0, i.e. at the data-

generating value, then we estimate α and β by maximizing the GARCH(1, 1) log-likelihood. We

consider the optimization in two dierent ways: First, we constrain α < 0 and β > 1 (region

1), then we constrain α > 0 and β < 1 (region 2). Figures 1.3.2a through 1.3.2f display, for

some representatively chosen simulations and sample sizes T = 10, 25, 50, the realizations of

the ε2t -sequence, the data-generating σ

2t -sequence, and the estimated σ2

t -sequences from regions 1

and 2.

The purposes of the plots in Figures 1.3.2a through 1.3.2f are to provide some graphical examples

of (1.2.6) from Subsection 1.2.2 at work for dierent small-sample sizes, T . Specically, the green

and blue sequences in the plots show that σ2t from region 1 (green) approximates ε2

t (blue).Recall that in (1.2.6), when σ2

t = ε2t , then the optimal log-likelihood value is achieved. However,

this is not possible as mentioned previously due to the number of equations (T ) exceeding the

Page 28: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 16

Model 1: α0 = 0.04, β0 = 0.90T 5 10 25 50 10 000ω 0.0085 0.0022 0.0020 0.0026 0.0012α -2.4039 -0.8531 -0.2739 -0.0846 0.0425β 3.6427 1.7070 1.1544 0.9290 0.8661BP 0.974 0.949 0.937 0.777 0.000SP 0.070 0.081 0.088 0.079 0.039

Model 2: α0 = 0.04, β0 = 0.95ω 0.2329 0.0134 0.0505 0.0139 0.0007α -1.0034 -0.0540 -0.0877 -0.0361 0.0404β 2.2176 1.2307 1.1069 0.9701 0.9383BP 0.933 0.916 0.901 0.844 0.000SP 0.063 0.079 0.096 0.093 0.043

Model 3: α0 = 0.09, β0 = 0.90ω 0.0760 0.0361 0.0241 0.0401 0.0013α -0.0232 -0.0580 -0.0269 -0.0454 0.0882β 1.8807 1.3031 1.1347 0.8196 0.8965BP 0.952 0.896 0.875 0.860 0.000SP 0.063 0.078 0.092 0.081 0.043

Model 4: α0 = 0.02, β0 = 0.97ω -0.0735 0.0094 0.0092 0.0064 0.0011α -2.6903 -0.9870 -0.3466 -0.1691 0.0200β 4.0911 1.8576 1.2496 1.1059 0.9690BP 0.893 0.962 0.984 0.957 0.000SP 0.062 0.079 0.097 0.096 0.041

Model 5: α0 = 0.05, β0 = 0.93ω -0.0314 0.0052 0.0053 0.0061 0.0010α -2.6977 -0.9197 -0.3085 -0.1064 0.0500β 3.9642 1.8074 1.2076 0.9880 0.9291BP 0.888 0.956 0.964 0.817 0.000SP 0.064 0.079 0.091 0.084 0.040

Model 6: α0 = 0.05, β0 = 0.84ω -0.0039 0.0015 0.0015 0.0023 0.0010α -2.2896 -0.7597 -0.2038 -0.0376 0.0504β 3.4741 1.6106 1.0417 0.7938 0.8364BP 0.872 0.942 0.843 0.663 0.000SP 0.070 0.080 0.083 0.068 0.049

Table 1.4: Average parameter estimates (estimated without parameter constraints), proportions ofnegative α estimates (BP), and proportion of times the estimation safeguard was triggered dividedby the number of times the objective function was called (SP) for the six dierent models . Thetable reports the average over 10,000 replications for each scenario.

Page 29: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 17

number of parameters when T > 3, and that is the case for our chosen values of T in those

examples. As evident from those plots, when T increases, σ2t from region 1 is still trying to

approximate ε2t, but the quality of the approximation deteriorates due to the increasing number

of equations (T ) to fulll. We also plot the σ2t sequence from region 2 (red) to compare it directly

with the graphical behavior of the green sequence. Across all the T s considered, we nd that the

red sequence is approximately constant, and it approximates the sample unconditional variance,1T

∑ε2t , that is represented by the purple sequence.

To conclude, when the log-likelihood function is maximized over the two dierent regions, we nd

that the optimal solution from region 1 tries to t σ2t to be as close to ε2

t as possible, whereasthe optimal solution from region 2 tries to t σ2

t to be close to the sample unconditional variance,1T

∑ε2t . The maximal log-likelihood value found in region 1 is always bigger than that found in

region 2.

To further examine the behavior of the log-likelihood function, we prepare plots of average (neg-

ative) log-likelihood surfaces on the space of α and β. We generate 10,000 replications of ε2t -

sequences from a GARCH(1, 1) process generated by a given parameter vector (ω0, α0, β0), then

we x ω = ω0 and calculate the log-likelihood of (ω0, αi, βi) for a grid of points (αi, βi) from

regions 1 and 2, respectively. We do this for each of the 10,000 realizations and average the log-

likelihood values for each grid point. We consider the sample sizes T = 5 in Figures 1.3.3 and 1.3.4

and T = 1000 in Figures 1.3.5 and 1.3.6. Since we multiply the log-likelihood value by -1, we are

interested in the minimum of the surface values. Also, we cap the surface values at 1 for visibility.

The gures show that the minimum of the (negative) log-likelihood function is obtained in region

1, that is, in the region where α < 0 and β > 1. Motivated by the observations made so far in the

Monte Carlo study, we consider a hypothetical question: If a monotonically increasing sequence

of ε2t is encountered, are α < 0 and β > 1 still needed to trace σ2

t with ε2t ? We run a small

Monte Carlo study to investigate. Let the data-generating parameters be ω0 = 0.001, α0 = 0.04,

and β0 = 0.90. We simulate the sequence of ε2t until we have 1,000 monotonically increasing

sequences of ε2t for T = 5, 6, 7, 8, 9. Table 1.5 reports the proportion of negative α out of

these 1,000 sequences for each T considered. As it turns out, when a monotonically increasing

sequence is encountered, negative α estimates are no longer common for small T . We conjecture

that when a sequence has a downward movement, i.e. ε2t − ε2

t−1 < 0 for some t, a negative α

Page 30: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 18

1 2 3 4 5 6 7 8 9 100

0.005

0.01

0.015

0.02

0.025

0.03

Seq

uenc

e va

lue

(a) T=10, L1 = 20.16, L2 = 18.83

1 2 3 4 5 6 7 8 9 100

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Seq

uenc

e va

lue

(b) T=10, L1 = 16.19, L2 = 13.88

0 5 10 15 20 250

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Seq

uenc

e va

lue

(c) T=25, L1 = 48.73, L2 = 46.33

0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Seq

uenc

e va

lue

(d) T=25, L1 = 29.57, L2 = 28.84

0 5 10 15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

Seq

uenc

e va

lue

(e) T=50, L1 = 87.51, L2 = 86.82

0 5 10 15 20 25 30 35 40 45 500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Seq

uenc

e va

lue

(f) T=50, L1 = 67.57, L2 = 64.03

Figure 1.3.2: Each graph has ve sequences plotted: the blue sequence is ε2t, the green represents the tted σ2t

sequence from the solution obtained in region 1, the red shows the tted σ2t sequence from region 2. The light blue

sequence is the data-generating σ2t sequence, the purple sequence is the sample mean of ε2t. L1 and L2 are the tted

log-likelihood values from region 1 and 2, respectively.

Page 31: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 19

-8

-6

1.4 0

-4

-logL

-0.1

-2

-0.2

0

1.2-0.3

-0.41 -0.5

(a) Region 1

-8

1

-6

-4

0.6

-logL

-2

0.5 0.4

0

0.2

0 0

(b) Region 2

Figure 1.3.3: Average surface plots at T = 5, ω0 = 0.001, α0 = 0.04, β0 = 0.90. The average (over10,000 replications) negative log-likelihood values are plotted on the vertical axis along grids of αand β on the horizontal axes.

-4

-3

1.4

-2

0

-logL

-1

-0.1

0

-0.21.2

1

-0.3-0.4

1 -0.5

(a) Region 1

1

-3

-2

0.6

-logL

-1

0

0.5 0.4

1

0.2

0 0

(b) Region 2

Figure 1.3.4: Average surface plots at T = 5, ω0 = 0.001, α0 = 0.04, β0 = 0.95. The average (over10,000 replications) negative log-likelihood values are plotted on the vertical axis along grids of αand β on the horizontal axes.

Page 32: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 20

-400

-300

1.4 0

-200

-logL

-0.1

-100

-0.21.2

0

-0.3-0.4

1 -0.5

(a) Region 1

-1500

1

-1000

0.6

-logL

-500

0.5 0.4

0

0.2

0 0

(b) Region 2

Figure 1.3.5: Average surface plots at T = 1000, ω0 = 0.001, α0 = 0.04, β0 = 0.90. The average(over 10,000 replications) negative log-likelihood values are plotted on the vertical axis along gridsof α and β on the horizontal axes.

-200

-150

1.4 0

-100

-logL

-0.1

-50

-0.21.2

0

-0.3-0.4

1 -0.5

(a) Region 1

-600

1

-500

-400

0.6

-300

-logL

-200

0.5 0.4

-100

0

0.2

0 0

(b) Region 2

Figure 1.3.6: Average surface plots at T = 1000, ω0 = 0.001, α0 = 0.04, β0 = 0.95. The average(over 10,000 replications) negative log-likelihood values are plotted on the vertical axis along gridsof α and β on the horizontal axes.

Page 33: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 21

T 5 6 7 8 9Prop. α < 0 0.011 0.001 0.000 0.000 0.000

Table 1.5: The proportion of negative α estimates out of 1,000 monotonically increasing sequencesof ε2

t for dierent sample sizes, T .

estimate is helping to improve the log-likelihood value. Consider the case of initialization with

σ21 = 1

T(ε2

1 + . . .+ ε2T ) = ε2. With T = 5, we have

σ22 = ω + αε2

1 + βε2

σ23 = ω(1 + β) + α(ε2

2 + βε21) + β2ε2

σ24 = ω(1 + β + β2) + α(ε2

3 + βε22 + β2ε2

1) + β3ε2

σ25 = ω(1 + β + β2 + β3) + α(ε2

4 + βε23 + β2ε2

2 + β3ε21) + β4ε2.

Their dierences are given as

σ23 − σ2

2 = ωβ + αε22 + α(β − 1)ε2

1 + β(β − 1)ε2

σ24 − σ2

3 = ωβ2 + αε23 + α(β − 1)ε2

2 + αβ(β − 1)ε21 + β2(β − 1)ε2

σ25 − σ2

4 = ωβ3 + αε24 + α(β − 1)ε2

3 + αβ(β − 1)ε22 + αβ2(β − 1)ε2

1 + β3(β − 1)ε2

and can be written more generally as

σ2t − σ2

t−1 = ωβt−2 + αε2t−1 + α(β − 1)

t−1∑j=2

βt−1−jε2j−1 + βt−2(β − 1)ε2, for t ≥ 3. (1.3.1)

Let It =ε2t , ε

2t−1, . . . , ε

20

. Take the same data-generating parameters of ω0 = 0.001, α0 = 0.04,

and β0 = 0.90 with T = 5. At each t, we compute the value of ε2t − ε2

t−1. We are interested in the

case where ε2t − ε2

t−1 is negative, i.e. there is a downward movement (about 50% of the time, see

Table 1.6). If ε2t − ε2

t−1 is negative, we substitute dierent values of α and β into (1.3.1). Table 1.7

presents the dierent values of α and β and the estimate P (σ2t −σ2

t−1 < 0|It) , which is computed as

the number of times σ2t −σ2

t−1 is negative divided by the total number of ε2t sequences simulated

(10,000 in our example). From Table 1.7, we see that the positive parameters (rst two columns)

can produce a downward movement of σ2t , i.e. σ

2t − σ2

t−1 < 0 at most 30% of the time. The rst

column contains the data-generating parameters. In contrast, if we allow for α < 0 and β > 1,

Page 34: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 22

Tt

2 3 4 5 6 7 8 9

5 49.93 49.97 50.02 49.986 49.91 50.04 50.02 49.92 50.047 49.99 49.99 49.97 49.94 49.99 50.028 49.99 49.98 49.99 49.97 49.98 49.99 49.999 49.89 50.00 50.03 49.92 50.09 49.90 49.94 50.02

Table 1.6: The table reports the probability estimates of a downward movement in the squaredreturn sequence, P (ε2

t − ε2t−1 < 0), in percentage for dierent sample sizes T at each t. The

probability estimate is computed as the number of times the squared return dierence is negativedivided by 10,000, the number of replications.

t(α, β)

(0.04, 0.9) (0.1, 0.85) (−0.25, 1.1) (−0.3, 1.3)

3 26.09 25.95 74.19 54.334 26.68 28.00 76.61 52.085 27.24 29.98 79.28 45.06

Table 1.7: The table reports the probability estimates of a downward movement in the conditionalvariance, P (σ2

t − σ2t−1 < 0|It−1), in percentage for T = 5, at each t for dierent (α, β). The value

of ω is xed at 0.001 across all columns. The probability estimate is computed as the number oftimes the conditional variance dierence is negative divided by 10,000, the number of replications.

then σ2t − σ2

t−1 < 0 can be produced more frequently. The negative values of α in the last two

columns are chosen in this example such that they do not produce a negative σ2t for any t. In other

words, they satisfy the α lower bound inequalities presented later in Section 1.4.

1.3.4 Probability of negative dierences in σ2t

We provide a numerical example to illustrate why α < 0 and β > 1 improve the log-likelihood by

allowing for more exibility in matching a situation ε2t−ε2

t−1 < 0 with a corresponding σ2t−σ2

t−1 < 0.

Recall that

σ2t = ω + αε2

t + βσ2t−1

= ω + α(σ2t−1ξ

2t−1) + βσ2

t−1,

Page 35: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.3 Monte Carlo study 23

so that the rst dierence expression becomes

σ2t − σ2

t−1 = ω + (αξ2t−1 + β − 1)σ2

t−1.

If σ2t − σ2

t−1 < 0, then αξ2t−1 + β − 1 < 0. The inverse is not true, however, since ω > 0 can

overcompensate the negativity of αξ2t−1 + β − 1. Therefore, αξ2

t−1 + β − 1 < 0 is a necessary but

not sucient condition for σ2t − σ2

t−1 < 0.

If αξ2t−1 + β − 1 > 0, then σ2

t − σ2t−1 > 0 for ω > 0, meaning that P(σ2

t − σ2t−1 < 0) = 0. In other

words, αξ2t−1 + β − 1 > 0 is a sucient condition to guarantee P(σ2

t − σ2t−1 < 0) = 0, assuming

ω > 0.

Denote the event

A : ω + (αξ2t−1 + β − 1)σ2

t−1 < 0, and

B : (αξ2t−1 + β − 1)σ2

t−1 < 0.

For ω > 0, we have ω + (αξ2t−1 + β − 1)σ2

t−1 > (αξ2t−1 + β − 1)σ2

t−1. Therefore, A ⊂ B, implying

that P(A) < P(B). Note that event A is the event of interest, i.e. σ2t − σ2

t−1 < 0. We now see that

0 < P (A) < P (B), where P (B) is an upper bound to our probability of interest. For example, let

the true parameters (α0, β0) = (0.09, 0.90). Note that

P(B) = P((αξ2t−1 + β − 1)σ2

t−1 < 0)

= P (αξ2t−1 + β − 1 < 0),

where the last equality is due to σ2t > 0. Assuming ξt ∼ N(0, 1), which implies ξ2

t ∼ χ21, then

P(αξ2t−1 + β − 1 < 0)= P(ξ2

t−1 <1− βα

)

≈ 0.71.

Now pick some α < 0 and β > 0, e.g. (α, β) = (−1, 1.01), then when we follow the previous step

of calculation and remember to ip the sign due to α < 0, then

P(αξ2t−1 + β − 1 < 0) = P(ξ2

t−1 >1− βα

)

≈ 0.92.

Page 36: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 24

To conclude, choosing α < 0 and β > 1 suitably gives more probability mass to P(A). This

illustrates the phenomenon that the parameter space in the infeasible region has more exibility

in tracking a downward movement in the squared returns, i.e. ε2t − ε2

t−1 < 0, with a corresponding

downward movement in σ2t .

1.4 Fitting the conditional variance to the squared returns

1.4.1 Bounds and extrema

As we have seen, as T grows, the estimates of α and β move into the meaningful region where

α ≥ 0, 0 < β < 1, and α + β ≤ 1, and eventually they converge to their data-generating values.

In this section, we derive a lower bound for α such that the implied conditional variance remains

positive, and we thus show that within this bound, a negative value of α is actually feasible. The

rst-order optimality condition ε2t = σ2

t for all t cannot be satised by a reasonable parametric

model for σ2t . Instead, we seek a set θ of parameters that minimizes the (negative) likelihood term∑

t(log σ2t + ε2

t/σ2t ). Loosely speaking, the optimization strives to trace the ε2

t -sequence with the

σ2t -sequence as closely as it is possible by varying the parameters ω, α, and β. At the same time,

ω, α, and β must be chosen such that σ2t > 0 for all t. A sucient condition to guarantee the

positivity of σ2t for all t is ω > 0, α > 0, and β > 0. Furthermore, α + β < 1 guarantees weak

stationarity of the GARCH(1, 1) process (Bollerslev, 1986).

In the remainder of this section, we consider limitations of the GARCH(1, 1) maximum likelihood

estimators in tting the σ2t -sequence to the ε2

t -sequence. We consider the initialization with σ21 =

1T

(ε21 + . . . + ε2

T ) = ε2 as suggested by Bollerslev (1986). Other initializations can be treated

analogously with ε2 being replaced by e.g. 1, as done in Lumsdaine (1995).

1.4.1.1 The α lower bound sequence

Recall that in Section 1.2, the GARCH(1, 1) likelihood function is a product of normal density

functions, and it is well dened as long as σ2t > 0 for t = 1, 2, . . . , T .

Similar to the presentation in Section 1.2, we consider the case of initialization with σ21 = 1

T(ε2

1 +

Page 37: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 25

. . .+ ε2T ) = ε2. We have for t ∈ 2, . . . , T

σ22 = ω + αε2

1 + βε2

σ23 = ω(1 + β) + α(ε2

2 + βε21) + β2ε2

σ24 = ω(1 + β + β2) + α(ε2

3 + βε22 + β2ε2

1) + β3ε2

...

σ2T = ω(1 + β + . . .+ βT−2) + α(ε2

T−1 + βε2T−2 + . . .+ βT−2ε2

1) + βT−1ε2.

Due to the choice of initialization, ε2 which is positive, σ21 > 0 will always be satised regardless

of the choice of ω, α, and β. For t ∈ 2, . . . , T , the condition σ2t > 0 means that the following

inequalities must hold:

α > −ω + βε2

ε21

=: −γ2

α > −ω(1 + β) + β2ε2

ε22 + βε2

1

=: −γ3

α > −ω(1 + β + β2) + β3ε2

ε23 + βε2

2 + β2ε21

=: −γ4

...

α > −ω(1 + β + . . .+ βT−2) + βT−1ε2

ε2T−1 + βε2

T−2 + . . .+ βT−2ε21

=: −γT .

All inequalities must hold simultaneously, thus α > max −γtTt=2. We call −γtTt=2 the α lower

bound sequence. Clearly, by inspection, if ω, β > 0, the sequence −γt is negative for all t ∈2, . . . , T. The α lower bound sequence is neither monotonically increasing or decreasing, but we

can show that it is a Cauchy sequence (see Proposition 1), and therefore it is convergent to some

real number. Unfortunately, we do not know the limit since the sequence is a random sequence

depending on the realized values of ε2t.

We only impose the condition σ2t > 0 for all t in 2, 3, . . . , T , which is needed to derive the lower-

bound result, but we do not impose an upper limit on σ2t , i.e. we allow σ2

T → ∞ or σ2T < ∞ as

T →∞. Without an upper limit, we do not derive an upper-bound result to work with.

Proposition 1. The −γtTt=2, a lower bound sequence for α, is a Cauchy sequence for any ω ∈ Rand β > 1 as T →∞, if ε2

t/ε2N <∞ for all t.

Page 38: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 26

Proof. Fix ω ∈ R and β > 1. Write

γt =ω(1 + β + . . .+ βt−2) + βt−1ε2

ε2t−1 + βε2

t−2 + . . .+ βt−2ε21

=(1 + β + . . .+ βt−2)

ε2t−1 + βε2

t−2 + . . .+ βt−2ε21

ω +βt−1

ε2t−1 + βε2

t−2 + . . .+ βt−2ε21

ε2, (1.4.1)

which can be deconstructed as the sum of two dierent sequences multiplied by two scaling factors,

ω and ε2. First, we inspect the second sequence in γt, i.e. the one associated with ε2. We claim

that βε21, β2

ε22+βε21, . . . (the sequence associated with ε2) is a monotonically decreasing sequence. We

make this assertion because if Dt denotes to be the dierence between term t− 1 and term t, then

Dt =βt−2

ε2t−1 + βε2

t−2 + . . .+ βt−3ε21

− βt−1

ε2t + βε2

t−1 + . . .+ βt−2ε21

Dt = (1

ε2t−1 + βε2

t−2 + . . .+ βt−3ε21︸ ︷︷ ︸

X

− β

ε2t + βε2

t−1 + . . .+ βt−2ε21︸ ︷︷ ︸

Y

)βt−2

Dt = (ε2t − βε2

t−1 + βε2t−1 − β2ε2

t−2 + β2ε2t−2 . . .− βt−2ε2

1 + βt−2ε21

XY)βt−2

Dt =ε2t

XYβt−2 ≥ 0.

We will show that βε21, β2

ε22+βε21, . . . is a Cauchy sequence sinceDt → 0 when t→∞. Denote min ε2

t,the minimum, to be ε2

N , then XY ≥ ε4N(1 + β + . . .+ βt−3)(1 + β + . . .+ βt−2), such that

0 ≤ Dt =ε2t

XYβt−2 ≤ ε2

tβt−2

ε4N(1 + β + . . .+ βt−3)(1 + β + . . .+ βt−2)

→ 0.

The convergence to zero of the right-hand side term holds as t → ∞ because the denominator

term converges to ∞ at a much faster rate than the numerator term.

For the second half of the proof, we consider the other sequence from the coecient term of ω,1ε21, 1+βε22+βε21

, . . .; similar to before, denote the dierence between term t-1 and term t to be D′t. For

Page 39: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 27

any t

D′t =1 + β + . . .+ βt−3

ε2t−2 + βε2

t−3 + . . .+ βt−3ε21

− 1 + β + . . .+ βt−2

ε2t−1 + βε2

t−2 + . . .+ βt−2ε21

=At −Bt

(ε2t−2 + βε2

t−3 + . . .+ βt−3ε21)(ε2

t−1 + βε2t−2 + . . .+ βt−2ε2

1)

=(ε2t−1 + βε2

t−1 + . . .+ βt−3ε2t−1)− (ε2

t−2 + βε2t−3 + . . .+ βt−3ε2

1)

(ε2t−2 + βε2

t−3 + . . .+ βt−3ε21)(ε2

t−1 + βε2t−2 + . . .+ βt−2ε2

1)(1.4.2)

where we dene

At :=(ε2t−1 + βε2

t−2 + . . .+ βt−2ε21) + (βε2

t−1 + β2ε2t−2 + . . .+ βt−1ε2

1)

+ . . .+ (βt−3ε2t−1 + βt−2ε2

t−2 + . . .+ β2t−5ε21),

Bt :=(ε2t−2 + βε2

t−3 + . . .+ βt−3ε21) + (βε2

t−2 + β2ε2t−3 + . . .+ βt−2ε2

1)

+ . . .+ (βt−2ε2t−2 + βt−1ε2

t−3 + . . .+ β2t−5ε21).

Except for the terms contained inside the rst parenthesis in Bt, and all the terms associated

with ε2t−1 in At, the remaining terms are common terms in At and Bt. The last equality follows

in (1.4.2) after canceling out the common terms in At and Bt. The numerator in (1.4.2) can be

either positive or negative, but the denominator is positive. Therefore, the sequence is neither

monotonically decreasing or increasing. However, the sequence 1ε21, 1+βε22+βε21

, . . . is also a Cauchy

sequence, because as t→∞, the rst summation term in D′t,

0 ≤ ε2t (1 + β + . . .+ βt−2)

(ε2t−1 + βε2

t−2 + . . .+ βt−3ε21)(ε2

t + βε2t−1 + . . .+ βt−2ε2

1)

≤ ε2t (1 + β + . . .+ βt−2)

ε4N(1 + β + . . .+ βt−3)(1 + β + . . .+ βt−2)

=ε2t

ε4N(1 + β + . . .+ βt−3)

→ 0,

and the second summation term in D′t,

0← − 1

B= − A

AB≤ −

A︷ ︸︸ ︷ε2t−1 + βε2

t−2 + . . .+ βt−3ε21

(ε2t−1 + βε2

t−2 + . . .+ βt−3ε21︸ ︷︷ ︸

A

)(ε2t + βε2

t−1 + . . .+ βt−2ε21︸ ︷︷ ︸

B

)≤ 0.

Page 40: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 28

Since γt is a sum of two Cauchy sequences with two scaling factors (ω and ε2), it is therefore also

a Cauchy sequence, as t→∞.

1.4.1.2 Upper and lower bounds for σ2t

Recall that σ2t = ε2

t for all t is a rst-order condition for a global and generally unattainable

maximum of the log-likelihood. To facilitate a direct comparison of σ2t and ε

2t , we construct upper

and lower bounds for σ2t expressed in terms of ε2

t .

We have that for t ∈ 2, . . . , T,

σ22 = ω + αε2

1 + βε2

σ23 = ω(1 + β) + α(ε2

2 + βε21) + β2ε2

...

σ2t = ω(1 + β + . . .+ βt−2) + α(ε2

t−1 + βε2t + . . .+ βt−2ε2

1) + βt−1ε2

σ2t+1 = ω(1 + β + . . .+ βt−1) + α(ε2

t + βε2t−1 + . . .+ βt−1ε2

1) + βtε2

...

σ2T = ω(1 + β + . . .+ βt−2) + α(ε2

T−1 + βε2T−2 + . . .+ βT−2ε2

1) + βT−1ε2

Assume that ω > 0, α > 0, and β > 0, and dene the current minimum and current maximum

ε2n,t = min ε2

1, . . . , ε2t and ε2

m,t = max ε21, . . . , ε

2t. Then

ω(1 + β + . . .+ βt−1) + α(ε2n,t + βε2

n,t + . . .+ βt−1ε2n,t) + βtε2 <

ω(1 + β + . . .+ βt−1) + α(ε2t + βε2

t−1 + . . .+ βt−1ε21) + βtε2 = σ2

t+1,

and thus,

ω

(1− βt

1− β

)+ α

(1− βt

1− β

)ε2n,t + βtε2 < σ2

t+1.

Similarly for the upper bound,

σ2t+1 < ω

(1− βt

1− β

)+ α

(1− βt

1− β

)ε2m,t + βtε2.

Thus, ω(

1−βt

1−β

)+α

(1−βt

1−β

)ε2n,t +βtε2 and ω

(1−βt

1−β

)+α

(1−βt

1−β

)ε2m,t +βtε2 are the lower and upper

bound for σ2t+1, respectively. We are interested in the relative distance between σ2

t+1 and ε2t+1.

Page 41: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 29

Currently, we have a comparison between σ2t+1 and ε2

n,t (ε2m,t) through the lower (upper) bound

but not a comparison between σ2t+1 and ε2

t+1 directly.

1.4.1.3 Construction of the current minimum and current maximum sets

It would be desirable to have either the lower or upper bound of σ2t+1, expressed in terms of ε2

t+1,

to facilitate a direct comparison between σ2t+1 and ε2

t+1, but this is only possible if ε2t+1 is the

maximum or minimum. For this comparison purpose, it leads us to the construction of the current

maximum set and current minimum set.

The construction of the current minimum set is an iterative procedure beginning at t = 2 until

t = T . We skip the iterative procedure at the rst observation, t = 1, because there are no lagged

values to compare against when computing the current minimum or current maximum. Note that

in our case, σ21 is initialized by ε2. As an example, we start by taking the empty set N . We

then begin the iterative procedure at the second observation, t = 2 and compute the quantity

min ε21, ε

22. If min ε2

1, ε22 = ε2

2, then it is the current minimum, and we assign ε22 to the set N .

Proceed next to t = 3, and compute the quantity min ε21, ε

22, ε

23. We extend N by assigning ε2

3 to

N if it is the current minimum, else we do not extend N at iteration t = 3. Continue the iterative

procedure until t = T . Then, by construction, the set

N =ε2n1, ε2n2, . . . , ε2

¯n : n1 < n2 < . . . < ¯n ≤ T

is the current minimum set, and the indexes n1, n2, . . ., n are the time points when the current

minimum is recorded. The same assignment of the maximum yields the current maximum set

M =ε2m1, ε2m2, . . . , ε2

¯m : m1 < m2 < . . . < ¯m ≤ T.

By construction, we have ε2n1> ε2

n2> . . . > ε2

¯nin N and ε2

m1< ε2

m2< . . . < ε2

¯min M . Furthermore,

the sets N and M are mutually exclusive since at time t, the element ε2t cannot be a current

minimum and a current maximum at the same time.

We dene the subset N ⊆ N , where N =ε2t ∈ N : ε2

t < ε2

and the subset M ⊆ M , where

M =ε2t ∈ M : ε2

t > ε2, and we call N the current minimum lesser set and M the current

maximum greater set. The purpose for taking the subsets N and M in N and M is technical, it is

Page 42: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 30

needed when deriving the proof for Proposition 2. Similar to their parent sets N and M , we can

write

N =ε2n1, ε2n2, . . . , ε2

n : n1 < n2 < . . . < n ≤ T,

and

M =ε2m1, ε2m2, . . . , ε

2m : m1 < m2 < . . . < m ≤ T

.

Figure 1.4.1 illustrates the current maximum and current minimum points for a sample of T = 10

points. At t = 2, ε22 is the current minimum, since ε2

1 < ε22. It is then assigned to N . At t = 3,

ε23 > ε2

2, so ε23 is not assigned to N . Since ε2

1 > ε23, it is also not assigned to M. At t = 4, ε2

4 > ε2t

for t = 1, 2, 3, and thus it is assigned to M . We continue in this fashion and obtain N (black

asterisks) and M(red asterisks).

1 2 3 4 5 6 7 8 9 10

t

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Squ

ared

ret

urn

Figure 1.4.1: The plot illustrates examples of current maximum points (marked by red asterisks)and current minimum points (marked by black asterisks) from a realized squared returns sample,ε2t with T = 10.

Page 43: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 31

1.4.1.4 Current upper and lower bounds

For the members in N , we can construct the current lower bounds as follows(ω

ε2n1

(1− βn1−1

1− β

)+ α

(1− βn1−1

1− β

)+ βn1−1bn1

)ε2n1

< σ2n1(

ω

ε2n2

(1− βn2−1

1− β

)+ α

(1− βn2−1

1− β

)+ βn2−1bn2

)ε2n2

< σ2n2

...(ω

ε2n

(1− βn−1

1− β

)+ α

(1− βn−1

1− β

)+ βn−1bn

)ε2n < σ2

n (1.4.3)

where 1 < bn1 < bn2 < . . . < bn such that ε2 = bn1ε2n1

= bn2ε2n2

= bnε2n, since by denition of the

set N , the members are smaller than ε2.

Similarly for the members in M , we can construct the current upper bounds

σ2m1

<

ε2m1

(1− βm1−1

1− β

)+ α

(1− βm1−1

1− β

)+ βm1−1am1

)ε2m1

σ2m2

<

ε2m2

(1− βm2−1

1− β

)+ α

(1− βm2−1

1− β

)+ βm2−1am2

)ε2m2

...

σ2m <

ε2m

(1− βm−1

1− β

)+ α

(1− βm−1

1− β

)+ βm−1am

)ε2m (1.4.4)

where 1 > am1 > am2 > . . . > am > 0 such that ε2 = am1ε2m1

= am2ε2m2

= amε2m, since by denition

of the set M , the members are greater than ε2.

1.4.2 A limitation in tting current extrema

Having constructed the sets N and M , we want to nd a relationship between the current min-

imum's in N and current maximum's in M . Assume we have a positive set of parameters, i.e.

ω, α, β > 0, and additionally, assume that the set of parameters satisfy the stationarity constraint,

α+ β < 1. We seek to answer the following question: If this set of parameters ts a current max-

imum in M , will the same set of parameters be able to t the subsequent current minimum in

N? It turns out the answer is no. We formulate the answer to the question more formally in

Proposition 2.

Page 44: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 32

Proposition 2. Assume the parameter restrictions ω > 0, 0 < α < 1, 0 < β < 1 with α + β < 1.

Further assume that N and M are non-empty sets. Let n be the smallest ni in N such that

ni > m1.

If ωε2m1

(1−βm1−1

1−β

)+ α

(1−βm1−1

1−β

)+ βm1−1am1 > 1, then ω

ε2n

(1−βn−1

1−β

)+ α

(1−βn−1

1−β

)+ βn−1bn > 1.

Remark. The term ωε2m1

(1−βm1−1

1−β

)+ α

(1−βm1−1

1−β

)+ βm1−1am1 is the coecient of the upper bound

that corresponds to the rst member inM . Suppose that ωε2m1

(1−βm1−1

1−β

)+α(

1−βm1−1

1−β

)+βm1−1am1 ≤

1, then σ2m1

< ε2m1, meaning that P (σ2

m1= ε2

m1) = 0 or we have zero probability of tting

σ2m1

to ε2m1. Suppose the opposite instead, ω

ε2m1

(1−βm1−1

1−β

)+ α

(1−βm1−1

1−β

)+ βm1−1am1 > 1, then

P (σ2m1

= ε2m1

) ≥ 0. Fix m1, nd in N , the smallest ni such that ni > m1, denote that

as n. According to the proposition, if ωε2m1

(1−βm1−1

1−β

)+ α

(1−βm1−1

1−β

)+ βm1−1am1 > 1, then

ωε2n

(1−βn−1

1−β

)+ α

(1−βn−1

1−β

)+ βn−1bn > 1, where the term ω

ε2n

(1−βn−1

1−β

)+ α

(1−βn−1

1−β

)+ βn−1bn is

the coecient of the lower bound that corresponds to n. If this is the case, then ε2n < σ2

n, meaning

that P (σ2n = ε2

n) = 0.

To reiterate, by choosing positive specications of ω, α and β such that there is a positive prob-

ability to t the current maximum ε2t that is greater than ε

2, the same choice of ω, α and β will

never t the subsequent current minimum ε2t that is lesser than ε

2. Thus, at any given point t, the

estimation algorithm can either t the latest element from the set N or the latest element from

the set M , but not both. This limits the ability of the estimation to trace the ε2t -sequence with

the σ2t -sequence.

Proof. By construction, n > m1, let n = m1 + t0, where t0 is a positive integer. Rewrite the lower

bound (1.4.3) as

ω

ε2n

(1− βn−1

1− β

)+α

(1− βn−1

1− β

)+βn−1bn =

ω

ε2n

(1− βm1+t0−1

1− β

)+α

(1− βm1+t0−1

1− β

)+βm1+t0−1bn.

Dene the gain term as

G =ω

ε2n

(1− βm1+t0−1

1− β

)+ α

(1− βm1+t0−1

1− β

)+ βm1+t0−1bn

−(ω

ε2m1

(1− βm1−1

1− β

)+ α

(1− βm1−1

1− β

)+ βm1−1am1

)(1.4.5)

Page 45: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 33

and a slight modication of the gain term with ω/ε2m1

replaced by ω/ε2n in (1.4.5)

G =ω

ε2n

(1− βm1+t0−1

1− β

)+ α

(1− βm1+t0−1

1− β

)+ βm1+t0−1bn

−(ω

ε2n

(1− βm1−1

1− β

)+ α

(1− βm1−1

1− β

)+ βm1−1am1

). (1.4.6)

The slight modication of the gain term is done to gather the common term ω/ε2n. Note that

G > G because by assumption ε2m1

> ε2n. If the net gain, G > 0, then from (1.4.5)

ω

ε2n

(1− βm1+t0−1

1− β

)+ α

(1− βm1+t0−1

1− β

)+ βm1+t0−1bn

ε2m1

(1− βm1−1

1− β

)+ α

(1− βm1−1

1− β

)+ βm1−1am1 . (1.4.7)

By assumption, ωε2m1

(1−βm1−1

1−β

)+ α

(1−βm1−1

1−β

)+ βm1−1am1 > 1, combined with (1.4.7), then

ωε2n

(1−βn−1

1−β

)+α

(1−βn−1

1−β

)+βn−1bn > 1 as desired. Therefore, to complete the proof of Proposition

2, it is sucient to show that G > 0. To show G > 0, we will now proceed to show G > 0, then

G > G > 0 as desired. Write

G = βm1−1

ε2n

1

1− β(1− βt0) +

α

1− β(1− βt0) + βt0bn − am1

). (1.4.8)

Write the second factor in (1.4.8) as

H =ω

ε2n

1

1− β(1− βt0) +

α

1− β(1− βt0) + βt0bn − am1

ε2n

1

1− β+

α

1− β− am1 + βt0

(bn −

α

1− β− ω

ε2n

1

1− β

). (1.4.9)

Now, G = βm1−1H. By assumption, βm1−1 > 0. Therefore, if H > 0, then G > 0. To show H > 0,

we will show ωε2n

11−β + α

1−β − am1 > 0 in (1.4.9) rst. By assumption,

ω

ε2m1

(1− βm1−1

1− β

)+ α

(1− βm1−1

1− β

)+ βm1−1am1 > 1(

ω

ε2m1

+ α

)(1− βm1−1

1− β

)> 1− βm1−1am1

ω

ε2m1

+ α >(1− βm1−1am1)(1− β)

1− βm1−1,

Page 46: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.4 Fitting the conditional variance to the squared returns 34

and by construction

1 > am1

1− βm1−1am1 > am1 − βm1−1am1

1− βm1−1am1 > am1(1− βm1−1)

such that

ω

ε2m1

+ α >am1(1− βm1−1)(1− β)

1− βm1−1

ω

ε2m1

1

1− β+

α

1− β> am1

ω

ε2m1

1

1− β+

α

1− β− am1 > 0

ωε2n

11−β + α

1−β − am1 >ωε2m1

11−β + α

1−β − am1 > 0 as required. For the second part of (1.4.9), we

consider two dierent cases for bn − α1−β −

ωε2n

11−β .

Case 1. Assume bn − α1−β −

ωε2n

11−β ≥ 0, then

H =ω

ε2n

1

1− β+

α

1− β− am1︸ ︷︷ ︸

>0

+βt0(bn −

α

1− β− ω

ε2n

1

1− β

)︸ ︷︷ ︸

≥0

> 0,

meaning that G = βm1−1H > 0 as required, and we are done for the rst case.

Case 2. Assume bn − α1−β −

ωε2n

11−β < 0 or equivalently α

1−β + ωε2n

11−β > bn. Begin by rewriting

G = βm1−1

ε2n

1

1− β(1− βt0) +

α

1− β(1− βt0) + βt0bn − am1

)=

1− β+ω

ε2n

1

1− β

)βm1−1(1− βt0) + βm1−1(βt0bn − am1). (1.4.10)

Page 47: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.5 Conclusion 35

Given that α1−β + ω

ε2n

11−β > bn and by construction bn > 1, then

α

1− β+ω

ε2n

1

1− β> 1(

α

1− β+ω

ε2n

1

1− β

)βm1−1(1− βt0) > βm1−1(1− βt0)(

α

1− β+ω

ε2n

1

1− β

)βm1−1(1− βt0) + βm1−1(βt0bn − am1) > βm1−1(1− βt0) + βm1−1(βt0bn − am1)

G > βm1−1(1− am1︸ ︷︷ ︸>0

+βt0(bn − 1)︸ ︷︷ ︸>0

)

> 0

meaning that G > 0 as required, and we are done for the second case.

In either case, G > 0 and the proof of Proposition 2 is complete.

1.5 Conclusion

We conduct a Monte Carlo study of the GARCH(1, 1) model on small sample sizes. The Monte

Carlo exercise reveals that at those small sample sizes, the α estimates tend to be negative, even

when the data-generating α0 is positive. This is due to the quasi-maximum likelihood method of

estimating a GARCH(1, 1) that requires that σ2t = ε2

t to achieve the maximal log-likelihood value.

At small sample sizes, by selecting a negative α estimate over a positive one, the log-likelihood

function can trace ε2t better with σ2

t . As the sample size T increases, the negative α estimates

disappear. We obtain some analytical results that demonstrate the limitation of GARCH(1, 1) to

t the σ2t -sequence to the ε2

t -sequence when the parameter space is positive.

1.6 References

Andrews, D. W. (2001), `Testing when a parameter is on the boundary of the maintained hypoth-

esis', Econometrica 69(3), 683734.

Baillie, R. T. and Bollerslev, T. (2002), `The message in daily exchange rates: A conditional-

variance tale', Journal of Business and Economic Statistics 20(1), 6068.

Page 48: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.6 References 36

Baillie, R. T. and Chung, H. (2001), `Estimation of GARCH models from the autocorrelations of

the squares of a process', Journal of Time Series Analysis 22(6), 631650.

Bollerslev, T. (1986), `Generalized autoregressive conditional heteroskedasticity', Journal of Econo-

metrics 31(3), 307327.

Bollerslev, T. (1987), `A conditionally heteroskedastic time series model for speculative prices and

rates of return', Review of Economics and Statistics 69(3), 542547.

Brooks, C., Burke, S. P. and Persand, G. (2001), `Benchmarks and the accuracy of GARCH model

estimation', International Journal of Forecasting 17(1), 4556.

Diebold, F. X. and Pauly, P. (1989), `Small sample properties of asymptotically equivalent tests

for autoregressive conditional heteroskedasticity', Statistical Papers 30(1), 105131.

Engle, R. F. (1982), `Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom ination', Econometrica 50(4), 9871007.

Engle, R. F., Hendry, D. F. and Trumble, D. (1985), `Small-sample properties of ARCH estimators

and tests', Canadian Journal of Economics 18(1), 6693.

Engle, R. F., Ng, V. K. and Rothschild, M. (1990), `Asset pricing with a factor-ARCH covariance

structure: Empirical estimates for treasury bills', Journal of Econometrics 45(1-2), 213237.

Engle, R. F. and Patton, A. (2001), `What good is a volatility model?', Quantitative Finance

1(2), 237245.

Fiorentini, G., Calzolari, G. and Panattoni, L. (1996), `Analytic derivatives and the computation

of GARCH estimates', Journal of Applied Econometrics 11(4), 399417.

Francq, C. and Zakoïan, J.-M. (2011), GARCH Models: Structure, statistical inference and nan-

cial applications, John Wiley & Sons.

Giraitis, L., Kokoszka, P. and Leipus, R. (2000), `Stationary ARCH Models: Dependence structure

and central limit theorem', Econometric Theory 16(1), 322.

Page 49: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.6 References 37

Glosten, L. R., Jagannathan, R. and Runkle, D. E. (1993), `On the relation between the expected

value and the volatility of the nominal excess return on stocks', Journal of Finance 48(5), 1779

1801.

Hansen, P. R. and Lunde, A. (2005), `A forecast comparison of volatility models: Does anything

beat a GARCH (1, 1)?', Journal of Applied Econometrics 20(7), 873889.

Harris, R. D., Coskun Küçüközmen, C. and Yilmaz, F. (2004), `Skewness in the conditional dis-

tribution of daily equity returns', Applied Financial Economics 14(3), 195202.

He, C. and Teräsvirta, T. (1999), `Properties of the autocorrelation function of squared observations

for second-order GARCH processes under two sets of parameter constraints', Journal of Time

Series Analysis 20(1), 2330.

Hung, J.-C., Lee, M.-C. and Liu, H.-C. (2008), `Estimation of value-at-risk for energy commodities

via fat-tailed GARCH models', Energy Economics 30(3), 11731191.

Hwang, S. and Valls Pereira, P. L. (2006), `Small sample properties of GARCH estimates and

persistence', European Journal of Finance 12(6-7), 473494.

Iglesias, E. M. and Phillips, G. D. (2011), `Small sample estimation bias in GARCH models with

any number of exogenous variables in the mean equation', Econometric Reviews 30(3), 303336.

Kristensen, D. and Linton, O. (2006), `A closed-form estimator for the GARCH (1, 1) model',

Econometric Theory 22(2), 323337.

Lee, P. M. (2012), Bayesian Statistics: An Introduction, 4th edn, John Wiley & Sons.

Lee, S.-W. and Hansen, B. E. (1994), `Asymptotic theory for the GARCH (1, 1) quasi-maximum

likelihood estimator', Econometric theory 10(1), 2952.

Lumsdaine, R. (1995), `Finite-sample properties of the maximum likelihood estimator in GARCH

(1, 1) and IGARCH (1, 1) models: A Monte Carlo investigation', Journal of Business and

Economic Statistics 13(1), 110.

Page 50: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.6 References 38

Lumsdaine, R. (1996), `Consistency and asymptotic normality of the quasi-maximum likelihood

estimator in IGARCH (1, 1) and covariance stationary GARCH (1, 1) models', Econometrica

64(3), 575596.

Ma, J., Nelson, C. R. and Startz, R. (2007), `Spurious inference in the GARCH(1, 1) model when

it is weakly identied', Studies in Nonlinear Dynamics and Econometrics 11(1), 127.

McCullough, B. D. and Vinod, H. D. (1999), `The numerical reliability of econometric software',

Journal of Economic Literature 37(2), 633665.

McCullough, B. and Renfro, C. G. (1998), `Benchmarks and software standards: A case study of

GARCH procedures', Journal of Economic and Social Measurement 25(2), 5971.

Mishra, S., Su, L. and Ullah, A. (2010), `Semiparametric estimator of time series conditional

variance', Journal of Business and Economic Statistics 28(2), 256274.

Nelson, D. B. and Cao, C. Q. (1992), `Inequality constraints in the univariate GARCH model',

Journal of Business and Economic Statistics 10(2), 229235.

Ng, H. S. and Lam, K. P. (2006), How does sample size aect GARCH models?, in `9th Joint

International Conference on Information Sciences (JCIS-06)', Advances in Intelligent Systems

Research.

Skoglund, J. (2001), A simple ecient GMM estimator of GARCH models, Technical report,

SSE/EFI Working Paper Series in Economics and Finance.

Taylor, S. J. (1994), `Modeling stochastic volatility: A review and comparative study', Mathemat-

ical Finance 4(2), 183204.

Teräsvirta, T. (2009), An introduction to univariate GARCH models, in T. G. Andersen, R. A.

Davis, J. P. Kreiÿ and T. Mikosh, eds, `Handbook of Financial time series', Springer-Verlag

Berlin Heidelberg, pp. 1742.

Tsai, H. and Chan, K.-S. (2008), `A note on inequality constraints in the GARCH model', Econo-

metric Theory 24(3), 823828.

Page 51: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

1.6 References 39

Winker, P. and Maringer, D. (2009), `The convergence of estimators based on heuristics: Theory

and application to a GARCH model', Computational Statistics 24(3), 533550.

Zakoïan, J.-M. (1994), `Threshold heteroskedastic models', Journal of Economic Dynamics and

Control 18(5), 931955.

Zivot, E. (2009), Practical issues in the analysis of univariate GARCH models, in T. G. Andersen,

R. A. Davis, J. P. Kreiÿ and T. Mikosh, eds, `Handbook of Financial Time Series', Springer-

Verlag Berlin Heidelberg, pp. 113155.

Zumbach, G. (2000), The pitfalls in tting GARCH(1, 1) processes, in C. L. Dunis, ed., `Advances

in Quantitative Asset Management', Springer, pp. 179200.

Page 52: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Chapter2A modified GARCH(1, 1) model for

cross-sectional data with small

samples

Wei Ruen Leong

Aarhus University and CREATES

Eric Hillebrand

Aarhus University and CREATES

Abstract

We perform a Monte Carlo study of biases in the parameter estimates of the panel

GARCH(1, 1) model on small samples, in particular estimates of the parameter α associ-

ated with the lagged squared return. We show that the estimates of the parameter α are often

negative. We propose a modied GARCH(1, 1) model to correct for the small-sample bias. In

simulation studies we nd that the modied model produces fewer negative α estimates and a

lower sample variance of the α estimates compared to the original model. We also nd that

the modied model requires a smaller sample size to produce positive α estimates on average.

40

Page 53: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.1 Introduction 41

When the sample size increases, the performance of the modied model deteriorates relative to

the original model. We present two empirical applications of the modied model.

2.1 Introduction

From a practitioner's perspective, estimations of generalized autoregressive conditional

heteroskedasticity (GARCH) type models are often not straightforward to implement, see for

example the numerical challenges documented in Brooks et al. (2001) and McCullough and Renfro

(1998). Standard empirical practice is to use the normal density for observations that may not be

normal, and proceed with a quasi maximum likelihood estimation. The quasi maximum likelihood

approach has the advantage of achieving consistency for the parameter estimates, provided that

a suciently large sample size is employed for estimation. The consistency of the parameter es-

timates holds even when the true distribution of the innovations is not normal. There have been

suggestions in the literature on what the ideal sample size should be in order for the asymptotic

properties to hold to a high degree of accuracy. For example, Ng and Lam (2006) conclude from

their simulation studies that the sample size should be at least 1000. In the nancial econometrics

literature, where nancial data such as stock prices can be acquired at tick-frequency, obtaining

the recommended sample size is not dicult. In this paper, our concern is for macroeconomic

practitioners who do not have a large sample size to work with due to the inherent low-frequency

nature of their data. Furthermore, it is also of particular interest for a practitioner to examine

idiosyncratic eects across dierent cross-sectional units. In a nancial context, this could, for

instance, be stock returns across dierent companies, and in a macroeconomic context, this can

be ination rates across dierent countries. A natural choice in order to obtain inferences from

cross-sectional data is the estimation of GARCH(1, 1) in a panel setting.

There are several empirical studies in the GARCH literature that apply GARCH models in a panel

setting, for example Mezrich and Engle (1996) and Bauwens and Rombouts (2007) in a nancial

context. There are also applications of the panel GARCH(1, 1) model employing macroeconomic

data. One example is by Lee (2010), who studies the conditional variance of outputs that are

represented by an industrial production index from member countries in Group 7 (G7). Another

example related to macroeconomics is Cavalcanti et al. (2014), who study the trade activity of

Page 54: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.1 Introduction 42

countries and its eect on economic growth. Although both studies are macroeconomic in nature,

the data set they use is dierent in terms of the number of cross-sectional units. Lee (2010) employs

a small panel of 7 countries whereas the study by Cavalcanti et al. (2014) have a large panel with

118 exporting countries.

Even for a single time series, it has been documented that estimates of the α parameter (the

parameter associated with the lagged return squared term) have a downward bias, see for example

the simulation study conducted by Hwang and Valls Pereira (2006). One common estimation

method for GARCH(1, 1) involves restricting the parameter space such that all parameters are

non-negative. We have documented in Leong and Hillebrand (2018) that when this restriction is

lifted during the estimation process, not only are the estimates of α biased downwards, but they

are often also negative, even when the data-generating α0 is positive. In the present paper, we

address this problem by introducing two additional terms in the conditional variance equation to

allow the sign of the lagged return to have an eect on the conditional variance. The main idea of

introducing the signed terms is that they allow for a faster adjustment of the conditional variance

to downward movements in the squared return sequence. In Leong and Hillebrand (2018) we have

shown that the relative inexibility of the conditional variance equation in adjusting to negative

changes in the squared return sequence is one of the main causes of the downward bias in α. The

inspiration for the sign terms originates from the asymmetric GARCH model introduced in Higgins

and Bera (1992) as well as other well-known GARCH models such as the GJR-GARCH model of

Glosten et al. (1993) and the threshold GARCH model of Zakoïan (1994).

The rest of the paper is organized as follows. Section 2.2 introduces the panel GARCH(1, 1) model.

Section 2.3 contains a Monte Carlo study of the panel GARCH(1, 1) model. Section 2.4 proposes

a new model to address the estimation problems identied in the simulation study. Section 2.5

demonstrates two empirical applications of our modied model. Finally, Section 2.6 concludes.

Page 55: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.2 The Panel GARCH(1, 1) model 43

2.2 The Panel GARCH(1, 1) model

In a single time series setting, the GARCH(1, 1) mean and conditional variance equations are

typically represented as follows,

ε2t = σ2

t ξ2t , ξt

iid∼ N(0, v2),

σ2t = ω0 + α0ε

2t−1 + β0σ

2t−1.

The common assumption is that v2 = 1.

Suppose that we have a cross section of data εi,t for i = 1, . . . , N and t = 1, . . . , T , for example

εi,t are indexed stock returns recorded over some discrete time period t = 1, . . . , T across dierent

companies (cross-sectional elements) i with N being the total number of companies. Further

assume that ε2i,t can be decomposed into its conditional variance σ2

i,t multiplied by a random

innovation ξ2i,t. Due to cross-sectional specic characteristics, it is natural to assume that the

cross-sectional innovation term ξ2i,t has varying variance v2

i . Let Ii,t−1 =ε2t−1, ε

2t−2, . . . , ε

20

, the

historical information set of cross section i at time t − 1. Using the law of iterated expectation,

we can write

E(ε2i,t) = E(σ2

i,tξ2i,t)

= E(E(σ2i,tξ

2i,t|Ii,t−1))

= E(σ2i,tE(ξ2

i,t|Ii,t−1))

= v2i

ω

1− α− β.

Letting ωi = v2i ω, the model with varying random innovation variances can be captured by esti-

mating a GARCH(1, 1) model with varying intercept terms for each cross sectional element i. It

is common empirical practice to assume that the random innovation is normally distributed, such

that the panel GARCH(1, 1) mean and conditional variance equations are given as follows,

ε2i,t = σ2

i,tξ2i,t, ξi,t

iid∼ N(0, v2i ), (2.2.1)

σ2i,t = ωi,0 + α0ε

2i,t−1 + β0σ

2i,t−1. (2.2.2)

The case of N = 1 collapses to the usual GARCH(1, 1) case, and therefore the panel

GARCH(1, 1) can be seen as a generalization of the GARCH(1, 1) model. For simplicity of

Page 56: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 44

argument, we consider the case of independent random innovations ξi,t in the cross section here.

In the more general case, to allow for cross-sectional dependence between elements i and j, where

i 6= j, the random innovation ξi,t in (2.2.1) can be introduced more generally as a multivariate

normally distributed random variable with correlation parameters ρij. Cross-sectional dependence

has been considered in the macroeconomic literature, for example, by Cermeño and Grier (2006),

and it is modeled, for example, in the Dynamic Conditional Correlation model by Engle (2002).

The panel GARCH(1, 1) model is also considered in Pakel et al. (2011), where the conditional

variance equation (2.2.2) is parameterized dierently as follows:

σ2i,t = µi,0(1− α0 − β0) + α0ε

2i,t−1 + β0σ

2i,t−1, (2.2.3)

where E(ε2i,t) = µi,0. The parameter µi,0 is treated as a nuisance parameter. For this reason, Pakel

et al. (2011) employ a two-stage estimation method detailed in Newey and McFadden (1994) to rst

obtain µi,0 through the sample mean of ε2i,t. The estimated µi,0 is then substituted into (2.2.3) and

a composite likelihood method (Varin et al., 2011) is applied to obtain the estimates of (α0, β0).

Monte Carlo results verify the accuracy of the two-stage estimation method only when the sample

size T is large because the bias from the estimated nuisance parameter, µi,0, is minimized when

the sample mean of ε2i,t converges to its true expected value. In this paper, we are interested in

the small sample case, and our estimation method diers from the one in Pakel et al. (2011) in

that we estimate ωi,0 directly.

The panel GARCH model has features that are similar to those in autoregressive panels considered,

for example, in Diggle et al. (2002) and Arellano and Honoré (2001). A plot of a realized sample of

the panel GARCH(1, 1) model with α0 = 0.04, β0 = 0.90 and ωi,0 sampled independently from a

uniform distribution over the interval (0.001, 0.010) is illustrated in Figure 2.2.1 with N = 30 and

T = 100. Here, as in the following, we stack the cross sectional data when presenting a plot. By

construction of the panel GARCH(1, 1) model, each cross section of the squared return sequence

has a dierent conditional mean for non-identical intercept terms in (2.2.2).

2.3 Monte Carlo study

We perform a Monte Carlo study to evaluate the nite sample properties of the estimated param-

eters especially at small sample sizes T . The data-generating process (DGP) considered is a panel

Page 57: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 45

500 1000 1500 2000 2500 3000

No. of observations

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Seq

uenc

e va

lue

Panel GARCH(1, 1) squared return and its conditional variance

Figure 2.2.1: A stacked realized sample of the panel GARCH(1, 1) with 30 cross sections and 100observations at each cross section.

GARCH(1, 1) as given in (2.2.1) and (2.2.2). For simplicity, we consider i.i.d. normal random

innovations with cross-sectional independence. We consider M = 1000 Monte Carlo replications,

sample sizes T = 50, 25, and 10, and numbers of cross sectional units N = 50, 25, 10, 5, and

1, respectively. We also add a group with T = 1, 000 to verify the accuracy of our estimation

algorithm at the asymptotic level.

The conditional variance equation in (2.2.2) is unobserved and dependent on its history, and the

rst observation of the conditional variance, σ21, has to be initialized with a starting value when

simulating the data-generating process. We choose to initialize with the unconditional variance,ωi

1−α−β . To eliminate any starting bias, we simulate 500 observations more than the required T to

be discarded.

When choosing DGP values, our focus is on a realistic range of α0 and β0. We consider six

dierent specications in this study. Our choices for the values of α0 and β0 are motivated by

values commonly reported in empirical studies such as, for example, Bollerslev (1987), Engle et al.

(1990) and Baillie and Bollerslev (2002).

Page 58: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 46

Spec (α0, β0) α0 + β0

1 (0.04, 0.90) 0.94

2 (0.055, 0.93) 0.985

3 (0.04, 0.95) 0.99

4 (0.055, 0.84) 0.895

5 (0.02, 0.97) 0.99

6 (0.09, 0.90) 0.99

The data-generating intercepts, ωi,0, are sampled independently from a uniform distribution on

(0.001, 0.010). We use the Newton-Raphson algorithm, a standard gradient-based optimizer for es-

timation. The commonly applied non-negativity constraint for the parameters of the

GARCH(1, 1) process is

ωi,0 > 0 for all i, α0 > 0, β0 > 0. (2.3.1)

We do not impose the (2.3.1) constraint explicitly in our optimization routine. An implicit safe-

guard is, however, coded into our optimizer to check for negative conditional variances, σ2i,t, for

all i and t. This safeguard ensures that any combination of parameters that produces at least

one negative σ2i,t in any of the cross sectional units i is not allowed and will be dismissed by the

optimizer. The optimizer initializes the rst observation of the conditional variance, σ2i,1, with

the sample unconditional variance, (1/T )∑T

t=1 ε2i,t for each cross section i. As a start value for

the optimization routine, we supply the optimizer with the data-generating parameter values. We

experimented with dierent starting values, but the results did not aect our overall conclusions.

Due to a large number of ω's, especially when the number of cross-sectional units is large, we

choose to summarize the ω's through the quantity ε ≡ εim := ωim − ωi,0, for i = 1, . . . , N and

m = 1, . . . , M , i.e. the dierence between the estimated and true ω for each cross sectional element

i and for each Monte Carlo run m. The average of ε2 can be interpreted as the mean-squared error

in estimating ω across all cross sectional units. We tabulate the average over all Monte Carlo

replications and over the cross sections for the following ve quantities: ε2, α, β, and α + β in

Tables 2.1 to 2.3. We also report the proportion of negative α, denoted α−ve in the table, and the

10% and 90% quantiles of α, denoted α0.1 and α0.9 in the table. Finally, we report the safeguard

proportion, SP, which is the number of times the safeguard was triggered divided by the number

Page 59: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 47

of times the objective function was evaluated. For the discussion in the text, we present the

specications that have α0 and β0 summing up to 0.99: Spec 3, 5 and 6.

We summarize our main ndings from the Monte Carlo study as follows. As the sample size T

decreases, we observe that α decreases below α0 and β increases above β0. For small enough T ,

the α's turn negative in almost all cases, even though α0 is positive, and β is frequently greater

than one despite β0 being smaller than one. Note that from Table 2.1 to 2.3, the phenomena α < 0

and β > 1 also occur for N = 1, i.e. the phenomenon is not a consequence of the panel situation.

When the panel structure is introduced to the GARCH(1, 1) model, i.e. when N>1, we see that

α becomes more highly negative on average. This happens in almost every case when T ≤ 25.

The negativity of α, however, is relatively smaller for specications with higher values of α0, e.g.

Spec 6 with α0 = 0.09. The sum of α and β is always less than one on average, regardless of

the specications of the models. The proportion of negative α is high in all cases. The safeguard

proportion, SP, is less than 10% in all cases. The 10% and 90% quantiles of α become narrower

when the number of cross-sectional units increases.

Figures 2.3.1 through 2.3.3 present the histogram plots of the estimated α, β and their sum from the

Monte Carlo study. The histogram plots depict the eect of adding cross-sectional units at T = 25.

When an increasing number of cross-sectional units are added, the range of α and β estimated

by the GARCH(1, 1) becomes narrower. The reason is that when the number of cross-sectional

units increases, the number of times the safeguard for σ2i,t > 0 is triggered by the optimizer also

increases for all i and t. The range of values of α and β that can satisfy the constraints therefore

becomes tighter. This can be seen clearly for the case (α, β) = (0.02, 0.97) in Figure 2.3.2 for the

α estimates. When N = 10, the lower end of the α estimates lies at around -0.6, and the lower

boundary increases to -0.5 when N = 25, and to -0.4 when N = 50.

An average value can sometimes be misleading since it can be inuenced by extreme values in the

data. The negative average value reported for the α in our Monte Carlo study could potentially be

caused by negative extreme values while most of the α could actually be positive. This is, however,

not the case in our Monte Carlo study, since a large number of α are negative as can be seen in

the histogram plots. Furthermore, the mode and median of α are also negative.

Page 60: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 48

N 50 25 10 5 1

Spec 3: T = 1000

ε2 0.00000 0.00000 0.00000 0.00000 0.00003α 0.03944 0.03951 0.03933 0.03920 0.04046β 0.94685 0.94652 0.94655 0.94627 0.93831

α + β 0.98629 0.98603 0.98588 0.98548 0.97877α-ve 0.00000 0.00000 0.00000 0.00000 0.00000α0.1 0.03724 0.03653 0.03458 0.03220 0.02524α0.9 0.04171 0.04258 0.04415 0.04643 0.05783SP 0.02795 0.03781 0.04374 0.04354 0.04363

Spec 3: T = 50

ε2 0.00068 0.00080 0.00098 0.00231 0.01471α -0.16347 -0.16465 -0.17340 -0.15575 -0.02913β 1.09103 1.08827 1.10382 1.07521 0.95100

α + β 0.92755 0.92361 0.93041 0.91946 0.92187α-ve 0.99600 0.99200 0.98500 0.91400 0.57200α0.1 -0.20084 -0.20680 -0.22214 -0.23108 -0.24369α0.9 -0.12368 -0.12236 -0.12925 -0.03484 0.04633SP 0.05409 0.06092 0.08284 0.07739 0.09059

Spec 3: T = 25

ε2 0.00106 0.00217 0.00396 0.00229 0.01561α -0.21269 -0.25693 -0.30755 -0.31434 -0.02939β 1.13127 1.13072 1.16335 1.18049 0.96617

α + β 0.91858 0.87379 0.85579 0.86614 0.93678α-ve 0.97600 0.98400 0.97500 0.96900 0.65300α0.1 -0.30560 -0.35831 -0.45267 -0.47878 -0.32861α0.9 -0.13252 -0.15890 -0.18169 -0.17613 0.05113SP 0.06241 0.07212 0.08275 0.07723 0.09566

Spec 3: T = 10

ε2 0.01618 0.01445 0.01049 0.06698 0.01621α -0.80873 -0.86626 -0.88139 -0.77011 -0.01885β 1.45717 1.50062 1.53380 1.45046 0.95710

α + β 0.64844 0.63436 0.65241 0.68035 0.93824α-ve 0.94400 0.94600 0.91900 0.84400 0.64000α0.1 -1.54194 -1.67289 -1.68521 -1.76320 -0.03042α0.9 -0.21152 -0.22874 -0.02392 0.02755 0.05973SP 0.07088 0.08017 0.08276 0.08821 0.07855

Table 2.1: Average parameter estimates for the model in Spec 3 (α0 = 0.04, β0 = 0.95). Thecross-section estimates for ωi are summarized by the mean-squared error term, ε2. α-ve reportsthe proportion of α < 0. α0.10 and α0.90 report the 10% and 90% quantiles of the α estimates,respectively. SP reports the proportion of times the estimation safeguard was triggered divided bythe number of times the objective function was called.

Page 61: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 49

N 50 25 10 5 1

Spec 5: T = 1000

ε2 0.00000 0.00000 0.00000 0.00002 0.00224α 0.01956 0.01962 0.01952 0.01972 0.01647β 0.96522 0.96475 0.96436 0.96297 0.94251

α + β 0.98479 0.98438 0.98388 0.98270 0.95899α-ve 0.00000 0.00000 0.00000 0.00000 0.00000α0.1 0.01781 0.01730 0.01562 0.01406 0.00500α0.9 0.02138 0.02212 0.02354 0.02555 0.03687SP 0.02795 0.03781 0.04374 0.04354 0.04215

Spec 5: T = 50

ε2 0.00034 0.00071 0.00042 0.00193 0.11188α -0.16068 -0.17508 -0.17870 -0.19237 -0.03758β 1.10219 1.10018 1.10791 1.10712 0.98616

α + β 0.94150 0.92510 0.92921 0.91475 0.94858α-ve 0.99700 0.99400 0.99400 0.98600 0.66900α0.1 -0.19583 -0.21935 -0.22349 -0.26171 -0.23298α0.9 -0.12144 -0.12771 -0.13272 -0.13258 0.02755SP 0.05409 0.06092 0.08284 0.07739 0.09249

Spec 5: T = 25

ε2 0.00062 0.00136 0.00263 0.00176 0.01574α -0.21234 -0.25491 -0.31419 -0.31342 -0.05760β 1.14713 1.13143 1.17096 1.17556 1.00640

α + β 0.93479 0.87652 0.85676 0.86213 0.94880α-ve 0.98600 0.97700 0.98100 0.96200 0.69000α0.1 -0.30132 -0.36144 -0.46231 -0.47338 -0.39468α0.9 -0.13804 -0.15355 -0.18293 -0.15876 0.02645SP 0.06241 0.07212 0.08275 0.07723 0.09618

Spec 5: T = 10

ε2 0.01206 0.00993 0.02545 0.03952 0.01905α -0.81602 -0.90861 -0.91269 -0.88811 -0.00598β 1.46143 1.52750 1.53953 1.52085 0.96163

α + β 0.64540 0.61889 0.62683 0.63273 0.95564α-ve 0.95000 0.95900 0.96100 0.90500 0.72900α0.1 -1.56391 -1.75593 -1.84658 -2.06469 -0.03559α0.9 -0.24394 -0.25568 -0.18770 -0.00415 0.03904SP 0.07088 0.08017 0.08276 0.08821 0.07711

Table 2.2: Average parameter estimates for the model in Spec 5 (α0 = 0.02, β0 = 0.97). Thecross-section estimates for ωi are summarized by the mean-squared error term, ε2. α-ve reportsthe proportion of α < 0. α0.10 and α0.90 report the 10% and 90% quantiles of the α estimates,respectively. SP reports proportion of times the estimation safeguard was triggered divided by thenumber of times the objective function was called.

Page 62: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 50

N 50 25 10 5 1

Spec 3: T = 1000

ε2 0.00000 0.00000 0.00000 0.00000 0.00001α 0.08883 0.08893 0.08899 0.08893 0.08826β 0.89790 0.89790 0.89747 0.89723 0.89657

α + β 0.98673 0.98683 0.98647 0.98617 0.98483α-ve 0.00000 0.00000 0.00000 0.00000 0.00000α0.1 0.08575 0.08457 0.08226 0.07892 0.06499α0.9 0.09212 0.09336 0.09606 0.09977 0.09110SP 0.02795 0.03781 0.04374 0.04354 0.04377

Spec 6: T = 50

ε2 0.00217 0.00961 0.01536 0.02716 0.07955α -0.10700 -0.05106 -0.03731 -0.04019 -0.04127β 1.02654 0.87219 0.83509 0.85687 0.80525

α + β 0.91953 0.82113 0.79777 0.81668 0.76397α-ve 0.82100 0.53100 0.48100 0.50300 0.49700α0.1 -0.17039 -0.17121 -0.17947 -0.20391 -0.25823α0.9 0.05987 0.08139 0.10505 0.12349 0.16456SP 0.05409 0.06092 0.08284 0.07739 0.09338

Spec 6: T = 25

ε2 0.00235 0.00348 0.00349 0.00587 0.26779α -0.21081 -0.23916 -0.24798 -0.25814 -0.07913β 1.09102 1.11034 1.13267 1.11169 0.88462

α + β 0.88021 0.87117 0.88468 0.85355 0.80549α-ve 0.94500 0.95400 0.94300 0.89100 0.63000α0.1 -0.30796 -0.34425 -0.37524 -0.43527 -0.52990α0.9 -0.12909 -0.14940 -0.13723 0.01229 0.12918SP 0.06241 0.07212 0.08275 0.07723 0.09389

Spec 6: T = 10

ε2 0.00807 0.01161 0.03169 0.10167 0.02556α -0.59415 -0.73003 -0.82042 -0.76439 0.05802β 1.30103 1.41969 1.45964 1.43747 0.90314

α + β 0.70688 0.68965 0.63922 0.67307 0.96117α-ve 0.84700 0.88600 0.90200 0.72600 0.46000α0.1 -1.22550 -1.48795 -1.63520 -1.71070 -0.06138α0.9 0.05744 0.01799 -0.00161 0.08590 0.11302SP 0.07088 0.08017 0.08276 0.08821 0.07816

Table 2.3: Average parameter estimates for the model in Spec 5 (α0 = 0.09, β0 = 0.90). Thecross-section estimates for ωi are summarized by the mean-squared error term, ε2. α-ve reportsthe proportion of α < 0. α0.10 and α0.90 report the 10% and 90% quantiles of the α estimates,respectively. SP reports the proportion of times the estimation safeguard was triggered divided bythe number of times the objective function was called.

Page 63: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 51

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 00

100

200

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

50

100

0.6 0.7 0.8 0.9 1 1.10

200

400+

(a) T = 25, N = 50

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 00

100

200

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

200

400

0.6 0.7 0.8 0.9 1 1.10

200

400+

(b) T = 25, N = 25

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 00

200

400

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

200

400

0.6 0.7 0.8 0.9 1 1.10

200

400+

(c) T = 25, N = 10

Figure 2.3.1: Histogram plots for the estimated α and β of 1,000 Monte Carlo replications for Spec3, α0 = 0.04, β0 = 0.95.

Page 64: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 52

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 00

100

200

0.9 1 1.1 1.2 1.3 1.40

100

200

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

100

200+

(a) T = 25, N = 50

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 00

100

200

0.9 1 1.1 1.2 1.3 1.40

100

200

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

100

200+

(b) T = 25, N = 25

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 00

100

200

0.9 1 1.1 1.2 1.3 1.40

50

100

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

500

1000+

(c) T = 25, N = 10

Figure 2.3.2: Histogram plots for the estimated α and β of 1,000 Monte Carlo replications for Spec5, α0 = 0.02, β0 = 0.97.

Page 65: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.3 Monte Carlo study 53

-0.6 -0.4 -0.2 0 0.2 0.40

100

200

0.8 0.9 1 1.1 1.2 1.3 1.40

100

200

0.6 0.7 0.8 0.9 1 1.10

100

200+

(a) T = 25, N = 50

-0.6 -0.4 -0.2 0 0.2 0.40

100

200

0.8 0.9 1 1.1 1.2 1.3 1.40

50

100

0.6 0.7 0.8 0.9 1 1.10

50

100+

(b) T = 25, N = 25

-0.6 -0.4 -0.2 0 0.2 0.40

100

200

0.8 0.9 1 1.1 1.2 1.3 1.40

200

400

0.6 0.7 0.8 0.9 1 1.10

200

400+

(c) T = 25, N = 10

Figure 2.3.3: Histogram plots for the estimated α and β of 1,000 Monte Carlo replications for Spec5, α0 = 0.09, β0 = 0.90.

Page 66: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.4 A modied GARCH(1, 1) model 54

2.4 A modied GARCH(1, 1) model

We have seen a frequent occurrence of negative α estimates in our Monte Carlo study at small

sample sizes. Leong and Hillebrand (2018) show in their second proposition that if the constraint

(2.3.1) is imposed, and a downward shift in the squared return occurs, i.e. ε2i,t < ε2

i,t−1, the

GARCH(1, 1) conditional variance has diculties capturing such a shift. Loosely speaking, by

inspecting the conditional variance equation in (2.2.2), when all the parameters are positive and

β0 ≈ 1, we have that σ2i,t > σ2

i,t−1. Therefore, again loosely speaking, as a means of overcoming the

diculty of capturing a downward shift, α has the tendency to be estimated in the negative region

at least in very small samples. To ameliorate this problem, it would be useful for the GARCH(1, 1)

model to allow for negative terms in the conditional variance equation. Following a suggestion by

Robin Lumsdaine, we propose to modify the GARCH(1, 1) model by applying a misspecication,

see also Lumsdaine and Ng (1999). The modied GARCH(1, 1) model is

σ2i,t = ωi + αε2

i,t−1 + γ1εi,t−11 εi,t−1 > 0+ γ2εi,t−11 εi,t−1 < 0+ βσ2i,t−1, (2.4.1)

where 1 . is the indicator function.

The one-period lagged return term, εi,t−1, can be either positive or negative. Therefore, to incor-

porate negative terms in (2.4.1), the expected signs of γ1 and γ2 should be the opposite of that

of εi,t−1, i.e. γ1 < 0 and γ2 > 0. If this is the case, we conjecture that the modied model has

the potential to capture negative shifts in ε2i,t. The addition of the extra terms in the modied

model is not entirely without unwanted consequences. For example, they do cause an interpreta-

tion problem. The expected sign of γ2 contradicts one of the well-documented stylized facts seen

in nancial data, namely, the leverage eect rst documented by Black (1976). The leverage eect

highlights the inverse relationship between return and volatility, stating that a negative past return

contributes to a higher future conditional variance. For a leverage eect to occur, it has to be that

γ2 < 0, since γ2 is the parameter associated with a negative lagged return. For this reason, the

additional terms should be viewed as auxiliary terms added to the conditional variance equation,

and their sole purpose is to serve as a small-sample bias correction. They should not be assigned

any interpretation beyond this.

Let the data-generating parameters be ω0 = 0.001, α0 = 0.04, and β0 = 0.90 in (2.2.2). We

estimate the original GARCH(1, 1) and the modied GARCH(1, 1) in (2.4.1) for dierent sizes of

Page 67: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.4 A modied GARCH(1, 1) model 55

T 50 100 150 250Parameters DGP Ori Mod Ori Mod Ori Mod Ori Modi

ω 0.001 0.0026 0.0185 0.0031 0.0426 0.0029 0.0486 0.0030 0.0828α 0.04 -0.0849 0.0270 -0.0007 0.0340 0.0240 0.0252 0.0394 0.0396γ1 - - -0.0388 - -0.0150 - -0.0067 - 0.0025γ2 - - 0.0405 - 0.0168 - 0.0089 - -0.0014β 0.90 0.9299 0.8389 0.8167 0.6326 0.7905 0.6197 0.7792 0.4526α-ve - 0.7830 0.4750 0.5660 0.4330 0.4110 0.4620 0.2220 0.3810

Table 2.4: Average parameter estimates and proportion of negative α estimates, α-ve of the original(Ori) vs. the modied (Mod) model for dierent sizes of T and N = 1 over 1,000 Monte Carloreplications.

T , and N = 1. Table 2.4 averages the estimated parameters over 10,000 Monte Carlo replications.

The modied model produces positive α on average across all T , whereas a positive α average

is produced by the original model only when T is equal to 150 and above. Additionally, the

percentage of α estimates that are negative in the modied model is lower for T equal to 100 and

below. The modied model is misspecied after all, and as T grows, it is expected to perform

worse than the original model. For example, at T equal to 150, the original model begins to have a

lower negative α percentage than the modied model. In addition, at T equal to 250, the modied

model begins to have contradictory signs of γ1 and γ2. At T equal to 100 and below, the estimated

signs of γ1 and γ2 are still consistent with the modied model's ability to introduce negative terms

in the conditional variance equation.

To further study the relationship between the sample size T and α, we conduct a simulation study

as follows. With the same data-generating parameter value for ω0 and varying values for α0, β0,

we record the minimum sample sizes that are needed to produce a positive α on average over

1,000 Monte Carlo replications for N = 1. The results are presented in Table 2.5 and demonstrate

that the modied model requires a lower sample size to produce a positive α on average. We also

present the ratio between the minimum sample sizes needed to produce positive α on average for

these two models. A common ratio of close to 0.5 indicates that the modied model needs about

half of the sample size required by the original model to produce positive α on average.

For N = 1, and for various sample sizes T = 25, 50, and 100, we compare the modied and original

Page 68: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.4 A modied GARCH(1, 1) model 56

α0

β0 0.80 0.85 0.90Tm To Ratio, Tm/To Tm To Ratio, Tm/To Tm To Ratio, Tm/To

0.04 34 66 0.52 52 71 0.73 26 84 0.310.07 20 38 0.53 23 49 0.47 32 60 0.530.09 18 38 0.47 23 38 0.61 35 56 0.63

Table 2.5: Tm and To represent the minimum sample size from the modied and original models,respectively, to yield positive α on average over 1,000 Monte Carlo repetitions for N = 1. Theratio reported is computed as Tm/To. A ratio of less than one indicates that the modied modelneeds fewer observations to yield positive α on average.

models in terms of the proportions of negative α estimates, and the sample variances of the negative

α estimates in Table 2.6. This helps us understand why the modied model performs better than

the original model in small samples. We nd that the proportion of negative α is smaller for the

modied model than for the original model in a majority of cases. Exceptions occur if T = 100 or

if α0 is relatively large (0.09). For example, at T = 75, for (α0, β0) = (0.07, 0.85), the proportions

of negative α estimates produced by the modied and original models are approximately equal, at

0.417 and 0.433, respectively. The sample variance of the negative α estimate is, however, lower

for the modied model (0.009) relative to the original model (0.026).

Consider T = 75 with (α0, β0) = (0.09, 0.80). The proportion of negative α estimates produced

by the modied model is slightly higher at 0.388 compared to 0.349 produced by the original

model. The sample variance of the negative α estimate is still lower for the modied model (0.007)

relative to the original model (0.028). In general, the modied model produces a lower proportion

of negative α estimates and a lower sample variance of the negative α estimates for T = 50, 75

relative to the original model. At T = 100, the original model starts to produce a lower proportion

of negative α estimates, but the sample variances of the negative α estimates are still higher

compared to the modied model in most cases.

Page 69: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.4 A modied GARCH(1, 1) model 57N

=1,T

=50

α0

β0

0.80

0.85

0.90

αm −ve

αo −ve

Vm α

Vo α

αm −ve

αo −ve

Vm α

Vo α

αm −ve

αo −ve

Vm α

Vo α

0.04

0.348

0.667

0.005

0.036

0.399

0.716

0.007

0.035

0.480

0.783

0.010

0.028

0.07

0.360

0.567

0.007

0.044

0.389

0.629

0.011

0.040

0.490

0.709

0.025

0.034

0.09

0.366

0.519

0.010

0.046

0.402

0.561

0.016

0.043

0.561

0.666

0.037

0.041

N=

1,T

=75

α0

β0

0.80

0.85

0.90

αm −ve

αo −ve

Vm α

Vo α

αm −ve

αo −ve

Vm α

Vo α

αm −ve

αo −ve

Vm α

Vo α

0.04

0.338

0.546

0.004

0.024

0.392

0.575

0.006

0.022

0.467

0.633

0.009

0.022

0.07

0.385

0.409

0.006

0.026

0.417

0.433

0.009

0.026

0.454

0.483

0.020

0.025

0.09

0.388

0.349

0.007

0.028

0.422

0.362

0.012

0.028

0.486

0.402

0.058

0.025

N=

1,T

=10

0

α0

β0

0.80

0.85

0.90

αm −ve

αo −ve

Vm α

Vo α

αm −ve

αo −ve

Vm α

Vo α

αm −ve

αo −ve

Vm α

Vo α

0.04

0.329

0.467

0.002

0.014

0.362

0.490

0.004

0.013

0.442

0.520

0.007

0.014

0.07

0.345

0.337

0.004

0.017

0.397

0.325

0.006

0.016

0.446

0.368

0.014

0.015

0.09

0.376

0.233

0.005

0.018

0.398

0.241

0.008

0.016

0.446

0.273

0.040

0.014

Table2.6:

ForN

=1andT

=50,

75,

100over

1,000Monte

Carlo

repetitions,thestatisticsαm −veandαo −ve

reporttheproportion

ofnegativeαforthemodied

andoriginal

models,respectively.Additionally,Vm αand

Vo αreportthesamplevariancesofαforthemodied

andoriginal

models,respectively.

Page 70: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.5 Empirical applications 58

2.5 Empirical applications

2.5.1 Commodity terms of trade

As a rst empirical application, we apply the modied model to the panel data set considered

in Cavalcanti et al. (2014). The data set consists of the growth rates of the so-called commodity

terms of trade (CToT), which are a measure for gauging a country's import and export activities.

There are a total of 118 countries and 38 time series observations for each country (N = 118,

T = 38). The growth rates of CToT are obtained by taking the logarithmic dierence of CToT.

After dierencing, the eective sample size is T = 37 for each country. The growth rates of

CToT and their squared values, stacked across all countries, are illustrated in Figure 2.5.1. Notice

that N/T ≈ 3.2, that is, the number of cross-sectional units is 3.2 times larger than the sample

period. To model the unobserved latent conditional variances for the growth rates of CToT, a

panel GARCH(1, 1) model is estimated.

The estimated parameters of the original model compared to the modied model in (2.4.1) are

presented in Table 2.7. We omit reporting the 118 estimates of ωi. Although the modied model

produces a β estimate that is less than one, the estimate of α is still negative. In this case, it

is even more highly negative than in the original model. The estimation process is numerically

challenging due to the large number of cross-sectional elements present, but with small T in each

cross-sectional element. The estimated conditional variances of the original and modied models

are shown in Figure 2.5.2.

2.5.2 Ination rates

A classical example of an application of conditional heteroskedasticity models is by Engle (1982)

covering United Kingdom quarterly ination from the period 1958-1977. In our second empirical

example, we consider a panel of yearly ination data from members of the G7 group of countries

from the period 1966-1990. We have a total of 25 time series observations for each member (N = 7,

T = 25). The ination rates and their squared values stacked across all countries are illustrated in

Figure 2.5.3. In this case, N/T ≈ 0.28, that is, the sample period is much longer than the number

of cross-sectional units. We use a panel GARCH(1, 1) model to estimate the unobserved latent

conditional variances for the ination rates.

Page 71: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.5 Empirical applications 59

500 1000 1500 2000 2500 3000 3500 4000

No. of observations

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

CT

oT g

row

th r

ates

Stacked panel data of CToT growth rates

(a) CToT growth rates

500 1000 1500 2000 2500 3000 3500 4000

No. of observations

0.05

0.1

0.15

0.2

0.25

(CT

oT g

row

th r

ates

)2

Stacked panel data of squared CToT growth rates

(b) Squared CToT growth rates

Figure 2.5.1: The growth rates of CToT (left gure) and their squared values (right gure). Bothare stacked across all cross sections with a total number of 4,366 observations (T×N, T = 37, N =118).

Est. parameters Original Modied

α -0.0385 -0.0559(0.0268) (0.0188)

γ1 - -0.0001(0.0015)

γ2 - -0.0001(0.0006)

β 1.0321 0.8441(0.0332) (0.0207)

logL/(T ×N) 3.9622 3.9155

Table 2.7: Comparison between the estimated parameters, produced by the original and modiedmodels, applied to the growth rates of CToT. The standard errors of the estimates are providedin the parentheses.

Page 72: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.5 Empirical applications 60

500 1000 1500 2000 2500 3000 3500 4000

No. of observations

0.005

0.01

0.015

0.02

0.025

Con

d. v

aria

nces

Estimated conditional variances of CToT growth rates from panel GARCH(1, 1)

Original modelModified model

Figure 2.5.2: The stacked conditional variance estimates of CToT growth rates produced by theoriginal (blue line) vs. modied (red line) models.

Est. parameters Original Modied

α -0.0772 0.0389(0.0419) (0.0418)

γ1 - 0.0844(0.0081)

γ2 - 0.0947(0.0402)

β 1.0832 0.6854(0.0704) (0.0392)

logL/(T ×N) 2.6849 2.6977

Table 2.8: Comparison between the estimated parameters, produced by the original vs. modiedmodels, applied to the G7 ination rates. The standard errors of the estimates are provided in theparentheses.

The estimated conditional variances for the ination rates are presented in Figure 2.5.4. As before,

we omit reporting the estimates of ωi in Table 2.8. The modied model produces both positive α

and β estimates, and β < 1. In contrast, the original model produces a negative α estimate and a

β estimate above one. The log-likelihood value for the modied model is also higher than for the

original model. The γ1 estimate in the modied model is opposite to its expected sign, however.

Page 73: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.5 Empirical applications 61

0 20 40 60 80 100 120 140 160 180

No. of observations

-0.05

0

0.05

0.1

0.15

0.2

0.25

G7

infla

tion

rate

s

Stacked panel data of G7 inflation rates

CanadaFranceGermanyItalyJapanUnited KingdomUnited States

(a) G7 ination rates

0 20 40 60 80 100 120 140 160 180

No. of observations

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

(G7

infla

tion

rate

s)2

Stacked panel data of squared G7 inflation rates

CanadaFranceGermanyItalyJapanUnited KingdomUnited States

(b) Squared G7 ination rates

Figure 2.5.3: The G7 ination rates (left gure) and their squared values (right gure). Both arestacked across all cross sections with a total number of 175 observations (T ×N, T = 25, N = 7).The vertical lines, each representing dierent members of the G7 mark the split/end of a crosssection.

0 20 40 60 80 100 120 140 160 180

No. of observations

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Con

d. v

aria

nces

Estimated conditional variances of G7 inflation rates from panel GARCH(1, 1)

CanadaFranceGermanyItalyJapanUnited KingdomUnited States

Figure 2.5.4: The stacked conditional variance estimates of ination rates, produced by the original(blue line) vs. modied (black dotted line) models. The vertical lines, each representing dierentmembers of the G7 mark the split/end of a cross section.

Page 74: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.6 Conclusion 62

2.6 Conclusion

The phenomenon of negative α estimates in GARCH(1, 1) on small sample sizes also occurs in

the panel GARCH(1, 1) model. In our Monte Carlo study, we show that the α estimates can

become more frequent and more highly negative when N > 1. We propose a modication of the

GARCH(1, 1) conditional variance equation by adding two signed terms that can take negative

values. We show in simulations that the modied model outperforms the original model in terms

of producing a lower percentage of negative α estimates for T less than 100 at N = 1. Additionally,

the modied model requires a smaller T to produce a positive average α estimate across a range of

α0 and β0 values commonly reported in empirical studies. This is due to the ability of the modied

model to produce a lower proportion of negative α estimates and a lower sample variance of the

α estimates for T less than 100 at N = 1. We apply the proposed model to macroeconomics data

in two dierent empirical examples. In both empirical examples, the original panel GARCH(1, 1)

model produces negative α estimates and β estimates greater than one, in line with the results

from the simulation. The modied GARCH(1, 1) model also produces a negative α estimate in

the rst example, but in the second example, it performs better by producing both a positive α

estimate, and a β estimate below one.

2.7 References

Arellano, M. and Honoré, B. (2001), Panel data models: Some recent developments, in J. Heckmen

and E. Leamer, eds, `Handbook of econometrics', Vol. 5, North-Holland, Amsterdam, pp. 3229

3296.

Baillie, R. T. and Bollerslev, T. (2002), `The message in daily exchange rates: A conditional-

variance tale', Journal of Business and Economic Statistics 20(1), 6068.

Bauwens, L. and Rombouts, J. (2007), `Bayesian clustering of many GARCH models', Econometric

Reviews 26(2-4), 365386.

Black, F. (1976), Studies of stock price volatility changes, in `Proceedings of the American Statis-

tical Association', Business and Economic Statistics Section, pp. 177181.

Page 75: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.7 References 63

Bollerslev, T. (1987), `A conditionally heteroskedastic time series model for speculative prices and

rates of return', Review of Economics and Statistics 69(3), 542547.

Brooks, C., Burke, S. P. and Persand, G. (2001), `Benchmarks and the accuracy of GARCH model

estimation', International Journal of Forecasting 17(1), 4556.

Cavalcanti, D. V., Tiago, V., Mohaddes, K. and Raissi, M. (2014), `Commodity price volatility

and the sources of growth', Journal of Applied Econometrics 30(6), 857873.

Cermeño, R. and Grier, K. B. (2006), Conditional heteroskedasticity and cross-sectional depen-

dence in panel data: an empirical study of ination uncertainty in the G7 countries, in B. Baltagi,

ed., `Panel Data Econometrics Theoretical Contributions and Empirical Applications', Elsevier,

New York, pp. 259277.

Diggle, P., Diggle, P. J., Heagerty, P., Heagerty, P. J., Liang, K.-Y. and Zeger, S. (2002), Analysis

of Longitudinal Data, Oxford University Press, Oxford.

Engle, R. F. (1982), `Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom ination', Econometrica 50(4), 9871007.

Engle, R. F. (2002), `Dynamic conditional correlation: A simple class of multivariate general-

ized autoregressive conditional heteroskedasticity models', Journal of Business and Economic

Statistics 20(3), 339350.

Engle, R. F., Ng, V. K. and Rothschild, M. (1990), `Asset pricing with a factor-ARCH covariance

structure: Empirical estimates for treasury bills', Journal of Econometrics 45(1-2), 213237.

Glosten, L. R., Jagannathan, R. and Runkle, D. E. (1993), `On the relation between the expected

value and the volatility of the nominal excess return on stocks', Journal of Finance 48(5), 1779

1801.

Higgins, M. L. and Bera, A. K. (1992), `A class of nonlinear ARCHmodels', International Economic

Review 33(1), 137158.

Hwang, S. and Valls Pereira, P. L. (2006), `Small sample properties of GARCH estimates and

persistence', European Journal of Finance 12(6-7), 473494.

Page 76: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

2.7 References 64

Lee, J. (2010), `The link between output growth and volatility: Evidence from a GARCH model

with panel data', Economics Letters 106(2), 143145.

Leong, W. R. and Hillebrand, E. (2018), GARCH(1, 1) on small samples, Technical report, Aarhus

University and CREATES.

Lumsdaine, R. L. and Ng, S. (1999), `Testing for ARCH in the presence of a possibly misspecied

conditional mean', Journal of Econometrics 93(2), 257279.

McCullough, B. and Renfro, C. G. (1998), `Benchmarks and software standards: A case study of

GARCH procedures', Journal of Economic and Social Measurement 25(2), 5971.

Mezrich, J. and Engle, R. F. (1996), `GARCH for groups', Risk 9(8), 3640.

Newey, W. K. and McFadden, D. (1994), Large sample estimation and hypothesis testing, in R. F.

Engle and D. L. McFadden, eds, `Handbook of Econometrics, Vol. 4', Elsevier Science B.V.,

pp. 21112245.

Ng, H. S. and Lam, K. P. (2006), How does sample size aect GARCH models?, in `9th Joint

International Conference on Information Sciences (JCIS-06)', Advances in Intelligent Systems

Research.

Pakel, C., Shephard, N. and Sheppard, K. (2011), `Nuisance parameters, composite likelihoods

and a panel of GARCH models', Statistica Sinica 21(1), 307329.

Varin, C., Reid, N. and Firth, D. (2011), `An overview of composite likelihood methods', Statistica

Sinica 21(1), 542.

Zakoïan, J.-M. (1994), `Threshold heteroskedastic models', Journal of Economic Dynamics and

Control 18(5), 931955.

Page 77: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

Chapter3Pairs trading with cointegration on

multiple stock indices

Wei Ruen Leong

Aarhus University and CREATES

Abstract

This paper investigates pairs trading, a popular statistical arbitrage trading strategy. The

study applies the cointegration approach to select pairs for trading and compares the protabil-

ity of pairs trading across four dierent stock indices. The four dierent stock indices selected

are S&P 500, S&P 500 small-cap, XU100, and Ibovespa100. Although a signicant prot is

obtained when the cointegration approach is implemented on stocks from the S&P 500, losses

are, however, also signicant and huge when I implement cointegration on stocks selected from

other indices. The results from the study challenge the existing ones in the literature, suggesting

that consistent prot is not always the case and losses can be incurred. A post-study analysis

on how pairs trading could fail due to the short-lived cointegrating relationship is performed.

Recommendations are given to overcome such issue in the concluding section that could pave

the way for future studies.

65

Page 78: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.1 Introduction 66

3.1 Introduction

A statistical arbitrage technique seeks to prot from anomalies in the nancial market identied

through statistical tools, and pairs trading is one of the most widely used statistical arbitrage

techniques. The idea of pairs trading was rst conceived in Wall Street around mid-1980s by

Nunzio Tartaglia and his group of computer scientists, mathematicians and physicists at Morgan

Stanley (Huck, 2010). Their algorithm generated hundreds of millions of prots for the rm, but

the group, however disbanded in 1989 after consecutive periods of losses. Twenty years later, pairs

trading is still one of the most popular statistical arbitrage strategies employed by hedge funds

and investment banks alike (Gatev et al., 2006). This is not alarming, given that the underlying

concept of pairs trading is straightforward: nd a pair of stocks that co-vary with each other closely,

then when their prices diverge, short the high-performer and long the under-performer, betting on

eventual convergence. The literature has not devoted much attention to it due to its proprietary

nature until recently, and the study of pairs trading in the nancial literature is therefore still

in its infancy. Since the seminal paper by Gatev et al. (2006) (henceforth GGR) who explore

the performance of pairs trading, the literature has started to grow, and over the years we have

witnessed many interesting follow-ups and extensions. Traditionally, the pairs trading methodology

identies stocks to trade based on correlation and other non-parametric means, for example in

GGR, stocks are identied by minimizing the sum of pairwise squared historical deviations. In

this paper, I explore pairs trading through selection by parametric means applying the method of

cointegration.

Since the early version of cointegration test invented by Engle and Granger (1987) and later rened

by Johansen (1991), the method of cointegration has been applied in a nancial context by Diebold

et al. (1994) to investigate the exchange rate dynamics and by Dwyer Jr. and Wallace (1992) to

uncover the relationship between market eciency and cointegration, for example. Researchers

have been interested in applying cointegration in the context of nancial trading (Caldeira and

Guilherme, 2013; Lin et al., 2006; Vidyamurthy, 2004). It is well documented that some price series

are cointegrated. This makes sense by intuition since there are stocks that respond to the same

idiosyncratic stocks, such as stocks of close competitors or common stock and preference stock

from the same underlying company. Those stocks can be thought of as being close substitutes for

one another and by the law of one price, they should be priced at one common level if the market

Page 79: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.1 Introduction 67

is truly ecient; otherwise an arbitrage prot can be made. The method of cointegration in pairs

trading is to identify stocks that have the same common stochastic trend and will more or less

react similarly to market shocks. A trading rule can then be devised to capture arbitrage prot

when one stock sways too far away from the other stock, believing the deviation is temporary

based on their cointegrated relationship. Despite its promise, pairs trading with cointegration is,

however, not risk free, and in fact, the main problem is that stocks can be cointegrated in-sample

but the relationship may no longer hold out-of-sample. Another issue is that high trading costs

may prevent arbitrage prot. However, consistent protability has been documented in the pairs

trading literature.

This paper aims to investigate if a simple pairs trading strategy built upon cointegration can be

consistently protable across dierent equity markets. The main study is performed on stocks

selected from the S&P 500 index. The S&P 500 small-cap index is also selected due to the

evidence presented in Fama and French (1993) documenting the strong performance of small and

value stocks. Additionally, the S&P 500 small-cap also has a higher trading cost compared to

the main index. I want to investigate if the higher trading costs will reduce the protability of

pairs trading. There have been studies that document the protability of pairs trading when

implemented outside of the US equity market. I choose not to restrict my attention to the US

equity market. For instance, Bolgün et al. (2010) replicates GGR's methodology to the Istanbul

stock exchange, nding prot as high as 3.36% per month. A similar study is performed by Perlin

(2009), nding a highly volatile annualized raw returns ranging from -24% to 38% using data

from the Brazilian stock exchange. Caldeira and Guilherme (2013) study pairs trading in the

same market, but with the cointegration method, and they nd a high annualized excess return of

16.38%. Broussard and Vaihekoski (2012) report a similar prot level of 12.5% per annum following

GGR's methodology with data from the Finnish stock exchange. However, there have been few

papers in the literature that compare protability across countries in the same study. Motivated

by the empirical ndings, apart from the S&P 500 index, I will do a cross-market prot comparison

by implementing the pair trading algorithm on multiple stock indices: S&P 500 small-cap, XU100

(Turkey), and Ibovespa100 (Brazil).

The rest of the paper is organized as follows. Section 3.2 gives a literature review on pairs trading,

Section 3.3 provides an overview on the theory of cointegration and its relation with pairs trading,

Page 80: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.2 Literature review 68

Section 3.4 explains the trading algorithm and the methodology employed, Section 3.5 presents

and discusses the main application, Section 3.6 provides a post-study analysis and nally Section

3.7 concludes.

3.2 Literature review

At the present time, the pairs trading selection techniques are diverse in the literature. According

to Huck and Afawubo (2015), they can be categorized into three main methods: the minimum

distance approach, the modeling of the mean reversion, and the combined forecasts approach and

Multi-Criteria Decision Methods (MCDM). A dierent categorization of the approaches is done in

a survey by Krauss (2017), who groups the approaches into a group of ve instead of three.

The rst method, the minimum distance approach is a simple non-parametric approach to select

pairs by minimizing the squared distance, SSDi,j between two standardized stock price series, pi,t

and pj,t

SSDi,j := mini 6=j

∑t

(pi,t − pj,t)2. (3.2.1)

Early papers in the literature adopt this approach for pairs selection due to its simplicity. Once

pairs have been identied for trading, a long/short strategy is constructed. When an upward

(downward) two standard deviation movement in the standardized price dierence from the zero

mean is recorded, then the rst stock in the pair is short (long) for some dollar value, and the

second stock in the pair is long (short) for the same dollar value. The two-standard-deviation

rule is determined arbitrarily, pioneered by GGR and has become the convention in later papers.

The two-standard-deviation bound can be thought of as the 95% condence interval of a Gaussian

distributed random variable, and any deviation from the bound is considered to be an anomaly,

but as explained in Section 3.4 it turns out that the optimal choice is not two standard deviations

but 0.75 standard deviation.

The second method, the modeling of the mean reversion of a time series can be done in various

ways. I describe the two most popular methods here. The rst method is to model the mean

reversion as a stochastic spread. The stochastic spread method is rst explained analytically by

Elliott et al. (2005) without considering any empirics. The main idea is to use a mean-reverting

Page 81: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.2 Literature review 69

stochastic process such as

dpt = α(µ− pt)dt+ βdWt, (3.2.2)

where (3.2.2) represents the Ornstein-Uhlenbeck (OU) process with parameters α, µ and, β rep-

resenting the rate of mean reversion, the long-term mean, and the volatility of the process, re-

spectively, and Wt is the Wiener process. The OU process can be used to capture the behavior

of pairs that have a tendency to converge to some stable equilibrium level. Do et al. (2006) later

expand the method by considering some empirics, and they assume that the spread is driven by a

latent state variable that follows an OU process. By equating the observable spread to be driven

by some mean-reverting process and an additive Gaussian white noise, the model can be estimated

through the Kalman lter in a state space setting. The second method is to apply the concept of

cointegration to a pair of stocks price series, p1t and p2t. For example, consider the case p1t and

p2t to be cointegrated with cointegrating vector (1, β)′, then the process

p1t − βp2t, (3.2.3)

is stationary. The cointegrating parameter β, in (3.2.3) has to be estimated in order to construct

the estimated spread, p1t − βp2t, that is used for trading due to its mean-reverting property.

Since this paper adopts the method of cointegration, I will refrain from commenting further since

elaborations are already stated in Section 3.3.

The third method combines neural networks to forecast prices and to make multi-criteria deci-

sions. This approach is uniquely dierent compared to the other two approaches because their

implementations rely on machine learning techniques without any reference to any economic equi-

librium model. Huck (2009) and Huck (2010) are the main papers in this approach. According

to Huck (2010), a non-technical overview of this approach is based on three phases: forecasting,

ranking, and trading. For the forecasting phase, Huck (2009) uses the Elman neural network (El-

man, 1990), and Huck (2010) uses a multi-step (up to four-step ahead) forecast approach. For the

ranking phase, both papers use a multi-criteria decision method known as the Electre III method

(see Greco et al. (2016) for a complete description of this method). The ranking system places

undervalued stocks at the top and overvalued stocks at the bottom. The nal phase, the trading

phase is implemented based on the ranking output: long the stocks on the top of the ranking and

short the bottom-ranked stocks.

Page 82: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.2 Literature review 70

There are other methods that are not categorized under the three main approaches, but have been

given some attention in the literature. See Krauss (2017) for a complete list of such methods. I

describe two of the other methods here. The rst method is the copula approach. Two representa-

tive studies in this domain are by Stander et al. (2013) and Rad et al. (2016). A copula function is

dened as a joint multivariate distribution function with uniform univariate marginal distributions.

Consider a pair of random variables X and Y (e.g. a pair of stocks), with distribution functions

F (X) and F (Y ), and joint bivariate distribution function F (X, Y ). After applying probability

integral transformation, one can write U = F (X) and V = F (Y ), where U, V are both uniform

random variables in the interval (0, 1). By Sklar's theorem (Sklar, 1996), their copula function is

given as C(u, v) = P (U ≤ u, V ≤ v).

The choices of copula functions considered by Rad et al. (2016) in his paper are Clayton, Rotated

Clayton, Gumbel, Rotated Gumbel, and Student-t copulas, whereas Stander et al. (2013) employ

a set of twenty two dierent Archimedean copulas. By the theory of copula, the partial derivative

of the copula function gives the conditional distribution functions (Nelsen, 2007) as follows:

∂C(u, v)

∂v:= hU(u|v) = P (U ≤ u|V = v), (3.2.4)

∂C(u, v)

∂u:= hV (v|u) = P (V ≤ v|U = u). (3.2.5)

The copula trading approach uses the functions hU and hV in (3.2.4) and (3.2.4) to estimate the

probability of outcomes where one random variable is less than a certain values, given the other

random variable has a specic value. For example, Stander et al. (2013) considers the condence

bands P (U ≤ u|V = v) = 0.05 and P (V ≤ v|U = u) = 0.95 in their study.

The second method is the Principle Component Analysis (PCA) approach. The main paper in

this approach is by Avellaneda and Lee (2010). The approach starts with a multi-factor model

with m factors to decompose the return of a stock into systematic and idiosyncratic components

as follows:

Ri =n∑j=1

βijF(j) + εi, (3.2.6)

where Ri is the return of stock i, βi,j is the factor loading for stock i and factor j. The sum,∑βijF

(j) represents the systematic component, and εi is the residual term that represents the

Page 83: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.3 Method of cointegration 71

idiosyncratic component. Avellaneda and Lee (2010) use PCA to extract the factors in (3.2.6),

and they assume that the stock return satisfy the following dierential equation

dPtPt

= αdt+n∑j=1

βjF(j) + dXt, (3.2.7)

where Pt is the stock price time series with drift parameter α, and dXt is some stationary process.

Avellaneda and Lee (2010) considers the OU process for Xt in (3.2.7), and apply a trading model

similar to Elliott et al. (2005) to implement trading.

Traditionally, papers in the literature have restricted themselves to the implementation of pairs

trading in the equity market, and studies of pairs trading in other asset classes are usually uncom-

mon. To end this section, I present a short list of pair trading studies involving other asset classes

here for the interested readers: Exchange Traded Funds (ETFs) (Avellaneda and Lee, 2010; Schizas

et al., 2011), energy futures (Dunis et al., 2008; Cummins and Bucca, 2012), and commodity futures

(Bianchi et al., 2009; Liu and Chou, 2003).

3.3 Method of cointegration

3.3.1 Theoretical overview

The concept of cointegration is briey introduced here since it is the method applied for the

selection of stocks. For the sake of presentation, the denition of cointegration provided by Engle

and Granger (1987) is adopted, although there are alternative denitions such as the one provided

by Lütkepohl (2005).

Let yt = (yt1, yt2, . . . , ytk)′ of dimension k×1 be a k-variate time-series vector. If all components of

yt are I(d), i.e. integrated of order d, and additionally if there exists β = (β1, . . . , βk)′ ∈ Rk\0,

such that εt = β′yt is I(d − b). Then, yt is said to be cointegrated of order (d, b), denoted by

yt ∼ C(d, b).

The most common case is when all components of yt are I(1) and β′yt is I(0). This will be the

case considered in this pairs trading study.

Consider a VAR(p) model where the individual variables in yt are either I(1) or I(0) as follows:

Page 84: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.3 Method of cointegration 72

yt = µ+ A1yt−1 + A2yt−2 + . . .+ Akyt−p + ut, (3.3.1)

where µ is a k × 1 vector containing the intercept terms, Ai is a k × k coecient matrix for

i = 1, 2, . . . , k and ut is a k × 1 residual vector. Note that both µ and Ai's are time invariant or

in other words they do not depend on t.

By adding and subtracting the appropriate lagged terms, yt−i, (3.3.1) can be represented in an

Error Correction Model (ECM) form as follows:

∆yt = µ+ Πyt−1 +

p−1∑i=1

Γi∆yt−i + ut, (3.3.2)

where Π = −(Ik − A1 − . . . − Ap) and Γi = −(Ai+1 + . . . + Ap), both are matrices of dimensions

k × k, and Ik represents an identity matrix of dimension k × k. However, (3.3.2) still needs to be

balanced to make sense. If rank(Π) = r < k, then Π can be decomposed into αβ′, where α and β

are both k× r vectors. Although yt is inherently non-stationary, there exists r linear combinationssuch that β′yt ∼ I(0). Therefore, (3.3.2) is balanced as follows:

∆yt︸︷︷︸I(0)

= µ+ α β′yt−1︸ ︷︷ ︸r cointegrating vectors, I(0)

+

p−1∑i=1

Γi∆yt−i︸ ︷︷ ︸I(0)

+ ut︸︷︷︸I(0)

(3.3.3)

The parameter α, sometimes termed the speed of adjustment, governs the speed of adjustment

to the equilibrium relationship when a shock to the system is registered. My main focus will be

restricted to the time series εt := β′yt, where εt is the spread of price series or spread for short.

When the spread is stationary, then there will be trading opportunities due to its mean reversion

property.

The full-information maximum likelihood (FIML) (Johansen, 1991; Hamilton, 1994) is employed

for the estimation of (3.3.2). Details of the procedure are omitted, but a summary of the main

idea is as follows. FIML assumes that ut from (3.3.2) follows a multivariate Gaussian distribution,

ut ∼ (0, Ω), where 0 is a k× 1 mean vector containing zeros, and Ω is a k× k covariance matrix.

The rank of Π, r is known, i.e. the number of cointegrating relationships is pre-determined. The

Page 85: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.3 Method of cointegration 73

estimation idea is to form an auxiliary regression on yt with a constant, ∆yt−1, . . ., and ∆yt−p+1.

The estimated k × k variance-covariance matrices, Σ of the auxiliary regression residual is then

combined to form a matrix. The eigenvalues of the newly formed matrix, λi are arranged in a

descending order. The r largest eigenvalues are then chosen and their associated eigenvectors, vi

are combined to form an estimate of β. The estimated speed of adjustment, α is subsequently

retrieved as a function of β and a component of Σ. The other parameters µ, Γi, Ω are retrieved

similarly as the results of the interaction of components of Σ and other parameters.

Before the estimation step is performed, the testing step identies the rank of Π. The hypothesis

test that is employed in this study is a likelihood-ratio type test called trace test (Johansen and

Juselius, 1990). Again, the details are omitted, but the main points are briey mentioned. The

null and alternate hypothesis of interest are as follows:

H0 : rank(Π) = r (equivalently λr+1 = λr+2 = . . . = λk = 0)

H1 : rank(Π) > r.

The trace test tests if the eigenvalues, λi's are signicantly dierent from zero. Each λi is always

between 0 and 1, and the closer it is to 1, the stronger the cointegrating relationship. Since the

hypothesis testing involves non-stationary series, the standard analysis does not apply, and the

test statistic is not asymptotically chi-squared distributed. Johansen and Juselius (1990) tabulate

critical values needed for the test through Monte Carlo experimentation.

3.3.2 Pairs trading and cointegration

A dierent representation of ECM is presented here. It is known as the common trend representa-

tion (Juselius, 2006), and is easy to interpret when relating to the idea of cointegration and pairs

trading. Write (3.3.1) in common trend representation as follows:

A(L)yt = µ+ ut, (3.3.4)

where A(L) is the lag polynomial. Assuming that A(L) contains a unit root, then the characteristic

equation |A(z)| = (1 − L)(1 − ρ2L) . . . (1 − ρkL). If AAadj = AadjA = |A|Ik, where Aadj is theadjoint matrix of the lag polynomial A(L), then multiply both sides of (3.3.4) by Aadj. Rearrange

Page 86: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.3 Method of cointegration 74

to obtain

|A|yt = Aadj(µ+ ut)

(1− L)(1− ρ2L) . . . (1− ρkL)yt = Aadj(µ+ ut)

(1− L)yt =Aadj

(1− ρ2L) . . . (1− ρkL)(µ+ ut),

∆yt = C(L)(µ+ ut)

where C(L) := Aadj

(1−ρ2L)...(1−ρkL). Using Taylor theorem and expanding at 1, C(L) = C(1)+C(L)(1−

L) to obtain

∆yt = [C(1) + C(L)(1− L)](µ+ ut)

yt = yt−1 + C(1)µ+ C(1)ut + C(L)(ut − ut−1)

... (recursive substitution)

yt = C(1)t∑i=1

ui︸ ︷︷ ︸non-stationary component

+ C(L)ut︸ ︷︷ ︸stationary component

+ y0 − C(L)u0 + C(1)µt︸ ︷︷ ︸initial values and deterministic trend

, (3.3.5)

and (3.3.5) is the common trend representation of the same system, where yt is governed by a non-

stationary stochastic trend component, a stationary component and deterministic components.

Consider the bivariate case where yt = (y1t, y2t)′, to denote a pair of stock, that is used for pairs

trading. Consider the case y1t and y2t to be cointegrated with cointegrating vector (1, β)′, then

y1t − βy2t is a stationary process by denition. By (3.3.5), write

y1t = C(1)t∑i=1

u1i︸ ︷︷ ︸non-stationary component

+ stationary and deterministic components,

βy2t = βC(1)t∑i=1

u2i︸ ︷︷ ︸non-stationary component

+ stationary and deterministic components.

Page 87: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.4 Trading algorithm 75

Then, it must be the case that C(1)∑t

i=1 u1i = βC(1)∑t

i=1 u2i, so that the non-stationary com-

ponents canceled to make y1t − βy2t a stationary process. This can be interpreted as y1t and y2t

sharing the same price-driving factors up to a constant scalar. A geometric interpretation is as

follows. Let the normalized price-driving vector of factors be m × 1 vectors of f1t and f2t, where

m represents the number of factors; then the inner product between the normalized versions of f1t

and βf2t isf ′1tβf2t|f1t||βf2t| = cosθ. If the stocks are cointegrated, it is expected that their price-driving

vectors point in the same direction and are therefore parallel to each other. This gives θ = 0, so

that the angle between f1t and βf2t is zero. For a more detailed explanation and a more formalized

treatment of these vectors of factors with factors model, see Vidyamurthy (2004).

In reality, however, one cannot expect the relationship f1t = βf2t to hold with strict equality. A

more plausible representation is f1t = βf2t+vt, where vt is some additive error component. One can

then recognize this as an ordinary least squares (OLS) set up. By the OLS formula, β = cov(f1t, f2t)

var(f2t),

the stronger the covariance (or equivalently the correlation) between f1t and f2t, the more likely y1t

and y2t share the same price-driving factors up to a constant scalar, and consequently the stronger

the cointegrating relationship.

3.4 Trading algorithm

3.4.1 Stock selection

The main algorithmic implementation is on the stock selection. To minimize the probability of

selection error, the criterion for a pair to be selected for trading is tightened. A pair has to

be cointegrated as identied by the Engle-Granger cointegration (Engle and Granger, 1987) and

Johansen rank test (Johansen, 1995). For those two cointegration tests, the augmented terms are

xed to be three although a more appropriate method is to vary it by looking for the optimal

number of augmented terms. The optimal number of augmented terms can be determined based

on some criteria, e.g. the Schwartz Criterion and the Akaike Information Criterion. This is not

implemented due to the long computational time. My main study considered 91 stocks. That

means that a total of

(91

2

)= 4095 pairs have to be evaluated, and searching for the optimal

lag length will be computationally cumbersome. If too few augmented terms are chosen, then the

Page 88: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.4 Trading algorithm 76

residuals are not whitened enough, leading to rejection of a potential cointegration relationship.

The choice of three augmented terms for hypothesis testing is a good compromise in most cases.

The Augmented Dickey Fuller (ADF) test is the usual choice for unit root testing in the Engle-

Granger cointegration framework. It is, however, susceptible to size problems when the true

process contains an MA component (Ng and Perron, 2001). Phillips and Perron (1988) design a

modication of the ADF test called the Phillips Perron (PP) test that addresses the above issue by

considering a non-parametric modication to the test statistic. Davidson and MacKinnon (2004)

later show that the PP test often suers from power problems when a nite sample is considered.

A further modication is considered by Perron and Qu (2007) and this modication improves the

nite sample properties of the PP test and their specication is considered when performing unit

root testing.

Once the pairs are selected, they are sorted in descending order according to their eigenvalues,

λi as explained in Section 3.3. The top 20 pairs for trading are then selected for trading. The

eigenvalues, λi indicates the strength of the cointegration relationship, the higher the better. If

there are less than 20 pairs in a period, all of them are traded. It is possible that in a period, there

are no pairs that pass both of the cointegration tests, leading to no trading during that period.

For instance, in a later study, a restriction is introduced such that cointegrated pairs must be from

the same industry, in which case the total number of potential cointegrated pairs drops to about

10

(10

2

)= 450, which is a restrictive sample space.

One of the main problems when estimating cointegrating relationships is identiability. It is well

known that the cointegrating vector that forms the cointegrating relationship is not unique. When

conducting a macroeconomic study, one can rely on theory to place a reasonable restriction on the

cointegrating vector. For the construction of trading algorithms, a unique cointegrating vector is

not as important, because any combination of cointegration vectors that leads to stationary spread

is equally good for trading. However, for the sake of book-keeping and ease of computation, it is

ideal to normalize with (1, β). With this choice of normalization, the rst variable of a pair is

normalized to one as follows:

yt − βxt, (3.4.1)

Page 89: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.4 Trading algorithm 77

where yt is the rst price series, and xt the second price series in a pair of stocks.

For my trading purposes, the cointegrated vector selected by the Engle-Granger cointegration test

is used since it always normalizes the rst variable of a pair. Only cointegrated pairs with positive

β are selected for trading. This is done to adhere to the long/short strategy proposed: the opening

of one long (short) position on the rst stock and β short (long) on the second stock. If β is

negative, then by inspection of (3.4.1), it will be the case of yt + βxt, i.e. I will have to either

long or short both stocks at the same time to create the spread for trading, which contradicts the

long/short strategy proposed.

3.4.2 Trading rules

The rst step of the trading algorithm is to estimate the cointegrating vector and the long-run

mean. A trade position is opened when there is a 0.75σ (see Figure 3.4.1 for the selection reason)

deviation from the estimated mean, where σ is the standard deviation of the spread. The wait-one-

day rule common in the literature is employed to prevent the bid-ask bounce eect. If the spread

stays within the 0.75σ to 1.50σ (−0.75σ to −1.50σ) bounds as measured from the estimated mean

for two consecutive days, a short (long) position is opened.

The trade is closed with a prot when the spread reverts to the estimated mean, and the prot-

taking rule is enforced strictly. When the spread hovers slightly above the estimated mean without

touching or crossing it, the trade will not be closed. It is closed only when the spread reverts to the

point at which it touches or crosses the estimated mean. In the case of an unfavorable situation,

i.e. a divergence instead of convergence of spread, the trade is closed to prevent further loss using

stop loss. A stop loss is a level at which the trading position is closed when the trade position

goes unfavorably. The implementation of stop loss in the study is realistic since it is often used

in practice to minimize losses. Stop loss is also employed in the literature such as Caldeira and

Guilherme (2013) and Lin et al. (2006). The stop loss is set to be at an arbitrary level of 1.50σ,

the double of the optimal threshold.

The trading rule employed above is to trade when there is a signicant deviation from the

equilibrium relationship, the estimated mean. A natural question is how to nd the optimal (in

terms of prot maximization) deviation from the mean before a trade order is placed. It turns out

that the question can be formulated as below due to Vidyamurthy (2004).

Page 90: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.4 Trading algorithm 78

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Figure 3.4.1: The objective function, ∆(1 − Φ(∆)) is used to determine the optimal threshold,where Φ(∆) is the cumulative distribution function of a Gaussian distribution. The graph showsthe value of the objective function for a range of values of ∆. The solution to the maximizationproblem, argmax

∆∆(1− Φ(∆)) = 0.75 is marked by the red dotted line.

First, the spread is standardized to make it a unitless measure, and it is further assumed that

the spread follows a Gaussian distribution, N(0, σ2). Consider a ∆ deviation above the zero

mean. The probability of such occurrence is P (X > ∆) = 1−Φ(∆), where Φ(.) is the cumulative

distribution function of a standard normal. The expected prot is ∆(1 − Φ(∆)), where Φ(∆) is

the cumulative distribution function of a Gaussian distribution. By the symmetry of the Gaussian

distribution, a similar analogy can be made for a ∆ deviation below the zero mean. Therefore, the

total expected prot in T time steps is 2T∆(1−Φ(∆)). The optimization problem turns out to be

maximizing the expression 2T∆(1− Φ(∆)) with respect to ∆. Since 2T is a scaling constant, the

actual maximizing problem can be rewritten as max∆

f(∆) = ∆(1 − Φ(∆)), and it can be shown

analytically that the optimal ∆ is 0.75σ. The graph of the objective function for a range of values

of ∆ is given by Figure 3.4.1.

The sample is divided into two periods of interest: the estimation and trading periods. Both

the estimation and trading periods are set to be three months arbitrarily. In the literature, the

lengths of estimation and trading periods are also arbitrarily set and they vary between studies,

e.g. in Caldeira and Guilherme (2013), the estimation period is one month and the trading period

is four months. In fact, there are no optimal one-size-ts-all estimation and trading period length.

The optimal length varies on a case-to-case basis and is data dependent, see the investigation

Page 91: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.4 Trading algorithm 79

in Section 3.6.

During the estimation period, the stock selection algorithm as stated in Subsection 3.4.1 is used to

select stocks for pairs trading. After the stocks are selected, the trading rule stated in Subsection

3.4.2 is implemented and trading is executed for the next three months. At the end of the trading

period, the data for the past three months are used for re-estimation for the new trading period,

and trading commences immediately. The cycle continues in a rolling window fashion until all

data are used.

3.4.3 An example of a trade

Figure 3.4.2 illustrates an example of the use of the trading algorithm. The traded pair is Cono-

coPhilips (COP) and EOG Resources Inc (EOG), both are from the energy sector and are con-

stituents of the S&P 500 listed on the New York Stock Exchange (NYSE). It is no coincidence that

the pair is identied to be cointegrated by both the Engle-Granger and Johansen cointegration

tests as they tend to move almost synchronously as depicted on Figure 3.4.2a. Once the pair is

identied, their spread is estimated, and it is given by COPt− βEOGt, the estimated β = 0.11862

for this example, as labeled on Figure 3.4.2a. The sample standard deviation of the estimated

spread, σεt is used to form the upper/lower trading signals, ±0.75σεt and the stop loss ±1.50σεt .

The estimated spread and its sample mean, µ and trading signals as well as stop loss are presented

in Figure 3.4.2b. Notice that the estimated spread is a stationary white noise process visually

arming the cointegrated relationship between the pair.

Once β, µ, trade signal levels, and stop loss levels are identied, they are xed throughout the

entire trading period, and trading commences according to the rule stated in Subsection 3.4.2.

Note that I do not de-mean and standardize the estimated spread for my trading exercise. The

trading exercise for this example is illustrated in Figure 3.4.2c. The black crosses at the bottom of

the graph indicate when trades are placed. For example, the rst trade is placed on 10/08/12. The

black cross moves upward, indicating a short position is opened anticipating a downward reversal

of the estimated mean. Note that the wait-one-day rule is used here when opening the trade.

The estimated spread rst crosses the trading signal but is below the stop loss level on 10/07/12,

and when it remains between the trade signal and stop loss levels for the next day, the trade is

placed on the second day, that is on 10/08/12. The trade is closed two days later on 10/10/12

Page 92: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.4 Trading algorithm 80

06/30/12 07/10/12 07/20/12 07/30/12 08/09/12 08/19/12 08/29/12 09/08/12 09/18/12 09/28/12

Time

52

54

56

58

60

Pric

e of

"C

OP

"

Cointegrated stocks, COP vs. EOG

= 0.1186280

90

100

110

120

Pric

e of

"E

OG

"

COPEOG

(a) Cointegrated pair selected

06/30/12 07/10/12 07/20/12 07/30/12 08/09/12 08/19/12 08/29/12 09/08/12 09/18/12 09/28/12

Time

41.5

42

42.5

43

43.5

44

44.5

45

45.5

Spr

ead

COP vs. EOG estimated spread

trade signalstop loss

(b) The estimated spread

09/28/12 10/08/12 10/18/12 10/28/12 11/07/12 11/17/12 11/27/12 12/07/12 12/17/12 12/27/12 01/06/13

Time

39

40

41

42

43

44

45

Spr

ead

COP vs. EOG trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.4.2: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is COP and EOG. The former (latter)price series is colored blue (red), and its price level is represented by the left-blue (right-red)vertical y-axis. The upper right graph illustrates the estimated spread of the cointegrated pair,COPt − βEOGt. The value of β is shown on the upper left graph as β. The light green and redlines mark the trade signal and stop loss levels, respectively. The sample mean of the estimatedspread is represented by the purple crosses. The bottom graph presents the estimated spreadover a trading period. The black crosses at the bottom of the graph indicate when trades areplaced. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 93: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 81

when the estimated spread crosses below the estimated mean. A prot is made due to the correct

anticipation of the downward reversal. All other trades over the trading period are made similarly

according to the pre-determined trading rules in Subsection 3.4.2.

3.5 Main application

3.5.1 Data

The data for the study is downloaded from a Bloomberg Terminal. 100 stocks were downloaded

for the main study, all of them were constituents of S&P 500 at the time the data was obtained.

For the classications of stocks, I adopt the Global Industry Classication Standard developed by

S&P, which classies stocks into ten dierent sectors. The ten sectors are energy, materials, indus-

trials, consumer discretionary, consumer staples, health care, nancials, information technology,

telecommunication services, and utilities. The sector S&P 100 index is composed of the top 100

stocks from each sector ranked by market capitalization. For each sector, the top 10 companies

from the sector S&P 100 index are selected, giving an equal representation of ten stocks from each

sector with a total selection of 100 stocks.

The variables of interest that were downloaded were the closing prices, and bid and ask prices.

Daily data are chosen starting from 3rd Jan 2010 and ending on 13th Nov 2013 with a total of

1946 observations for the main study. Data for small-cap stocks are obtained similarly from the

S&P 500 small-cap index and from the S&P small-cap indices of the sectors in question. The

two remaining data sets, the Turkish and Brazilian stocks, are obtained directly from XU100 and

Ibovespa100. Unfortunately, due to lack of information on classications of industries, an equal

representation of stocks from each sector is not being sampled for XU100 and Ibovespa100.

The closing prices are already adjusted for dividends and stock splits when downloaded. Stocks

with missing data are omitted from the study without replacement. Out of the 100 stocks down-

loaded from each index, the adjusted size is 91 stocks for the main study, 86 for the small-cap

study, 62 for the Turkish study, and 46 for the Brazilian study. A stock list is provided in the

appendix for the interested reader.

Page 94: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 82

3.5.2 Return computation and trading cost

The common return computation is as follows:

rpt =

∑i∈P witrit∑i∈P wit

, (3.5.1)

where wit = wit−1(1 + rit−1), and wi1 = 1, is the initial value. There are two issues with the return

computation above. If rit < 0 for several consecutive periods, then wit < 1, and similarly for

consecutive rit > 0, then wit > 1. This shows that this return computation formula underweights

negative returns and overweights positive returns, such that the imbalance in weight will lead to

a bias towards positive return. A more appropriate return formula should weight positive and

negative returns equally. The other issue is the formula relies on compounding returns because

wit = (1 + ri1)(1 + ri2) . . . (1 + rit−1) (the identity is obtained after recursive substitution of terms).

To achieve compounding returns, it requires the portfolio to be re-invested and re-balanced from

the date the trading position is opened until it is closed. This is unrealistic as trading costs will

be hefty due to the frequent re-balancing.

The return computation (3.5.2) is designed to rely directly on the nal prot and loss to avoid the

bias. Additionally, to be realistic, the compounding return formula is not used. A divisor has to

be chosen for the nal prot and loss when computing return, ideally it should be the size of the

position undertaken. The divisor is chosen to be the mean of absolute value of all opening-trade

spreads1, where the opening trade spread is dened as the spread recorded when a long or short

position for a pair is opened. My portfolio return rpt, during a trading period t, is dened as the

sum of prots and losses for all pairs in the portfolio divided by the average of opening spreads for

all pairs as follows:

rpt =total prot/loss from all trades

1Number of trades

∑Number of tradesk=1 ‖opening spreadk‖

(3.5.2)

To be consistent with early papers on pairs trading, ordinary prices are used instead of log prices

in my study. Ordinary prices are also more appropriate for my study since daily data is used,

whereas the log approximation would be more suitable if high-frequency data were employed.

1I am only interested in the magnitude of the spread hence the absolute value, since it is possible that the

spread is negative when the short value is greater than the long value.

Page 95: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 83

In terms of trading cost, I choose to use the median of historical bid-ask spreads of a stock as

a proxy for trading cost. The mean is not used because there are some outliers in the historical

bid-ask spread data. According to Engelberg et al. (2009), there are various costs incurred when

trading a stock such as short-selling cost and transaction cost. Although a complete estimation

of all the costs involved is usually dicult, it is assumed that all costs incurred are adequately

conveyed in the bid-ask spread (Huang and Stoll, 1997). Therefore, the median of the historical

bid-ask spread is sucient as a proxy for the actual total trading costs involved. From the data,

the bid-ask spread ranges from approximately $0.10 to $0.50 for stocks in the S&P 500, and the

range is higher for stocks in the S&P 500 small-cap. Trading costs are incurred for a long position

and a short position when a pair position is opened, and similarly when the pair position is closed.

For a round-trip transaction, i.e. when a trade position is opened and then closed, there will be

four costs involved. They are the two bid-ask spreads for the longed stock and the shorted stock,

respectively, when the position is opened, and similarly the two bid-ask spreads when the position

is closed.

Since pairs trading relies on short selling, it is natural to ask if short-selling constraints will

decrease pairs trading protability. D`Avolio (2002) shows that there are implicit costs called

specials (proxied by the daily loan fee) when short selling illiquid stocks. D`Avolio (2002) further

shows that short recalls can occur when there is a divergence of opinion between borrowers and

short sellers, depriving arbitrageurs of their prots. If pairs trading protability heavily depends

on trading illiquid stocks, then the prot is expected to decrease due to the implicit costs. GGR

study the robustness of pairs trading protability by trading only stocks that do not suer from

liquidity problems and nd that the change in protability is negligible. Since the stocks considered

in my study are all listed on the main stock exchanges of their respective countries, liquidity will

not pose a problem.

3.5.3 Results

The trading algorithm is implemented on S&P 500 stocks and the study is from January 2010

to November 2013 with each trading period lasting three months. The results are presented in

Table 3.1. The trading prots and returns are very volatile, but the total prots and returns are

positive. By restricting each cointegrated pair of stocks to be from the same sector, it leads to a

Page 96: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 84

Unrestricted

Period Net prot/loss Total return

1 -15.900 -0.0172 67.135 0.1543 -6.026 -0.0154 11.558 0.0245 -10.735 -0.0116 -0.808 -0.0027 0.276 0.0028 -11.628 -0.0399 -4.491 -0.01010 17.183 0.03211 1.763 0.00212 -36.472 -0.02213 -5.485 -0.01014 2.499 0.00515 -6.020 -0.018

Total 2.850 0.073

Restricted

Period Net prot/loss Total return

1 -0.074 -0.0012 51.512 0.6053 -0.473 -0.0074 0.006 0.0005 -2.670 -0.1326 -1.086 -0.0727 0.075 0.0058 -8.201 -0.0799 17.571 0.20010 -7.930 -0.09011 5.538 0.12612 -0.389 -0.01413 -0.510 -0.00314 2.667 0.06115 -1.216 -0.157

Total 54.819 0.442

Table 3.1: S&P 500 trading prots/losses and returns (accounted for transactions costs) fromJanuary 2010 to November 2013 divided into fteen trading periods. Two cases are reported.The restricted case restricts trading to pairs of stocks to come from the same sector, and theunrestricted case does not impose this restriction.

sharp increase in prot, and the return is also magnied greatly due to the fact that fewer stocks

are being picked and traded.

The procedure is repeated in the small-cap market to check if pairs trading can consistently be

protable, and the ndings are recorded in Table 3.2. As seen in Table 3.2, the trading prots and

returns are very volatile, and the total prots and returns are negative. In contrast to the previous

case, when cointegrated pairs of stocks are restricted to be from the same sector, the prot and

return worsen considerably. This shows that restricting cointegrated pair of stocks to be from

Page 97: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 85

Unrestricted

Period Net prot/loss Total return

1 12.315 0.0492 -2.595 -0.0083 0.476 0.0024 -2.114 -0.0055 -1.034 -0.0056 -21.415 -0.0597 3.068 0.0098 -12.970 -0.0219 -16.489 -0.07510 57.473 0.05311 5.230 0.00912 -9.044 -0.02813 -10.507 -0.01814 -8.417 -0.02015 4.318 0.017

Total -1.706 -0.099

Restricted

Period Net prot/loss Total return

1 -1.360 -0.0512 -1.628 -0.0203 -1.997 -0.0564 3.834 0.0785 -1.245 -0.0526 -0.582 -0.5397 -0.955 -0.0198 -0.797 -0.0089 2.381 0.05710 -1.422 -0.03611 1.070 0.01412 -2.708 -0.17813 -1.577 -0.02714 -0.656 -0.02715 -1.941 -0.196

Total -9.583 -1.061

Table 3.2: S&P small-cap 500 trading prots/losses and returns (accounted for transactions costs)from January 2010 to November 2013 divided into fteen trading periods. Two cases are reported.The restricted case restricts trading to pairs of stocks to come from the same sector, and theunrestricted case does not impose this restriction.

the same sector does not guarantee an increase in prot and return. The pair trading algorithm

incurs a signicant amount of loss and a huge negative return when implemented in equity markets

outside of the US as seen in Table 3.3. There are eight and seven-period of consecutive losses,

when the pair trading algorithm is implemented on XU100 and Ibovespa100, respectively.

I present a trading example from each of the indices considered in this study. The rst example

is from the S&P500 by the cointegrated pair of stocks, IBM and MSI as shown in Figure 3.5.1.

Both stocks are from the technology sector. The cointegrated relationship is still intact during the

Page 98: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 86

XU100 (Turkey)

Period Net prot/loss Total return

1 -0.859 -0.0202 -4.282 -0.0633 -3.723 -0.0534 -2.618 -0.0375 -0.903 -0.0206 -16.624 -0.0637 5.661 0.0648 -8.159 -0.1349 -11.153 -0.10110 -5.250 -0.12411 -0.564 -0.02912 -11.930 -0.23913 -11.255 -0.34114 -11.658 -0.86515 -2.282 -0.045

Total -85.599 -2.070

Ibovespa100 (Brazil)

Period Net prot/loss Total return

1 -11.834 -0.0582 -2.670 -0.1203 -14.720 -0.1424 -7.758 -0.0655 -5.327 -0.1306 -6.652 -0.0357 -8.197 -0.0868 0.269 0.0069 -5.712 -0.17810 -10.752 -0.10611 0.000 0.00012 -7.868 -0.04913 1.271 0.01314 -3.509 -0.05015 -3.696 -0.025

Total -87.154 -1.025

Table 3.3: XU100 and Ibovespa100 trading prots/losses and returns (accounted for transactionscosts) from January 2010 to November 2013 divided into fteen trading periods. Only the unre-stricted case is considered for both stock indices.

trading period as shown in Figure 3.5.1c. The rst trade is conducted just after 05/02/10, the

trade is closed with a signicant prot two days later due to the huge downward reversal of the

estimated spread. A second trade is conducted on 06/03/10 and is closed a day after due to a

quick downward reversal of the estimated spread. The third trade is conducted in a similar fashion

on 06/09/10, but this time with an upward reversal of mean. The nal trade is on 06/23/10, and

closes two days later with a prot. All of the four trades conducted over the trading period are

protable.

The second example is from the S&P 500 small-cap as shown in Figure 3.5.2, this example is

Page 99: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 87

dierent compared to the previous example because in Figure 3.5.2c, one can see that the cointe-

gration relationship is intact for about three quarters of the trading period before the relationship

breaks down. The rst trade on 10/16/12 is not protable due to a loss being incurred when the

estimated spread crosses the stop loss level. However, subsequent trades are protable when the

reversal pattern of the estimated spread takes place.

The third example is from the XU100 index and is depicted in Figure 3.5.3, the cointegrating

relationship is intact during the trading period as shown in Figure 3.5.3c. The trade positions,

however, are held for a longer period before closing for a prot. For example, the second trade on

01/18/12 is held for seven days before it closes when the estimated spread reverses to the estimated

mean. All three trades from this example are protable.

The fourth and nal example is from the Ibovespa100 index, and the example is illustrated in Figure

3.5.4. Compared to the previous examples, the cointegrating relationship is not as strong visually

as seen in Figure 3.5.4c. In fact, the rst and second trades on 7/12/12 and 8/29/12, respectively,

incur losses due to the closing of the trade positions when the estimated spread crosses the stop

loss level. The third and nal trade on 9/26/12 of this example is, however, protable due to the

downward reversal of the estimated spread towards the estimated mean.

3.5.4 Sub-prime crisis and risk factors regression

The period around the onset of the sub-prime nancial crisis from the end of 2007 to the start of

2008 marked a particularly challenging era for quantitative hedge funds (Khandani and Lo, 2011),

especially those that practiced statistical arbitrage strategies. Do and Fa (2010), however, report

that pairs trading performs well during periods of nancial distress. To investigate their claims, I

conduct a second trading study starting from January 2006 to December 2009, which overlaps the

sub-prime crisis.

As shown in Table 3.4, there is no evidence that the sub-prime crisis contributes to an increase

in prot. Contrary to the rst study, it is found that restricting cointegrated pair of stocks to

come from the same sector actually decreases prot. Similarly, when implemented on small-cap

stocks, losses are incurred as shown in Table 3.5. In fact, losses are larger compared to the previous

scenario, and restricting cointegrated pair of stocks to the same sector leads to a moderate loss

whereas without restriction, the loss escalates dramatically. Similar to previous examples, as

Page 100: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 88

01/02/10 01/12/10 01/22/10 02/01/10 02/11/10 02/21/10 03/03/10 03/13/10 03/23/10 04/02/10

Time

120

130

140

Pric

e of

"IB

M"

Cointegrated stocks, IBM vs. MSI

= 1.199725

30

35

Pric

e of

"M

SI"

IBMMSI

(a) The cointegrated pair selected

01/02/10 01/12/10 01/22/10 02/01/10 02/11/10 02/21/10 03/03/10 03/13/10 03/23/10 04/02/10

Time

88

89

90

91

92

93

94

95

96

97

Spr

ead

IBM vs. MSI estimated spread

trade signalstop loss

(b) The estimated spread

03/23/10 04/02/10 04/12/10 04/22/10 05/02/10 05/12/10 05/22/10 06/01/10 06/11/10 06/21/10 07/01/10

Time

86

88

90

92

94

96

98

100

Spr

ead

IBM vs. MSI trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.5.1: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is IBM and MSI. The former (latter)price series is colored blue (red), and its price level is represented by the left-blue (right-red)vertical y-axis. The upper right graph illustrates the estimated spread of the cointegrated pair,IBMt−βMSIt. The value of β is shown on the upper left graph as β. The light green and red linesmark the trade signal and stop loss levels, respectively. The sample mean of the estimated spread isrepresented by the purple crosses. The bottom graph presents the estimated spread over a tradingperiod. The black crosses at the bottom of the graph indicate when trades are placed. When theblack cross shifts upward (downward), a short (long) position is opened for the estimated spread.When the black cross moves upward or downward, it remains there until the trade is closed, andthen it shifts back to its original level. The trade is conducted using the estimated mean, tradesignal and stop loss levels carried over from the upper right graph. In all of the three graphs, thehorizontal x-axis labels the time over the period concerned in mm/dd/yy format (m: month; d:day; y: year).

Page 101: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 89

06/30/12 07/10/12 07/20/12 07/30/12 08/09/12 08/19/12 08/29/12 09/08/12 09/18/12 09/28/12

Time

43.5

44

44.5

45

45.5

46

46.5

Pric

e of

"P

RA

"

Cointegrated stocks, PRA vs. LHO

= 0.2118724

25

26

27

28

29

30

Pric

e of

"LH

O"

PRALHO

(a) The cointegrated pair selected

06/30/12 07/10/12 07/20/12 07/30/12 08/09/12 08/19/12 08/29/12 09/08/12 09/18/12 09/28/12

Time

38

38.5

39

39.5

40

40.5

Spr

ead

PRA vs. LHO estimated spread

trade signalstop loss

(b) The estimated spread

09/28/12 10/08/12 10/18/12 10/28/12 11/07/12 11/17/12 11/27/12 12/07/12 12/17/12 12/27/12 01/06/13

Time

35

36

37

38

39

40

41

42

Spr

ead

PRA vs. LHO trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.5.2: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is PRA and LHO. The former (latter)price series is colored blue (red), and its price level is represented by the left-blue (right-red)vertical y-axis. The upper right graph illustrates the estimated spread of the cointegrated pair,PRAt − βLHOt. The value of β is shown on the upper left graph as β. The light green and redlines mark the trade signal and stop loss levels, respectively. The sample mean of the estimatedspread is represented by the purple crosses. The bottom graph presents the estimated spreadover a trading period. The black crosses at the bottom of the graph indicate when trades areplaced. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 102: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 90

09/24/11 10/04/11 10/14/11 10/24/11 11/03/11 11/13/11 11/23/11 12/03/11 12/13/11 12/23/11 01/02/12

Time

5.6

5.8

6

6.2

6.4

6.6

6.8

7

7.2

Pric

e of

"G

AR

AN

"

Cointegrated stocks, GARAN vs. TSKB

= 3.05361.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

Pric

e of

"T

SK

B"

GARANTSKB

(a) The cointegrated pair selected

09/24/11 10/04/11 10/14/11 10/24/11 11/03/11 11/13/11 11/23/11 12/03/11 12/13/11 12/23/11 01/02/12

Time

2

2.2

2.4

2.6

2.8

3

3.2

3.4

Spr

ead

GARAN vs. TSKB estimated spread

trade signalstop loss

(b) The estimated spread

01/02/12 01/12/12 01/22/12 02/01/12 02/11/12 02/21/12 03/02/12 03/12/12 03/22/12 04/01/12

Time

1.6

1.8

2

2.2

2.4

2.6

2.8

3

Spr

ead

GARAN vs. TSKB trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.5.3: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is GARAN and TSKB. The former(latter) price series is colored blue (red), and its price level is represented by the left-blue (right-red) vertical y-axis. The upper right graph illustrates the estimated spread of the cointegratedpair, GARANt− βTSKBt. The value of β is shown on the upper left graph as β. The light greenand red lines mark the trade signal and stop loss levels, respectively. The sample mean of theestimated spread is represented by the purple crosses. The bottom graph presents the estimatedspread over a trading period. The black crosses at the bottom of the graph indicate when tradesare placed. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 103: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.5 Main application 91

04/01/12 04/11/12 04/21/12 05/01/12 05/11/12 05/21/12 05/31/12 06/10/12 06/20/12 06/30/12

Time

18

20

22

24

26

Pric

e of

"B

BA

S3"

Cointegrated stocks, BBAS3 vs. PETR3

= 1.281318

20

22

24

26

Pric

e of

"P

ET

R3"

BBAS3PETR3

(a) The cointegrated pair selected

04/01/12 04/11/12 04/21/12 05/01/12 05/11/12 05/21/12 05/31/12 06/10/12 06/20/12 06/30/12

Time

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

Spr

ead

BBAS3 vs. PETR3 estimated spread

trade signalstop loss

(b) The estimated spread

06/30/12 07/10/12 07/20/12 07/30/12 08/09/12 08/19/12 08/29/12 09/08/12 09/18/12 09/28/12

Time

-9

-8

-7

-6

-5

-4

-3

-2

Spr

ead

BBAS3 vs. PETR3 trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.5.4: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is BBAS3 and PETR3. The former(latter) price series is colored blue (red), and its price level is represented by the left-blue (right-red) vertical y-axis. The upper right graph illustrates the estimated spread of the cointegratedpair, BBAS3t− βPETR3t. The value of β is shown on the upper left graph as β. The light greenand red lines mark the trade signal and stop loss levels, respectively. The sample mean of theestimated spread is represented by the purple crosses. The bottom graph presents the estimatedspread over a trading period. The black crosses at the bottom of the graph indicate when tradesare placed. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 104: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 92

shown in Table 3.6, losses are incurred when the trading algorithm is implemented outside of the

US equity market. The returns are more volatile and the losses are greater than those of the

previous scenario.

To explore the returns of pairs trading risk exposures to other factors, the returns are regressed

with Fama-French three factors (Fama and French, 1993), augmented with two additional factors:

momentum and short-term reversal factors. The rst factor, momentum, is in line with that of

Jegadeesh and Titman (1993), and it accounts for return continuation. Since pairs trading is a

contrarian strategy that does not rely on momentum trading, i.e. buying (selling) stocks when

they are on the rise (fall), it is expected that the momentum factor will be negatively correlated.

Jegadeesh (1990) shows that using reversal strategies to select stocks using past data yields positive

abnormal return. Since pairs trading relies on the reversal of the estimated spread to the mean to

make a prot, it is expected that the returns will be positively correlated with the reversal factor.

The above regression is performted on the S&P 500 trading returns2 for two dierent periods: non-

sub-prime and sub-prime periods and for two dierent specications: unrestricted and restricted,

where the latter restricts cointegrated pairs to come from the same sector. Due to the small sample

size for returns, a residual bootstrap sampling is employed to obtain the regression statistics. The

regression results are presented in Table 3.7. The expected signs for momentum and the short-

term reversal factors are negative and positive, respectively, in all four panels. The regression

results for the other factors are consistent with those in the literature where the pairs trading

excess returns risk exposure to the market excess return is small, which is expected since pairs

trading is essentially a market-neutral strategy. Exposures to the other two Fama-French factors:

the dierence between small and big stocks (SMB) and the dierence between value and growth

stocks (HML) are signicant for half of the panel and their signs are mixed.

3.6 Post-study analysis

Despite having good in-sample t, the out-of-sample spreads are quite erratic. It is often the case

when the trading period is too long to be protable. On average, the cointegration relationship

2I choose to regress only on returns from trading on S&P 500 stocks since they have a good balanced mix

between positive and negative returns unlike the other markets.

Page 105: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 93

Unrestricted

Period Net prot/loss Total return

1 7.509 0.0102 0.450 0.0013 3.897 0.0114 -33.471 -0.0635 -3.912 -0.0076 10.949 0.0267 -12.764 -0.0258 2.420 0.0019 123.131 0.09710 8.093 0.01211 -6.177 -0.01212 35.357 0.06713 -28.574 -0.08914 10.856 0.02115 -10.804 -0.014

Total 106.960 0.035

Restricted

Period Net prot/loss Total return

1 4.462 0.0332 0.000 0.0003 -0.455 -0.0414 3.979 0.0545 1.379 0.0166 -5.165 -0.1077 -3.429 -0.0388 16.238 0.0099 -2.118 -0.04010 0.916 0.09011 0.000 0.00012 4.083 0.01813 -6.430 -0.03714 0.000 0.00015 -12.729 -0.110

Total 0.731 -0.154

Table 3.4: S&P 500 trading prots/losses and returns (accounted for transactions costs) fromJanuary 2006 to December 2009 (sub-prime crisis) divided into fteen trading periods. Two casesare reported. The restricted case restricts trading to pairs of stocks to come from the same sector,and the unrestricted case does not impose this restriction.

Page 106: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 94

Unrestricted

Period Net prot/loss Total return

1 -16.552 -0.0542 -9.171 -0.0423 -7.811 -0.0324 -16.291 -0.0605 3.365 0.0066 -17.102 -0.0377 -15.276 -0.0458 2.064 0.0129 0.606 0.00210 -11.115 -0.05411 -5.843 -0.03412 -8.689 -0.03913 -15.226 -0.06914 -1.162 -0.00615 -2.111 -0.010

Total -120.314 -0.463

Restricted

Period Net prot/loss Total return

1 4.525 0.0702 -2.758 -0.0203 0.787 0.0324 5.368 0.0705 -2.705 -0.1206 -0.792 -0.0217 -2.164 -0.0208 -6.088 -0.0869 -6.320 -0.04510 -4.891 -0.30211 4.152 0.10912 8.629 0.03313 -2.601 -0.04814 1.739 0.09015 -3.622 -0.124

Total -6.741 -0.383

Table 3.5: S&P small-cap 500 trading prots/losses and returns (accounted for transactions costs)from January 2006 to December 2009 (sub-prime crisis) divided into fteen trading periods. Twocases are reported. The restricted case restricts trading to pairs of stocks to come from the samesector, and the unrestricted case does not impose this restriction.

Page 107: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 95

XU100 (Turkey)

Period Net prot/loss Total return

1 -10.737 -0.1122 -11.993 -0.3673 -2.466 -0.0924 -7.299 -0.3025 -16.312 -0.0826 -10.753 -0.2647 -2.962 -0.0878 -3.108 -0.1099 -7.033 -0.09910 -3.728 -0.17911 -1.241 -0.15012 -3.377 -0.36513 -1.080 -0.10914 -13.613 -0.11715 -7.700 -0.386

Total -103.403 -2.821

Ibovespa100 (Brazil)

Period Net prot/loss Total return

1 -28.278 -0.1392 3.700 0.0563 -0.948 -0.0494 -23.273 -0.1045 -3.755 -0.0266 -12.160 -0.2777 -7.988 -0.0628 -3.505 -0.0209 -0.269 -0.00310 -3.432 -0.26711 -4.854 -0.09012 -20.312 -0.11413 0.800 0.00514 -20.192 -0.11515 -4.657 -0.042

Total -129.123 -1.248

Table 3.6: XU100 and Ibovespa100 trading prots/losses and returns (accounted for transactionscosts) from January 2006 to December 2009 (sub-prime crisis) divided into fteen trading periods.Only the unrestricted case is considered for both stock indices.

becomes obsolete two weeks after estimation. This renders a period of minimal activity where few

or no trades are executed as shown in Figure 3.6.1c. Even worse, there are cases where some trades

are triggered by false signals, which ultimately leads to losses as seen in Figure 3.6.2c, where the

spread oscillates between the trading signal and stop-loss signal, triggering many loss calls. This

suggests that re-estimation is needed more often than the three-month period set by the study.

To investigate if this is the cause of the negative trading prots and returns in the other markets,

the trading algorithm is re-implemented by reducing the trading period to two weeks. The focus

is on two strongly cointegrated pairs in the Brazilian market: ELET3 vs. ELET6 and VALE3 vs.

Page 108: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 96

04/01/12 04/11/12 04/21/12 05/01/12 05/11/12 05/21/12 05/31/12 06/10/12 06/20/12 06/30/12

Time

12

13

14

15

Pric

e of

"E

NB

R3"

Cointegrated stocks, ENBR3 vs. VIVT4

= 0.1250945

50

55

60

Pric

e of

"V

IVT

4"

ENBR3VIVT4

(a) The cointegrated pair selected

04/01/12 04/11/12 04/21/12 05/01/12 05/11/12 05/21/12 05/31/12 06/10/12 06/20/12 06/30/12

Time

6

6.2

6.4

6.6

6.8

7

7.2

Spr

ead

ENBR3 vs. VIVT4 estimated spread

trade signalstop loss

(b) The estimated spread

06/30/12 07/10/12 07/20/12 07/30/12 08/09/12 08/19/12 08/29/12 09/08/12 09/18/12 09/28/12

Time

5.5

6

6.5

7

7.5

8

Spr

ead

ENBR3 vs. VIVT4 trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.6.1: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is ENBR3 and VIVT4. The former(latter) price series is colored blue (red), and its price level is represented by the left-blue (right-red) vertical y-axis. The upper right graph illustrates the estimated spread of the cointegratedpair, ENBR3t− βV IV T4t. The value of β is shown on the upper left graph as β. The light greenand red lines mark the trade signal and stop loss levels, respectively. The sample mean of theestimated spread is represented by the purple crosses. The bottom graph presents the estimatedspread over a trading period. The black crosses at the bottom of the graph indicate when tradesare placed. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 109: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 97

06/26/11 07/06/11 07/16/11 07/26/11 08/05/11 08/15/11 08/25/11 09/04/11 09/14/11 09/24/11 10/04/11

Time

8

9

10

11

12

13

14

Pric

e of

"A

LLL3

"

Cointegrated stocks, ALLL3 vs. ELET3

= 1.048615

16

17

18

19

20

21

Pric

e of

"E

LET

3"

ALLL3ELET3

(a) The cointegrated pair selected

06/26/11 07/06/11 07/16/11 07/26/11 08/05/11 08/15/11 08/25/11 09/04/11 09/14/11 09/24/11 10/04/11

Time

-10

-9.5

-9

-8.5

-8

-7.5

-7

Spr

ead

ALLL3 vs. ELET3 estimated spread

trade signalstop loss

(b) The estimated spread

09/24/11 10/04/11 10/14/11 10/24/11 11/03/11 11/13/11 11/23/11 12/03/11 12/13/11 12/23/11 01/02/12

Time

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8

-7.5

-7

Spr

ead

ALLL3 vs. ELET3 trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.6.2: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is ALLL3 and ELET3. The former(latter) price series is colored blue (red), and its price level is represented by the left-blue (right-red) vertical y-axis. The upper right graph illustrates the estimated spread of the cointegratedpair, ALLL3t− βELET3t. The value of β is shown on the upper left graph as β. The light greenand red lines mark the trade signal and stop loss levels, respectively. The sample mean of theestimated spread is represented by the purple crosses. The bottom graph presents the estimatedspread over a trading period. The black crosses at the bottom of the graph indicate when tradesare placed. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 110: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 98

Non-sub-prime Sub-prime

Unrestricted Restricted Unrestricted Restricted

Intercept 0.289 -1.705 0.951 3.200( 0.386) ( 1.198) ( 2.396) ( 6.123)

Market -0.025 -0.013 -0.137 -0.003( 0.006) ( 0.012) ( 0.040) ( 0.061)

SMB -0.062 -0.047 -0.066 -0.523( 0.050) ( 0.087) ( 0.307) ( 0.446)

HML -0.027 0.077 0.161 -0.008( 0.081) ( 0.041) ( 0.506) ( 0.208)

Momentum 0.017 -0.003 -0.573 -0.010( 0.052) ( 0.002) ( 0.323) ( 0.011)

Reversal 0.139 0.010 0.376 0.030( 0.051) ( 0.011) ( 0.310) ( 0.057)

Table 3.7: The table reports the regression of S&P 500 returns on ve dierent risk factors, andthe standard errors are reported in the parentheses. The regression is performed on two dierenttime periods: non-sub-prime and sub-prime. For each time period, two cases are considered. Therst case (restricted) restricts pairs of stocks to be from the same sector, and for the second case(unrestricted), the pairs of stocks are not restricted to be from the same sector.

VALE5 3. The cointegrating relationship is still intact after two weeks with the same parameters

from the estimation period, but an arbitrage prot is still hard to capture with the current trading

rule as trading signals are not always triggered as seen in Figure 3.6.3c, even when the out-of-

sample spreads are better behaved. Furthermore, due to occasional false trading signals like the

one shown in Figure 3.6.4c, where the trades are opened when the estimated spread lies between

the trade signal and stop loss levels, it never crosses the estimated mean for a prot, and they

have to be closed due to the stop loss rule. After accounting for trading costs, a minor loss is, in

fact, incurred, even for those two strongly cointegrated pairs.

3These two pairs have common and preference stock relationship from the same parent company. They share

the same common stochastic trend, and therefore they should exhibit a strong cointegrating relationship.

Page 111: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 99

06/16/09 07/06/09 07/26/09 08/15/09 09/04/09 09/24/09 10/14/09

Time

26

28

30

32

Pric

e of

"E

LET

3"

Cointegrated stocks, ELET3 vs. ELET6

= 1.251224

25

26

27

Pric

e of

"E

LET

6"

ELET3ELET6

(a) The cointegrated pair selected

06/16/09 07/06/09 07/26/09 08/15/09 09/04/09 09/24/09 10/14/09

Time

-4.2

-4

-3.8

-3.6

-3.4

-3.2

-3

-2.8

-2.6

-2.4

Spr

ead

ELET3 vs. ELET6 estimated spread

trade signalstop loss

(b) The estimated spread

10/14/09 10/16/09 10/18/09 10/20/09 10/22/09 10/24/09 10/26/09 10/28/09

Time

-4.2

-4

-3.8

-3.6

-3.4

-3.2

-3

-2.8

Spr

ead

ELET3 vs. ELET6 trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.6.3: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is ELET3 and ELET6. The former(latter) price series is colored blue (red), and its price level is represented by the left-blue (right-red) vertical y-axis. The upper right graph illustrates the estimated spread of the cointegratedpair, ELET3t− βELET6t. The value of β is shown on the upper left graph as β. The light greenand red lines mark the trade signal and stop loss levels, respectively. The sample mean of theestimated spread is represented by the purple crosses. The bottom graph presents the estimatedspread over a trading period. The black crosses at the bottom of the graph indicate when tradesare placed. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 112: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 100

12/17/12 01/06/13 01/26/13 02/15/13 03/07/13 03/27/13 04/16/13 05/06/13

Time

30

35

40

45

Pric

e of

"V

ALE

3"

Cointegrated stocks, VALE3 vs. VALE5

= 0.9882930

35

40

45

Pric

e of

"V

ALE

5"

VALE3VALE5

(a) The cointegrated pair selected

12/17/12 01/06/13 01/26/13 02/15/13 03/07/13 03/27/13 04/16/13 05/06/13

Time

1.4

1.6

1.8

2

2.2

2.4

2.6

Spr

ead

VALE3 vs. VALE5 estimated spread

trade signalstop loss

(b) The estimated spread

04/24/13 04/26/13 04/28/13 04/30/13 05/02/13 05/04/13 05/06/13 05/08/13 05/10/13

Time

1.2

1.4

1.6

1.8

2

2.2

2.4

Spr

ead

VALE3 vs. VALE5 trade

trade signalstop loss

(c) The trade of the estimated spread

Figure 3.6.4: The upper left graph illustrates the daily price series between a pair of stocks identiedby the trading algorithm. In this case, the pair considered is VALE3 and VALE5. The former(latter) price series is colored blue (red), and its price level is represented by the left-blue (right-red) vertical y-axis. The upper right graph illustrates the estimated spread of the cointegratedpair, V ALE3t− βV ALE5t. The value of β is shown on the upper left graph as β. The light greenand red lines mark the trade signal and stop loss levels, respectively. The sample mean of theestimated spread is represented by the purple crosses. The bottom graph presents the estimatedspread over a trading period. The black crosses at the bottom of the graph indicate when tradesare placed. When the black cross shifts upward (downward), a short (long) position is opened forthe estimated spread. When the black cross moves upward or downward, it remains there untilthe trade is closed, and then it shifts back to its original level. The trade is conducted using theestimated mean, trade signal and stop loss levels carried over from the upper right graph. In allof the three graphs, the horizontal x-axis labels the time over the period concerned in mm/dd/yyformat (m: month; d: day; y: year).

Page 113: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.6 Post-study analysis 101

The obvious question to consider now is how to nd the optimal re-estimation and trading periods.

A investigation is conducted by considering dierent combinations of re-estimation and trading

periods and the prot levels they produce. The results are illustrated in three-dimensional plots for

the pairs ELET3 vs. ELET6 and VALE3 vs. VALE5 in Figures 3.6.5 and 3.6.6, respectively. The

optimal re-estimation and trading periods for the pair ELET is 67 and 18 days with a net trading

prot of $0.82, whereas for the pair VALE, they are 68 and 15 days, respectively, with a small net

trading prot of $0.10. In general, the optimal re-estimation and trading periods are dierent for

every pair, and one-size-ts-all re-estimation and trading periods like those considered in the study

will not work well. Furthermore, for dierent combinations of re-estimation and trading periods,

there are more losing trades than winning trades, suggesting that it is easier to lose money than

to make money using a simple pairs trading algorithm in the Brazilian market, even with strongly

cointegrated pairs. To determine the optimal re-estimation and trading periods, one can backtest

on data after a trading period ends. In reality, the optimal re-estimation and trading periods are

unknown when the trading period commences. The optimal re-estimation and trading periods are

vital for a protable trading period since as the previous examples show, it is more likely for one

to end up with a losing combination of periods than a winning one. This makes protable pairs

trading even harder.

-420

-3

-2

75

Pro

fit

15

-1

Trading period

0

70

Reestimation period

1

1065

5 60

Figure 3.6.5: The prot level is represented on the vertical axis over dierent trading and re-estimation periods on the horizontal axes for ELET3 vs. ELET6.

Page 114: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.7 Conclusion 102

-220

-1.5

-1

75

Pro

fit

15

-0.5

Trading period

0

70

Reestimation period

0.5

1065

5 60

Figure 3.6.6: The prot level is represented on the vertical axis over dierent trading and re-estimation periods on the horizontal axes for VALE3 vs. VALE5.

3.7 Conclusion

A simple pairs trading strategy using the cointegration technique for pairs selection is implemented

in this study. Although the strategy is found to be protable when implemented on large-capped

stocks in the US equity market, the prot vanishes and losses are incurred when I replicate the

strategy on small-cap stocks in the US equity market and on stocks from foreign markets. The

results from my study contrast those of Bolgün et al. (2010) and Caldeira and Guilherme (2013),

who report positive prot of the pairs trading strategy in the Turkish and Brazilian market,

respectively. Although a direct comparison cannot be made due to dierent trading periods and

trading rules, my empirical results and further analysis show that it is hard for a simple pairs

trading strategy to be consistently protable across dierent equity markets.

The results from my paper contradict the ndings by Do and Fa (2010) that pairs trading strate-

gies perform strongly during periods of prolonged turbulence. When the trading strategy is im-

plemented during the sub-prime crisis, it is found that returns are inferior compared to the period

without a nancial crisis. I analyze the risk exposures of pairs trading returns by means of re-

gression with Fama-French three factors, augmented by two additional factors that control for

momentum and mean reversion. I nd the regression results to be consistent with those reported

Page 115: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.7 Conclusion 103

in the literature, and the parameters have the expected signs.

However, much more work is still needed to learn about pairs trading as a statistical arbitrage

strategy. I conclude this paper by giving some recommendations for possible future research

topics. In this study, it is very often the case that cointegrating relationship starts to break in

two weeks, leading to minimal trades after that, or worse, losses are incurred due to false trading

signals. Bock and Mestel (2009) argue that although basic econometric time series models can

be applied to generate trading signals, the trading signals may no longer be valid when there are

changing fundamental economic reasons. In this situation, betting on eventual convergence on

spread will incur losses. They suggest pairs trading using a Markov regime-switching model with

switching mean and variance to detect structural changes in pairs. When implemented on pairs

selected from the Dow Jones index, their regime-switching rule generates positive returns.

On the other hand, Do and Fa (2010) show that a simple pairs trading strategy will not be

protable. They incorporate two additional metrics: industry homogeneity and the frequency

of historical price spread reversal. The rst metric is incorporated in my study by restricting

cointegrated pairs to be from the same industry. Mixed evidence of consistent prot improvement

is obtained. I have, however, not considered the frequency of historical price spread reversal and

would suggest such incorporation in a future study. The reversal frequency is dened as the number

of past zero crossings of the price spread (Do and Fa, 2010), and in my case, it will be measured

by the number of estimated mean crossings of the price spread based on historical data.

The prot of pairs trading is studied extensively by Engelberg et al. (2009). They nd that

protability is high when there is rm news around the date of divergence. When there is an

idiosyncratic shock to one of the rms from the chosen pair, the divergence is more likely to be

permanent contributing to low prot, but the situation is the opposite when there is an industry-

wide shock. They also nd that a strategy that commits to closing each position within 10 days

after a pair has diverged contributes to an increase in prot. A future study could consider their

ndings to rene the trading rules and signals.

Page 116: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.8 References 104

3.8 References

Avellaneda, M. and Lee, J.-H. (2010), `Statistical arbitrage in the US equities market', Quantitative

Finance 10(7), 761782.

Bianchi, R., Drew, M. and Zhu, R. (2009), Pairs trading prots in commodity futures markets, in

`Proceedings of Asian Finance Association 2009 International Conference', pp. 126.

Bock, M. and Mestel, R. (2009), A regime-switching relative value arbitrage rule, in `Operations

Research Proceedings 2008', Springer-Verlag Berlin Heidelberg, pp. 914.

Bolgün, K. E., Kurun, E. and Güven, S. (2010), `Dynamic pairs trading strategy for the companies

listed in the Istanbul stock exchange', International Review of Applied Financial Issues and

Economics 2(1), 3757.

Broussard, J. P. and Vaihekoski, M. (2012), `Protability of pairs trading strategy in an illiquid

market with multiple share classes', Journal of International Financial Markets, Institutions

and Money 22(5), 11881201.

Caldeira, J. and Guilherme, M. (2013), `Selection of a portfolio of pairs based on cointegration: A

statistical arbitrage strategy', Brazilian Review of Finance 11(1), 4980.

Cummins, M. and Bucca, A. (2012), `Quantitative spread trading on crude oil and rened products

markets', Quantitative Finance 12(12), 18571875.

Davidson, R. and MacKinnon, J. G. (2004), Econometric Theory and Methods, Oxford University

Press New York.

D`Avolio, G. (2002), `The market for borrowing stock', Journal of Financial Economics 66(2), 271

306.

Diebold, F. X., Gardeazabal, J. and Yilmaz, K. (1994), `On cointegration and exchange rate

dynamics', Journal of Finance 49(2), 727735.

Do, B., Fa, R. and Hamza, K. (2006), A new approach to modeling and estimation for pairs

trading, in `Financial Management Association European Conference Proceedings 2006', pp. 87

99.

Page 117: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.8 References 105

Do, B. and Fa, R. W. (2010), `Does simple pairs trading still work?', Financial Analysts Journal

66(4), 8395.

Dunis, C. L., Laws, J. and Evans, B. (2008), `Trading futures spread portfolios: Applications of

higher order and recurrent networks', European Journal of Finance 14(6), 503521.

Dwyer Jr., G. P. and Wallace, M. S. (1992), `Cointegration and market eciency', Journal of

International Money and Finance 11(4), 318327.

Elliott, R. J., Van Der Hoek, J. and Malcolm, W. P. (2005), `Pairs trading', Quantitative Finance

5(3), 271276.

Elman, J. L. (1990), `Finding structure in time', Cognitive Science 14(2), 179211.

Engelberg, J., Gao, P. and Jagannathan, R. (2009), An anatomy of pairs trading: The role of

idiosyncratic news, common information and liquidity, in `Third Singapore International Con-

ference on Finance 2009'.

Engle, R. F. and Granger, C. W. (1987), `Co-integration and error correction: Representation,

estimation, and testing', Econometrica 55(2), 251276.

Fama, E. F. and French, K. R. (1993), `Common risk factors in the returns on stocks and bonds',

Journal of Financial Economics 33(1), 356.

Gatev, E., Goetzmann, W. N. and Rouwenhorst, K. G. (2006), `Pairs trading: Performance of a

relative-value arbitrage rule', Review of Financial Studies 19(3), 797827.

Greco, S., Figueira, J. and Ehrgott, M. (2016), Multiple Criteria Decision Analysis, Springer, New

York.

Hamilton, J. D. (1994), Time Series Analysis, Cambridge University Press.

Huang, R. D. and Stoll, H. R. (1997), `The components of the bid-ask spread: A general approach',

Review of Financial Studies 10(4), 9951034.

Huck, N. (2009), `Pairs selection and outranking: An application to the S&P 100 index', European

Journal of Operational Research 196(2), 819825.

Page 118: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.8 References 106

Huck, N. (2010), `Pairs trading and outranking: The multi-step-ahead forecasting case', European

Journal of Operational Research 207(3), 17021716.

Huck, N. and Afawubo, K. (2015), `Pairs trading and selection methods: Is cointegration superior?',

Applied Economics 47(6), 599613.

Jegadeesh, N. (1990), `Evidence of predictable behavior of security returns', Journal of Finance

45(3), 881898.

Jegadeesh, N. and Titman, S. (1993), `Returns to buying winners and selling losers: Implications

for stock market eciency', Journal of Finance 48(1), 6591.

Johansen, S. (1991), `Estimation and hypothesis testing of cointegration vectors in Gaussian vector

autoregressive models', Econometrica 59(6), 15511580.

Johansen, S. (1995), Likelihood-based inference in cointegrated vector autoregressive models, Oxford

University Press.

Johansen, S. and Juselius, K. (1990), `Maximum likelihood estimation and inference on cointegra-

tion - with applications to the demand for money', Oxford Bulletin of Economics and Statistics

52(2), 169210.

Juselius, K. (2006), The cointegrated VAR model: Methodology and applications, Oxford University

Press.

Khandani, A. and Lo, A. W. (2011), `What happened to the quants in August 2007? Evidence

from factors and transactions data', Journal of Financial Markets 14(4), 146.

Krauss, C. (2017), `Statistical arbitrage pairs trading strategies: Review and outlook', Journal of

Economic Surveys 31(2), 513545.

Lin, Y.-X., McCrae, M. and Gulati, C. (2006), `Loss protection in pairs trading through minimum

prot bounds: A cointegration approach', Journal of Applied Mathematics and Decision Sciences

1(2006), 114.

Liu, S.-M. and Chou, C.-H. (2003), `Parities and spread trading in gold and silver markets: A

fractional cointegration analysis', Applied Financial Economics 13(12), 899911.

Page 119: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.8 References 107

Lütkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Springer.

Nelsen, R. B. (2007), An introduction to copulas, Springer.

Ng, S. and Perron, P. (2001), `Lag length selection and the construction of unit root tests with

good size and power', Econometrica 69(6), 15191554.

Perlin, M. S. (2009), `Evaluation of pairs-trading strategy at the Brazilian nancial market', Jour-

nal of Derivatives and Hedge Funds 15(2), 122136.

Perron, P. and Qu, Z. (2007), `A simple modication to improve the nite sample properties of Ng

and Perron's unit root tests', Economics Letters 94(1), 1219.

Phillips, P. C. and Perron, P. (1988), `Testing for a unit root in time series regression', Biometrika

75(2), 335346.

Rad, H., Low, R. K. Y. and Fa, R. (2016), `The protability of pairs trading strategies: Distance,

cointegration and copula methods', Quantitative Finance 16(10), 15411558.

Schizas, P., Thomakos, D. and Wang, T. (2011), Pairs trading on international ETFs. Unpublished

manuscript.

Sklar, A. (1996), Random variables, distribution functions, and copulas: A personal look backward

and forward, in L. Rüschendorf, B. Schweizer and M. D. Taylor, eds, `Distributions with Fixed

Marginals and Related Topics', Institute of Mathematical Statistics, Hayward CA, pp. 114.

Stander, Y., Marais, D. and Botha, I. (2013), `Trading strategies with copulas', Journal of Eco-

nomic and Financial Sciences 6(1), 83107.

Vidyamurthy, G. (2004), Pairs Trading: Quantitative methods and analysis, John Wiley and Sons.

Page 120: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.9 Appendix 108

3.9 Appendix

3.9.1 List of stocks

Tables 3.8 to 3.11 present the list of stocks considered (in the applications) by their ticker symbols

across four dierent stock indices.

No Tickersym-bol

1 EBAY2 AMZN3 HD4 DIS5 MCD6 FOXA7 UVV8 TWX9 LOW10 YUM11 PG12 KO13 PEP14 BTI15 ITYBY16 KMB17 MDLZ18 UL19 PRU20 WFC21 JPM22 BAC23 C24 AIG25 AXP

No Tickersym-bol

26 USB27 GS28 MET29 AAPL30 MSFT31 GOOG32 IBM33 CSCO34 ORCL35 YHOO36 INTC37 JNJ38 PFE39 MRK40 GILD41 AMGN42 BMY43 UNH44 CELG45 GE46 UTX47 BA48 MMM49 UNP50 UPS

No Tickersym-bol

51 HON52 CAT53 EMR54 DHR55 XOM56 CVX57 SLB58 COP59 OXY60 APC61 EOG62 HAL63 APA64 DOW65 DD66 FMC67 MOS68 FCX69 IP70 IFF71 MWV72 MON73 AES74 EIX75 AEE

No Tickersym-bol

76 AEP77 CNP78 CMS79 ED80 D81 DTE82 DUK83 VZ84 SBAC85 TWX86 CTL87 CCI88 FTR89 HRS90 MSI91 LVLT

Table 3.8: S&P 500

Page 121: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.9 Appendix 109

No Tickersym-bol

1 BC2 FNP3 WWW4 POOL5 LYV6 CBRL7 SHOO8 BWLD9 RYL10 HAIN11 DAR12 THS13 WDFC14 ANDE15 JJSF16 SAM17 LNCE18 SAFM19 SKT20 PRAA21 PSEC22 PRA23 LHO24 MAA25 PPS

No Tickersym-bol

26 SF27 GEO28 FEIC29 MMS30 BDC31 VSAT32 CGNX33 TYL34 AXE35 MSCC36 JCOM37 CNC38 ALGN39 QCOR40 WST41 PRXL42 VPHM43 MDCO44 HAE45 MWIV46 AOS47 ODFL48 TDY49 TTC50 ENS

No Tickersym-bol

51 ATU52 EME53 CW54 AIT55 PDCE56 CKH57 HOS58 CRZO59 SGY60 GEOS61 NR62 TTI63 POL64 FUL65 SWM66 KS67 BCPC68 TXI69 SWC70 GLT71 SJI72 EDE73 EE74 OTTR75 MGEE

No Tickersym-bol

76 CWT77 AWR78 CPK79 UTL80 USM81 USMO82 ALSK83 CNSL84 IDT85 SHEN86 GNCM

A

Table 3.9: S&P 500 small cap

Page 122: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.9 Appendix 110

No Ticker symbol

1 AEFES2 AKBNK3 AKENR4 AKSA5 ALARK6 ALGYO7 ALKIM8 ANACM9 ANHYT10 ARCLK11 ASELS12 ASUZU13 BAFGS14 BIMAS15 BRISA16 BRSAN17 CIMSA18 CLEBI19 DOAS20 DOHOL

No Ticker symbol

21 DYHOL22 ECILC23 ECZYT24 ENKAI25 FENER26 FROTO27 GARAN28 GOODY29 GSDHO30 GUBRF31 HURGZ32 IHEVA33 IPEKE34 ISCTR35 ISGYO36 IZMDC37 KARSN38 KARTN39 KONYA40 KOZAA

No Ticker symbol

41 KRDMD42 NTHOL43 NTTUR44 OTKAR45 PTOFS46 SAHOL47 SASA48 SISE49 TATKS50 TEBNK51 TEKST52 TOASO53 TRCAS54 TRKCM55 TSKB56 TTRAK57 ULKER58 VAKBN59 VESTL60 YAZIC

No Ticker symbol

61 YKBNK62 ZOREN

Table 3.10: XU100

Page 123: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic

3.9 Appendix 111

No Ticker symbol

1 ALLL32 AMBV43 BBAS34 BBDC35 BBDC46 BRAP47 BRFS38 BRKM59 BTOW310 CCRO311 CMIG412 CPFE313 CPLE614 CRUZ315 CSAN316 CSNA317 CYRE318 DASA319 ELET320 ELET6

No Ticker symbol

21 EMBR322 ENBR323 GGBR424 GOAU425 GOLL426 ITSA427 ITUB428 KLBN429 LAME430 LREN331 NATU332 OIBR433 PCAR434 PETR335 PETR436 RENT337 RSID338 SBSP339 SUZB540 TIMP3

No Ticker symbol

41 TRPL442 UGPA343 USIM544 VALE345 VALE546 VIVT4

Table 3.11: Ibovespa100

Page 124: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic
Page 125: GARCH(1, 1) at small sample size and pairs trading with cointegration · 2019. 1. 3. · mance of GARCH(1, 1) is superior when compared to other GARCH-type models. The asymptotic