on modeling the volatility in speculative prices · on modeling the volatility in speculative...

On Modeling the Volatility in Speculative Prices

Zhijie Hou

Dissertation submitted to the Faculty of the

Virginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in

Economics, Science

Aris Spanos, Chair

Richard Ashley

Kwok Ping Tsang

Wen You

May 1, 2014

Blacksburg, Virginia

Keywords: Volatility Modeling, Student’s t Distribution, Heterogeneity, Probabilistic Reduction

Approach, Statistical Adequacy

Copyright 2014, Zhijie Hou

On Modeling the Volatility in Speculative Prices

Zhijie Hou

(ABSTRACT)

Following the Probabilistic Reduction(PR) Approach, this paper proposes the Student’s Autore-

gressive (St-AR) Model, Student’s t Vector Autoregressive (St-VAR) Model and their heteroge-

neous versions, as an alternative to the various ARCH type models, to capture univariate and

multivariate volatility. The St-AR and St-VAR models differ from the latter volatility models

because they give rise to internally consistent statistical models that do not rely on ad-hoc spec-

ification and parameter restrictions, but model the conditional mean and conditional variance

jointly.

The univariate modeling is illustrated using the Real Effect Exchange Rate(REER) indices of

three mainstream currencies in Asia (RMB, Hong Kong Dollar and Taiwan Dollar), while the mul-

tivariate volatility modeling is applied to investigate the relationship between the REER indices

and stock price indices in mainland China, as well as the relationship between the stock prices

in mainland China and Hong Kong. Following the PR methodology, the information gained in

Mis-Specification(M-S) testing leads to respecification strategies from the original Normal-(V)AR

models to the St-(V)AR models. The results from formal Mis-Specification (M-S) tests and fore-

casting performance indicate that the St-(V)AR models provide a more appropriate way to model

volatility for certain types of speculative price data.

Acknowledgements

I would like to thank my advisor, Dr. Aris Spanos, for his dedicated mentioning. Dr. Spanos

introduced me to the field of Econometrics and Empirical Modeling. While working with Dr.

Spanos, I have learned far more than any book or class could teach, and his patient and valuable

guidance has greatly helped me to complete my degree.

I would like to thank Dr. Richard Ashley, Dr. Kwok Ping Tsang, and Dr. Wen You for serving

as my graduate advisory committee members. I have learned a lot from the courses taught by

these professors, which helped me to set up the foundation of my academic ability.

I would also like to thank all the other faculties, staffs and my colleagues in the Department of

Economics, for their warm help during my years in the department.

I would like to thank my parents, Jianxin and Mei, my wife Anqi, the rest of my family, and all

of my friends. They have always been there for me. It is impossible for me to finish my education

without their unconditional support and encouragement.

iii

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 A Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Perspectives on Volatility Modeling 7

2.1 Probabilistic Features of the Speculative Prices Data . . . . . . . . . . . . . . . . . 7

2.1.1 Non-Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Time Trends & Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 ARCH-type Volatility Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 The ARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 The GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.3 The IGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.4 The Student’s t GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.5 The GARCH-in-Mean Model . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.6 The EGARCH & TGARCH Model . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.7 Multivariate GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.8 Summary of The ARCH-type Volatility Models . . . . . . . . . . . . . . . . 17

2.3 Probabilistic Reduction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.2 Misspecification Testing and Respecification . . . . . . . . . . . . . . . . . . 22

2.3.3 Generalized Procedure of PR Approach . . . . . . . . . . . . . . . . . . . . 23

2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 Implicit Restrictions for The ARCH Models . . . . . . . . . . . . . . . . . . 25

iv

CONTENTS v

2.4.2 Implicit Restrictions for The Multivariate ARCH Models . . . . . . . . . . 27

3 Student’s t Family of Univariate Volatility Models 31

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Student’s t Family Univariate Volatility Models . . . . . . . . . . . . . . . . . . . . 33

3.2.1 St-AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.2 Heterogeneous St-AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Empirical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.2 RMB Real Effective Exchange Rate (REER) Index . . . . . . . . . . . . . . 47

3.3.3 HKD Real Effective Exchange Rate (REER) Index . . . . . . . . . . . . . . 52

3.3.4 TWD Real Effective Exchange Rate (REER) Index . . . . . . . . . . . . . . 57

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.5.1 Proof of Proposition 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.5.2 Derivation of the Maximum Likelihood Function . . . . . . . . . . . . . . . 65

4 Student’s t Family of Multivariate Volatility Models 66

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2 Student’s t Multivariate Volatility Models . . . . . . . . . . . . . . . . . . . . . . . 68

4.2.1 St-VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2.2 Heterogeneous St-VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3 Empirical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3.2 RMB REER Index & Shanghai Stock Exchange Index . . . . . . . . . . . . 80

4.3.3 Shanghai Stock Exchange Index vs. Hang Seng Index . . . . . . . . . . . . 85

4.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.4.1 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.4.2 Derivation of the Maximum Likelihood Function . . . . . . . . . . . . . . . 99

5 Conclusion 100

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2 Discussion and Future Prospect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

List of Figures

3.1 Time Plot of The RMB Real Effective Exchange Rate Index . . . . . . . . . . . . . 47

3.2 Fitted Values of RMB REER Index . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Adjusted Fitted Values of RMB REER Index . . . . . . . . . . . . . . . . . . . . . 51

3.4 Fitted Conditional Variance of RMB REER Index . . . . . . . . . . . . . . . . . . 52

3.5 The HKD Real Effective Exchange Rate Index . . . . . . . . . . . . . . . . . . . . 52

3.6 Fitted Values of HKD REER Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.7 Adjusted Fitted Values of HKD REER Index . . . . . . . . . . . . . . . . . . . . . 56

3.8 Fitted Conditional Variance of HKD REER Index . . . . . . . . . . . . . . . . . . 56

3.9 The TWD Real Effective Exchange Rate Index . . . . . . . . . . . . . . . . . . . . 57

3.10 Fitted Values of TWD REER Index . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.11 Adjusted Fitted Values of TWD REER Index . . . . . . . . . . . . . . . . . . . . . 61

3.12 Fitted Conditional Variance of TWD REER Index . . . . . . . . . . . . . . . . . . 61

4.1 Time Plot of RMB REER Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.2 Time Plot of SSE Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3 Time Plot of Hang Seng Index returns . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 Time Plot of SSE Index returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

vi

List of Tables

3.1 Student’s t Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Heterogeneous(Linear) Student’s t Autoregressive Model . . . . . . . . . . . . . . . 41

3.3 Heterogeneous(Quadratic) Student’s t Autoregressive Model . . . . . . . . . . . . . 43

3.4 M-S Tests for Univariate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.5 M-S Tests and Respecification of AR(3) Model for RMB REER . . . . . . . . . . . 48

3.6 Estimation and MS Tests Results of 3rd order H-St-AR(3,3;4) for RMB . . . . . . 49

3.7 M-S Tests and Respecification of AR(2) Model for HKD REER . . . . . . . . . . . 53

3.8 Estimation and MS Tests Results of Linear St-AR(3,3;4) model for HKD . . . . . 54

3.9 M-S Tests and Respecification of AR(2) Model for TWD REER Index . . . . . . . 58

3.10 Estimation and MS Tests Results of St-AR(3,3;5) for TWD REER Index . . . . . 59

4.1 Student’s t Vector Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 Heterogeneous (Linear) Student’s t Vector Autoregressive Model . . . . . . . . . . 75

4.3 Heterogeneous (Quodratic) Student’s t Vector Autoregressive Model . . . . . . . . 77

4.4 M-S Tests for Multivariate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.5 M-S Tests and Respecification of VAR(2) Model for RMB REER vs SSE . . . . . . 81

4.6 Estimation and MS Tests of 3rd order H-StVAR(3,3;5) for RMB REER vs SSE . . 83

4.7 Estimation of VAR(2) Models: Hang Seng Index vs SSE Index . . . . . . . . . . . 88

4.8 M-S Tests and Respecification of the VAR(2) Model: Period 1 . . . . . . . . . . . . 89

4.9 Estimation and MS Tests of StVAR(2,2;5):Period 1 . . . . . . . . . . . . . . . . . . 90


4.11 Estimation and MS Tests of 2nd order H-StVAR(3,3;5): Period 2 . . . . . . . . . . 92


4.13 Estimation and MS Tests of 2nd order H-StVAR(3,3;5): Period 3 . . . . . . . . . . 95

vii

Chapter 1

Introduction

There is an enormous literature on modeling the time-varying characteristics of financial time

series. In general, most of the main features of financial time series data can be characterized by the

first two order moments, the mean and the variance, which are closely related to important topics

including volatility clustering, time trends, seasonal patterns, underlying distributions, and etc. In

order to secure the reliability of estimation and forecasting, the modeler should pay considerable

attention to the chance regularity patterns exhibited by the particular data 1.

The Probabilistic Reduction (PR) approach and statistical adequacy provide the cornerstone

of the results in this dissertation. Following the PR approach proposed by Spanos (1986), this

paper proposes a new empirical framework for modeling the volatility of speculative price data.

Within the context of this framework, this paper discusses a coherent methodological procedure

for choosing an appropriate and parsimonious model that is capable of capturing the dynamic

structure of the process underlying the data. The key to success is to use statistical adequacy

as the criterion to evaluate and select the specification. In particular, the proposed Student’s t

family models accounts for the chance regularities in the data of interest more adequately than the

traditional models.

First, it is well documented that the underlying distribution of many financial time series is

not Normal. Leptokurticity (fat tail) and the existence of heteroscedastic volatility in speculative

prices lead to the use of Student’s t distribution as a more appropriate distribution than the Normal

distribution. This change gives rise to the Student’s t family models which not only capture the

non-Normal kurtosis, but also removes the awkwardness of retaining Normality while allowing for

1See Spanos (1999) for more details about the definition of chance regularity.

1

2 CHAPTER 1. INTRODUCTION

heteroscedasticity. Moreover, the specification of Student’s models are directly derived from the

joint distribution of all relevant variables, without any ad-hoc restrictions.

Second, since the statistical models in this dissertation are specified by imposing several impor-

tant reduction assumptions on the joint distribution of relevant variables, they enjoy two important

advantages over the traditionally used ARCH type models: (1) The first two order conditional mo-

ments are modeled jointly, therefore their interrelationship are no longer ignored. (2) The reduction

assumptions greatly reduce the number of parameters to estimate while the functional form of con-

ditional distribution also avoids unnecessary parameter restrictions to guarantee the positivity of

conditional variance and the existence of higher order moments.

The volatility models developed in this dissertation can be grouped into two categories: uni-

variate volatility models and multivariate volatility models, where the former one assumes that the

feature of one series depends on its own past information while the latter one includes the interre-

lationship among different series. Each scenario is illustrated using several examples of the detailed

modeling and estimation process with different reduction assumptions. In particular, I apply the

Student’s t family models 2 to a number of real world speculative price datasets, and show how to

choose an appropriate model using an iterative procedures of specification, Mis-Specification (M-S)

testing and respecification. The technical innovation of this paper is to propose a tractable method

for capturing and explaining heteroscedasticity stemming from past information and introducing

new types of heterogeneity in the conditional mean.

1.1 Background

Over the last three decades, financial economists have been very concerned about the modeling

of volatility in speculative price data. This topic has an enormous literature, and its empirical

applications have spread over many areas of financial economics both in theory and practice. Good

volatility forecasts can improve financial risk management, which basically boils down to forecasting

the risk of holding complicated portfolios at various horizons. Volatility modeling is crucial for the

valuation of options and of portfolios containing options as well as for the success of many trading

associated with strategies options. Volatility is also playing an important role in the field of asset

pricing since it is considered as a measure of risk, and investors want a premium for investing in

risky assets. Banks and other financial institutions apply so-called Value-at-Risk (VaR) models to

2In practice, parametrization in such model could be complex. To maintain tractability, I simplify the problemby assuming that the multivariate Student’s t distribution used in this paper has marginal with same degrees offreedom.

1.1. BACKGROUND 3

assess their risks. Modeling and forecasting volatility or, in other words, the covariance structure

of asset returns, is therefore very important. Furthermore, pertinent volatility modeling in the

context of time series can improve the precision in parameter estimation and the reliability of

forecasting. Apart from these examples, volatility modeling has many other financial applications

and provides valuable information that can be used in asset allocation, trading strategy, policy

making, and generating financial instruments.

In finance and macroeconomics, volatility refers to variation of time series such as growth

returns, stock returns and exchange rate returns over time. The most popular measure of this type

of volatility is the standard deviation of the time series. When measured by standard deviation,

however, volatility measures only the degree of dispersion, without the direction of changes. This

is because when calculating standard deviation (or variance), all differences are squared, so that

negative and positive differences are removed. In traditional time series analysis, volatility is

commonly modeled as a conditional variance process within an ARCH type framework.

Estimating and forecasting of the behavior of volatility are important components of analyzing

the future behavior of financial indicators, and therefore deserve careful studies. Convoluted

interrelationship between financial indicators and complicated disturbances exhibited in the time

series that may conceal the real properties of interest make it never easy to predict volatility. One

of the most important features of volatility of financial data is that it is not directly observable,

but highly predictable. Extensive attempts lead to various model-based quantitative analysis that

assist in investigating volatility process and establishing the relationship between current values of

the financial indicators and their future expected values.

Despite numerous theoretical and empirical difficulties, several stylized features of speculative

price data provide valuable information for formalizing financial volatility and proposing econo-

metric techniques for volatility estimating and forecasting.

1. It is widely accepted that the volatility of price of financial instruments evolves over time.

Although the functional forms of the conditional variance are various, it is in agreement with

numerous empirical findings that the volatility is time varying, and does not tend to any

limits, even though the sample size is enormous in econometric sense. This phenomenon is

often referred to as conditional heteroskedasticity.

2. One consequence of volatility movements is the tendency of clustering and considerable sec-

ond order autocorrelation. That is, volatility may be high for certain periods and low for

other periods. Volatility clustering is reflected in positive significant autocorrelations of


squared returns, which show a slow decay toward zero.

3. There is extensive empirical evidence suggesting that volatility reacts asymmetrically to a

big price increase or a big price drop in the past. In particular, increases in volatility are

larger when previous returns are negative than when they have the same magnitude but are

positive. This phenomenon is largely referred to as the leverage effect.

4. The distribution of returns has higher kurtosis than the Normal, indicating that extreme

returns have higher probability than expected under a Normal distribution.

Apart from these characteristics that are commonly observed in many economic and financial

time series, there are additional important assumptions often imposed in volatility analysis. While

financial data are usually observed only at discrete times, volatility process often evolves over time

in a continuous manner, that is, volatility jumps are rare. Besides, volatility does not diverge to

infinity, that is, volatility varies within some fixed range. Statistically speaking, this means that

volatility is often stationary. Although these assumptions seem strict, they are reasonable and

lead to significant simplification in volatility modeling. Relaxation of these assumptions motivates

plenty of further discussions.

These statistical properties play a crucial role in the development of volatility models. Some

volatility models were specifically proposed to correct the weaknesses of the existing ones for their

inability to capture the characteristics mentioned earlier. For example, the failure of traditional

homoskedastic linear time-series models have led to the use of heteroskedastic conditional vari-

ance formulations. The long-memory behavior motivates the extension from the ARCH model to

the GARCH model. The EGARCH model is developed to capture the asymmetry in volatility

clustering induced by big positive and negative returns. And non-linear dependence and thick-

tails exhibits in returns data give rise to the application of non-normal distributions in modeling

volatility.

In the last three decades, most studies have modeled volatility by the conditional variance

estimated using the Autoregressive Conditional Heteroscedasticity (ARCH) type models. The

vast quantity of research of the competing volatility models is motivated by the introduction and

development of the ARCH type models. Since the original ARCH model was first proposed by

Engle (1982), the AHCH type models have been largely extended and proven quite successful to

capture some statistical features exhibited by the data.

In the ARCH(p) model, the conditional variance of current error term or innovation is expressed

1.1. BACKGROUND 5

as a p-th order weighted average of previous time periods’ squared error terms. With this idea, the

ARCH model gained remarkable success to replicate the tendency of financial time series to move

between high volatility and low volatility periods (volatility clustering). Following the publication

of the ARCH model, an enormous body of modifications and refinements has been focused on

extending and generalizing the ARCH model, mainly by different kinds of alternative functional

forms for the conditional variance. The most important modification is the generalized ARCH

(GARCH) model suggested by Bollerslev (1986), who is a graduate student of Engle’s. Compared

to the ARCH model, Bollerslev’s GARCH model allows for a more parsimonious parametrization of

the dynamic volatility modeling, in a way that is similar to the generalization of the autoregressive

(AR) process to the autoregressive moving average (ARMA) process. Another remarkable modifi-

cation is Nelson’s (1991) exponential GARCH (EGARCH), which is motivated by the limitation of

both the ARCH and the GARCH models that they only use information on the magnitude of the

returns while completely ignoring information on the sign of the returns. To remedy this problem,

Nelson (1991) introduced the Exponential GARCH (EGARCH) model, which was the first one in

the family of asymmetric GARCH models. Some other influential models include the GJR model

of Glosten et al. (1993), the asymmetric power GARCH (APGARCH) of Ding et al. (1993), Engle

and Lee’s (1999) component GARCH (CGARCH) and etc. However, these models are just a small

section from the universe of existing ARCH specifications. In a recent review article, Degiannakis

and Xekalaki (2004) present more than thirty variants of the original ARCH model.

Apart from the development of new specifications which are designed to capture more and more

statistical features of the observed data, there has been tremendous development of theoretical

results regarding the statistical properties of the most popular ARCH type models. Nelson (1990)

and Bougerol and Picard (1992) establish conditions for the stationarity and ergodicity of the

GARCH process. Lee and Hansen (1994) as well as Lumsdaine (1996) prove the consistency

and asymptotic normality of the quasi-maximum likelihood estimator for the GARCH(1, 1). Ling

and McAleer (2002) derive the conditions for the existence of moments in the GARCH(p, q). A

summary of recent theoretical results on GARCH models can be found in Li et al. (2002).

Another major development has been the extension to the family of multivariate GARCH mod-

els. Many of the univariate ARCH type models have their multivariate extensions. The advantage

of the multivariate framework is that it can model temporal dependencies in the conditional co-

variances as well as the conditional variances. Although the multivariate models are viewed as

providing a better description of reality, unfortunately they suffer from some problems of practical


importance, such as over-parametrization and intractability. For a review article on multivariate

GARCH models, see Bauwens et al(2006).

As will be discussed in Chapter 2, the ARCH family models commonly plagued by several

problems that are difficult to address. In order to overcome the inadequacy of ARCH-type volatility

models, this dissertation uses an alternative approach, the Probabilistic Reduction (Spanos 1986),

which views statistical models as parameterizations of the joint distribution of the observable

random variables underlying the data chosen. The distinctive feature of the PR approach is that

the statistical modeling depends on a set of probabilistic assumptions relating to the observable

random variables, which can be classified into three broad categories: Distribution, Dependence,

and Heterogeneity. A key step in the modeling process is to detect the chance regularity patterns

exhibits in the data and choose the appropriate reduction assumptions. The PR approach has

advantages in empirical modeling as it lets the statistical features of data to play an important

role in specifying the model, which makes it capable of bridging the gap between theory and data

of interest.

1.2 A Brief Overview

In chapter 2, I provide the important developments in volatility modeling. The advantages

and weaknesses of the ARCH type volatility models are discussed first, then the PR approach is

introduced in terms of how it overcome the limitations of the ARCH type models. In chapter 3, the

Student’s t models in the univariate cases are discussed. Chapter 4 extends the Student’s t model

into the multivariate context. For both the univariate and the multivariate cases, heterogeneous

versions are proposed and their applications are illustrated using empirical examples. Chapter 5

makes the conclusion and the final remarks about the future research.

Chapter 2

Perspectives on Volatility

Modeling

2.1 Probabilistic Features of the Speculative Prices Data

The developments in volatility modeling are motivated by the fact that financial time series

often exhibit chance regularities that can be related to their conditional variance over time, and

these regularities have important applications in financial economics. Among others, the ARCH

type models is the most widely used statistical technique in volatility modeling, being the leading

framework for volatility modeling in current literature. However, these models invariably suffer

from limitations which stimulated the search for new models. In this section, I briefly discuss

several important probabilistic features of speculative prices. Bringing out these features can lead

to a better understanding of the models.

2.1.1 Non-Normality

Although financial time series data often perform some properties close to those of the Normal

distribution, such as bell-shaped symmetry, it is widely accepted that financial returns contain

non-normal properties. They often exhibit thick tails, indicating that they have higher excess

kurtosis (the fourth moment) than if they were Normal. The data seems to be leptokurtic with

a large concentration of observations around the mean and have more outliers relative to the

Normal distribution, and extreme returns have higher probability than expected under a Normal

7

8 CHAPTER 2. PERSPECTIVES ON VOLATILITY MODELING

distribution. Violation to the assumption of Normality has important implications relating the

first two moments of a time series. Particularly, Normality implies linearity in the mean and

homoscedasticity in the variance, while non-normal distribution could lead to nonlinear mean or

heteroscedastic variance or both.1 As discussed in the following chapters, the results in a series

of misspecification tests suggest that the assumption of Normality is severely violated in several

speculative prices data. In order to avoid the awkwardness of retention of Normality, an alternative

distributional assumption is required.

In this work, the Normal distribution is replaced by the Student’s t distribution. The first

reason is that Student’s t distribution can capture some important characteristics of financial and

macroeconomics time series, such as fatter tails and second order dependence. Second, when the

joint distribution is assumed to be multivariate Student’s t distribution, we can obtain a specifica-

tion of conditional distribution that well captures heteroscedasticity exhibited in the time series.

Furthermore, it can be shown that all the parameters in the conditional mean and conditional

variance equations can be written as functions of the parameters in the joint distribution. There-

fore, the violation of Normality, which is troublesome in many studies, now works as important

information helping us to specify a plausible model that describes and explains the data structure.

2.1.2 Heteroskedasticity

One of the most important empirical observations of speculative prices is that their volatility

is not constant over time. The variance of time series does not tend to any limit, even though the

sample size is enormous. Besides, their second order moments are often positively autocorrelated.

That is, if an asset price made a big move yesterday, it is more likely to make a big move today. That

indicates the existence of volatility clustering, or in other words, second order dependence. The

most commonly used method to capture such phenomenon in current literature are ARCH family

models, where ARCH stands for for Autoregressive Conditional Heteroscedasticity. The “AR”

comes from the fact that ARCH type models are autoregressive models in conditional variance

equation, the “Conditional” means next period’s volatility is conditional on information set in

previous periods, and the “Heteroscedasticity” means volatility is varying over time. As will be

discussed in the next section, although the original and ARCH model and its numerous variants

are quite successful in capturing volatility clustering exhibited by time series in many empirical

1Spanos(1995) shows that if the assumptions of (a) linearity and (b) Homoscedasticity, are supplemented withthe assumption l of (c) linearity of the reverse regression, assumptions (a)-(c) are tantamount to assuming jointnormality of the regressors and regressions, not just conditional normality.

2.1. PROBABILISTIC FEATURES OF THE SPECULATIVE PRICES DATA 9

applications, these models have some severe problems need to be addressed.

Spanos(1994) proposes a new approach to model heteroscedasticity which enables the model-

er to utilize information conveyed by data plots in making informed decisions on the form and

structure of heteroscedasticity. Following this work, several extended researches have been made.

For example, Spanos(2002) develops the Student’s t autoregressive (St-AR) model with dynamic

heteroscedasticity, as an alternative to the symmetric Stable family and the ARCH-type models,

for modeling speculative prices. Heracleous and Spanos (2006) proposes the Student’s t Dynamic

Linear Regression (St-DLR) model which differs from traditional ARCH type model in that it can

incorporate exogenous variables in the conditional variance in a natural way. In all the works men-

tioned above, the authors provide empirical evidence that the Student’s t family models that follow

the Probabilistic Reduction (PR) methodology dominate the alternative ARCH-type formulations

on statistical adequacy grounds.

2.1.3 Time Trends & Seasonality

Many time series have time trends or cyclic patterns, in mean, or variance, or both. Dominating

trends or seasonality could conceal the underlying movement of the time series under study. There

are many empirical approaches to detrend and deseasonalize a time series. This work provides an-

other way to deal with these important issues, the Heterogenous-Student’s t models. In particular,

time-heterogeneity and seasonal-heterogeneity can be easily introduced into the specification of the

original Student’s t models. Like most other econometric models for time series, the parameters

are assumed to be constant over time in the H-St-AR and H-St-VAR models. In these models,

heterogeneity is captured by assuming the mean vector of joint distribution as a function of time

index or seasonal dummies. There are two major advantages for Heterogeneous-Student’s t mod-

els. First, in these models, it would be unnecessary to remove the trend or seasonality beforehand,

so the chances that important information is eliminated by such procedure significantly decreases.

Second, Heterogeneous-Student’s t model can be used to detect potential relationship between

time trend or seasonal patterns exhibited in the mean with heteroscedasticity in the variance. It

can be shown that in a Heterogenous-Student’s t model, the parameters in the unconditional mean

equation can enter the specifications of conditional mean and conditional variance.


2.2 ARCH-type Volatility Modeling

The autoregressive conditional heteroscedasticity (ARCH) model provides the first systematic

framework to capture conditional heteroscedasticity. In his Nobel lecture, Engle (2004) describes

the genesis of the ARCH model. When practitioners implement various financial strategies, esti-

mates of the volatility are often required, which is commonly known as time varying. An approach

called historical volatility, which is estimated by the sample standard deviation of returns over a

short period, was often and remains, largely used. Although it is simple and gives us the first

hand look of dynamic volatility in the past, the right period to estimate is difficult to determine.

Besides, it is the volatility over a future period that should be considered the risk, hence a forecast

of volatility is needed as well as a measure for today. A theory of dynamic volatilities is therefore

needed. This is the role that is played by the ARCH model and variation/extensions thereof.

The original idea was to find a model that could assess the validity of the conjecture of Friedman

(1977) that the unpredictability of inflation was a primary cause of business cycles. It is really

the uncertainty due to this unpredictability, rather than the level of inflation, that would influence

the investors’ behaviors. Engle and his colleague Granger, who shares the Nobel prize with him,

find that squared residuals often were autocorrelated even though the residuals themselves were

not. Finally, Engle provide the ARCH model to explain this fact which is frequently observed in

economic data. Obviously, the ARCH model basically boils down to be a generalization of the

sample variance. Specifically, instead of using short or long sample standard deviations, the ARCH

model forecasts the conditional variance in terms of the weighted averages of the past squared

forecast errors. The weighting parameters could be estimated based on maximum likelihood using

historical data. Once the weights are determined, this dynamic volatility at any time could be

measured and forecasted. Engle proposes a regression model with errors following an ARCH process

to parameterize conditional heteroscedasticity in a wage-price equation for the United Kingdom.

2.2.1 The ARCH Model

The basic ideas of ARCH models are that (a) the shock of an return is serially uncorrelated, but

dependent, and (b) the dependence of the shock can be described by a simple quadratic function

of its lagged values 2. Specifically, an ARCH(p) model can be specified in terms of its first two

2See Tsay(2005) for more details.

2.2. ARCH-TYPE VOLATILITY MODELING 11

conditional moments. The mean equation is given by:

yt = Xtβ + ut (2.1)

(ut|Ft−1) ∼ N(0, h2t ) (2.2)

where Ft−1 represents the past information of the dependent variable. The ARCH conditional

variance takes the form:

h2t = α0 +

p∑i=1

αiu2t−i (2.3)

From the structure of the model, it is seen that if the past squared shocks at period t, ut, is

large, then the forecast conditional variance for next period h2t+1 will be large. This means that,

under the ARCH framework, large shocks tend to be followed by large shock, while small shocks

tend to be followed by small shocks. This feature is similar to the volatility clusterings observed

in time series returns. However, it is clear that the conditional variance depends only on the

magnitude the lagged residuals without the signs. In other word, under an ARCH models, positive

and negative shocks of same modulus are assumed to have the same effect on volatility, which is

highly unrealistic. We will come back to this issue later.

It is worth nothing that the specification of ARCH volatility in Equation 2.3 requires some

parameter restrictions, which are in some sense quite unappetizing. First of all, it is required that

α0 > 0 and αi ≥ for i > 0 to ensure the positiveness of the conditional variance. Second, αi < 1

for i > 0, for stationarity. Third, in some applications, it is important to guarantee the existence

of higher moments of ut, and therefore αi must satisfy some additional constraints. For example,

let’s assume the volatility of an asset return can be described by an ARCH(1) process. To study

its tail behavior, we require the fourth moment of ut is finite. It is well known that if ut follows

Normal distribution, then the fourth-order moment of ut is

E(u2t ) =3α2

0(1 + α1)

(1− α1)(1− 3α21)

(2.4)

Equivalently, if ut fourth-order stationary, α1 must satisty the further condition that α1 <13 .

In sum, α21 for an ARCH(1) model must be in the interval [0, 13 ). Unfortunately, the constraint

becomes increasingly complicated for higher order ARCH models. In practice, the parameter

restrictions often hurt the reliability of ARCH model because they are generally non-testable.


2.2.2 The GARCH Model

Although the representation of ARCH model is simple, it was found that a relatively long

lag structure in the conditional variance equation is often called for to capture the long memory

present in the volatility process of financial time series. Consequently, the parameter restrictions

imposed to avoid negative variance become complicated rather rapidly, not to mention the fixed

lag structures for ensuring the existence of higher moments.

To circumvent these problems, Bollerslev (1986) proposes a generalization of the ARCH(p)

process, which allows for both long memory as well as a more flexible lag structure. As Bollerslev

(1986) described himself, “the extension of the ARCH process to the GARCH process bears much

resemblance to the extension of the standard time series AR process to the general ARMA process,

permits a more parsimonious description in many situations.” In particular, the GARCH model

includes past conditional variances in the current conditional variance equation. The form of

the conditional mean is same with that of the ARCH model, while GARCH formulation of the

conditional variance can be written as follows:

h2t = α0 +

p∑i=1

αiu2t−i +

q∑j=1

βjh2t−j (2.5)

The αi and βj are referred to as ARCH and GARCH parameters, respectively. The strengths

and weaknesses of GARCH models are similar with those of ARCH models. According to the

conditional variance equation, a large ut−1 or ht−1 gives rise to a large ht. Hence like an ARCH

model, a GARCH model can generate the well-known behavior of volatility clustering in financial

time series. However, the GARCH models also ignore the sign of the shocks and treat positive

and negative shocks symmetrically. Besides, although the GARCH models are more parsimonious

than ARCH models, complicated parameter restrictions cannot be avoided. Clearly, a sufficient

condition for the positivity of conditional variance is α0 > 0, αi ≥ 0 for i = 1, ..., p, βj ≥ 0 for j =

1, ..., q. Nelson and Cao (1992) provide the necessary and sufficient conditions for positivity of the

conditional variance in higher-order GARCH models, which are much mode complicated. Besides,

it is well-known that the GARCH(p, q) process is weakly stationary if and only if∑max(p,q)i=1 (αi +

βi) < 1. Bollerslev (1986) shows that for a GARCH(1, 1) model, if 3α21 + 2α1β1 + β2

1 < 1, the

fourth-order moment exists. Conditions for stationarity and the existence of moments of a GARCH

processes receive a lot of attention in the literature, for instance, Ling and McAleer (2002). In

general, for higher order GARCH processes, these conditionals cannot be easily extended and


become unrealistic to verify.

2.2.3 The IGARCH Model

An important case of the GARCH model is the Integrated GARCH (IGARCH), which is

proposed by Engle and Bollerslev (1986). It often occurs in connection with applications, that the

estimated sum of the parameters turns out to be very close to unity. This phenomenon provides an

empirical motivation, for the development of the integrated GARCH(p, q) or IGARCH(p, q) model.

Engle and Bollerslev(1986) first extend a standard GARCH(1, 1) model to an IGARCH(1, 1) model

by imposing the restriction that α1 + β1 = 1. Nelson (1990) studies some probability properties

of the volatility process h2t under an IGARCH model. Nelson showed that under mild conditions

for zt and assuming α0 > 0, the GARCH(1, 1) process is strongly stationary even if α1 + β1 > 1

as long as E[log(α1 + β1z2t )] < 0. Under certain conditions, the volatility process is strictly

stationary but not weakly stationary, because it does not have the first two moments. Naturally,

a GARCH(p, q) model could be extended to an IGARCH(p, q) model by imposing the restrictionp∑i=1

αi +q∑j=1

βj = 1 to the conditional variance in Equation 2.5.

2.2.4 The Student’s t GARCH Model

An attractive feature of the GARCH process is that successfully captures several characteristics

of financial time series, such as thick tails. The fact has been well documented in the literature

that even though the conditional distribution of the innovations is normal, the unconditional

distribution has thicker tails than the normal one. However, the degree of leptokurtosis induced

by the GARCH process often does not capture all of the leptokurtosis exhibited by high frequency

speculative prices. There is a fair amount of evidence that the conditional distribution of ut is

non-normal. Bollerslev (1987) suggests a fat-tailed conditional distribution that might be superior

to the conditional normal. In particular, he replaced the assumption of conditional Normality of

the error with that of Conditional Student’s t distribution. This development permits a distinction

between conditional heteroscedasticity and a conditional leptokurtic distribution, either of which

could account for the observed unconditional kurtosis in the data. The distribution of the error

term according to Bollerslev (1987) takes the form:

f(ut|Fpt−1) =Γ[ 12 (υ + 1)]

Γ( 12υ)[π(υ − 2)]

12h2t×[1 +

u2t(υ − 2)h2t

] 12 (υ+1)

(2.6)


McGuirk, Robertson and Spanos (1993) find that the above functional form suggested by

Bollerslev for the distribution of ut is different with the Student’s t conditional distribution:

f(ut|Fpt−1) =Γ[ 12 (υ + p+ 1)]

Γ( 12 (υ + p))π

12

(υσ2h2t )− 1

2 ×[1 +

u2tυσ2h2t

]− 12 (υ+p+1)

(2.7)

One can derive Formula 2.6 by substituting h2t into the marginal Student’s t distribution and re-

arranging the scale parameter, while Formula 2.7 can be obtained from the form of the conditional

Student’s t distribution from the joint distribution of the observations. It can be seen that the

degrees of freedom parameter enters 2.7 separately in the gamma functions but as a product with

σ2 in the other terms. Any attempt to estimate υ, ignoring σ2, will result in the estimation of an

inappropriate mixture of both υ and σ2.

2.2.5 The GARCH-in-Mean Model

Many economic theories predict that the return of a macroeconomic or financial variable also

depends on its second order conditional moment. A typical example is the linear or approximately

linear relationship between output growth and its uncertainty which has been quite frequently

studied in the applied macro econometrics literature. Likeliwise, in finance, the return of a security

may be related with its volatility. To model such a phenomenon, another variant of the GARCH

model, the GARCH-in-Mean (GARCH-M) model by Engle et al. (1987) is often applied, where the

“M” stands for GARCH in the mean. In a GARCH-M model, the conditional variance is modeled

by the usual GARCH equation, while the conditional mean is specified as a function including the

risk premium. Most commonly, the functional form of the risk premium is assumed to be linear or

logarithmic in the conditional variance or standard deviation. A simple GARCH-M (1, 1) model

can be written as:

yt = µ+ δh2t + ut, ut ∼ N(0, h2t ) (2.8)

h2t = α0 + α1u2t−1 + β1h

2t−1 (2.9)

The formulation of the GARCH-M model implies the existence of serial correlations in the

return series, which is introduced by those in the volatility process. Therefore the relationship be-

tween economics return and its volatility described by the GARCH-M model offers an explanation

for serial correlations exhibited in historical return data.

More recently, a number of studies discuss the choice of the functional form of the risk premium.

It has been argued that there is no strong evidence that the risk premium is linear or logarithmic in


the conditional variance or standard deviation, which is commonly assumed when one utilizes the

GARCH-M model. In response to this issue, semi-parametric methods are applied which do not

require restrictive assumptions about the functional form of the risk premium apart from certain

smoothness conditions. In many of these models, the conditional variance is parametric while the

conditional mean is nonparametric, see Linton and Perron (2003) for an example.

2.2.6 The EGARCH & TGARCH Model

As mentioned earlier, although the GARCH(p, q) model successfully captures some statistical

features of financial time series, such as thick tailed returns and volatility clustering, its struc-

ture imposes important limitations. Many authors, such as Nelson (1991), criticize the GARCH

models for their inability to allow an asymmetric response to good news and bad news. Since the

specification of the conditional variance depends only on the magnitude the lagged residuals while

ignoring their signs, the GARCH models inherently assume that positive and negative shocks have

the same effects on volatility. In practice this is unrealistic because financial or economics variables

often responds differently to positive and negative shocks, which is so called the ”leverage effect”.

In order to capture the asymmetry manifested by the data, a new class of models, in which pos-

itive shocks and negative shocks have different predictability for volatility, was introduced. The

model that gained most popularity is Nelson’s (1991) EGARCH model. In particular, to allow for

asymmetric effects between positive and negative asset returns, he considers the following form for

the the weighted innovation

g(εt) = θεt + γ[|εt| − E(|εt|)] (2.10)

where θ and γ are real constants and both εt = ut/ht and |εt| − E(|εt|) are zero-mean iid

sequences with continuous distributions. For the standard Gaussian random variable εt, E(|εt|) =√2/π. For the standardized Student-t distribution,

E(|εt|) =2√υ − 2((υ + 1)/2)

(υ − 1)Γ(υ/2)√π

An EGARCH(p, q) model can be written as

log h2t = α0 +

p∑i=1

αig(εi) +

q∑j=1

βj log(h2t−j) (2.11)

It is worth pointing out that the specification of the volatility in terms of its logarithmic


transformation ensures that the conditional variance is positive in contrast to the GARCH models,

which require additional restrictions on the parameters. The properties of the EGARCH model

has attracted much research lately. Nelson (1991) derives the existence conditions for moments of

the general infinite-order Exponential ARCH model. Straumann and Mikosch provide a sufficient

condition for the stationarity of the EGARCH model. The expressions for moments of the first-

order EGARCH process can be found in He, Terasvirta and Malmsten (2002).

A similar way used to handle leverage effects on the conditional standard deviation was intro-

duced by Glosten, Jagannathan, and Runkle (1993) and Rabemananjara and Zakoian (1994). A

TGARCH(p,q) model assumes the form

h2t = α0 +

p∑i=1

(αi + γiNt−i)u2t−i +

q∑j=1

βjh2t−j (2.12)

where Nt−i is an indicator for negative ut−i, that is

Nt−i =

1 for ut−i < 0

0 for ut−i ≥ 0

and α0, αi, γi and βj are non-negative parameters satisfying conditions similar to those of GARCH

models. From this model, it can be seen that asymmetry of positive and negative innovations are

modeled, in that a positive ut−i contributes αiu2t−i to h2t while a negative ut−i has a different

impact (αi + γi)u2t−i with γi 6= 0 to h2t . This model uses zero as its threshold to separate the

impacts of past shocks. This is a case of particular interest in which the TGARCH model is linear

in parameters, while other threshold values can also be used.

2.2.7 Multivariate GARCH Models

In macroeconomics and finance, many empirical models have been proposed in the literature

finding that many economic variables react to the same information and hence have non-zero

covariances conditional on the information set. Hence the study of the relations between the

volatilities and co-volatilities of several variables are important. These issues raise the question of

the specification of the dynamics of covariances or correlations, which can be studied by using a

multivariate model. Naturally, many authors have generalized univariate volatility models to the

multivariate case.

Due to the dominating popularity of the GARCH model in univariate volatility modeling, many


substantive multivariate volatility models are direct extensions of the univariate GARCH model.

In particular, the extension of univariate GARCH to multivariate GARCH(MGARCH) can be

considered as an analogy of the generalization of ARMA to Vector ARMA (VARMA) models,

which are developed to handle vector time series. In the econometrics literature three approaches

for constructing multivariate GARCH models. The first approach is direct generalizations of the

univariate GARCH model, related models include VEC, BEKK, factor models and etc. The second

method is linear combinations of univariate GARCH models, such as (generalized) orthogonal

models and latent factor models. The third approach is nonlinear combinations of univariate

GARCH models, in which we have dynamic conditional correlation (DCC) models, the general

dynamic covariance model and copula-based multivariate models.

In practice, three issues are especially important in the field of multivariate volatility modeling.

The first point is parsimoniousness. Since the number of parameter increases rapidly when it

comes to multivariate models, overparametrization could make the specification untractable. The

second issue is flexibility. A good model should be able to describe various types of dynamic co-

movement of multiple time series with few unsatisfactory restrictions. The third issue is positive

definiteness of the variance matrix. In order to ensuring the positivity of condition variance, a

number of conditions on the parameters are required, which are generally difficult to guarantee.

Further discussions about multivariate volatility modeling are presented in Chapter 4.

2.2.8 Summary of The ARCH-type Volatility Models

To sum up, the extensive research on volatility modeling has largely concentrated on to captur-

ing characteristics of data and choosing the most parsimonious and appropriate functional forms

for the conditional variance which stem from the conditional distributions. For this purpose, a

large amount of studies in the literature have make great efforts to propose univariate and mul-

tivariate volatility models, while the latter one takes the interaction of multiple time series into

consideration as well. Since 1980s, the ARCH family models form the most popular way of parame-

terizing volatility clustering, as well as dynamic and nonlinear dependence observed in return series

of financial and macroeconomics variables. During the last thirty years, wealthy refinements and

extensions of the basic ARCH model were greatly motivated by a combination of factors including

new findings in volatility theories, increased availability of time series data, and fast development

in computer programming.

While the various ARCH-type models have been quite successful and obtained overwhelming


popularity in empirical studies, they have also raised a number of issues which need to be addressed.

Engle (2002) discussed the development and weaknesses of the ARCH model and its vast quantity

of extensions, and identifies promising areas of new research. In particular, this work is focused

on three limitations of the ARCH type models, as discussed below.

First, the specification is ad hoc. Both the functional expression of the conditional variance

and the distribution assumptions for the associated marginal and conditional distributions are pre-

specified without verification. The ARCH type models are deliberately designed for the purpose

of capturing several important features of the volatility of financial return series. Although it

is remarkable that such simple specifications can describe almost any financial time series, they

heavily depends on some implicit assumptions that are either awkward or very hard to verify.

It has been proven that there there is a contradiction in retaining Normality while allowing for

a heteroscedastic conditional variance. Even if the shocks are assumed to follow the Student’s

t distribution, the functional form of the conditional variance implies some implicit restrictions,

which are quite unappealing in two ways: (1) these restrictions are not obvious and therefore

easy to be ignored; (2) these restrictions are complicated and hard to verify even for simple

ARCH(1) model, and become increasingly unwieldy when one uses more complicated models. (See

Section 2.4 for more details). Ignorance or failure to satisfy these implicit restrictions will lead to

misspecification, which could be very hurtful to the estimation and prediction.

Second, for any functional form of the conditional variance, we must ensure the conditional

variance is positive. Two strategies are commonly used in ARCH type model. First, a number of

parameter restrictions are imposed. These restrictions are generally inequality about the param-

eters in the variance equation. For example, for the ARCH(1) model, the conditional variance is

given by

h2t = α0 + α1a2t−1

In order to make sure h2t is positive, we requite α0 > 0 and α1 ≥ 0. When it extends to a

GARCH(1,1) model, the conditional variance is given by

h2t = α0 + α1a2t−1 + β1h

2t−1

and the parameter constraints become α0 > 0, α1 ≥ 0 and β1 ≥ 0. Such inequality constraints are

sufficient conditions for the positivity of the conditional variance of ARCH type model. When the

order of the ARCH effects or GARCH effects increases, the number of constraints increase rapidly

2.3. PROBABILISTIC REDUCTION APPROACH 19

and become very complicated. When it comes to multivariate cases, the positive definiteness of the

variance matrix Ht has to be ensured. The parameter constraints can easily make the multivariate

models are untractable, which leads to the second strategy. One can impose additional assumptions

on the dynamic structure of the variance matrix. For example, it is difficult to guarantee the

positive definiteness of Ht in a VEC model. Therefore a special case of the VEC model is proposed

by Baba, Engle, Kraft and Kroner and given the name BEKK model. In the models with strong

assumptions like the BEKK model, positive definiteness is guaranteed by sacrificing generality. In

sum, the ARCH type models impose either parameter constrains or simplifying assumptions to

provide positive variance. However, the parameter constraints are often very complicated and the

assumptions could be too strong to be realistic.

Third, in ARCH type models, the conditional mean and conditional variance of the return series

are modeled separately by a mean equation and a variance equation. The unknown parameters

related to these two conditional moments are split into two disjoint part. Even for the special

case, the GARCH-in-mean model, where the conditional mean of the return partially depends

on the conditional variance, no relationship between the two set of parameters are considered.

Separately modeling the conditional mean and the conditional variance is problematic because they

are supposed to come from the same distribution. The parameters both in the mean equation and in

the variance equation should be closely related with the parameters in the conditional distribution

of the return series. Ignoring these important relationships could lead to misspecification and

misleading inference.

2.3 Probabilistic Reduction Approach

In this section I present an overview of the Probabilistic Reduction (PR) methodology; see

Spanos (1986). In the context of PR approach, the primary goal is to model the actual data

generating process (DGP): the source of the data in coming to inquire about the phenomenon

of interest. The adequacy of any account of model specification will be assessed by its potential

in allowing the modeler to learn about the actual DGP and the phenomenon of interest. (See

McGuirk, Spanos 2001).

It is in agreement with common sense that both substantive and statistical information play

important roles in learning from data. In econometric modeling information from both sources

are combined to constitute an appropriate model, and the PR approach provides a methodological

framework of doing that in a coherent way. Broadly speaking, statistical information refers to the


chance regularity patterns exhibited by the data when viewed as realizations of generic stochastic

processes, without any information pertaining to what they represent substantively (see Spanos

2010a).

In empirical modeling, it is never easy to fill the gap between actual DGP on one side and

the theory and statistical models on the other. Theories might suggest important features to be

investigated in the available observed data, and to build the bridge that connects them is one major

issue. Embedding the economic theory within an appropriate statistical model is a challenging

task since the concept of “appropriateness” is multidimensional. Specifically, a number of crucial

foundational problems need to be addressed: (a) is the statistical model relevant in probing the

theory? (b) are its assumptions satisfied by the data? (c) are its assumptions internally consistent?

and (d) does it facilitate learning from the data about the phenomenon of interest?

An important element of the PR approach is the formalization of Fisher’s notion of “the reduc-

tion of data to a few numerical values”. In the PR perspective of empirical modeling, one may use

the iterative procedures in which there are four interrelated steps involved: (1) specification, (2)

misspecification testing (3) respecification, and (4) identification. Specification refers to the actual

choice of the statistical model. This is achieved by imposing a set of reduction assumptions of the

joint distribution of the random variables. We often utilize information from graphical techniques

to make decisions about the probabilistic assumptions that are needed to capture the empirical

regularities in the observed data. Once the model is fully specified we then proceed to the second

stage of misspecification testing in order to formally test whether we made the right assumptions

in the first stage. If the model is misspecified, we proceed to the third stage, respecification. In

this stage, the information obtained through the tests in the second stage could be very helpful.

Respecification is generally conducted by imposing a different set of reduction assumptions. Mis-

specification testing and respecification are repeated until we finally reach a statistically adequate

model. The primary purpose of the first three stages is to provide the link between the available

observed data and the assumptions making up the model. In the last stage, identification provides

the final link between the statistically adequate model and the theoretical model. The discussion

below provides more details about some of these steps.

2.3.1 Specification

In the context of the PR approach specification refers to the choice of a statistical model

in the context of which the theoretical question of interest will be assessed (McGuirk, Spanos


2001). The modeler’s major objective is to find an appropriate statistical model that is able to

embed the substantive question of interest. The reliability of the inference reached on the basis

of the estimated statistical models largely depends on its ability of adequately accounting for

the probabilistic structure of the observed data, in other word, its statistical adequacy. On the

other hand, misspecified statistical model provides misleading inferences and weak link between

the empirical and the theoretical models.

The PR methodology allows the structure of the data to play an important role in specifying

plausible statistical models. A statistical model is defined in terms of a number of probabilistic

assumptions, which can be obtained from the joint distribution of all the observable random

variables involved: the set of observations has been chosen by some theory in conjunction with

what aspects of the phenomenon of interest are measurable. For many economic data, one may

achieve useful information by using graphic techniques such as scatter plots, P-P plots and Q-

Q plots to make decisions about the probabilistic assumptions needed to capture the empirical

regularities in the observed data. The statistical model is then derived from the joint distribution

by sequentially imposing a few reduction assumptions, which is an important component of PR

approach. As shown in Spanos (1999), the probabilistic assumptions that aim at reflecting the

probabilistic features of the data can be classified into the three broad categories: Distribution

(D), Dependence (M), Heterogeneity (H).

The Distribution category means the particular distribution that best describes the data stud-

ied. Generally the prosperities of the first two order moments are closely related to this assumption.

Commonly used assumptions in this category include the Normal distribution, the Student’s t dis-

tribution, etc. The Dependence category means the nature of temporal dependence present in our

data set, implying the relationship between the current behaviors and historical behaviors of the

data. Some examples are: Markov(p), m-dependence and martingale etc. The Heterogeneity cat-

egory relates to the time-varying characteristic such as trends and seasonal patterns in the mean

and variance of the data. The simplest assumptions in this category are the Identical Distribution

and the Stationarity. One should introduce heterogeneity using appropriate and parsimonious

methods when necessary.

Thus, in the PR perspective, a statistical model can be viewed as a consistent set of assump-

tions relating to the observable random variables underlying the data. In fact, all possible types

of statistical models can be constructed by imposing different assumptions from these three cate-

gories. That is, any statistical model can be characterized by the underlying reduction assumptions


imposed. For example, for a Normal/Linear regression model, the reduction assumptions imposed

on the process Zt = (yt,X′t)′ are (D)Normal, (M)Independent, and (M)Identically Distributed.

The reduction process of this model can be written as:

D(Z1, ...,ZT ; Φt) =

T∏t=1

D(Zt;φ) =

T∏t=1

D(yt|Xt;φ1)D(Xt;φ2) (2.13)

An advantage of PR approach is that it is specified exclusively in terms of the observable

random variables involved rather than the error term. Unlike the GARCH framework, the PR

approach is no longer ad-hoc since it provides a systematical framework to selects the appropriate

functional form that is uniquely determined by the form of the joint distribution.

Besides, an important property of volatility models is that, the mean and variance conditioning

on the information set come from the same joint distribution, and therefore should be modeled

jointly. In the PR approach, the conditional distribution is obtained by definition. The conditional

density function is calculated as the joint density function of all relevant variables divided by the

joint density function of those variables in the information set, where the joint density function is

derived according to the reduction assumptions. Once we obtain the functional form of the con-

ditional density, we can easily get the conditional mean and conditional variance of the particular

time series.

2.3.2 Misspecification Testing and Respecification

Once the model is fully specified we then proceed to the stage of misspecification testing. Verify-

ing that the underlying model assumptions are adequate for the data being analyzed is important,

but sometimes fails to obtain sufficient attention in the empirical works. The methodology of Mis-

specification testing would be helpful in specifying and validating statistical models and provide

important information indicating how to proceed when violations in statistical assumptions are

detected. Misspecification testing is viewed as testing without (see Spanos, 1999). As a result, the

p-values are interpreted as providing evidence against the null hypothesis, in view of the observed

data. Many different forms of misspecification tests have been proposed. For example, McGuirk,

Driscoll, Alwang, and Huang (1995) illustrate a misspecification testing strategy designed to ensure

that the statistical assumptions where one can check all testable statistical assumptions underlying

a model using a battery of individual- and joint-misspecification tests. Most of these Misspecifica-

tion tests are based on auxiliary regressions with the standardized (weighted) estimated residuals


of the maintained model (see Spanos 2002). Without the procedure of misspecification testing and

respecification that follows when necessary, the statistically model will be at best suspicious and

at worst lead to misleading inference.

It is important to distinguish the misspecification testing from model selection procedures such

as the Akaike Information Criterion (AIC) and its various extensions. Spanos(2010) argues that

these model selection procedures invariably give rise to unreliable inferences, primarily because

their choice within a prespecified family of models that (a) assumes away the problem of model

validation, and (b) ignores the relevant error probabilities.

Respecification refers to the choice of an alternative statistical model when the original un-

derlying statistical assumptions are detected to be inappropriate for the data under study. The

conclusions in the step of misspecification testing could offer important information of respecifying

strategies. Iterative procedure of misspecification testing and respecification should be used until

a statistically adequate model is obtained.

2.3.3 Generalized Procedure of PR Approach

Now I outline the general procedure of modeling with the perspective of PR approach. Let’s

take a univariate volatility model as an example.

1. Specification: Choose the volatility model. One can use a simple guess for example AR

model or ARMA model. At this stage, we do not need to pay too much attention to getting

the exact parameters and estimates. First, several models with slightly different parameters

might be seemingly different but actually are very similar. The more important reason is that

we are dealing with real world data rather than generated data, so it rarely occurs that a first

tried model can capture adequately all the chance regularities in the data. It is possible that

the original model is severely misspecified, regardless of the values of parameters. We have

plenty opportunities to modify our model and change it for the better. Therefore obtaining

precise results of the initial model might be pointless.

2. Estimation and model evaluation: Obtain the estimation of the specified model and

check the appropriateness of the underlying statistical assumptions. In this paper, we apply

a series of misspecification tests to detect departures from distribution, dependence and

homogeneity assumptions. In general, a model is evaluated according to the results of the

misspecification tests and the statistical significance of estimated coefficients. The former one


suggests whether statistical adequacy is obtained and and if it is the latter one determines

its performance in inference, including forecasting.

3. Respecification: When any forms of misspecification are detected, the model is respecified

with a view to account for the statistical information the original model did not. The results

in misspecification testing offer useful guidelies for respecification strategies. Two things

in practice are worth pointing out. Firstly, when the original model is misspecified, the

modification does not necessarily lead to a correctly specified model. In general there are

several candidate models possible and one needs to choose the most appropriate to proceed.

Second, when the number of parameters to estimate is large, it is difficult to find a model

that guarantees the statistical significance for all of them. To an acceptable degree, one can

allow for insignificant coefficients of minor interest. In this situation one could consider the

forecasting performance.

The PR approach has been successfully used in a number of empirical works. McGuirk, Robert-

son and Spanos(1993) illustrates the appropriateness of Student’s t AutoRegressive model with dy-

namic heteroscedasticity (St-AR) in modeling non-linear dependence and leptokurtosis in exchange

rate data. The estimated St-AR models are shown to statistically dominate alternative ARCH-

type formulations and suggest that volatility predictions are not necessarily as large or as variable

as other models indicate. Spanos(1994) extends the well-knownn Normal/linear/homoskedastmic

models to a family of non-normal/linear/heteroskedastic models. The non-normality is kept within

the bounds of the elliptically symmetric family of multivariate distributions (and in particular the

Student’s t distribution) that lead to several forms of heteroscedasticity, including quadratic and

exponential functions of the conditioning variables. More recently, Heracleous and Spanos (2006)

use the Student’s dynamic linear regression (St-DLR) model as an alternative to the various ex-

tensions of the ARCH type volatility model using Dow Jones data and the three-month T-bill rate.

This model is shown to incorporate exogenous variables in the conditional variance in a natural

way and address several limitations of ARCH type models. More recently, Spanos (2011) proposes

a recasting of the statistical foundations of panel data models using the Probabilistic Reduction

perspective where statistical models are viewed as parameterizations of the observable stochas-

tic (vector) process underlying the data. Using the PR perspective several statistical models for

panel data are given a complete list of assumptions in terms of the probabilistic structure of the

observable processes underlying the data. These specifications bring out certain weaknesses in

the probabilistic structure of current panel data models, including the inefficient way such models

2.4. APPENDIX 25

account for the heterogeneity (individual or time) and/or dependence in panel data.

2.4 Appendix

2.4.1 Implicit Restrictions for The ARCH Models

Consider an ARCH(1) model, the conditional function can be written as the sum of a constant

term α0 and squared residuals in the last period. Suppose the mean of the return is an AR(1)

process. The mean equation is given by

yt = β0 + β1yt−1 + at (2.14)

and the variance equation is given by

h2t = α0 + α1a2t−1 (2.15)

Take the expectation of both sides of the mean equation, we obtain

E(yt) = β0 + β1E(yt−1) (2.16)

Denote µ = E(yt), we have

β0 = (1− β1)µ (2.17)

Therefore, the shock at time t− 1 can be rewritten as

at−1 = yt−1 − β0 − β1yt−2 = (yt−1 − µ)− β1(yt−2 − µ) = yt−1 − β1yt−2 (2.18)

where

yt = yt − µ

So the variance equation in the ARCH(1) model can be rewritten as

h2t (ARCH) = α0 + α1y2t−1 + α1β

21y

2t−2 − 2α1β1yt−1yt−2 (2.19)

It can be seen in Chapter 3 that the variance equation in the St-AR(2,2) model takes the form


of

h2t (St-AR) = C(1 + δ11y2t−1 + δ22y

2t−2 + 2δ12yt−1yt−2) (2.20)

where C is a constant term and δs are unknown parameters to estimate. By comparing the

specifications of h2t (ARCH) and h2t (St-AR), we can find several proportional relationships between

the two set of parameters listed as the following.

ARCH(1) St-AR(2,2)

α1 δ11

α1β21 δ22

2α1β1 2δ12

It is easy to find that the ARCH(1) model is a special case of the St-AR(2, 2) model with the

parameter restriction of:

δ11δ22 = δ212

Similarly, when it comes to an ARCH(2) model, the variance equation is given by

h2t (ARCH) = α0 +α1y2t−1 +(α2 +α1β

21)y2t−2 +α2β

21y

2t−3−2α1β1yt−1yt−2−2α2β1yt−2yt−3 (2.21)

By matching this equation with that of a St-AR(3,3) model defined as

h2t (St-AR) = C(1+δ11y2t−1+δ22y

2t−2+δ33y

2t−3+2δ12yt−1yt−2+2δ13yt−1yt−3+2δ23yt−2yt−3) (2.22)

we can find that the two set of parameters have the proportional relationship as the following.

ARCH(2) StAR(3,3)

α1 δ11

α2 + α1β21 δ22

α2β21 δ33

−2α1β1 2δ12

0 2δ13

−2α2β1 2δ23

From these relationships, we can find that the ARCH(2) model is a special case of the St-

AR(3,3) model with the following implicit restrictions:

δ11δ33 = δ12δ23

2.4. APPENDIX 27

δ11δ22δ33 = δ11δ223 + δ33δ

212

δ13 = 0

Therefore we have at least three parameter restricts for a simple ARCH(2) model. It is well

known that the GARCH(1,1) model can be regarded as am ARCH model with the order of infinity.

So the implicit restrictions for the GARCH models will be much more complicated. In practice,

these kinds of restrictions are hard to verify and ignoring them lead to misspecification that would

hurt the reliability of the inference.

2.4.2 Implicit Restrictions for The Multivariate ARCH Models

To ease the notational burden, consider a bivariate case:

y1,t = β10 + β11y1,t−1 + β12y2,t−1 + u1t (2.23)

y2,t = β20 + β21y1,t−1 + β22y2,t−1 + u2t (2.24)

The VEC-ARCH model, which is a simplified version of the VEC-GARCH model, describes

the conditional variance-covariance matrix as the following

vech(Ht) =

h11,t

h21,t

h22,t

=

c11,t

c21,t

c22,t

+

α11 α12 α13

α21 α22 α23

α31 α32 α33

u21,t−1

u1,t−1u2,t−1

u22,t−1

(2.25)

where vech denotes the vectorization of a matrix. We can write the error terms as the following

u1,t−1 = y1,t−1 − β10 − β11y1,t−2 − β12y2,t−2 = y1,t−1 − β11y1,t−2 − β12y2,t−2 (2.26)

u2,t−1 = y2,t−1 − β20 − β21y1,t−2 − β22y2,t−2 = y2,t−1 − β21y1,t−2 − β22y2,t−2 (2.27)

where

yi,t = yi,t − µi for i = 1, 2

Let’s focus on the variance of y11,t,

h11,t(V-ARCH) = c11 + a11u21,t−1 + a12u1,t−1u2,t−1 + a13u

22,t−1 (2.28)


Substituting Equation 2.26 and 2.27 into Equation 2.28 yields a quite long expression, to remain

simplicity, I present the coefficients with the corresponding terms:

Terms Coefficients

(y1,t−1)2 α11

(y1,t−2)2 α11β211 + α12β11β21 + α13β

221

(y2,t−1)2 α13

(y2,t−2)2 α11β212 + α12β12β22 + α13β

222

y1,t−1y1,t−2 −2α11β11 − α12β21

y1,t−1y2,t−1 α12

y1,t−1y2,t−2 −2α11β12 − α12β22

y1,t−2y2,t−1 −α12β11 − 2α13β21

y1,t−2y2,t−2 2α11β11β12 + α12β11β22 + α12β12β21 + 2α13β21β22

y2,t−1y2,t−2 −α12β12 − 2α13β22

In the StVAR(2,2) models (see Chapter 4 for more details), the variance-covariance matrix

takes the form

h11,t(St-VAR) = C1 + C2 ×

y1,t−1

y2,t−1

y1,t−2

y2,t−2

′δ11 δ21 γ11 γ21

δ21 δ22 γ12 γ22

γ11 γ12 λ11 λ12

γ21 γ22 λ21 λ22

y1,t−1

y2,t−1

y1,t−2

y2,t−2

(2.29)

Matching the results in Equation 2.28 and Equation 2.29 yields the following proportional rela-

tionships between the two sets of parameters.

2.4. APPENDIX 29

VEC-ARCH(1) St-VAR(2,2)

α11 δ11

α11β211 + α12β11β21 + α13β

221 λ11

α13 δ22

α11β212 + α12β12β22 + α13β

222 λ22

−2α11β11 − α12β21 2γ11

α12 2δ21

−2α11β12 − α12β22 2γ21

−α12β11 − 2α13β21 2γ12

2α11β11β12 + α12β11β22 + α12β12β21 + 2α13β21β22 2λ21

−α12β12 − 2α13β22 2γ22

It can been shown that there are implicit restrictions with the VEC-ARCH model. The pro-

portional relationships presented above implies the following simultaneous equations

−2δ11β11 − 2δ21β21 = 2Cγ11 (2.30)

−2δ11β12 − 2δ11β22 = 2Cγ21 (2.31)

−2δ21β11 − 2δ22β21 = 2Cγ12 (2.32)

−2δ21β12 − 2δ22β22 = 2Cγ22 (2.33)

where βs are treated as variables. Since the number of equations is equal to the number of variables,

we can get write β11, β12, β21, β22 as functions of δ11,δ21,δ22,γ11,γ12,γ21,γ22, denoted by

βij = fij(δ, γ) for i, j = 1, 2

where δ = (δ11, δ21, δ22) and γ = (γ11, γ21, γ12, γ22). Substituting the solutions of βij into the

equations related to λs yields a number of implicit restrictions, for example

λ11 = α11β211 + α12β11β21 + α13β

221

can be written as

λ11 = δ11f11(δ, γ)2 + 2δ12f11(δ, γ)f21(δ, γ) + δ22f21(δ, γ)2


Similar restrictions related to λ21 and λ22 can be derived. Note that these restrictions come

from the variance of y11,t. The comparison between the expressions of h21,t and h22,t in the VEC-

ARCH(1) model and St-VAR(2,2) model could lead to analogous results. Therefore, the VEC-

ARCH model is actually a special case of the StVAR(2,2) model with a number of restrictions

that are not obvious and quite complicated. These restrictions would become very untractable

in three way: (1) as the order of ARCH effects increase, like high order VEC-ARCH model or

VEC-GARCH model; (2) as the number of lags in the mean equation increases; (3) as the number

of series increases. Some strong assumptions like diagonality can only simplify the restrictions, but

not reduce the number of the restrictions. For other type of multivariate ARCH type models, the

implicit restrictions are also worth careful study.

Chapter 3

Student’s t Family of Univariate

Volatility Models

3.1 Introduction

In this chapter, I discuss univariate volatility modeling by introducing alternative econometric

models to the ARCH-type models that are appropriate for certain speculative prices data. The

discussion in this chapter focuses primarily on models stemming from the Probabilistic Reduction

(PR) approach. These models are referred to as univariate volatility models. Sometimes other

exogenous variables or trend and seasonal patterns might also be responsible for conditional mean

and conditional variance. It is well-known that the financial time series often move together over

time and their volatilities involve not only the series specific attributes but correlations among

variables as well. These issues lead to the multivariate volatility modeling and are considered in

the next chapter.

Linear time series analysis provides a natural framework to study the dynamic structure of

a time series. In general, the most important characteristics of a particular time series are re-

lated to their first two moments of conditional distributions of the observable stochatic processes

underlying the data. The conditional mean of a time series relates to its stationarity, dynamic

dependence and trend properties, while the conditional variance relates to its variability. A num-

ber of econometric models and techniques are well documented to study important properties of

non-white noise, say unit-root nonstationarity, autocorrelation, heterogeneity (trends), seasonality

and heteroskedasticity.

31

32 CHAPTER 3. STUDENT’S T FAMILY OF UNIVARIATE VOLATILITY MODELS

It has been found in many empirical works that volatility in returns fluctuates over time. A

number of studies have documented that speculative price returns, such as stock price returns and

foreign exchange returns perform important “stylized facts” such as volatility clustering and lep-

tokurtic marginal distributions. Although observations in these series are uncorrelated or nearly

uncorrelated, they are in fact not independent because the series contain higher order dependence.

The most popular way of parameterizing this dependence is the models based on the Autoregres-

sive Conditional Heteroscedasticity (ARCH) formulation proposed by Engle (1982). Weaknesses

of the original ARCH model and the statistical feature exhibited in the return data motivate quite

a few extensions of ARCH model with alternative functional forms or non-normal error distribu-

tions. Quintessential examples of ARCH-type models includes Bollerslev’s (1986) GARCH model,

Bollerslev’s (1987) Student’s t GARCH model and Nelson’s (1991) EGARCH model.

Although the ARCH-type models are very useful in time series analysis and proven quite

successful in the literature, there are some potential problems. In particular, this chapter focuses

on three limitations invariably observed in the existing volatility modeling literature:

(1) ad-hoc specification,

(2) unwarranted parameter restrictions, and

(3) neglect of the interrelationship between the first two order conditional moments.

To address these limitations, I introduce a new family of parametric models. These models are

based on several reduction assumptions and the statistical nature of the joint distribution of observ-

able random variables. The well-documented fat tails and leptokurticity exhibited in speculative

prices data suggests replacing the Normality assumption with other distributions like the multivari-

ate student’s t. “Short Memory” of many time series make it reasonable to make the assumption of

Markov(p) process which could considerable simplify the specification and parametrization. Any

heterogeneity exhbited by the data is modeled to ensure the parameters are constant over time.

The remainder of the paper is organized as follows. The family of Student’s t family econometric

models is described in the next subsection. Section 3.3 discusses the empirical performance of the

Student’s t Autoregressive (St-AR) models and the Heterogeneous Student’s Autoregressive (H-

St-AR) model using several real world speculative prices data. Through these applications, I

present the complete procedure of specification, misspecification tests and respecification. Section

3.4 summarizes the results and implications.

3.2. STUDENT’S T FAMILY UNIVARIATE VOLATILITY MODELS 33

3.2 Student’s t Family Univariate Volatility Models

A large number of studies have found that financial time series data commonly exhibit two

characteristics: First, the empirical distribution of returns appears to be bell-shaped symmetric

and leptokurtic. Second, there exists volatility clusters(second order dependence). These findings

give rise to a number of parametric conditional heteroskedastic models which extend homoscedas-

tic linear time-series models by introducing volatility equation. Among others, Autoregressive

Conditional Heteroscedastic (ARCH) family models are the leading systematic framework in the

volatility literature. Despite of their popularity, some limitations hurt the usefulness of the ARCH

models in financial analysis. In this section, I will present the Student’s t family volatility models

which inherently overcome some of these limitations. These models are advantageous over tradi-

tional ones because they are not ad-hoc and allow the underlying characteristics of the observed

data to play an important role in specifying the statistical models; besides, related misspecification

testing and respecification strategies are well-designed and easy to apply when violation are de-

tected. In the Student’s t family models, the joint distribution of the variables follows multivariate

Student’s t distribution, and the conditional heteroskedasticity of the time series of one variable

arises from three possible sources: the historical behaviors of the variable itself, the behavior of

other variables (see more details in Chapter 4) and the heterogeneity in the mean of the time series.

Next I present the empirical specifications of two models: (1) St-AR model and (2) Heteroge-

neous St-AR model. In both models, the volatility depend only on the history of the series itself;

in the first model the first two order moments are assumed to be time-invariant, while the second

model extends the St-AR model by including heterogeneity over time in the mean which might

serve as another source of heteroskedasticity.

3.2.1 St-AR Model

Consider the observable random variables (y1, ..., yT ). Within the Probabilistic Reduction (PR)

methodology, this stochastic process can be summarized by the joint distributionD(y1, y2, ..., yT ;ϕ).

We can impose three set of assumptions on this stochastic process so that the process can be re-

duced to an operational form. Particularly, in the St-AR model the relevant reduction assumptions

can be given as:

(D) Student’s t, (M) Markov(p), (H) Second Order Stationarity


Denote the conditioning information set by F . The Distribution (D) assumption implies that

the conditional distribution f(yt|F) is Student’s t with conditional mean that is linear in F and het-

eroskedastic conditional variance cov(yt|F). The Dependence(M) assumption implies that (ut|F)

is a Markov(p) process, indicating that one can make predictions for the future of the process based

solely on its last p states. The Homogeneity(H) assumption implies that the parameters in the first

two order conditions are time-invariant. With these reduction assumptions, the joint distribution

of (y1, ..., yT ) can be simplified as

D(y1, ..., yT ;ϕ) = D(y1;ϕ0(1))∏Tt=2D(yt|y1t−1;ϕ1(t))

= D(y1, ..., yp;ϕ0(p))T∏

t=p+1D(yt|yt−pt−1 ;ϕ1(t))

= D(y1, ..., yp;ϕ0(p))T∏

t=p+1D(yt|yt−pt−1 ;ϕ1)

where y1t−1 = (yt−1, ..., y1), yt−pt−1 = (yt−1, ..., yt−p), ϕ0(1) denotes the parameters in the distri-

bution of D(y1), ϕ0(p) denotes the parameters in the marginal distribution of D(y1, ..., yp), ϕ1(t)

denotes the parameters in the conditional distribution of D(yt|yt−pt−1) at time t. The first equation

indicates that the joint distribution can be decomposed into a product of a marginal distribution

and (T −1) conditional distributions and D(y1). The second equation is based on the Dependence

(M) assumption of Markov (p) which allows us to reduce the conditional information set to yt−pt−1

for any t > p. The Homogeneity (H) assumption of second order invariance leads to the third

equation, in which the parameters are constant over time. When T is large and p is small, the

impact of the D(y1, ..., yp;ϕ0(p)) to the whole function is neglectable. This procedure significantly

reduces the number of unknown parameters to estimate. In order to get explicit expression of

parameters ϕ1, we need to consider the joint distribution of

∆t =

yt

yt−1

..

yt−(p−1)

yt−p

∼ St

µy

µy

..

µy

µy

σ0 σ1 ... σp−1 σp

σ1 σ0 ... σp−2 σp−1

... ... ... ... ...

σp−1 σp−2 ... σ0 σ1

σp σp−1 ... σ1 σ0

; υ

(3.1)

denoted by

∆t ∼ St(µ,Σ, υ)

where υ is the degrees of freedom parameter, and


µ = (µ1, µ′p)′

Σ =

σ11 Σ′21

Σ21 Σ22

The dimensions of the vectors and the matrices used above are as follows

µ(m× 1), µ1(1× 1), µp(p× 1), Σ(m×m),

σ11(1× 1), Σ21(p× 1), Σ22(p× p), m = p+ 1.

The following proposition provides the joint distribution, conditional distribution and marginal

distribution of ∆t = (yt, yt−1..., yt−p).

Proposition 3.1 Suppose that ∆t = (yt, yt−1..., yt−p) ∼ St(µ,Σ; υ) as defined in Eq(3.1), the

joint distribution, conditional distribution and marginal distribution of ∆t for all t ∈ T can be

written as

D(∆t;ϕ) = D(yt,yt−pt−1;ϕ) = D(yt|yt−pt−1;ϕ1)D(yt−pt−1;ϕ2) ∼ St(µ,Σ; υ)

D(yt|yt−pt−1;ϕ1) ∼ St(β0 + β′yt−pt−1, ω

2t ; υ + p

)D(yt−pt−1;ϕ2) ∼ St(µp,Σ22; υ)

where

ϕ1 = β0,β, σ2,Σ22,µp, ϕ2 = µp,Σ22, ϕ = (ϕ1, ϕ2),

ω2t =

υ

υ + pσ2

(1 +

[1

υ(yt−pt−1 − µp)

′Σ−122 (yt−pt−1 − µp)

]),

β0 = µy − β′µp,β = Σ−122 Σ21, σ2 = σ11 −Σ′21Σ

−122 Σ21

The proof of Proposition 3.1 is provided in section 3.5.1. Note that the density function of the

joint distribution can be written in two ways:

(1)D(∆t;ϕ) = D(yt, ..., yt−p;ϕ)

(2)D(∆t;ϕ) = D(yt|yt−pt−1;ϕ1)D(yt−pt−1;ϕ2)

When we use this model and the models discussed in the next a few sections, we will be more

focused on the second expression of the joint distribution because all parameters of interest in

this model are explicitly stated in it. Proposition 3.1 gives rise to the complete specification of

the St-AR model. A St-AR(p, l; υ) has three predetermined parameters: p, l, and υ, where p


Table 3.1: Student’s t Autoregressive Model

Mean Equation

yt = β0 + β′yt−pt−1 + ut(ut|F t−pt−1 ) ∼ St(0, ωt; υ)

Skedastic Equation

ω2t =

(υ

υ + p− 2

)× σ2

(1 +

[1


′Σ−122 (yt−pt−1 − µp)

])where

β0 = µ1 − β′µp, β = Σ−122 Σ21, σ2 = σ11 −Σ′21Σ

−122 Σ21

F t−pt−1 = σ(yt−pt−1) represents the conditioning set

[1]D(yt|F t−pt−1 ) is Student’s t distributed.

(D) → [2]E(yt|F t−pt−1 ) is linear in yt−pt−1.

[3]Cov(yt|F t−pt−1 ) =

(υ

υ + p− 2

)× σ2(

1 +

[1


′Σ−122 (yt−pt−1 − µp)

])is heteroskedastic.

(M) → [4](ut|F t−pt−1 ) is a Markov(p) process.(H) → [5]Θ = (µ, β0,β, σ

2,Σ22) are t-invariant.

represents the number of Markov process followed by the stochastic time series, l represents the

number of lags in the conditional mean equation, υ is the parameter of degree of freedom. These

three parameters are not estimated. Instead, they are chosen according to graphical analysis and

re-specification procedures. The final choice of k, p and υ should be made on the basis of the

model’s ability to account for the probabilistic features of the data, with the significance of the

coefficients in the conditional mean and variance providing additional support. Besides, for the

purpose of simplicity, I set p = l. A St-AR(p, p; υ) model is fully specified in Table 3.1.

The specification of the St-AR(p, p, υ) model indicates that the conditional mean is linear in

the conditioning variables, in the same way of that in GARCH volatility models. On the other

hand, the dynamic heteroskedasticity volatility is modeled in terms of a quadratic function of all

past conditioning information with only a small quantity of unknown parameters. The St-AR

conditional variance can be thought of as a sequentially smoothed version of the unconditional

variance. In this model, the specification is not ad-hoc because the functional form of the condi-

tional variance is totally based on the properties of Student’s t distribution. Besides, we don’t need

additional parameter restrictions to guarantee the positivity of conditional variance and existence


of higher moments. Moreover, the relationship between the parameters in the conditional mean

and conditional variance are well captured.

To estimate the St-AR model, one should use maximum likelihood method. Under station-

arity, the log-likelihood function for the St-AR model can be written in terms of a recursive

decomposition D(∆t;ϕ). By substituting the functional form of D(yt|yt−pt−1;ϕ1) and D(yt−pt−1;ϕ2)

into D(∆1, ...,∆T ;ϕ), we obtain

D(∆1, ...,∆T ) =

T∏t=1

D(yt|yt−pt−1;ϕ1)D(yt−pt−1;ϕ2) (3.2)

and the log-likelihood function takes the form

lnL(∆1, ...,∆T ;ϕ) ∝ C +T

2ln |Σ−122 | −

T

2ln(σ2)− 1

2(υ + p+ 1)

T∑t=1

ln(γt) (3.3)

where

C = T(

ln Γ(υ+p+1

2

)− (p+1)

2 ln(πυ)− Γ(υ2 ))

γt = ct + u2t/υσ2

ct = 1 + 1υ (yt−pt−1 − µp)

′Σ−122 (yt−pt−1 − µp)

ut = yt − β0 − β′yt−pt−1

A number of issues arise in the maximum likelihood estimation. First, Spanos (1992) has

shown that the coefficients characterizing the conditional mean and variance are related through

the parameters of the joint distribution. Consequently, the conditional mean and conditional

variance should not be modeled separately. In the estimating steps, I first get the estimates of

µ and Σ. Once µ and Σ are obtained, the derivations in Proposition 3.1 can be used to get the

estimates of ϕ = (β0, β, µp, Σ22, σ2). In this way, the interrelationship between the parameters in

the first two order conditional are captured. Asymptotic standard errors for the estimates of ϕ

are obtained from the inverse of the final Hessian and derived using the Delta-method. Second,

the first order conditions are non-linear and therefore require the use of a numerical procedure. A

combination of the numerical optimization methods such as N-M (Nelder and Mead, 1965) method

and BFGS (Broyden, 1970) method are used to ensure that the optimization procedure leads to a

global optimum. It is also worth pointing out at this stage that in order to ensure the positivity

and the symmetirc-Toeplitz shape of the Σ matrix (m×m), we can factorize Σ as the product of


two matrices, that is Σ = L′L, where L takes the form:

L =

l1 l2 ... lm 0 0 ... 0

0 l1 ... lm−1 lm 0 ... ...

... ... ... ... ... ... ... 0

0 0 0 l1 l2 l3 ... lm

m×(2m−1)

(3.4)

3.2.2 Heterogeneous St-AR Model

In the St-AR model discussed above, we make an important reduction assumption that the mod-

el is second order stationary, which implies that the parameters in the first two order conditional

moments are constant over times. However, potential heterogeneity exhibited in many financial

time series data motivates relaxing the homogeneity assumptions. Specifically, now we introduce

heterogeneity into the univariate autoregressive model by imposing the following assumption:

µy(t) = g(t)

where g(.) is a parametric function and t is the time index. Two useful examples of g(t) are

linear function and quadratic function:

µy(t) = γ0 + γ1t

µy(t) = γ0 + γ1t+ γ2t2

where γis are the vectors of parameters to estimate. The introduction of time-varying E(yt)

gives rise to the Heterogeneous St-AR (H-St-AR) model, with conditional mean and conditional

variance functions slightly different with those of a St-AR model. In a Heterogeneous St-AR model,

the vector process ∆t = (yt, ..., yt−p) is given by

∆t ∼ St(µ(t),Σ, υ)


where the mean µ(t) is time-varying and the Σ matrix is constant over time:

µ(t) =

µy(t)

µy(t− 1)

..

µy(t− p)

=

g(t)

g(t− 1)

..

g(t− p)

As the following, Proposition 3.2 provides the joint distribution, conditional distribution and

marginal distribution of ∆t. In fact, Proposition 3.2 can be regarded as an extension of Proposition

3.1.

Proposition 3.2 Suppose that ∆t = (yt, yt−1..., yt−p) ∼ St(µ(t),Σ; υ), the joint distribution,

conditional distribution and marginal distribution of ∆t for all t ∈ T can be written as

D(∆t;ϕ(t)) = D(yt,yt−pt−1;ϕ(t)) = D(yt|yt−pt−1;ϕ1(t))D(yt−pt−1;ϕ2(t)) ∼ St(µ(t),Σ; υ)

D(yt|yt−pt−1;ϕ1(t)) ∼ St(β0(t) + β′yt−pt−1, ω

2t ; υ + p

)D(yt−pt−1;ϕ2(t)) ∼ St(µp(t),Σ22; υ)

where

ϕ1(t) = β0(t),β, σ2,Σ22,µp(t), ϕ2(t) = µp(t),Σ22, ϕ(t) = (ϕ1(t), ϕ2(t)),

ω2t =

υ

υ + pσ2

(1 +

[1

υ(yt−pt−1 − µp(t))

′Σ−122 (yt−pt−1 − µp(t))

])β0(t) = µy(t)− β′µp(t),β = Σ−122 Σ21, σ

2 = σ11 −Σ′21Σ−122 Σ21

The proof of Proposition 3.2 is quite similar with that of Proposition 3.1 and therefore not pro-

vided in this paper. It can be easily noted that the main differences between the parametrization

of a Heterogeneous St-AR model and a St-AR model come from β0(t), µp(t) and ω2t . These param-

eters are constant in a St-AR model but become variable over time as a result of t-heterogeneity in

the conditional mean. It is important to note that although we only introduce time heterogeneity

in the conditional mean of the series, conditional variance is also influenced by time index through

the functional form of ω2t . According to Proposition 3.2 and some mathematical derivations, we

can obtain the complete specification of a Heterogeneous St-AR model. Note that the Hetero-

geneous St-AR model enjoys all the merits of St-AR model. Moreover, the Heterogenous St-AR

model is useful to capture the heterogeneity of the conditional mean of the series over time, which

is possibly a source of heteroskedasticity. Next I will present the complete specifications of two

Heterogeneous St-AR model.


[1] Linear Heterogeneity

In the first specification, g(.) is assumed to be a linear function of t.

µy(t) = g(t) = γ0 + γ1t (3.5)

The corresponding parameter of β0 in a St-AR model becomes time-varying

β0(t) = γ0 + γ1t−p∑i=1

βi(γ0 + γ1(t− i)) = a0 + a1t (3.6)

where

β′ = [β1, ..., βp]

a0 = γ0(1−p∑i=1

βi) + γ1p∑i=1

iβi

a1 = γ1(1−p∑i=1

βi)

For example, when p = 2, we have

a0 = (1− β1 − β2)γ0 + (β1 + 2β2)γ1

a1 = (1− β1 − β2)γ1

while when p = 3, we have

a0 = (1− β1 − β2 − β3)γ0 + (β1 + 2β2 + β3)γ1

a1 = (1− β1 − β2 − β3)γ1

According to these derivations and Proposition 3.2, we obtain the complete specification of a

Heterogeneous(Linear) St-AR model as shown in Table 3.2.


Table 3.2: Heterogeneous(Linear) Student’s t Autoregressive Model

Mean Equation

yt = a0 + a1t+ β′yt−pt−1 + ut(ut|F t−pt−1 ) ∼ St(0, ω2

t ; υ)Skedastic Equation

ω2t =

(υ

υ + p− 2

)× σ2

(1 +

[1


′Σ−122 (yt−pt−1 − µp(t))

])where

β = Σ−122 Σ21, σ2 = Σ11 −Σ′21Σ

−122 Σ21

µy(t) = γ0 + γ1t

a0 = γ0(1−p∑i=1

βi) + γ1p∑i=1

iβi

a1 = γ1(1−p∑i=1

βi)


[1]D(yt|F t−pt−1 ) is Student’s t

(D) → [2]E(yt|F t−pt−1 ) is linear in yt−pt−1


(υ

υ + p− 2

)×σ2

(1 +

[1


′Σ−122 (yt−pt−1 − µp(t))

])is heteroskedastic

(M) → [4](ut|F t−pt−1 ) is a Markov(p) process.(H) → [5]Θ = (a0, a1, γ0, γ1,β, σ

2,Σ22) are t-invariant

[2] Quadratic Heterogeneity

The second Heterogeneous St-AR model assumes that g(.) is a quadratic function of t:

µy(t) = g(t) = γ0 + γ1t+ γ2t2 (3.7)

Similarly, the corresponding parameter β0 in St-AR model can be written as a function of t

β0 = γ0 + γ1t+ γ2t2 −

p∑i=1

βi(γ0 + γ1(t− i) + γ2(t− i)2) = a0 + a1t+ a2t2 (3.8)

where


β′ = [β1, ..., βp]

a0 = γ0(1−p∑i=1

βi) + γ1p∑i=1

iβi − γ2p∑i=1

i2βi

a1 = γ1(1−p∑i=1

βi) + γ2p∑i=1

2iβi

a2 = γ2(1−p∑i=1

βi)

For example, when p = 2, we have

a0 = (1− β1 − β2)γ0 + (β1 + 2β2)γ1 − (β1 + 4β2)γ2

a1 = (1− β1 − β2)γ1 + (2β1 + 4β2)γ2

a2 = (1− β1 − β2)γ2

while when p = 3, we have

a0 = (1− β1 − β2 − β3)γ0 + (β1 + 2β2 + 3β3)γ1 − (β1 + 4β2 + 9β3)γ2

a1 = (1− β1 − β2 − β3)γ1 + (2β1 + 4β2 + 6β3)γ2

a2 = (1− β1 − β2 − β3)γ2

According to these derivations and Proposition 3.2, we can obtain the complete specification

of a Heterogeneous (Quadratic) St-AR model, as shown in Table 3.3.


Table 3.3: Heterogeneous(Quadratic) Student’s t Autoregressive Model

Mean Equation

yt = a0 + a1t+ a2t2 + β′yt−pt−1 + ut

(ut|F t−pt−1 ) ∼ St(0, ω2t ; υ)

Skedastic Equation

ω2t =

(υ

υ + p− 2

)× σ2

(1 +

[1


′Σ−122 (yt−pt−1 − µp(t))

])where

β = Σ−122 Σ21, σ2 = Σ11 −Σ′21Σ

−122 Σ21

µy(t) = γ0 + γ1t+ γ2t2

a0 = γ0(1−p∑i=1

βi) + γ1p∑i=1

iβi − γ2p∑i=1

i2βi

a1 = γ1(1−p∑i=1

βi) + γ2p∑i=1

2iβi, a2 = γ2(1−p∑i=1

βi)


[1]D(yt|F t−pt−1 ) is Student’s t

(D) → [2]E(yt|F t−pt−1 ) is linear in yt−pt−1


(υ

υ + p− 2

)×σ2

(1 +

[1


′Σ−122 (yt−pt−1 − µp(t))


(M) → [4](ut|F t−pt−1 ) is a Markov(p) process.

(H) → [5]Θ = (a0, a1, a2, γ0, γ1, γ2,β, σ2,Σ22) are t-invariant

The estimation of heterogeneous St-AR model is similar with that of (homogenous) St-AR

model. By substituting the functional form of the conditional distribution and marginal distri-

bution into the joint distribution D(yt|yt−1t−p;ϕ1(t))D(yt−1t−p;ϕ2(t)), we can obtain the log-likelihood

function written as the following

lnL(y1, ..., yT ; Θ) ∝ C +T

2ln |Σ−122 | −

T

2ln(σ2)− 1

2(υ + p+ 1)

T∑t=1

ln(γ2t ) (3.9)


where

C = T(ln Γ(υ+p+1

2 )− p+12 ln(πυ)− Γ(υ2 )

)γt = ct + u2t/υσ

2

ct = 1 + 1υ (yt−1t−p − µp(t))′Σ−122 (yt−1t−p − µp(t))

ut = yt − β0(t)− β′yt−pt−1

Obviously the difference between function (3.2) and function (3.4) is that time heterogeneity is

involved in the latter likelihood function through ct and ut. We first obtain the maximum likelihood

estimates of the parameter γ0, γ1 (and γ2 for quadratic Heterogeneous St-AR model) and Σ by

maximizing the log-likelihood function (3.4), then the parameters of the conditional distribution

ϕ1 and marginal distribution ϕ2 can be obtained. The parameters and standard errors estimation

procedure are quite similar with St-AR model. Particularly, the Σ matrix are specified in the

same way of that in St-AR models, so we can continue to factorize Σ with the matrix L defined in

function (3.3) to ensure its positivity and symmetric-Toeplitz shape.

Moreover, the aforementioned Heterogeneous St-AR models are designed to capture the time

trend exhibited in the time series data, where the mean of yt is assumed to be a function of time

index. This specification needs modification when the essential concern is cyclical behavior or

the particular effects relating to t. For example, if a monthly time series data exhibits potential

seasonal pattern, we can apply a poly-trigonometric model by assuming

µy(t) = b0 + b1sin(2πθt) + b2cos(2πθt) (3.10)

where θ is the frequency index defined in cycles per unit time. In another example we focus on

possible summer effects on the time series data. To address this issue, we can define an indicator

variable

D(t) =

1 if t is June, July or August

0 otherwise

and introduce the summer effect heterogeneity by

µy(t) = g(t) + bDt (3.11)

Finally, it is worth pointing out the difference between Heterogeneous St-AR models and tra-

ditional time series models to deal with trend patterns. In many cases, time series involve trend

3.3. EMPIRICAL APPLICATIONS 45

patterns which can could really conceal both the true underlying movement in the series, as well as

certain regular characteristics which may be of interest to analysts. It is also possible that a trend

over time is a source of nonstationarity. As a result, detrending and deseasonalizing techniques

such as differencing and dummy variables are very important to deal with raw time series data set.

A big advantage of these methods is that they are simple to conduct; sometimes there is even no pa-

rameter to estimate. However, this could turn to be a severe disadvantage of these methods, which

is especially acute in volatility models, is that they totally ignore the interrelationship between

the trend patterns and the volatility process. In the underlying statistically adequate model, the

trend pattern in the time series may be responsible for the behaviors of volatility process. When

the trend patterns are essentially related to volatility, the Heterogeneous St-AR models are useful

because they are based on the joint distribution of the observable random variables, in which the

interrelationship between the first two order moments are well captured. As mentioned above, the

heterogeneous parameters are involved in the specification of conditional variance, which enables

us to determine whether heterogeneity in the mean is a source of conditional heteroskedasticity.

3.3 Empirical Applications

3.3.1 Introduction

In this section I report the empirical results of the models discussed above using real world

data. There are two main goals in this section. The first goal is to illustrate the applicability

of St-AR(H-St-AR) model to capture the volatility in univariate time series of exchange rates.

Secondly, I provide the empirical results of a series of misspecification tests (MS tests), in order to

investigate the performance of these models.

To evaluate the model, I apply a series of misspecification tests (MS Tests). For any volatility

model, if the model can capture the systematic anomalies exhibits in the time series data, then

the weighted residual series should behave like a white noise. The M-S tests applied will be based

on the standardized estimated residuals of the maintained model, which are defined as

ut = ut/wt (3.12)

where ut is the raw residual and wt is the estimated conditional standard error. When the

model is homoscedasticity, wt is constant. The relevant M-S tests are based on auxiliary regressions


relating the weighted residuals ut or its square u2t to factors that might potentially pick up any

departures from the model assumptions. Particularly, I consider the following auxiliary regressions

and Table 3.4 summarizes the hypotheses tested in the M-S tests for univariate volatility models.

ut = a0 + a1ut−1 + a2ut−2 + b1yt + b2y2t + b3t+ b4t

2 + vt (3.13)

u2t = c0 + c1yt + c2y2t + c3u

2t−1 + c4u

2t−2 + c5t+ c6t

2 + vt (3.14)

Table 3.4: M-S Tests for Univariate Models

Null Hypothesis Auxiliary regression

Linearity b2 = 0 (3.13)

Homoscedasticity c1 = c2 = 0 (3.14)

1st Independence a1 = a2 = 0 (3.13)

2nd Independence c3 = c4 = 0 (3.14)

1st t-invariance b3 = b4 = 0 (3.13)

2nd t-invariance c5 = c6 = 0 (3.14)

Apart from the results of M-S tests, I also provide fitted values to evaluate the performance of

the model. Using the estimated parameters and model specifications we can calculate the fitted

values as

yt = β0 + βXt

where β0 and β are the estimated parameters, Xt is explanatory variables corresponding to

the particular model applied. When the time series are heteroskedastic, it is necessary to study

the behavior of volatility. Therefore, the adjusted fitted values are designed as below

yt = β0 + βXt + ut = yt + ut

ut ∼ St(0, ωt2; υ)

where ut is generated error term that follows Student’s t distribution with ωt2 and υ defined in

different specifications. In particular, ωt2 can be obtained using the last p periods of the relevant

variables. Intuitively, the fitted values look like a smoothed version of adjusted fitted values by


removing the dynamic volatility. The difference between yt and yt roughly measures the impact

of the conditional variance.

3.3.2 RMB Real Effective Exchange Rate (REER) Index

In this subsection I study the time series data of the RMB real effective exchange rate (REER)

index. I considered the growth of the RMB REER index, which is measured as the log differences

of monthly data, recorded over the period April 1995 through November 2011 (T=200)1. Figure

3.1 shows the time plot of the growth rate of RMB REER index.

0 50 100 150 200

−4

04

Time Index

RM

B

Figure 3.1: Time Plot of The RMB Real Effective Exchange Rate Index

3.3.2.1 The Benchmark Model

To begin with, I use a simple AR(3) model as the benchmark model, in which most of the

statistical regularities exhibited in the data are ignored. Let yt be the log difference of RMB

REER index at time t. The fitted model is written as the following

yt = 0.22(0.14)

− 0.07(0.02)

yt−1 − 0.11(0.07)

yt−2 − 0.05(0.07)

yt−3

The estimated coefficients and their standard errors imply a significantly negative relationship

between the growth rate of the REER index of RMB with its first lag. The constant term and the

coefficients of second and third lags are insignificant at the significant level of 0.1. The results of

the M-S tests are reproted in part (a) of Table 3.5. The test statistics for the group of MS tests are

reported in the second column with p-values in the parentheses. The indicated conclusions at the

significant level of 5% are reported in the third column where low p-values point to misspecification.

1I obtain the RMB REER data from http://www.hexun.com/.


Table 3.5: M-S Tests and Respecification of AR(3) Model for RMB REER

(a) M-S Tests

Tests Statistics Conclusion

Distribution (D)Linearity −0.68(0.50) Not Rejected

Homoscedasticity 3.02(0.05)∗ Rejected

Dependence (M)1st Independence 0.03(0.97) Not Rejected

2nd Independence 0.10(0.91) Not Rejected

Heterogeneity(H)1st t-invariance 3.47(0.03)∗ Rejected

2nd t-invariance 0.39(0.68) Not Rejected

(b) Respecification

Assumptions Original Model MS-Tests Results Respecification

Distribution (D) Normal Rejected Student’s t (4)

Dependence (M) Markov(3) Not Rejected Unchanged

Heterogeneity(H) Homogeneous Rejected 3rd order Heterogeneity

[1]: p values are reported in parentheses

[2]: ∗ significant at 5%

It can be seen that the assumptions of homoscedasticity and homogeneity are violated for

this particular time series. Considering that this time series contains only 200 observations, p-

values smaller than 5% are unsatisfactory. The violation of homoscedasticity assumption is not

surprising because the assumption of Normal distribution is unreliable and the simple AR model

totally ignores the chance regularities of the volatility process. Apart from heteroscedasticity, it

seems that there exists time heterogeneity. Although the log-differencing process removes part

of the time trend, the remaining still needs to be captured. As the result, the respecification

steps are called for to remove these departures from the underlying assumptions. Part(b) of Table

3.5 outlines the respecifying strategies. In particular, I apply the Student’s t distribution with

degree of freedom parameter υ = 4, remain the Markov (3) process and use 3rd heterogeneity

in the conditional mean function. Note that these respecification strategies are not necessarily

appropriate for the data, further adjustments should be made, if required.


3.3.2.2 The Heterogeneous St-AR(3,3;4) Model

Based on the results in the M-S tests and suggested respecification strategies, the 3rd order

H-St-AR(3,3;4) is used to analyze the RMB REER index.2 The estimation results of the 3rd order

H-St-AR(3,3;4) model are reported in part (a) in Table 3.6, and the M-S tests are shown in part

(b) of Table 3.6.

Table 3.6: Estimation and MS Tests Results of 3rd order H-St-AR(3,3;4) for RMB

(a)Estimation (b)MS-Tests

Parameters Estimates Tests Statistics Conclusion

1 1.31(0.67)∗

Linearity 0.22(0.82) Not Rejected

t −1.74(1.02)

Homoscedasticity 1.74(0.18) Not Rejected

t2 0.65(0.52)

1st Independence 0.15(0.86) Not Rejected

t3 0.15(0.16)


yt−1 −0.01(0.04)

1st t-invariance 0.49(0.61) Not Rejected

yt−2 −0.11(0.05)∗


yt−3 0.06(0.03)

σ2 3.10(0.23)∗

l11 3.16(0.23)∗

l21 −0.05(0.14)

l31 −0.36(0.17)∗

[1]: In part (a), standard errors are reported in parentheses

[2]: In part (b), p-values are reported in parentheses

[3]: * significant at 5%

2The degree of freedom is determined by comparing the P-P plots for Normal distribution and the Student’s tdistribution with different degrees of freedom, with the Cauchy distribution as the reference curve. See Heracleousand Spanos(2006). The visual inspection leads to a conjecture that Student’t t with 4 degree of freedom might bethe best choice. The Student’s t distribution with 3 and 5 degrees of freedom are also estimated, and the empiricalresults are slightly different. Besides, the 1st order (linear) and the 2nd (quadratic) order heterogeneous functionsare also estimated, and the performances of those models are slightly inferior to the 3rd heterogeneous function.


The estimated mean equation is3

E(yt|yt−1, yt−2, yt−3) = 1.31(0.67)∗

− 1.74(1.02)

t+ 0.65(0.52)

t2 + 0.15(0.16)

t3 − 0.01(0.04)

yt−1 − 0.11(0.05)∗

yt−2 + 0.06(0.03)∗

yt−3

The estimated variance function is

var(yt|yt−1, yt−2, yt−3) = 3.10(0.23)∗

[1 +

1

4(yt−3t−1 − µ3(t))′Σ−122 (yt−3t−1 − µ3(t))

]

where

µ3(t) =

1.56(0.62)∗

− 1.97(1.09)∗

(t− 1) + 1.01(0.64)∗

(t− 1)2 + 0.15(0.18)

(t− 1)3

1.56(0.62)∗

− 1.97(1.09)∗

(t− 2) + 1.01(0.64)∗

(t− 2)2 + 0.15(0.18)

(t− 2)3

1.56(0.62)∗

− 1.97(1.09)∗

(t− 3) + 1.01(0.64)∗

(t− 3)2 + 0.15(0.18)

(t− 3)3

and

Σ22 =

3.16(0.24)∗

−0.05(0.14)

0.36(0.17)∗

−0.05(0.14)

3.16(0.24)∗

−0.05(0.14)

− 0.36(0.17)∗

−0.05(0.14)

3.16(0.24)∗

The 3rd order H-St-AR model reveals several intersting findings. First, in the Normal AR(3)

model, the constant term is insignificant with a low magnitude and positive sign. But when it

comes to the H-St-AR(3,3;4) model, this term is decomposed into a significantly positive constant

term with a much larger value and three other terms relating to the time index. It is not surprising

to see that the coefficients of the time trend terms in the heterogeneity function is insignificant,

since the trend of the time series of RMB REER is not sharp, as shown in the time plot presented

early in this section. Second, in the Normal AR(3) model, it appears that the coefficients of

yt−1 is the only AR coefficient that is statistically significant, indicating that the current RMB

REER is negatively related to its first lag. When we use the specified model, the value of the

estimated coefficients differ from those in the AR model. The only significant AR coefficient in the

H-St-AR model is the one for the second lag, implying that the current RMB REER is negatively

related to its second lag, instead of the first lag suggested by the Normal AR(3) model. Third, the

results in the M-S tests indicate that the respecification is efficient. The 3rd order H-St-AR(3,3;4)

model successfully captures the conditional heteroskedasticity, dependence and time heterogeneity

underlying in the time series of RMB REER index.

3In this and the next chapter, * represents significant at 5%


0 50 100 150 200

−10

05

10

Time Index

Pre

dict

ed V

alue

s

Figure 3.2: Fitted Values of RMB REER Index

0 50 100 150 200

−10

05

10

Time Index

Adj

uste

d P

redi

cted

Val

ues

Figure 3.3: Adjusted Fitted Values of RMB REER Index

Figure 3.2 and Figure 3.3 provide the unadjusted and adjusted fitted values of 3rd order H-St-

AR(3,3;4) model for the growth of the RMB REER index. Clearly the latter one provides a better

forecast of the the data. Figure 3.4 shows the fitted conditional variance in the H-St-AR(3,3;4)

model. In most periods, the conditional variance is of considerable magnitude, so the difference

between unadjusted fitted values and adjusted fitted value is obvious.


0 50 100 150 200

46

810

Time Index

Con

ditio

nal V

aria

nce

Figure 3.4: Fitted Conditional Variance of RMB REER Index

3.3.3 HKD Real Effective Exchange Rate (REER) Index

In this subsection, I study the time series of the Hong Kong Dollar(HKD) Real Effective

Exchange Rate (REER) Index over the period from January 1994 to August 2013 (T=236)4.

Figure 3.5 shows the time plot of the growth rate of the HKD REER index.

0 50 100 150 200

−4

02

46

Time Index

HK

D

Figure 3.5: The HKD Real Effective Exchange Rate Index


To begin with, we use a simple AR(2) model as the benchmark model. Let yt be the log

difference of the HKD REER index at time t. The fitted model is written as the following

yt = 0.00(0.07)

+ 0.42(0.07)∗

yt−1 − 0.11(0.07)

yt−2

The AR(2) model suggests that the constant term is insignificant, the coefficient of yt−1 is

significantly positive, and the coefficient of yt−2 is insignificantly negative. In order to evaluate

the reliability of the AR(2) model, the M-S tests are conducted to find potential mis-specification.

4I obtain the HKD REER data from http://www.hexun.com/.


The results of the tests are reported in part (a) of Table 3.7.

Table 3.7: M-S Tests and Respecification of AR(2) Model for HKD REER

(a) M-S Tests


Distribution (D)Linearity 2.05(0.04)∗ Rejected



2nd Independence 6.50(0.00)∗ Rejected

Heterogeneity(H)1st t-invariance 1.05(0.35) Not Rejected


(b) Respecification

Assumotions Original Model MS-Tests Results Respecification


Dependence (M) Markov(2) Rejected Markov(3)

Heterogeneity(H) Homogeneous Not Rejected Unchanged



The M-S tests in Table 3.7(a) indicate two problems in the Normal AR(2) model. First, it

fails to capture the heteroskedasticity exhibited in the HKD REER time series data, which means

that the assumption of Normality is seriously suspicious. Second, the second order dependence is

not removed by last two lags. To deal with these issues, the respecification strategies are listed

in Table 3.7(b). Linearity in the conditional mean and heteroskedasticity in conditional variance

lead to the Student’s t distribution, with υ = 4 as the initial degree of freedom parameter. To

capture second order dependence, we use the St-AR models with the assumption of Markov(3).

Since we found no evidence of time instability in the MS tests, there is no need for introducing

heterogeneity function at this stage. Hence, the St-AR(3,3;4) model is used to analyze the HKD

REER time series data.


3.3.3.2 St-AR (3,3;4) Model

The estimation results of the St-AR(3,3;4) model are reported in part (a) in Table 3.8, and the

M-S tests are reported in part (b) of Table 3.8.

Table 3.8: Estimation and MS Tests Results of Linear St-AR(3,3;4) model for HKD



1 −0.02(0.03)

Linearity 1.00(0.32) Not Rejected

yt−1 0.40(0.04)∗


yt−2 −0.10(0.06)


yt−3 0.06(0.07)


σ2 0.73(0.05)∗


l11 0.86(0.06)∗


l21 0.31(0.04)∗

l31 0.06(0.05)




According to the results, the estimated mean equation is

E(yt|yt−1, yt−2, yt−3) = −0.02(0.03)

+ 0.40(0.04)∗

yt−1 − 0.10(0.06)

yt−2 + 0.06(0.07)

yt−3

and the variance equation is

var(yt|yt−1,yt−2,yt−3) = 0.73(0.05)∗

[1 +

1

4(yt−3t−1 − µ3)′Σ−122 (yt−3t−1 − µ3)

]


where

µ3 =

−0.03

(0.04)

−0.03(0.04)

−0.03(0.04)

and

Σ22 =

0.86(0.06)∗

0.31(0.04)∗

0.06(0.05)

0.31(0.04)∗

0.86(0.06)∗

0.31(0.04)∗

0.06(0.05)

0.31(0.04)∗

0.86(0.06)∗

The M-S tests results in Table 3.8 (b) indicate no departure from the underlying statistical

assumptions. In particular, the St-AR(3,3;4) model outperforms the simple AR(2) model in terms

of capturing heteroskedasticity and second order dependence. Comparing the estimates of St-

AR(3,3;4) and the AR(2) leads to the following findings. First, it is of interest to see that although

the specification and estimation procedures of these two models are totally different, the resulted

estimates are quite similar. In specific, in both models the constant term appears to be insignificant

with very small magnitude, and the coefficient of the first lag is significantly positive, with similar

magnitude around 0.40. This is a good example indicating that a mis-specified model has chances

to give rise to reasonably well result. Second, the MS tests for the AR(2) model suggests that the

Markov(2) process fails to capture all the dependence exhibited in the time series, which lead to a

switch from the Markov(2) process to the Markov(3) process in the respecification. However, in the

estimates of the St-AR(3,3;4) model, the second and third lags are quite insignificant. The reason

for these results is that the AR(2) model is seriously misspecified from multiple aspects, making

it difficult to remedy the model purely by the information gained in the M-S tests. Meanwhile,

these results also illustrate why we often need to do iterative procedures of M-S testing and

respecification to finally find the best model. As a matter of fact, two more St-AR models, the

St-AR(1,1;4) model and the St-AR(2,2;4) model are also conducted, in which the dependence is

assumed as the Markov(1) and the Markov(2) process, respectively. The results show that when

Normality is replaced by the Student’t distribution with υ = 4, a Markov(1) process works well

enough to capture the first and second order dependence, and the estimated coefficient for the first

lag in the three St-AR models are very close.

Figure 3.6 and Figure 3.7 respectively show the unadjusted and adjusted fitted values of HKD

REER index. Figure 3.8 shows the fitted conditional variance in the H-St-AR(3,3;5) model. It


0 50 100 150 200

−5

05

Time Index

Pre

dict

ed V

alue

s

Figure 3.6: Fitted Values of HKD REER Index

0 50 100 150 200

−5

05

Time Index

Adj

uste

d P

redi

cted

Val

ues

Figure 3.7: Adjusted Fitted Values of HKD REER Index

0 50 100 150 200

13

57

Time Index

Con

ditio

nal V

aria

nce

Figure 3.8: Fitted Conditional Variance of HKD REER Index

seems that the volatility of HKD REER index are quite stable over time, unless a few sudden

peaks.


3.3.4 TWD Real Effective Exchange Rate (REER) Index

In this subsection, I study the exchange rate of another currency in Asia, the Taiwan Dol-

lar(TWD). The data considered in this study are log differences of TWD REER index, over the

period of July 1997 through August 2013 (T=194) 5. Figure 3.9 shows the time plot of the TWD

REER Index.

0 50 100 150

−6

−2

24

Time Index

TW

D

Figure 3.9: The TWD Real Effective Exchange Rate Index


A simple AR(2) model is used as the benchmark model. Let yt be the log difference of TWD

REER Index. The fitted model is given as the following

yt = −0.06(0.08)

+ 0.25(0.07)∗

yt−1 − 0.07(0.05)

yt−2

The resulting regression implies that the coefficient of the first lag is significantly positive,

while the constant term and the coefficient of second lag are both insignificant. However, these

conclusions are highly unreliable because obvious deviations from the underlying assumptions are

detected by the M-S tests, as shown in Table 3.9.

5I obtain the TWD REER data from http://www.hexun.com/.


Table 3.9: M-S Tests and Respecification of AR(2) Model for TWD REER Index

(a) M-S Tests


Distribution (D)Linearity 2.85(0.00)∗ Rejected

Homoscedasticity 19.09(0.00)∗ Rejected


2nd Independence 3.31(0.04)∗ Rejected

Heterogeneity(H)1st t-invariance 0.62(0.54) Not Rejected


(b) Respecification

Assumotions Original Model MS-Tests Results Respecification




[1]:p values are reported in parentheses

[2]:* significant at 5%

The results in the MS-tests reported above suggest that, in the Normal AR(2) model, there

is serious departure from the underlying Distribution and Dependence assumptions. The results

suggest no signals of time heterogeneity, so there is no need to introduce heterogeneity into the

conditional mean function until we find new evidence for that. In order to remedy the violations

detected by the M-S tests, I apply the St-AR(3,3;5) model.


3.3.4.2 The St-AR(3,3,5) Model

The results of estimation and MS tests are shown in the part(a) and part(b) of Table 3.10,

respectively.

Table 3.10: Estimation and MS Tests Results of St-AR(3,3;5) for TWD REER Index



1 −0.05(0.03)

Linearity −0.20(0.84) Not Rejected

yt−1 0.35(0.04)∗


yt−2 −0.01(0.06)


yt−3 −0.15(0.07)∗


σ2 0.64(0.05)∗


l11 0.75(0.06)∗


l21 0.26(0.04)∗

l32 0.03(0.04)



[3]: * significant at 5%

The mean function takes the form

E(yt|yt−1, yt−2, yt−3) = −0.05(0.03)

+ 0.35(0.04)∗

yt−1 − 0.01(0.06)

yt−2 − 0.15(0.07)∗

yt−3

The variance function takes the form

var(yt|yt−1, yt−2, yt−3) = 0.64(0.05)∗

[1 +

1

5(yt−3t−1 − µ3)′Σ−122 (yt−3t−1 − µ3)

]


where

µ3 =

0.06(0.04)

0.06(0.04)

0.06(0.04)

and

Σ22 =

0.75(0.06)∗

0.26(0.04)∗

0.03(0.04)

0.26(0.04)∗

0.75(0.06)∗

0.26(0.04)∗

0.03(0.04)

0.26(0.04)∗

0.75(0.06)∗

Table 3.10 indicates no departure from the statistical assumptions. There are several important

findings from the St-AR(3,3;5) model. First, the constant term remains insignificantly negative.

Second, like the AR(2) model, the coefficient of the first lag is still significantly positive, but its

magnitude increases from 0.25 to 0.35. Third, the coefficient of the third lag is significantly nega-

tive, which is not modeled in the AR(2) model. Fourth, the departure from underlying Distribution

and Dependence assumptions goes away in the St-AR model, fortunately, we find no evidence for

the existence of heterogeneity, so it is unnecessary to introduce heterogenous function.

Figure 3.10 and Figure 3.11 provide the unadjusted and adjusted fitted values of the St-

AR(3,3;5) model for TWD REER Index. Figure 3.12 shows the fitted conditional variance in

the St-AR(3,3;5) model.

0 50 100 150

−5

05

Time Index

Pre

dict

ed V

alue

s

Figure 3.10: Fitted Values of TWD REER Index

3.4 Conclusion

In this chapter, I discuss a few specifications for modeling univariate volatility based on the

PR approach. In particular, these specifications are no longer ad-hoc and allow the data struc-

3.4. CONCLUSION 61

0 50 100 150

−5

05

Time Index

Adj

uste

d P

redi

cted

Val

ues

Figure 3.11: Adjusted Fitted Values of TWD REER Index

0 50 100 150

24

68

Time Index

Con

ditio

nal V

aria

nce

Figure 3.12: Fitted Conditional Variance of TWD REER Index

ture to play an important role by imposing three categories of reduction assumptions on the joint

distribution of all observations: (D)Distribution, (M) Dependence, (H) Heterogeneity. These spec-

ifications outperform the ARCH type model in two ways: first, there is no need for complicated

parametric restrictions to guarantee the positivity of conditional variance and existence of higher

order moments; second, the interrelationship between the first two conditional moments are taken

into consideration.

The empirical applications suggest that the proposed Student’s family univariate volatility

models provide an alternative way to describe and forecast the behaviors of speculative price time

series. The empirical results of estimation and misspecification testing using three mainstream cur-

rencies in Asia indicate several statements listed as the following. The RMB Real Effect Exchange

Rate Index can be well captured by a 3rd order Heterogeneous St-AR(3,3;4) model in which the

history information in three lags and a cubic time-heterogeneity in the conditional mean are re-

sponsible for the properties of volatility. The Hong Kong Dollar(HKD) Real Effect Exchange Rate

Index can be captured by a St-AR(3,3;4) model. Similarly, the Taiwan Dollar(TWD) Real Effect

Exchange Rate Index can be described by a St-AR(3,3;5) model. In the two latter applications,

the dynamic heteroskedasticity depends on the historical information in the previous lags.


3.5 Appendix

3.5.1 Proof of Proposition 3.1

Let Zt be a random vector with the form

Zt =

YtXt

where the dimensions of the vectors used above are as follows

Zt : k × 1, Yt : k1 × 1, Xt : k2 × 1

where k = k1 + k2. Based on the assumption that Zt follows a St(µ,Σ; υ) distribution, the

vector process can be written as

Zt ∼ St(µ,Σ; υ) = St

µ1

µ2

Σ11 Σ′21

Σ21 Σ22

; υ

where the dimensions of the vectors and matrices used above are as follows

µ : k × 1, µ1 : k1 × 1, µ2 : k2 × 1

Σ : k × k, Σ11 : k1 × k1, Σ21 : k2 × k1, Σ22 : k2 × k2

The density function of Zt is

D(Zt) =Γ( 1

2 (υ + k))|Σ|− 12

Γ( 12υ)(πυ)

12k

(1 +

1

υ(Zt − µ)

′Σ−1 (Zt − µ)

)− 12 (υ+k)

(3.15)

The density function of Xt is

D(Xt) =Γ( 1

2 (υ + k2))|Σ22|−12

Γ( 12υ)(πυ)

12k2

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k2)

(3.16)

The conditional density function of (Yt|Xt) can be written as

3.5. APPENDIX 63

D(Yt|Xt) = D(Zt)/D(Xt)

=

Γ( 12 (υ + k))|Σ|− 1

2

Γ( 12υ)(πυ)−

12k

(1 +

1

υ(Zt − µ)

′−1(Zt − µ)

)− 12 (υ+k)

Γ( 12 (υ + k2))|Σ22|−

12

Γ( 12υ)(πυ)−

12k2

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k2)

=Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))πυ

12k1

(|Σ||Σ22|

)− 12

(1 +

1

υ(Zt − µ)

′−1(Zt − µ)

)− 12 (υ+k)

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k)+

12k1

We simplify this function as the following. Using results from Searle (1982), the second term

can be rewritten as (|Σ||Σ22|

)− 12

=

(|Σ22||Σ11 −Σ′21Σ

−122 Σ21|

|Σ22|

)− 12

= |σ2| 12

Next consider the third term(1 +

1

υ(Zt − µ)

′−1(Zt − µ)

)− 12 (υ+k)

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k)+

12k1

=

(

1 +1

υ(Zt − µ)

′−1(Zt − µ)

)(

1 +1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 1

2 (υ+k)

×(

1 +1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12k1

Let ct = 1 +1

υ(Xt − µ2)

′Σ−122 (Xt − µ2). Note that ct is a scalar. Using results from Sear-

le(1982), the numerator of the left part can be rewritten as

1 +1

υ

Yt − µ1

Xt − µ2

′ 0 0′

0 Σ−122

+

I

−Σ−122 Σ21

(σ2)−1( I −Σ−122 Σ21 )

Yt − µ1

Xt − µ2

= ct +1

υ

Yt − µ1

Xt − µ2

′ I

−β

(σ2)−1(

I −β) Yt − µ1

Xt − µ2

= ct +ut(σ

2)−1utυ

where ut = Yt−µ2−β′(Xt−µ2) = Yt−β0−β′Xt. Besides, it is easy to see that the denominator

of the left part is ct and the right part is c− 1

2k1t . So the third term can be written as


(1 +

ut(σ2)−1utυct

)− 12 (υ+k)

c− 1

2k1t

With the modified second and third term, the conditional density function of (Yt|Xt) can be

written as

D(Yt|Xt) =Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))

(πυ)−12k1 |σ2|− 1

2 c− 1

2k1t

(1 +

ut(σ2)−1utυct

)− 12 (υ+k)

=Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))

(πυ × υ + k2

υ

)− 12k1(ct ×

υ

υ + k2

)− 12k1

|σ2|− 12

(1 +

1

υ× υ

υ + k2× u′t

(υctσ

2

υ + k2

)−1ut

)− 12 (υ+k)

(Let υ + k2 = υ∗)

=Γ( 1

2 (υ∗ + k1))

Γ( 12υ∗)(πυ∗)

12k1

∣∣∣∣υctσ2

υ∗

∣∣∣∣− 12

(1 +

1

υ∗u′t

(υσ2ctυ∗

)−1ut

)− 12 (υ

∗+k1)

Let β0 + β′Xt = µ∗,υctσ

2

υ∗= Σ∗, we obtain the conditional density function

D(Yt|Xt) =Γ( 1

2 (υ∗ + k1))

Γ( 12υ∗)(πυ∗)

12k1|Σ∗|−

12

(1 +

1

υ∗(Yt − µ∗)′(Σ∗)−1(Yt − µ∗)

)− 12 (υ

∗+k1)

(3.17)

This function directly give rise to

(Yt|Xt) ∼ St(µ∗,Σ∗; υ∗)

with the first order two conditional moments:

E(Yt|Xt) = µ∗ = β0 + β′Xt

V ar(Yt|Xt) =υ∗

υ∗ − 2Σ∗ =

υ∗

υ∗ − 2

υctσ2

υ∗=

υ

υ + k2 − 2× σ2

(1 +

1

υ(Xt − µ2)

>Σ−122 (Xt − µ2)

)

Substituting Yt by yt and Xt by yt−pt−1 yields Proposition 3.1.

QED

3.5. APPENDIX 65

3.5.2 Derivation of the Maximum Likelihood Function

In order to obtain the likelihood function, we substitute the functional form of D(Yt|Xt;ϕ1) in

(3.16) and D(Xt;ϕ2) in (3.15) into

D(∆1, ...,∆T ) =

T∏t=1

D(Yt|Xt;ϕ1)D(Xt;ϕ2) (3.18)

So the joint density function takes the form

D(∆1, ...,∆T ) =T∏t=1

(Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))(πυ)

12k|σ2|− 1

2 |Σ22|−12 c− 1

2 (k+υ)t

(1 +

u′t(σ2t )ut

υct

)− 12 (υ+k)

)

=T∏t=1

(C0|σ2|− 12 |Σ−122 |

12 γ− 1

2 (υ+k)t )

where

C0 =Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))(πυ)

12k, γt = ct +

u′t(σ2t )utυ

In the St-AR case, k1 = 1, k2 = p, σ2 is a positive scalar and |σ2| = σ2. Therefore, the

log-likelihood function can be written as

lnL(∆1, ...,∆T ;ϕ) ∝ C +T

2ln |Σ−122 | −

T

2ln(σ2)− 1

2(υ + p+ 1)

T∑t=1

ln(γt) (3.19)

where C = TC0.

Chapter 4

Student’s t Family of Multivariate

Volatility Models

4.1 Introduction

In the previous chapter, I discuss the volatility models in the context of working with separate

univariate time series, where the first and second order moments of the time series depend on the

historical information of itself. It is well documented that financial time series often move together

over time. The purpose of recognizing and understanding the interrelationship between multiple

time series motivate the birth of multivariate volatility models. The ARCH type models are the

most commonly used tool to analyze the conditional variance of a univariate time series, it can be

extended to take care of the relations between the volatilities and co-volatilities of several return

series. A survey article by Bauwens, Laurent, and Rombouts (2006) provide a detailed review for

the most important generalizations of ARCH type univariate volatility models to the multivariate

case.

Consider the multivariate return series yt of dimension N × 1. We consider the first two

conditional moments of yt. Like in the univariate case, we condition on the sigma-filed generated

by the past information until time t − 1, denoted by Ft−1. The mean equation of the return can

be written as

yt = µt + at

where µt represents the vector of conditional mean of the return, and at represents the vector

66

4.1. INTRODUCTION 67

of shocks, or innovations. In general, µt depends on a set of unknown parameters θ1 and the

historical information of the multivariate time series. For most return series, the conditional mean

can written be as a vector ARMA structure sometimes with relevant exogenous variables:

E(yt|Ft−1) = µt = ΓXt +

p∑i=1

Φiyt−i +

q∑i=1

Θiat−i (4.1)

where Xt denotes the exogenous variables, p and q are non-negative integers, Γ, Φ and Θ are

vectors of parameters. The conditional variance of the return yt, given Ft−1 can be written as

Cov(at|Ft−1) = Ht (4.2)

where Ht is a N ×N positive definite matrix, which depends on some unknown parameters θ2.

In most cases, the parameters in the conditional mean, θ1, and the parameters in the conditional

variance, θ2 are split into two separate parts. Even for the GARCH-in-mean models, where µt

also depends on Ht, no interrelationship between the two sets of parameters are modeled. The

key task of multivariate volatility models is to find an appropriate and parsimonious specification

for µt and Ht that are capable of capturing the statistical features of the data. The ARCH type

multivariate models provide a number of different specifications of Ht. In the survey by Bauwens,

Laurent, and Rombouts, they distinguish three approaches of multivariate GARCH models: (i)

direct generalizations of the univariate GARCH model of Bollerslev (1986); (ii) linear combinations

of univariate GARCH models; (iii) nonlinear combinations of univariate GARCH models.

Although many different multivariate ARCH type models are proposed, some important issues

still need to be solved. Particularly, the multivariate ARCH type models suffer from the same

limitations as in the univariate cases: (i) ad-hoc specification; (2) parameter restrictions to en-

sure the positivity of conditional variance; (3) ignoring the interrelationship between the two set

of parameters in the conditional mean and conditional variance. In the multivariate case these

limitations lead to misspecifications that are even more severe than the univariate case, and the

consequence are quite unpredictable. Apart from the three limitations above, a specific issue for

multivariate ARCH type models is that the number of parameters increase rapidly as the number

of series increases. Many specifications are only tractable for bivariate cases. In practice, many

models impose assumptions to simplify the dynamic structure of the conditional variance matrix,

which turn to be unrealistic. These issues naturally lead to the birth of a new family of volatility

models based on the PR approach, which will be main topic in the following sections.

68 CHAPTER 4. STUDENT’S T FAMILY OF MULTIVARIATE VOLATILITY MODELS

In this chapter I propose an alternative approach to modeling multivariate volatility in specu-

lative prices, the Student’t Vector Autoregressive (St-VAR) Model, which follows the PR method-

ology. I extend the traditional Normal/Linear/Homoskedastic VAR in the direction of Student’s t

distributions. Like in the previous chapter, the main motivation behind this choice is the desire to

develop models which enable us to capture the stylized facts of leptokurticity, non-linear depen-

dence and heteroskedasticity observed in speculative price data. I also extend the StVAR model

to Heterogeneous St-VAR model by including time related heterogeneity in the conditional mean.

4.2 Student’s t Multivariate Volatility Models

4.2.1 St-VAR Model

Consider the observation Zt = (y1t, ...yNt)′, t ∈ T . In the perspective of PR approach, the

statistical process of Zt can be summarized by imposing three categories of reduction assumptions

on the joint distribution of D(Z1, ...ZT ). Particularly, the reduction assumptions are:

(D) Student’s t (M) Markov(p) (H) Second Order Stationarity

In the light of the Dependence(M) assumption and Homogeneity(H) assumption, the joint distri-

bution of (Z1, ...,ZT ) can be simplified as

D(Z1, ...,ZT ) = D(Z1;ϕ0(1))∏Tt=2D(Zt|Z1

t−1;ϕ1(t))

= D(Z1, ...,Zp;ϕ0(p))T∏

t=p+1D(Zt|Zt−pt−1;ϕ1(t))

= D(Z1, ...,Zp;ϕ0(p))T∏

t=p+1D(Zt|Zt−pt−1;ϕ1)

where Zt−pt−1 = (Zt−1, ...,Zt−p), ϕ0(1) denotes the parameters in the distribution of D(Z1), ϕ0(p)

denotes the parameters in the joint distribution of D(Z1, ...,Zp), ϕ1(t) denotes the parameters in

the conditional distribution of D(Zt|Zt−pt−1) at time t. The first equation indicates that the joint

distribution can be decomposed into a product of a marginal distribution and (T − 1) conditional

distribution. The second equation is based on the Dependence (M) assumption of Markov (p)

which allows us to change the conditional information set to Zt−pt−1 for any t > p. The Homogeneity

(H) assumption of Second order invariance leads to the third equation, in which the parameters

are constant over time. When T is large and p is small, the impact of the D(Z1, ...,Zp;ϕ0(p))

to the whole function is neglectable. This procedure significant reduces the number of unknown

parameters to estimate. In order to get explicit expression of parameters ϕ1, we need to consider

the joint distribution of

4.2. STUDENT’S T MULTIVARIATE VOLATILITY MODELS 69

∆t =

Zt

Zt−1

...

Zt−p

∼ St

µZ

µZ

...

µZ

Σ0 Σ1 ... Σp

Σ′1 Σ0 ... Σp−1

... ... ... ...

Σ′p Σ′p−1 ... Σ0

; υ

(4.3)

where

υ is the degrees of freedom parameter.

Σn = cov(Zt1 ,Zt2) for any t1 − t2 = n

In particular,

µZ = (E(y1t), E(y2t), ..., E(yNt))′

Σn =

cov(y1t1 , y1t2) cov(y1t1 , y2t2) ... cov(y1t1 , yNt2)

cov(y2t1 , y1t2) cov(y2t1 , y2t2) ... cov(y2t1 , yNt2)

... ... ... ...

cov(yNt1 , y1t2) cov(yNt1 , y2t2) ... cov(yNt1 , yNt2)

where t1 − t2 = n.

Distribution (4.3) can be simplified as

∆t ∼ St(µ,Σ; υ) (4.4)

where

µ = (µ1,µ′2)′

Σ =

Σ11 Σ′21

Σ21 Σ22

The dimensions of the vectors and the matrices used above are as follows

µ : m× 1, µ1 : N × 1, µ2 : Np× 1, Σ : m×m, Σ11 : N ×N,

Σ21 : Np× 1, Σ22 : Np×Np, m = N × (p+ 1).


The following proposition provides the joint distribution, conditional distribution and marginal

distribution of ∆t = (Z′t,Z′t−1, ...,Z

′t−p)

′.

Proposition 4.1 Suppose that ∆t = (Z′t,Z′t−1, ...,Z

′t−p)

′ ∼ St(µ,Σ; υ), the joint distribution,

conditional distribution and marginal distribution of ∆t for all t ∈ T can be written as

D(∆t; Θ) = D(Zt,Zt−pt−1;ϕ) = D(Zt|Zt−pt−1;ϕ1)D(Zt−pt−1;ϕ2) ∼ St(µ,Σ; υ)

D(Zt|Zt−pt−1;ϕ1) ∼ St(B0 + B′Zt−pt−1,Ωt; υ +Np

)D(Zt−pt−1;ϕ2) ∼ St(µ2,Σ22; υ)

where

ϕ1 = B0,B,σ2,Σ22,µ1,µ2, ϕ2 = µp,Σ22, ϕ = (ϕ1, ϕ2),

Ωt =υ

υ +Npσ2

(1 +

[1

υ(Zt−pt−1 − µ2)′Σ−122 (Zt−pt−1 − µ2)

])B0 = µ1 −B′µ2,B = Σ−122 Σ21,σ

2 = Σ11 −Σ′21Σ−122 Σ21

The proof of Proposition 4.1 is provided in section 4.4.1. Proposition 4.1 provides explicit

parameterizations of the Student’s VAR model. The full specification of the St-VAR model is

presented in Table 4.1.


Table 4.1: Student’s t Vector Autoregressive Model

Mean Equation

Zt = B0 + B′Zt−pt−1 + ut

(ut|F t−pt−1 ) ∼ St(0,Ωt; υ +Np)

Skedastic Equation

Ωt =

(υ

υ +Np− 2

)× σ2

(1 +

[1

υ(Zt−pt−1 − µ2)′Σ−122 (Zt−pt−1 − µ2)

])where B = Σ−122 Σ21,B0 = µ1 −B′µ2,σ

2 = Σ11 −Σ′21Σ−122 Σ21

F t−pt−1 = σ(Zt−pt−1) represents the conditioning set

[1]D(Zt|F t−pt−1 ) is Student’s t

(D) → [2]E(Zt|F t−pt−1 ) is linear in Zt−pt−1

[3]Cov(Zt|F t−pt−1 ) =

(υ

υ +Np− 2

)×σ2

(1 +

[1

υ(Zt−pt−1 − µp)

′Σ−122 (Zt−pt−1 − µp)



(H) → [5]Θ = (µ,B0,B,σ2,Σ22) are t-invariant

To estimate the St-VAR model, one should use maximum likelihood method. Under sta-

tionarity, the loglikelihood function for the St-AR model can be written in terms of a recursive

decomposition D(∆t, ϕ). By substituting the functional form of D(Zt|Zt−pt−1;ϕ1) and D(Zt−pt−1;ϕ2)

into

D(∆1, ...,∆T ) = D(∆1, ...,∆p)

T∏t=p+1

D(Zt|Zt−1t−p;ϕ1)D(Zt−1t−p;ϕ2)

one can obtain the log-likelihood function. In particular, the log-likelihood function takes the

form(ignoring the first p initial conditions):

lnL(∆1, ...,∆T ;ϕ) ∝ C +T

2ln |Σ−122 | −

T

2ln(|σ2|)− 1

2(υ +m+ 1)

T∑t=1

ln(γ2t ) (4.5)

where


C = T ln Γ[12 (υ +m+ 1)]− T ln(πυ)− TΓ( 12υ)

γt = ct + 1υ (u′tΣ

−122 ut)

ct = 1 + 1υ (Zt−1t−p − µ2)′Σ−122 (Zt−1t−p − µ2)

ut = Zt −B0 −B′Zt−pt−1

See the derication in section 4.4.2 for more details. The maximum likelihood estimates (MLE)

of the parameters µ and Σ are obtained by maximizing the log-likelihood function (4.2). we then

get the estimates of Θ = (B0, B, µp, Σ22, σ2). Asymptotic standard errors for the estimates are

obtained from the inverse of the final Hessian and derived using the Delta-method. Besides, the

first order conditions are non-linear and therefore require the use of a numerical procedure. It

is also worth pointing out the matrix Σ22 has some important properties. Firstly, it must be

positive definite and symmetric since it is the variance-covariance matrix of ∆t; secondly, it can

be partitioned into (p+1)× (p+1) matrices, each of which has the dimension of N ×N . Note that

these submatrices are covariance matrix of Zt1 and Zt2 , which are symmetric only for t1 = t2. In

order to guarantee these two properties of Σ while estimating the paramters, we can factorize Σ

as the product of two matrices, that is Σ = L′L, where L is a m× (2m− 1) matrix. For example,

when N = 3, p = 2, m = N × (p+ 1) = 9, therefore L takes the form:

L =

a1 a2 a3 a4 a5 a6 a7 a8 a9 0 0 0 0 0 0 0 0

0 b1 b2 b3 b4 b5 b6 b7 b8 b9 0 0 0 0 0 0 0

0 0 c1 c2 c3 c4 c5 c6 c7 c8 c9 0 0 0 0 0 0

0 0 0 a1 a2 a3 a4 a5 a6 a7 a8 a9 0 0 0 0 0

0 0 0 0 b1 b2 b3 b4 b5 b6 b7 b8 b9 0 0 0 0

0 0 0 0 0 c1 c2 c3 c4 c5 c6 c7 c8 c9 0 0 0

0 0 0 0 0 0 a1 a2 a3 a4 a5 a6 a7 a8 a9 0 0

0 0 0 0 0 0 0 b1 b2 b3 b4 b5 b6 b7 b8 b9 0

0 0 0 0 0 0 0 0 c1 c2 c3 c4 c5 c6 c7 c8 c9

(9×17)

(4.6)

4.2.2 Heterogeneous St-VAR Model

The idea of extend the St-VAR model to Heterogeneous St-VAR quite resembles the one from

St-AR to Heterogeneous St-VAR. In fact, compared with the Heterogeneous St-AR model, we have

more flexible modeling procedure to capture heterogeneity exhibited in the time series data with

the Heterogeneous St-VAR model. In particular, the mean functions of the N variables considered


could be designed to follow different processes. The major difference between a St-VAR model

and a Heterogeneous St-VAR model is the additional assumption that the mean of variable is time

variant, described by:

µZ = g(t) (4.7)

where g(.) is an N × 1 parametric vector function. For example, two simple examples are linear

function and quadratic function:

µ(t) = γ0 + γ1t

µ(t) = γ0 + γ1t+ γ2t2

where γi are N × 1 vectors with parameters to estimate. Note that these two vector functions

presented here imply that the N variables follow the process with same functional form, while

it is possible that the means of the multiple variables follow different functional forms, or some

of them may be constant over time. For simplicity, in the following I continue to use the vector

functions. The introduction of time-varying E(yt) gives rise to the Heterogeneous St-VAR model,

with conditional mean and conditional variance functions slightly different with those of a St-VAR

model. In an Heterogeneous St-VAR model, the vector process ∆t = (Zt, ...,Zt−1) is given by

∆t ∼ St(µ(t),Σ, υ) (4.8)

where the mean µ(t) is time-varying and the variance-covariance matrix is time-invariant.

µ(t) =

µZ(t)

µZ(t− 1)

...

µZ(t− p)

=

g(t)

g(t− 1)

...

g(t− p)

(4.9)

Proposition 4.2 provides the joint distribution, conditional distribution and marginal distribu-

tion of ∆t.

Proposition 4.2 Suppose that ∆t = (Z′t,Z′t−1, ...,Z

′t−p)

′ ∼ St(µ(t),Σ; υ), the joint distribu-

tion, conditional distribution and marginal distribution of ∆t for all t ∈ T can be written as

D(∆t;ϕ) = D(Zt,Zt−pt−1;ϕ) = D(Zt|Zt−pt−1;ϕ1)D(Zt−pt−1;ϕ2) ∼ St(µ(t),Σ; υ)

D(Zt|Zt−pt−1;ϕ1) ∼ St(B0(t) + B′Zt−pt−1,Ωt; υ +Np

)D(Zt−pt−1;ϕ2) ∼ St(µp(t),Σ22; υ)


where

ϕ1 = B0(t),B,σ2,Σ22,µ1(t),µp(t), ϕ2 = µp(t),Σ22, ϕ = (ϕ1, ϕ2),

Ωt =υ

υ + pσ2

(1 +

[1

υ(Zt−pt−1 − µ2(t))′Σ−122 (Zt−pt−1 − µ2(t))

])B0(t) = µ1(t)−B′µ2(t),B = Σ−122 Σ21,σ

2 = Σ11 −Σ′21Σ−122 Σ21

The proof of this proposition is very similar with that of Proposition 4.1. It can be easily found

that the heterogeneity imposed in mean of the joint distribution brings heterogeneity to both

conditional mean function and conditional variance function. Next I will present the complete

specifications of two Heterogeneous St-VAR models as examples.

[1] Linear Heterogeneity

The first model is a linear Heterogeneous St-VAR model in which the mean process follow

µZ = γ0 + γ1t

where γ0 and γ1 are N × 1 vectors. According to these assumptions, B0 in the autoregressive

function can also be written as a function of time index, say

B0(t) = µ1 −B′µp = γ0 + γ1t−p∑i=1

β′i(γ0 + γ1(t− i)) = a0 + a1t (4.10)

where

B′ = (β′1, ...,β′N ), where β′i is an N ×N matrix for i = 1, ..., N.

a0 = (I −p∑i=1

βi)γ0 + (p∑i=1

iβi)γ1

a1 = (I −p∑i=1

βi)γ1


Heterogeneous (Linear) St-VAR model as shown in Table 4.2.


Table 4.2: Heterogeneous (Linear) Student’s t Vector Autoregressive Model

Mean Equation

Zt = a0 + a1t+B′Zt−pt−1 + ut


Skedastic Equation

Ωt =

(υ

υ +Np− 2

)× σ2

(1 +

[1

υ(Zt−pt−1 − µ2(t))′Σ−122 (Zt−pt−1 − µ2(t))

])where µp(t), a0, a1 are defined as above

B = Σ−122 Σ21,σ2 = Σ11 −Σ′21Σ

−122 Σ21



(D) → [2]E(Zt|F t−pt−1 ) is linear in Zt−kt−1


(υ

υ +Np− 2

)× σ2(

1 +

[1

υ(Zt−pt−1 − µ2(t))′Σ−122 (Zt−pt−1 − µ2(t))



(H) → [5]Θ = (γi2i=1, ai2i=1,B,σ2,Σ22) are t-invariant


[2] Quadratic Heterogeneity

Next I present the complete specification of a quadratic Heterogeneous St-VAR model, in which

the mean process is modeled as

µZ = γ0 + γ1t+ γ2t2

where γ0, γ1 and γ2 are N × 1 vectors. With this assumptions imposed, B0 in the autoregressive

function is a quadratic function of t, say

B0(t) = µ1 −B′µp = γ0 + γ1t+ γ2t2 −

p∑i=1

β′i(γ0 + γ1(t− i) + γ2(t− i)2)a0 + a1t+ a2t2 (4.11)

where

B′ = (β′1, ...,β′N ), where β′i is an N ×N matrix for i = 1, ..., N.

a0 = (I −p∑i=1

βi)γ0 + (p∑i=1

iβi)γ1 − (p∑i=1

i2βi)γ2

a1 = (I −p∑i=1

βi)γ1 + (p∑i=1

2iβi)γ2

a2 = (I −p∑i=1

βi)γ2


Heterogeneous (Quadratic) St-VAR model as shown in Table 4.3.


Table 4.3: Heterogeneous (Quodratic) Student’s t Vector Autoregressive Model

Mean Equation

Zt = a0 + a1t+ a2t2 + B′Zt−pt−1 + ut


Skedastic Equation

Ωt =

(υ

υ +Np− 2

)× σ2

(1 +

[1

υ(Zt−pt−1 − µ2(t))′Σ−122 (Zt−pt−1 − µ2(t))

])where µp(t), a0, a1, a2 are defined as above

B = Σ−122 Σ21,σ2 = Σ11 −Σ′21Σ

−122 Σ21



(D) → [2]E(Zt|F t−pt−1 ) is linear in Zt−kt−1


(υ

υ +Np− 2

)× σ2(

1 +

[1

υ(Zt−pt−1 − µ2(t))′Σ−122 (Zt−pt−1 − µ2(t))



(H) → [5]Θ = (γi3i=1, ai3i=1,B,σ2,Σ22) are t-invariant

The likelihood function of the Heterogeneous St-VAR model is slightly different with that of

the St-VAR model because the time index t enters ct and ut. In particular, the log-likelihood

function takes the form:

lnL(∆1, ...,∆T ;ϕ) ∝ C +T

2ln |Σ−122 | −

T

2ln(|σ2|)− 1

2(υ +m+ 1)

T∑t=1

ln(γ2t ) (4.12)


where

C = T ln Γ[12 (υ +m+ 1)]− T ln(πυ)− TΓ( 12υ)

γt = ct + 1υ (u′tΣ

−122 ut)

ct = 1 + 1υ (Zt−1t−p − µ2(t))′Σ−122 (Zt−1t−p − µ2(t))

ut = Zt − a0 − a1t− a2t2 −B′Zt−pt−1

I first obtain the maximum likelihood estimates of the parameters γi(i = 1, 2, 3) and Σ by

maximizing the log-likelihood function, then the parameters of the conditional distribution ϕ1

and marginal distribution ϕ2 can be estimated. The parameter and standard error estimation

procedure are quite similar with St-VAR model. Particularly, the Σ matrix is specified in the

same way of that in Student’s t family models, so we continue to factorize Σ with the matrix L

like the one in Function (4.6) to ensure its positivity and particular shape.

4.3 Empirical Applications

In this chapter, I provide several empirical applications of the St-VAR model and Heterogeneous

St-VAR models. The main objective of this section is to illustrate the applicability of the St-VAR

modeling framework to capture multivariate volatility in financial analysis. I provide the results

of the estimation, Misspecification tests, and respecification.

4.3.1 Introduction

To evaluate statistical models, I apply a series of Misspecification tests (M-S Tests) to detect

potential departures from underlying assumptions. For any volatility model, if the model can

capture the systematic anomalies exhibits in the time series data, then the weighted residual series

should behave like a white noise. The M-S tests applied will be based on the standardized estimated

residuals of the maintained model, which are defined as

ut = L−1t ut

where ut = Zt−Zt is the raw residuals and LtL′t = var(Zt|F t−pt−1 ). When the model is homoscedas-

ticity, L is constant over times. The relevant M-S tests are based on auxiliary regressions relating

the weighted residuals ut or its square u2t to factors that might potentially pick up any departures

from the model assumptions. I consider the following two auxiliary regressions and Table 4.4

summarizes the hypotheses tested.


uit = a0 + a1uit−1 + a2uit−2 + b1yit + b2y2it + b3t+ b4t

2 + vit (4.13)

u2it = c0 + c1uit + c2u2it + c3y

2it−1 + c4y

2it−2 + c5t+ c6t

2 + vit (4.14)

Table 4.4: M-S Tests for Multivariate Models

Null Hypothesis Auxiliary regression

Linearity b2 = 0 (4.13)

Homoscedasticity c1 = c2 = 0 (4.14)

1st Independence a1 = a2 = 0 (4.13)

2nd Independence c3 = c4 = 0 (4.14)

1st t-invariance b3 = b4 = 0 (4.13)

2nd t-invariance c5 = c6 = 0 (4.14)

Apart from the results of M-S Tests, the fitted values and predictions are also provided. Like

the analysis in the univariate volatility models, two types of fitted values are calculated based on

the results in the estimation of parameters. The raw fitted values are

Zt = B0 + B′Zt−pt−1

where B0 and B are estimated parameters, Zt−pt−1 is the past history serves as the explanatory

variables. Since the time series are heteroscedastic, it is necessary to study the behavior of volatility.

For this purpose, the adjusted fitted values are designed as below

Zt = B0 + B′Zt−pt−1 + ut = Zt + ut

ut ∼ St(0, Ωt; υ)

where ut is generated error term that follows Student’s t distribution with Ωt and υ defined in

different specifications. In particular, Ωt can be obtained using the last p period data of relevant

variables. Intuitively, the fitted values look like a smoothed version of adjusted fitted value. The

difference between Zt and Zt roughly measures the impact of conditional variance.


4.3.2 RMB REER Index & Shanghai Stock Exchange Index

In this section, I study the bivariate time series data which include the monthly RMB Real

Effect Exchange Rate (REER) Index and Shanghai Stock Exchange(SSE) Composite Index over

the periods of 1995.4-2011.11 (T=200)1. Figure 4.1 and Figure 4.2 show the time plot of the log

returns of RMB REER Index and SSE Index, respectively. The main goal of this analysis is to

investigate and describe the interrelationship and comovement with these two series.

0 50 100 150 200

−4

04

Time Index

RM

B R

EE

R

Figure 4.1: Time Plot of RMB REER Index

0 50 100 150 200

−30

−10

10

Time Index

SS

E

Figure 4.2: Time Plot of SSE Index


To begin with, we use the VAR(2) model with Normal distribution and homoscedasticity as the

benchmark model. Let Ext be the log return of RMB Real Effect Exchange Rate Index at time t

and Stt be the log return of Shanghai Stock Exchange Composite Index at time t. The estimated

simple VAR(2) model is reported as the following.

1I obtain the RMB REER data from http://www.hexun.com/ and the data of SSE fromhttp://finance.yahoo.com/.


Ext = 0.23(0.14)

− 0.02(0.07)

Ext−1 − 0.03(0.02)∗

Ext−2 − 0.12(0.07)∗

Stt−1 − 0.01(0.02)

Stt−2

Stt = 0.44(0.64)

− 0.28(0.32)

Ext−1 + 0.01(0.07)

Ext−2 + 0.70(0.32)∗

Stt−1 − 0.17(0.07)∗

Stt−2

In the RMB REER Index equation, the parameters of the second lag of REER and the first

lag of SSE are significantly negative, while in the SSE Index equation, the parameters of the first

and second lags of SSE are significantly positive and negative. To evaluate the performance of this

model, Table 4.5 presents the results of the M-S tests.

Table 4.5: M-S Tests and Respecification of VAR(2) Model for RMB REER vs SSE

(a) M-S Tests

Tests REER SSE Conclusion

Distribution (D)Linearity 0.03(0.97) −1.43(0.15) Not Rejected

Homoscedasticity 2.76(0.06) 0.08(0.92) Not Rejected

Dependence (M)1st Independence 0.02(0.98) 0.21(0.81) Not Rejected

2nd Independence 3.18(0.04)∗ 8.57(0.00)∗ Rejected

Heterogeneity(H)1st t-invariance 3.68(0.03)∗ 0.11(0.89) Rejected

2nd t-invariance 1.52(0.22) 1.24(0.29) Not Rejected

(b) Respecification


Distribution (D) Normal Not Rejected Student’s t (5)


Heterogeneity(H) Homogeneous Rejected 3rd order Heterogeneity


[2]:* significant at 5%

The most notable implication of the results in Table 4.5(a) is the violation to the assumptions

of Markov(2) and Homogeneity. In order to remedy these violations, I replace the Markov(2) de-

pendence with Markov(3) and apply the third order time heterogeneity (second order heterogeneity

is also tried, and the results show that third order is better). The assumption of Normality is not


violated in the M-S tests, but the p-value is small. Consider the number of observations is not

large, I use the Student’s t distribution with degree of freedom 5.


4.3.2.2 The Heterogeneous St-VAR(3,3;5) Model

In light of the information obtained in the M-S tests, I respecify the model using the 3rd order

H-StVAR(3,3;5) specification.

Table 4.6: Estimation and MS Tests of 3rd order H-StVAR(3,3;5) for RMB REER vs SSE

(a)Estimation

RMB REER SSE

Parameters Estimates Parameters Estimates

1 8.96(3.21)∗

1 7.29(2.85)∗

t 5.42(3.38)

t 3.17(2.23)

t2 −14.41(5.34)∗

t2 −1.73(0.86)∗

t3 −8.68(5.44)

t3 −1.07(0.59)

Ext−1 −0.05(0.05)

Ext−1 −0.00(0.19)

Ext−2 −0.03(0.01)∗

Ext−2 0.10(0.04)∗

Ext−3 −0.10(0.05)∗

Ext−3 0.55(0.21)∗

Stt−1 0.00(0.01)

Stt−1 0.13(0.05)∗

Stt−2 0.04(0.02)∗

Stt−2 −0.31(0.12)∗

Stt−3 0.00(0.01)

Stt−3 −0.08(0.07)

(b)M-S Tests

Assumptions RMB REER SSE Conclusions

Linearity −0.12(0.90) −0.92(0.36) Pass

Homoscedasticity 1.42(0.24) 0.12(0.89) Pass

1st Independence 0.14(0.87) 0.00(1.00) Pass

2nd Independence 0.55(0.58) 0.39(0.68) Pass

1st t-invariance 0.85(0.43) 0.45(0.64) Pass

2nd t-invariance 0.01(0.99) 0.64(0.53) Pass

[1]: In part (a), standard errors are reported in parentheses.

[2]: In part (b), p-values are reported in parentheses.


The results of estimation and MS tests of the 3rd order StVAR(3,3;5) model are reported in

Table 4.6(a) and (b), respectively.

The mean equation takes the form

E(Ext|Zt−3t−1 ) = 8.96(3.21)∗

+ 5.42(3.38)

t− 14.41(5.34)∗

t2 − 8.68(5.44)∗

t3 − 0.05(0.05)

Ext−1 − 0.03(0.01)∗

Ext−2 − 0.10(0.05)∗

Ext−3

+ 0.00(0.01)

Stt−1 + 0.04(0.02)∗

Stt−2 + 0.00(0.01)

Stt−3

E(Stt|Zt−3t−1 ) = 7.29(2.85)∗

+ 3.17(2.23)

t− 1.73(0.86)∗

t2 − 1.07(0.59)

t3 − 0.09(0.19)

Ext−1 + 0.10(0.04)∗

Ext−2 + 0.55(0.21)∗

Ext−3

+ 0.13(0.05)∗

Stt−1 − 0.31(0.12)∗

Stt−2 − 0.08(0.07)

Stt−3

The variance equation takes the form

var(Zt|Zt−2t−1) =5

7× σ2

(1 +

1

5(Zt−3t−1 − µ3(t))′Σ−122 (Zt−3t−1 − µ3(t))

)

where

σ2 =

3.26(0.24)∗

−0.75(0.56)

−0.75(0.56)

58.75(4.56)∗

,

µ3(t) =

6.82(2.47)∗

− 10.94(4.13)∗

(t− 1) + 5.86(2.30)∗

(t− 1)2 − 1.53(0.77)∗

(t− 1)3

15.31(12.29)

− 24.33(20.46)

(t− 1) + 12.88(11.24)

(t− 1)2 − 3.76(3.61)

(t− 1)3

6.82(2.47)∗

− 10.94(4.13)∗

(t− 2) + 5.86(2.30)∗

(t− 2)2 − 1.53(0.77)∗

(t− 2)3

15.31(12.29)

− 24.33(20.46)

(t− 2) + 12.88(11.24)

(t− 2)2 − 3.76(3.61)

(t− 2)3

6.82(2.47)∗

− 10.94(4.13)∗

(t− 1) + 5.86(2.30)∗

(t− 1)2 − 1.53(0.77)∗

(t− 1)3

15.31(12.29)

− 24.33(20.46)

(t− 1) + 12.88(11.24)

(t− 1)2 − 3.76(3.61)

(t− 1)3

,


and

Σ22 =

3.34(0.25)∗

−1.31(0.56)∗

−0.11(0.14)

−1.44(0.62)∗

−0.33(0.17)∗

−0.18(0.76)

−1.31(0.56)∗

61.84(4.73)∗

−0.57(0.62)∗

5.32(3.01)∗

1.78(0.71)∗

7.51(3.32)∗

−0.11(0.14)

−0.57(0.62)∗

3.34(0.25)∗

−1.31(0.56)∗

−0.11(0.14)

−1.44(0.62)∗

−1.44(0.62)∗

5.32(3.01)∗

−1.31(0.56)∗

61.84(4.73)∗

−0.57(0.62)∗

5.32(3.01)∗

−0.33(0.17)∗

1.78(0.71)∗

−0.11(0.14)

−0.57(0.62)∗

3.34(0.25)∗

−1.31(0.56)∗

−0.18(0.76)

7.51(3.32)∗

−1.44(0.62)∗

5.32(3.01)∗

−1.31(0.56)∗

61.84(4.73)∗

The most important finding from these results is that the H-St-VAR(3,3;5) model outperforms

the simple VAR(2) model in the M-S tests. The p-values for the tests in distributional assumptions

are much larger than the Normal-VAR(2) model, implying the Student’s t distribution describes

the data better than the Normal distribution, and the departure from dependent and homogeneity

assumptions are removed by Markov(3) process and 3rd polynomial heterogenous function. Sec-

ondly, there are differences between the estimated coefficients in the H-St-VAR model with those

in the Normal-VAR model, including magnitude and significance. Specifically, in the Normal VAR

model, the SSE index is weakly related to the lags of the RMB REER index, while in the St-VAR

model, this relationship becomes much stronger.

4.3.3 Shanghai Stock Exchange Index vs. Hang Seng Index

In this part, I consider the bivariate time series data in two stock markets, the mainland

Chinese stock market and the Hong Kong market. In specific, I examine the relationship between

the Hang Seng Index of the Hong Kong Stock Exchange and the Shanghai Composite Index of

the Shanghai Stock Exchange through the period from December 1990 to January 2014. A salient

issue concerned in this study is the fact that the relationship and volatility of the stock markets

are substantially affected by sudden structural changes, corresponding to domestic and global

events. Examples of such events include the 1997 Asian finance crisis, the 1997-2000 Internet

bubble, and the recent global financial crisis since 2008. A number of works in the literature have

provided evidence that sudden changes have substantial impact on the structure of time series,

which lead to changes in parameters and volatility persistence. Among other, two important issues

are closely associated with structure break exhibited in the time series. First, sudden changes result

in volatility persistence. It has been extensively documented that, incorporating sudden changes

into the GARCH-type models could significantly reduce the persistence of volatility in time series.


Conversely, ignoring the sudden changes lead to misleading inference that the volatility is highly

persist. The second issue is the false information transmission across multiple time series. The

information flow of volatility from one series to another might be affected by sudden economic

events. It is welled documented that ignoring the sudden changes in a multivariate model could

lead to unreliable inference of cross-market shocks, in terms of their intensity, direction and origin.

As a results, it is important to consider the possible sudden changes in the volatility models, to

simultaneously capture their impact on volatility persistence and information transmission. The

issue naturally arises is how to find the changing points of volatility. In general, a structural change

in the unconditional variance implies a structural change in the conditional variance captured by a

volatility model. Inclan and Tiao (1994) develop a cumulative sums of squares(CSS) algorithms to

identify discrete sub-periods of the changing volatility of stock returns. Particularly, they propose

the Inclan and Tiao (IT) statics to test the null hypothesis that the unconditional variance is

constant over time, against the alternative hypothesis that there is a structural break in the

unconditional variance. Since the IT statics is particularly designed for Normal i.i.d process,

which is unrealistic conditions given that real-world financial time series data often exhibited

non-Normal and persistence, a number of adjustments have been made. For example, Sanso

et al.(2004) relax the condition of Normality by taking the fourth order moments properties of

disturbances and conditional heteroskedasticity into consideration. Rapach and Strauss (2008)

employ a nonparametric modification of the IT statistics that allows for dependent processes.

To test for multiple structural breaks of volatility, Inclan and Tiao (1994) propose the iterative

cumulative sums of squares (ICSS) algorithm based on the IT statistics. Most modified versions

of the original IT statistics could be easily applied to the ICSS algorithm.

Despite the wide use of the ICSS algorithm and numerous versions of IT-based statistics to

find volatility changing points, I will use observable economics evens as the breaks in this analysis.

The reasons are as the following. First, a statistic to test for changing points always requires

assumptions on the time series data. Different time series, or even different parts in one time

series, may exhibit different statistical features. Therefore it is difficult to find a general method

that is consistently appropriate. Second, changes in volatility is only one type of structural change,

many other types, such as changes in the mean and parameters, are also important. As far as I

know, there is no single method designed for simultaneously detecting for all types of important

structure changes. Meanwhile, it is very likely that different algorithms give rise to different

changing points, which is quite unsatisfactory. Third, most structure changes closely correspond


to the real-world events, which are often observable with clear beginning and ending dates. To

sum up, there is no evidence that the complicated statistical changing-points finders work better

than simply using the observable economic events as natural changing points.

In particular, I partition the period from December 1990 to January 2014 into four parts by

three cutting points. The first part is the pre-Asian-crisis period, from December 1990 to October

1997. Since the opening up of Chinese stock markets, more foreign investors have been actively

involved, the stock markets in the mainland China and Hong Kong began to share more and

more common information. The second part is the post-Asian-crisis period, from October 1997

to October 2006. Hong Kong was one of the areas that were most hurt by the Asian financial

crisis began in 1997. It took nearly ten years to recover from the stock disaster. Chinese stock

market was less affected, compared to the Southeast Asia and South Korea. However, its GDP

growth slowed sharply during 1998 and 1999, which revealed its financial weaknesses and structural

problems. Both China mainland and Hong Kong were heavily shocked by the 1997 Asian financial

crisis, so it is reasonable to separately analyze the behavior of markets in the pre- and post- Asian

financial crisis periods. The third period goes from October 2006 to August 2007, the rise-and-fall

period. This period is short but abnormal. Both the Chinese stock market and the Hong Kong

stock market experienced successive huge rise and fall. A number of works discuss the reason for

the phenomenon, but it seems there is no theoretical agreement. One possible reason of the sudden

slump is the upcoming global financial crisis beginning from August 2007, which leads to the last

period. The fourth period is the global-crisis period, goes from August 2007 to January 2014.

During this period, both mainland China and Hong Kong suffered from the fall in GDP growth

and export, but the close cooperation between the two economies against the crisis were effective.

0 200 400 600 800 1000 1200

−20

−10

010

Time Index

Han

g S

eng

Inde

x

Figure 4.3: Time Plot of Hang Seng Index returns

Figure 4.3 and 4.4 show the time plot of the log returns of Hang Seng Index and SSE Index,


0 200 400 600 800 1000 1200

−20

020

4060

80

Time Index

SS

E In

dex

Figure 4.4: Time Plot of SSE Index returns

respectively 2. As discussed above, I partition the entire period into four parts, pre-Asian-crisis

period, post-Asian-crisis period, rise-and-fall period and global-crisis period. In the third period,

both market experienced a sudden and abnormally sharp transition from bull market to bear

market, which is highly unpredictable. Therefore in this work, I do not apply the volatility models

to study the behaviors of the stock returns in this period.

Let HKt be the log return of the Hang Seng Index in Hong Kong stock exchange at time t, and

let CNt be the log return of the Shanghai Stock Exchange Index in Shanghai Stock Composite

Exchange at time t. To begin with, I use the Normal VAR(2) models as the benchmark models to

analysis the relationship between these two stock return time series. The results show that only

few parameters in the Normal VAR(2) models are significant in three different periods.

Table 4.7: Estimation of VAR(2) Models: Hang Seng Index vs SSE Index

Period 1990.12-1997.10 1997.10-2006.10 2007.8-2014.1

Hang Seng SSE Hang Seng SSE Hang Seng SSE

1 0.41(0.17)∗

0.51(0.52)

0.11(0.17)

0.09(0.16)

0.02(0.23)

−0.07(0.21)

HKt−1 0.01(0.05)

0.22(0.16)

−0.03(0.05)

0.04(0.04)

−0.05(0.07)

0.17(0.06)∗

HKt−2 −0.01(0.02)

0.04(0.05)

−0.02(0.05)

−0.05(0.05)

−0.01(0.07)

−0.07(0.07)

CNt−1 0.05(0.05)

0.00(0.16)

0.06(0.05)

−0.02(0.04)

−0.04(0.07)

0.02(0.06)

CNt−2 0.00(0.02)

−0.04(0.05)

0.03(0.05)

0.02(0.05)

−0.01(0.07)

−0.03(0.07)

[1]: standard errors are reported in parentheses

In order to evaluate the performance of the Normal VAR(2) models, I apply the M-S tests for

2I obtain the SSE data and the HSI data from http://finance.yahoo.com/.


each period. The M-S tests not only detect potential departures from the underlying assumptions

related to the statistical features of the time series data, but also provide useful information for

respecifying the statistical model.

4.3.3.1 Period 1: 1990-1997

Table 4.8 presents the M-S tests for the Normal VAR(2) in the first period: December 1990 to

October 1997.

Table 4.8: M-S Tests and Respecification of the VAR(2) Model: Period 1

(a) M-S Tests

Tests Hang Seng SSE Conclusion

Distribution (D)Linearity −2.08(0.04)∗ −1.06(0.29) Rejected

Homoscedasticity 20.47(0.00)∗ 1.65(0.19) Rejected


2nd Independence 2.13(0.12) 0.11(0.89) Not Rejected

Heterogeneity(H)1st t-invariance 0.73(0.48) 0.54(0.58) Not Rejected


(b) Respecification



Dependence (M) Markov(2) Not Rejected Unchanged



Departure from the Normal distribution is the most important finding revealed by the M-S

tests. Therefore, the Normal distribution is replaced by Student’s t distribution with degree of

freedom of 5. Since no other violations are detected, the dependence and homogeneity assumptions

are not changed. Table 4.9 reports the results for the respecified St-VAR(2,2,5) model.


Table 4.9: Estimation and MS Tests of StVAR(2,2;5):Period 1

(a)Estimation

Hang Seng Index SSE Index


1 0.51(0.09)∗

1 0.19(0.18)

HKt−1 0.01(0.04)

HKt−1 0.16(0.09)

HKt−2 −0.01(0.02)

HKt−2 0.12(0.04)∗

CNt−1 0.05(0.05)

CNt−1 0.07(0.12)

CNt−2 0.00(0.00)

CNt−2 0.09(0.06)

(b)M-S Tests

Assumptions Hang Seng SSE Conclusions

Linearity −0.16(0.87) −0.73(0.36) Not Rejected


1st Independence 0.16(0.85) 0.61(1.00) Not Rejected


1st t-invariance 0.84(0.43) 0.46(0.64) Not Rejected




The results of the St-VAR(2,2,5) model indicate a significantly positive relationship between the

SSE Index with the second lag Hang Seng Index. Beside, the MS tests show that the respecification

leads to a statistically adequate model, where no departure from the reduction assumptions is

found.


4.3.3.2 Period 2: 1997-2006

Next I study the time series data of the two stock return in the second period, October 1997

to October 2006.


(a) M-S Tests


Distribution (D)Linearity 1.89(0.05)∗ −0.12(0.91) Rejected





2nd t-invariance 26.90(0.00)∗ 0.84(0.43) Rejected

(b) Respecification



Dependence (M) Markov(2) Not Rejected Markov(3)

Heterogeneity(H) Homogeneous Rejected 2nd Heterogeneity


In the results shown in the above table, we obtain important information. Again, the distri-

butional assumption of Normality is not satisfied, which naturally leads us to use the Student’s t

distribution. Besides, signal of time-heterogeneity is found in the M-S tests, therefore heterogeneity

function is introduced into the model. The results show that a second order heterogeneity function

is strong enough to capture the second order time-heterogeneity, so there is no need to use higher

order polynomials. Note that the results of M-S tests indicate no departure from the Dependence

assumption for a VAR(2) model, so in the directly suggested respecification, the Markov(2) pro-

cess should remain. However, the iterative M-S testing and respecification procedure show that a

H-St-VAR(3,3,5) model works better than a H-St-VAR(2,2,5) model.


Table 4.11: Estimation and MS Tests of 2nd order H-StVAR(3,3;5): Period 2

(a)Estimation



1 0.21(0.34)

1 0.44(0.46)

t 0.15(0.20)

t 0.38(0.29)

t2 0.39(0.51)

t2 0.36(0.27)

HKt−1 0.01(0.03)

HKt−1 0.04(0.03)

HKt−2 0.00(0.03)

HKt−2 −0.02(0.03)

HKt−3 0.02(0.04)

HKt−3 −0.03(0.03)

CNt−1 −0.01(0.04)

CNt−1 0.05(0.03)

CNt−2 0.02(0.05)

CNt−2 0.02(0.04)

CNt−3 0.06(0.06)

CNt−3 0.06(0.05)

(b)M-S Tests

Assumptions Hang Seng SSE Conclusions

Linearity 0.34(0.73) 0.48(0.63) Not Rejected








The results indicate that the 2nd order H-St-VAR(3,3;5) model works better than the original

Normal VAR(2) model in the sense that the distributional and dependence assumptions are well

satisfied. Although the time series data of the Hang Seng Index still exhibits second order time-

heterogeneity, but the p-value in the corresponding M-S tests are much larger than that in the


VAR(2) model, indicating that the H-St-VAR model makes an improvement to reduce the negative

effect of the time-heterogeneity to the analysis. Although the results in the parameter estimation

in the two models are quite similar, it is reasonable to claim that the H-St-VAR model provides

more reliable inference.


4.3.3.3 Period 3: 2007-2014

For the time series data in the last period, August 2007 to January 2014, the M-S tests show

that the Normal VAR(2) is severely misspecified again. All the three reduction assumptions are

violated. The information gained in the M-S tests suggests a respecification to a 2nd order H-St-

VAR(3,3,5) model.


(a) M-S Tests


Distribution (D)Linearity 2.37(0.02)∗ 1.01(0.31) Rejected

Homoscedasticity 16.81(0.00)∗ 1.86(0.16) Rejected


2nd Independence 15.89(0.00)∗ 1.37(0.25) Rejected



(b) Respecification




Heterogeneity(H) Homogeneous Rejected 2nd Heterogeneity



Table 4.13: Estimation and MS Tests of 2nd order H-StVAR(3,3;5): Period 3

(a)Estimation



1 −0.88(1.66)

1 −0.01(1.45)

t −1.02(1.63)

t −1.03(1.61)

t2 0.01(1.63)

t2 −0.96(1.49)

HKt−1 −0.10(0.04)∗

HKt−1 0.10(0.04)∗

HKt−2 0.09(0.04)∗

HKt−2 −0.04(0.04)

HKt−3 −0.01(0.05)

HKt−3 0.03(0.05)

CNt−1 −0.01(0.05)

CNt−1 0.01(0.05)

CNt−2 0.05(0.07)

CNt−2 −0.03(0.07)

CNt−3 −0.06(0.07)

CNt−3 −0.03(0.07)

(b)M-S Tests

Assumptions RMB REER SSE Conclusions

Linearity −0.61(0.54) 0.05(0.96) Not Rejected








The M-S tests detect no violation against the underlying assumptions related to the statistical

feature of the time series data, so the respecification is successful. Besides, the H-St-VAR reveals

some interesting findings that are hidden in the Norml VAR(2) model. In specific, in this period,

the Hang Seng Index return is positively related with its first order lag and negatively related with


its second order lag, while the SSE Index return is positively related with the first order lag of

Hang Seng Index return.

4.4 Appendix

4.4.1 Proof of Proposition 4.1

Let Zt be a random vector with the form

Zt =

YtXt

where the dimensions of the vectors used above are as follows

Zt : k × 1, Yt : k1 × 1, Xt : k2 × 1

where k = k1 + k2. Based on the assumtpion that Zt follows a St(µ,Σ, ; υ) distribution, the

vector process can be written as

Zt ∼ St(µ,Σ; υ) = St

µ1

µ2

Σ11 Σ′21

Σ21 Σ22

; υ

where the dimensions of the vectors and matrices used above are as follows

µ : k × 1, µ1 : k1 × 1, µ2 : k2 × 1

Σ : k × k, Σ11 : k1 × k1, Σ21 : k2 × k1, Σ22 : k2 × k2

The density function of Zt is

D(Zt) =Γ( 1

2 (υ + k))|Σ|− 12

Γ( 12υ)(πυ)

12k

(1 +

1

υ(Zt − µ)

′Σ−1 (Zt − µ)

)− 12 (υ+k)

(4.15)

The density function of Xt is

D(Xt) =Γ( 1

2 (υ + k2))|Σ22|−12

Γ( 12υ)(πυ)

12k2

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k2)

(4.16)

The conditional density function of (Yt|Xt) can be written as

4.4. APPENDIX 97

D(Yt|Xt) = D(Zt)/D(Xt)

=

Γ( 12 (υ + k))|Σ|− 1

2

Γ( 12υ)(πυ)−

12k

(1 +

1

υ(Zt − µ)

′Σ−1 (Zt − µ)

)− 12 (υ+k)

Γ( 12 (υ + k2))|Σ22|−

12

Γ( 12υ)(πυ)−

12k2

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k2)

=Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))πυ

12k1

(|Σ||Σ22|

)− 12

(1 +

1

υ(Zt − µ)

′Σ−1 (Zt − µ)

)− 12 (υ+k)

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k)+

12k1

We simplify this function as the following. Using results from Searle(1982), the second term

can be rewritten as (|Σ||Σ22|

)− 12

=

(|Σ22||Σ11 −Σ′21Σ

−122 Σ21|

|Σ22|

)− 12

= |σ2| 12

Next consider the third term(1 +

1

υ(Zt − µ)

′Σ−1 (Zt − µ)

)− 12 (υ+k)

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12 (υ+k)+

12k1

=

(

1 +1

υ(Zt − µ)

′Σ−1 (Zt − µ)

)(

1 +1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 1

2 (υ+k)

×(

1 +1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)− 12k1

Let ct = 1 +1

υ(Xt − µ2)

′Σ−122 (Xt − µ2). Note that ct is a scalar. Using results from Sear-

le(1982), the numerator of the left part can be rewritten as

1 +1

υ

Yt − µ1

Xt − µ2

′ 0 0′

0 Σ−122

+

I

−Σ−122 Σ21

(σ2)−1( I −Σ−122 Σ21 )

Yt − µ1

Xt − µ2

= ct +1

υ

Yt − µ1

Xt − µ2

′ I

−β

(σ2)−1(

I −β) Yt − µ1

Xt − µ2

= ct +u′t(σ

2)−1utυ

where ut = Yt − µ2 − β′(Xt − µ2) = Yt − β0 − β′Xt. Besides, it is easy to see that the

denominator of the left part is ct and the right part is c− 1

2k1t . So the third term can be written as


(1 +

u′t(σ2)−1utυct

)− 12 (υ+k)

c− 1

2k1t

With the modified second and third term, the conditional density function of (Yt|Xt) can be

written as

D(Yt|Xt) =Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))

(πυ)−12k1 |σ2|− 1

2 c− 1

2k1t

(1 +


)− 12 (υ+k)

=Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))

(πυ × υ + k2

υ

)− 12k1(ct ×

υ

υ + k2

)− 12k1

|σ2|− 12

(1 +

1

υ× υ

υ + k2× u′t

(υctσ

2

υ + k2

)−1ut

)− 12 (υ+k)

(Let υ + k2 = υ∗)

=Γ( 1

2 (υ∗ + k1))

Γ( 12υ∗)(πυ∗)

12k1

∣∣∣∣υctσ2

υ∗

∣∣∣∣− 12

(1 +

1

υ∗u′t

(υσ2ctυ∗

)−1ut

)− 12 (υ

∗+k1)

Let β0 + β′Xt = µ∗,υctσ

2

υ∗= Σ∗, we obtain the conditional density function

D(Yt|Xt) =Γ( 1

2 (υ∗ + k1))

Γ( 12υ∗)(πυ∗)

12k1|Σ∗|−

12

(1 +

1

υ∗(Yt − µ∗)′(Σ∗)−1(Yt − µ∗)

)− 12 (υ

∗+k1)

(4.17)

This function directly give rise to

(Yt|Xt) ∼ St(µ∗,Σ∗; υ∗) (4.18)

with the first order two conditional moments:

E(Yt|Xt) = µ∗ = β0 + β′Xt

V ar(Yt|Xt) =υ∗

υ∗ − 2Σ∗ =

υ∗

υ∗ − 2

υctσ2

υ∗=

υ

υ + k2 − 2× σ2

(1 +

1

υ(Xt − µ2)

′Σ−122 (Xt − µ2)

)

Substituting Yt by Zt and Xt by Zt−pt−1 yields Proposition 4.1.

QED

4.4. APPENDIX 99

4.4.2 Derivation of the Maximum Likelihood Function

In order to obtain the likelihood function, we substitute the functional form of D(Yt|Xt;ϕ1) in

(4.10) and D(Xt;ϕ2) in (4.9) into

D(∆1, ...,∆T ) =

T∏t=1

D(Yt|Xt;ϕ1)D(Xt;ϕ2)

So the joint density function takes the form

D(∆1, ...,∆T ) =T∏t=1

(Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))(πυ)

12k|σ2|− 1

2 |Σ22|−12 c− 1

2 (k+υ)t

(1 +


)− 12 (υ+k)

)

=T∏t=1

(C0|σ2|− 12 |Σ−122 |

12 γ− 1

2 (υ+k)t )

where

C0 =Γ( 1

2 (υ + k))

Γ( 12 (υ + k2))(πυ)

12k

γt = ct +u′t(σ

2)−1utυ

In the St-VAR case, k1 = N , k2 = Np, and σ2 is a N ×N positive definite matrix. Therefore,

the log-likelihood function can be written as

lnL(∆1, ...,∆T ;ϕ) ∝ C +T

2ln |Σ−122 | −

T

2ln(|σ2|)− 1

2(υ +Np)

T∑t=1

ln(γt) (4.19)

where C = TC0.

Chapter 5

Conclusion

5.1 Introduction

How to describe the volatility of speculative prices that has the properties of (i) Non-Normal

distribution, (ii) volatility clustering, (iii) mean reversion and (iv)structure heterogeneity? The

main goal of this dissertation is to answer this question. To be more specific, the key contributions

of this paper are listed as the following.

(1) This paper follows the PR approach to propose the Student’s t family models, particularly

including (Heterogeneous)St-AR and (Heterogeneous)St-VAR models, for capturing univariate and

multivariate volatility. The PR approach gives rise to a statistical model that is defined in terms

of observable random variables and their lags, and not errors, as is the case with the ARCH-type

formulations.

(2) I show that in both univariate and multivariate cases, the ARCH type models are special

cases of the Student’s type model with implicit restrictions. Therefore the Student’s t type model

generalize the ARCH type models, and overcome several limitations of the ARCH type models.

The use of the multivariate Student’s t distribution leads to a specification for the conditional

variance that is inherently heteroscedastic, which is not ad-hoc. Besides, it gives rise to an in-

herently consistent model, which does not require any parametric positivity restrictions on the

coefficients or additional memory assumptions for the stability of the conditional variance. Third,

this specification allows us to model the conditional mean and conditional variance jointly leading

to gains in efficiency, in contrast to the ARCH-type models, where the two functions are specified

separately.

100

5.2. DISCUSSION AND FUTURE PROSPECT 101

(3) I propose the heterogeneous version of the Student’t type models in order to capture the

interrelated changes in the structure of the first two order moments of a time series. The Stu-

dent’s t type models provide a convenient framework that enables us to model different types of

heterogeneity by introducing specific types of heterogeneous mean functions.

(4) I illustrate the applicability of the Student’s type models using the data of real-world

speculative prices in Asia. The results of a number of empirical applications suggest the Student’s

t type models provide a promising way of modeling volatility. The forecasting performance of the

St-(V)AR models is far better than that of the Normal-(V)AR models. A key reason for this is that

the statistical adequacy of the Student’s t models has been secured through thorough M-S testing.

The very different estimation results from the Normal-(V)AR and St-(V)AR models illustrate the

importance of appropriate model choice.

5.2 Discussion and Future Prospect

The challenge of volatility modeling is to find a parsimonious specification that give rise to

reliable estimates and forecasts. In this dissertation I use the PR approach as an effective specifi-

cation selection method and use statistical adequacy as the main criteria to evaluate the reliability

of the specification. This present study give rise to more further discussions.

(1) The models with different distributions can be considered. In this work, the main distri-

butional assumption is that the relevant variables follow the multivariate Student’s t distribution,

which is a member the elliptically symmetric family of distributions. Although the Student’s t

distribution offers some advantages in volatility modeling, it can be extended into a more general

form in order to allow for (i) non-linear mean process, (ii) marginal distributions with different

degrees of freedom (iii) asymmetric volatility.

(2) Further study of different types of heterogeneity can be interesting. In this work, the

heterogeneity in the mean is captured by orthogonal polynomials. It deserves a careful study to

examine other types of heterogeneous functions, like dummy variable function, poly-trigonometric

function, or the combination of different types. The study on the properties and applicability of

different types of heterogeneous function enable us to make the specifications more flexible.

(3) In this work, the PR approach is mainly applied to build autoregressive models in both

univariate and multivariate cases. An important extension could be the application of the PR

approach in specifying other types of models, like panel data models and seemingly unrelated

regressions.

Bibliography

Andersen, T. G., & Bollerslev, T. (1998). Answering the Skeptics: Yes, Standard Volatility Models

Do Provide Accurate Forecasts. International Economic Review , 39 , 885–905.

Andreou, E., Pittis, N., & Spanos, A. (2001). On Modelling Speculative Prices: The Empirical

Literature. Journal of Economic Surveys, 15 , 187–220.

Andreou, E., & Spanos, A. (2003). Statistical Adequacy and the Testing of Trend Versus Difference

Stationarity. Journal of Economic Surveys, 22 , 217–237.

Bauwens, L., Laurent, S., & Rombouts, J. V. K. (2006). Multivariate GARCH Models: A Survey.

Journal of Applied Econometrics, 21 , 79–109.

Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of E-

conometrics, 31 , 307–327.

Bollerslev, T. (1987). A Conditionally Heteroskedastic Time Series Model for Speculative Prices

and Rates of Return. Review of Economics and Statistics, 69 , 542–547.

Bollerslev, T., & Baillie, R. T. (1990). A Multivariate Generalized ARCH Approach to Model-

ing Risk Premia in Forward Foreign Exchange Markets. Journal of International Money and

Finance, 9 , 309–324.

Bougerol, P., & Picard, N. (1992). Strict Stationarity of Generalized Autoregressive Processes.

The Annals of Probability , 20 , 1714–1730.

Degiannakis, S. A., & Xekalaki, E. (2004). Autoregressive Conditional Heteroscedasticity (ARCH)

Models: A Review. Quality Technology and Quantitative Management , 1 , 271–324.

Ding, Z., & Granger, C. W. (1996). Modeling Volatility Persistence of Speculative Prices: A New

Approach. Journal of Econometrics, 73 , 185–215.

102

BIBLIOGRAPHY 103

Ding, Z., Granger, C. W., & Engle, R. F. (1993). A Long Memory Property of Stock Market

Returns and a New Model. Journal of Empirical Finance, 1 , 83–106.

Engle, R. (2002). New Frontiers for ARCH Models. Journal of Applied Econometrics, 5 , 425–446.

Engle, R. F. (1982). Autoregressive Conditional Heteroskedasticity with Estimates of the Variance

of United Kingdom Inflation. Econometrica, 50 , 987–1008.

Engle, R. F. (2004). Risk and Volatility: Econometric Models and Financial Practice. American

Economic Review , 94 , 405–420.

Engle, R. F., & Bollerslev, T. (1986). Modelling the Persistence of Conditional Variances. Econo-

metric Reviews, 5 , 1–50.

Engle, R. F., & Kroner, K. F. (1995). Multivariate Simultaneous Generalized ARCH. Econometric

Theory , 11 , 122–150.

Engle, R. F., Lilien, D. M., & Robins, R. P. (1987). Estimating Time Varying Risk Premia in the

Term Structure: The ARCH-M Model. Econometrica, 55 , 391–407.

Engle, R. F., Ng, V. K., & Rothschild, M. (1990). Asset Pricing with a Factor-ARCH Covariance

Structure: Empirical Estimates for Treasury Bills. Journal of Econometrics, 45 , 213–237.

Engle, R. F., & Patton, A. J. (2001). What Good is a Volatility Model? Quantitative Finance,

1 , 237–245.

Friedman, M. (2002). Nobel Lecture: Inflation and Unemployment. Journal of Political Economy ,

85 , 451–472.

Glosten, L. R., Jagannathan, R., & Runkle, D. E. (1993). On the Relation Between the Expected

Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance, 48 ,

1779–1801.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.

Hansen, P. R., & Lunde, A. (2005). A Forecast Comparison of Volatility Models: Does Anything

Beat a GARCH(1,1)? Journal of Applied Econometrics, 20 , 873–889.

He, C., & Terasvirta, T. (1999a). Fourth Moment Structure of the GARCH(p; q) Process. Econo-

metric Theory , 15 , 824–846.

104 BIBLIOGRAPHY

He, C., & Terasvirta, T. (1999b). Properties of Moments of a Family of GARCH Processes. Journal

of Econometrics, 92 , 173–192.

He, C., Terasvirta, T., & Malmsten, H. (2002). Moment Structure of A Family of First-Order

Exponential GARCH Models. Econometric Theory , 18 , 868885.

Heracleous, M. S., & Spanos, A. (2006). The Students t Autoregeressive Model with Dynamic

Heteroskedasticity. Advances in Econometrics, 20 , 289–319.

Inclan, C., & Tiao, G. C. (1994). Use of Cumulative Sums of Squares for Retrospective Detection

of Changes of Variance. Journal of the American Statistical Association, 89 , 913–923.

Kearney, C., & Patton, A. J. (2000). Multivariate GARCH Modeling of Exchange Rate Volatility

Transmission in the European Monetary System. The Financial Review , 41 , 29–48.

Kotz, S., & Nadarajah, S. (2004). Multivariate T-Distributions and Their Applications. Cambridge

University Press.

Lamoureux, C. G., & Lastrapes, W. D. (1990). Persistence in Variance, Structural Change, and

the GARCH Model. Journal of Business and Economic Statistics, 8 , 225–234.

Lee, G. G. J., & Engle, R. F. (1999). A Permanent and Transitory Component Model of Stock

Return Volatility. R. Engle and H. White (ed.) Cointegration, Causality, and Forecasting , (pp.

475–497).

Lee, S. W., & Hansen, B. E. (1994). Asymptotic Theory for the GARCH(1, 1) Quasimaximum

Likelihood Estimator. Econometric Theory , 10 , 29–52.

Li, W. K., Ling, S., & McAleer, M. (2002). Recent Theoretical Results for Time Series Models

with GARCH Errors. Journal of Economic Surveys, 16 , 245–69.

Ling, S., & McAleer, M. (2002). Stationarity and the existence of moments of a family of GARCH

processes. Journal of Econometrics, 106 , 109–117.

Linton, O. B., & Perron, B. (2003). The Shape of the Risk Premium: Evidence from a Semipara-

metric Generalized Autoregressive Conditional Heteroscedasticity Model. Journal of Business

and Economic Statistics, 21 , 354–67.

Lumsdaine, R. L. (1996). Consistency and Asymptotic Normality of the Quasi-maximum Likeli-

hood Estimator in IGARCH(1, 1) and Covariance Stationary GARCH(1, 1) Models. Econometric

Theory , 64 , 575–596.

BIBLIOGRAPHY 105

McGuirk, A., Driscoll, P., Alwang, J., & Huang, H. (1995). System Misspecification Testing and

Structural Change in the Demand for Meats. Journal of ’Agricultural and Resource Economics,

20 , 1–21.

McGuirk, A., Robertson, J., & Spanos, A. (1993). Modeling Exchange Rate Dynamics: Non-Linear

Dependence and Thick Tails. Econometric Reviews, 12 , 33–63.

McGuirk, A., & Spanos, A. (2002). The Model Specification Problem from a Probabilistic Reduc-

tion Rerspective. American Journal of Agricultural Economics, 83 , 1168–1176.

Nelson, D. B. (1986). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econo-

metrica, 59 , 347–370.

Nelson, D. B. (1990). Stationarity and Persistence in the GARCH(1,1) Model. Econometric

Theory , 6 , 318–334.

Nelson, D. B., & Cao, C. Q. (1992). Inequality Constraints in the Univariate GARCH Model.

Journal of Business & Economic Statistics, 10 , 229–235.

Pan, M., Fok, R. C., & Liu, A. (2007). Dynamic Linkages between Exchange Rates and Stock

Prices: Evidence from East Asian Markets. International Review of Economics and Finance,

16 , 503–520.

Praag, B. M. V., & Wesselman, B. M. (1989). Elliptical Multivariate Analysis. Journal of Econo-

metrics, 41 , 189–203.

Rabemananjara, R., & Zakoian, J. M. (1993). Threshhold ARCH Models and Asymmetries in

Volatility. Journal of Applied Econometrics, 8 , 31–49.

Rapach, D. E., & Strauss, J. K. (2008). Structural Breaks and GARCH Models of Exchange Rate

Volatility. Journal of Applied Econometrics, 23 , 65–90.

Sanso, A., Arago, V., & Carrion, J. L. (2004). Testing for Changes in the Unconditional Variance

of Financial Time Series. DEA Working Papers 5, Universitat de les Illes Balears, Departament

d’Economa Aplicada.

Schwert, G. W. (1989). Why Does Stock Market Volatility Change Over Time? Journal of

Finance, 44 , 1115–1153.

106 BIBLIOGRAPHY

Schwert, G. W., & Seguin, P. J. (1990). Heteroskedasticity in Stock Returns. Journal of Finance,

45 , 1129–1155.

Searle, S. R. (1982). Matrix Algebra Useful for Statistics. JohnWiley.

Shumway, R. H., & Stoffer, D. S. (2006). Time Series Analysis and Its Applications With R

Examples. Springer Science+Business Media, LLC.

Spanos, A. (1986). Statistical Foundations of Econometric Modeling . Cambridge University Press.

Spanos, A. (1989). On Re-reading Haavelmo: A Retrospective View of Econometric Modeling.

Econometric Theory , 5 , 405–429.

Spanos, A. (1994). On Modeling Heteroskedasticity: The Student’s t and Elliptical Linear Regres-

sion Models. Econometric Theory , 10 , 286–315.

Spanos, A. (1995). On Normality and the Linear Regression Model. Econometric Review , 14 ,

195–203.

Spanos, A. (1999). Probability Theory and Statistical Inference: Econometric Modeling with Ob-

servational Data. Cambridge University Press.

Spanos, A. (2002). Student’s t Autoregressive Model with Dynamic Heteroskedasticity. Working

Paper .

Spanos, A. (2010a). Akaike-type Criteria and the Reliability of Inference: Model Selection versus

Statistical Model Specification. Journal of Econometrics, 158 , 204–220.

Spanos, A. (2010b). Statistical Adequacy and the Trustworthiness of Empirical Evidence: Statis-

tical vs. Substantive Information. Economic Modelling , 27 , 1436–1452.

Spanos, A. (2011). Revisiting the Statistical Foundations of Panel Data Modeling. Working Paper .

Tsai, H. (2008). A Note on Inequality Constraints in the GARCH Model. Econometric Theory ,

24 , 823–828.

Tsay, R. S. (2005). Analysis of Financial Time Series, 2nd Edition. Wiley-Interscience.

Tse, Y. K., & Tsui, A. K. C. (2002). A Multivariate GARCH Model with Time-varying Correla-

tions. Journal of Business and Economic Statistics, 20 , 351–362.

Zhao, H. (2010). Dynamic Relationship between Exchange Rate and Stock Price: Evidence from

China. Research in International Business and Finance, 24 , 103–112.

on modeling the volatility in speculative prices · on modeling the volatility in speculative...

Documents